Hear from your colleagues, get to know the latest trends and offering inside Adevinta. Find inspiration.
Put faces to all those Slack names and gifs. Get in touch with colleagues from other offices.
Cutting edge technology around data, maturity of the lates frameworks, learn about the hype.
Data oriented workshops for different roles and levels. Get your hands on some action.
Our colleagues from ECG have already joined us. Take the opportunity to get to know new people and ways of using data.
Find colleagues and get the time to ask those questions you always wanted to ask.
We are working on building an interesting agenda full of great talks and discussions for you.
Keep connected for more info.
π₯ Recording
Welcome from Organisation comitee and introduction to Data Days
π₯ Recording
Both teams behing Odin and Data Highway will share their backstory, the process and the context that led them to build this incredible two products that are similar and different in equal proportions.
Data Engineering Track |
π₯ Recording
With the introduction of the european GDPR laws in 2018, companies are required to change the way they are processing and storing personal information (PII) of their customers. This task is particularly challenging when considering that Big Data platforms running on Hadoop-based systems were not designed for mutability. We will present how the Data Strategy team @ mobile.de has build a GDPR-compliant data lake in the Google Cloud Platform.
Leveraging Delta Lake as a storage layer on GCS allowed us to implements the new requirements on a Petabyte scale. The use of BigQuery as an access layer provides additional capabilities for managing fine-grained access to PII fields which allows a privacy-first data approach without reducing comfort or speed for our users.
π₯ Recording
In the last quarters we started from BumbleBee to get metrics from our experiment and made a lot of change to compute advanced metrics.
Engineers adapted the pipeline to consume from either raw data or fact layer when possible, added a layer of prepared datasets from which we can compute all sort of metrics from simple count to advanced aggregations (session length, # session ending in a lead, dwell time, returning users, ...)
Analysts work on nice dash-boarding in Tableau so we can also share our learnings from experiments in an efficient manner with the marketplaces.
The presentation will be made by a DE and a DA from P10N. We will quickly introduce our backend A/B testing setup, our adaptation of the BumbleBee pipeline and the Tableau dashboards built on top of these results
π₯ Recording
dbt (Data Built Tool) is a tool for developing, testing and deploying data transformation jobs, bringing software engineering best practices to SQL development. At the eCG Personalization team we use dbt with BigQuery to provide user profile data points for our tenants and features for downstream ML models. In this talk we will present how we have built our data transformation pipelines and how it fits with the rest of our data stack. We also show how dbt can reduce the required engineering effort for developing data products and democratize access to data.
π₯ Recording
At Leboncoin, we have close to 200 kafka topics produced by micro-services covering all domains of the organisation (ad publication, ad validation, authentication, transactions, messaging...).
These topics are used by business intelligence, analytics, and machine learning teams, but first they need to be transfered to a more ""offline friendly"" storage with a proper query engine such as Spark or Athena.
Their schemas are also susceptible to evolve quite rapidly, beyond what the central data-engineering teams is able to cope with.
This talk will delve into the challenges and solutions we implemented to - using kafka connect - fully automate the process of discovering, normalizing, extracting, storing, and exposing those topics in our Hive metastore,without any human intervention along the process, to be enable greater agility for our downstream teams, and make the data engineering team less of a bottleneck when it comes to accessing data.
Data Engineering Track |
We will go on a quest to discover who can discover more secrets and get to meet a more diverse group of people in our now bigger than ever familiy of Data-Lovers. There will be prizes for the winners.
π₯ Recording
Welcome from ML Organisation panel and highlight on what's going on with ML in Adevinta
π₯ Recording
First Kaggle Triple Grandmaster Abhishek Thakur will share his learnings, experiences and vision on the future of ML.
Machine Learning Track |
π₯ Recording
In this talk we will present how we improved the performance of the most impactful product provided by Personalisation called Related Items, which presents users with similar ads to the one they are currently looking at. We currently generate recommendations for this product by a combination of two batch algorithms that run in an hourly fashion: a graph-based Collaborative Filtering (CF) algorithm and a text-matching Content-Based (CB) approach. This offline architecture has 2 main limitations: (1) low scalability / high computational costs incurred by precomputing all recommendations every hour, even though only 10-20% of them are actually served; and (2) the item cold-start problem, not being able to return recommendations for classified ads published in the last hour.
We addressed these limitations by moving towards an online hybrid recommendation architecture on top of Elasticsearch, powered by a neural recommender model called LightFM. This model learns item embeddings from user interactions which then are used to retrieve related-items recommendations in an online manner via efficient approximate nearest neighbour search. We will also talk about the novel streaming content ingestion pipeline built on top of Yottaβs self-serve services, which is key to be able to have access to near real time ad content data and generate recommendations also for new ads, which allows us to reach almost 100% coverage. The most recent A/B test on Segundamano.mx has shown a significant uplift in several user-centric online metrics, which validates the effectiveness of the new architecture not only in terms of cost reduction and coverage, but also in user engagement impact.
π₯ Recording
Connecting buyers and sellers in a safe and secure environment is one of the biggest challenges in online marketplaces. Probabilistic models built upon user-item databases address the challenge, but often encounter issues such as lack of stability and robustness. These issues are magnified in fraud scenarios where datasets are highly imbalanced, noisy and malicious users deliberately adapt their behaviors to avoid detection.
In this context, we leveraged the power of existing open sources machine learning libraries H2O and Catboost and designed a pipeline to collect, process and predict the likelihood of a private sellerβs listing data to be fraudulent. We found that the stacked ensemble model provides the best performance (F1=0.73) when compared to other commonly used models in the field. Further, our models are benchmarked on a public Kaggle Dataset, TalkingData AdTracking Fraud Detection Challenge where we compared them to other studies and highlighted their generalizability and effectiveness at handling online fraud.
π₯ Recording
Our team (PnR) provides recommendations for various markets, and in this talk we want to give an overview of how our solutions evolved, what were the driving factors, how do they work and what are the advantages and disadvantages of different approaches we saw during our journey.
π₯ Recording
In this talk we will share our way of working and dive into our personalised homepage recommendations. The first half will focus on our ML development lifecycle, the tech stack, the challenges we faced and how we overcame some of them. In the second half we will show how we increased revenue and connections on Marktplaats by adding a personalised ranking layer on top of multiple existing recommender systems. We will discuss the details of its two core components, one for producing relevant content and one for adding a taste of inspiration. We will also provide an overview of the experiments we did in the past year to share our key learnings.
Machine Learning Track |
Quiz Challengue, Who knows more about ML?
π₯ Recording
Welcome from Organisation comitee and introduction to Analytics Day
π₯ Recording
Prof. Dr. Arif Wider is a full professor of software engineering at HTW Berlin and a fellow technology consultant (part-time) with ThoughtWorks Germany, where he served as Head of Data & AI before moving back to academia.
Analytics Track |
π₯ Recording
It was mid of 2019 when the distributed Data mesh concept developed by Zhamak Dehghani arrived to BIC, and there we were able to identify some of the problems we were suffering while building analytics data for central teams.
From this moment on, we started slowly to shift central teams analytics data development to this new paradigm. It's been a long way to get there, as it is not only about engineering or data, it has a big impact on Data culture.
In this session we are going to get and introduction to Data mesh, how BIC is enabling it and some learnings and fails from the trip.
π₯ Recording
An A-Z time series forecasting methodology including all steps from data preparation, exploratory analysis, model fitting/validation/selection (currently in the form of an R script, building a Shiny app is WiP). The idea is to help end-users with no coding skills, to generate robust forecasting and achieve standardisation and transparency of forecasting methods across the organisation. Tested on Vibrancy, SEO and Paid Digital use cases with promising results.
π₯ Recording
Taking from the experience of Q3 and Q4 at Subito, integrating the Houston tool and starting the upskilling programme we will show how challenging but rewarding is to move an entire organisation towards a more data driven product process and creating an experimentation culture.
π₯ Recording
The goal of self-service analytics is to enable everyone in Adevinta to leverage available data in making business decisions - not an easy task considering thousands of Adevintans are making decisions every day!
This presentation will give an overview of what weβve learned so far about self-service analytics, share best practices and showcase some of our most successful tools.
Analytics Track |
TBC
Share your cool projects, interesting findings, or just your experience with some tooling with the rest of your colleagues.
We need your content to make Data Days great again. Do you want to speak?
Submit a proposal
Data Days is a Globally managed event to bring all local marketplaces together. If you are an Adevintian, you are free to join.
We will cover the conference costs for you, but we ask your commitment to be active and participate, and to book time to attend.
Infinite Tickets Available
Data Days events are community events intended for networking and collaboration as well as learning. We value the participation of every member of the data community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to other attendees throughout the event and in interactions online associated with the event.
Β© Designed and Developed by UIdeck