Trust me,

I'm a Data Scientist

Thoughts by Sarah Diot-Girard

Aug 15, 2022

My favorite talks of PyData London 2022

In-person conferences are coming back in 2022! I had been meaning to attend PyData London for quite some time, and I finally got the opportunity this year. The venue is great, nearby the Tower of London, offering us a magnificent view over the Thames while nibbling scones between talks.

I also had the great pleasure to give a talk at the conference, covering best architecture practices for handling I/Os.

Here are some of my highlights for the conference.

Kishan Manani gave a great primer on feature engineering for time series, and the common pitfalls to avoid when trying to apply “regular” Machine Learning models to time data - yes, the kind of models which expects iid data points. Very nice and clear presentation.

Next, Marysia Winkels share her thoughts on data-centric AI. As models in Natural Language Processing and Computer Vision are becoming so huge that it makes no sense trying to improve them, the role of the data scientist switches from that of a model builder to focus more on improving the quality of the datasets.

I found the concept of data-centric AI very neat, it helped me organise some random thoughts floating around from quite some time. Marysia’s talk went beyond being purely theoretical, by exposing very concrete use cases and how the data-centric lens helps with viewing those problems in a new light.

And last but not least, Adrin Jalali brought to the table very interesting questions around fairness in ML: how we can define it, why it is context-dependant and relevant to more than “typical” applications one might think about when considering ethics (bail release, credit loan, job application and the such).

I appreciated that the talk was accessible even to a non-technical audience, and I will definitely share it widely.