Homepage of Sarah Diot-Girard

PyData London 2022

“Off with their I/Os!” - or how to contain madness by isolating your code

Engulfed in a tedious refactoring of your code, you’re adding the 7th layer of mocks to a test when you realise something must have gone wrong somewhere, but what? You’ve written readable code, split into functions and classes to avoid long chunks of code, and yet, every time, you end up with hardly testable code, a test suite that runs for hours, functions with seventeen arguments, and you wonder if it’s you mocking the code or the code mocking you.

PyConDE 2019

Privacy-preserving text analysis

Data privacy is probably one of the most important challenges we are facing in Data Science. Applications are collecting more and more personal data and it is paramount to ensure anonymity. Privacy cannot be solved just by removing personal identifiers, and concepts such as k-anonymity have been developed to help with structured data. But what if you are working with unstructured text data? Things can get even trickier… This talk aims at presenting a few tips and tricks to ensure privacy when working with text, as well as identifying still open research questions. No silver bullet here, but hopefully a step in the right direction.

DjangoCong 2019

De la polysémie en milieu hybride - a glossary to facilitate communication between developers and data scientists

This short talk in French aims at highlighting in a fun way that many technical words are used both in data science and in web development, but with very different meanings - which can lead to misunderstanding when collaborating.

EuroSciPy 2019

Introduction to tests for data scientists

We all know that we should test our code more, but somehow, we never seem to find the time. Test writing is sometimes perceived as tedious, boring, and unappealing, but it doesn’t have to be that way!

PyConLt 2019

From ML experiments to production: versioning and reproducibility with MLV-tools

This talk given in collaboration with my coworker Stéphanie Bracaloni introduces our open-source project MLV-tools.

PyData Amsterdam 2019

How to easily set up and version your Machine Learning pipelines, using DVC and MLV-tools

Have you ever heard about Machine Learning versioning solutions? Have you ever tried one of them? And what about automation? Learn how to easily build versionable pipelines! This tutorial explain through small exercises how to setup a project using DVC and MLV-tools.

PyConFr 2018

Advanced NumPy usages

Python is known as a slow programming language. It is nonetheless very popular in the scientific community, and is used to perform massive numerical computations. How can that be? In a word: NumPy.

PyConFr 2018

Machine Learning workshop for beginners

This workshop is intended for Python developers with no previous experience in Machine Learning.

Company internal meetup

Introduction to Data Privacy

This talk is a high level overview of the principles of data privacy. Featuring toucans, koalas and lemurs.

EuroPython 2018

Trust me, I'm a Data Scientist - ethics for builders of data-based applications

Data Science is gonna save the world, right? Or is it? Machine Learning epic fails are being largely commented. It’s easy to convince ourselves that they are due to the inconsiderate misuse of Data Science. But is it really so? Is it possible that innocuous choices lead an honnest team to a disaster?

I'm a Data Scientist

Thoughts by Sarah Diot-Girard

PyData London 2022

PyConDE 2019

DjangoCong 2019

EuroSciPy 2019

PyConLt 2019

PyData Amsterdam 2019

PyConFr 2018

PyConFr 2018

Company internal meetup

EuroPython 2018