Trust me,

I'm a Data Scientist

Thoughts by Sarah Diot-Girard

PyConLt 2019

From ML experiments to production: versioning and reproducibility with MLV-tools

This talk given in collaboration with my coworker Stéphanie Bracaloni introduces our open-source project MLV-tools.

You’re a data scientist. You have a bunch of analyses you performed in Jupyter Notebooks, but anything older than 2 months is totally useless because it’s never working right when you open the notebook again.

You’re working with software engineers. They can’t imagine life without Git, reviews on readable files, tests, code analysis, CI. They are aghast that you cannot reproduce your Machine Learning analysis seamlessly. And when your team wants to bring anything into production, it’s a nightmare.

We had these kind of issues in our company. Building on open-source solutions, we have developed a set of open-source tools and designed a process that works for us. It helps us version Machine Learning experiments and smoothen the path to production. We are thrilled to present our project and we hope to spark a discussion with the community.