Sarah DIOT-GIRARD

sdg@jlbl.net

github.com/SdgJlbl

Software Engineer - Machine Learning

With a proven expertise in Machine Learning, I advocate and implement Software Engineering best practices and get companies ready to unlock the full potential of their Data Science models using the latest MLOps tools.

TL; DR:

12 years experience · Python · Data Science · Tech Lead · MLOps · Kubernetes · Technical Mentoring · Reproducible Pipelines · Software Architect · Open-Source · NLP · Data privacy · Ethical AI

Since January 2023

Senior Machine Learning Engineer

Owkin

Developed Substra, an open-source Federated Learning software.

  • Contributed to the Python libraries, the Python / Django backend, the golang orchestration component and to the Kubernetes packaging.
  • Improved backend architecture, towards more isolated components.
  • Optimised performances when working on large datasets.
  • Contributed to various user-facing features (experiment dependency management, supporting flexible compute plans to pave the way for advanced federated analytics).
  • Restructured user documentation (following diátaxis guidelines).
  • Worked on Kubernetes hardening of the applicative components.

Co-authored an internal white paper on Data Privacy and Federated Learning.

Delivered internal training for data scientists (on pipeline reproducibility, writing efficient tests…) and one-to-one technical mentoring.

Delivered an MVP for a malware scanner component on a critical timeline.

April 2021 - December 2022

Tech Lead - Credit modelling

Hokodo

Led the credit modelling team, implementing B2B short-term credit scores (BNPL).

  • Maintained dozens of distinct models in production.
  • Automated and streamlined modelling processes for reducing time-to-production.
  • Prototyped, then delivered improved algorithms and feature selection techniques, while satisfying the industry requirements on explainability.

Set up modern MLOps and software engineering practices.

  • Designed reproducible pipelines with DVC, model monitoring, end-to-end ownership on model implementation and deployment.
  • Enforced code quality tools and processes (CI, unit tests, linting, reviews).
  • Led the refactoring effort of the legacy codebase, all while delivering value for the business.
  • Introduced the team to various important software engineering principles (SOLID principles,hexagonal architecture...).

Supervised a team of 4 Data Scientists and ML Engineers.

  • Co-constructed the team technical roadmap, balancing technical and business constraints.
  • Mentored the junior members in the team through code reviews, pair programming and one-to-one technical coaching.
  • Organised coordination with the web engineering team and other stakeholders.
November 2017 - March 2021

Senior Data Scientist

PeopleDoc / UKG

Built an automated classification pipeline for HR documents, both text and scans, from scratch.

  • Led the project from the POC phase to production-grade deployment, focusing on MLOps practices (execution performance, model monitoring, ...) and codebase maintainability.
  • Designed the prediction API for integrating with other internal applications.
  • Worked with the UI/UX team to improve results interpretability for end users.
  • Created a POC to anonymise text data.

Built up and led a team of 5 persons.

  • Hired, onboarded and mentored teammates (product owner, DevOps, ...).
  • Wrote technical specifications and coordinated technical tasks.
  • Championed a Machine Learning culture inside the company, including best practices around data privacy.

Developed MLV-tools, an open-source MLOps toolkit for easy Machine Learning pipeline versioning.

June 2016 - September 2017

Lead Data Scientist

WayKonect

Developed data-driven algorithms for connected cars.

Implemented the data pipeline from scratch, with a focus on code quality and reproducibility of results.

Designed the corporate data strategy to improve data gathering in the long term.

Developed a personalised coaching algorithm to improve driver safety and promote eco-driving.

May 2012 - June 2016

Machine Learning Research Engineer

Dassault Systèmes

Research and development on a rule inference engine (quality analysis on manufacturing processes).

Refactored legacy code, updated documentation and tests, improved rule intelligibility.

Improved predictive power of the rule engine using boosting techniques.

Added a prescriptive module for correcting poor quality outcomes.

6-month internship: Inference of gene regulatory networks from DNA chips data

October 2011 - April 2012

Research assistant

TU München / DLR

Implemented in Python optimisation algorithms for Deep Neural Networks.

Tech Stack

  • ML: pandas · NumPy · scikit-learn · PyTorch · skorch · Jupyter
  • Code quality: pytest · Hypothesis · black · Data Version Control (DVC)
  • DevOps: bash · git · Linux · Docker · Kubernetes · FluxCD · Argo Workflows
  • Other programming languages: go · rust

Education & languages

  • Engineering degree, ENSTA ParisTech, 2012
  • Master of Science "Robotics, Cognition, Intelligence", TU München, 2012
  • French (native speaker), English (C2), German (B2+)

Interests

  • Open-source contributor
  • Speaker at tech conferences (EuroPython, PyData, EuroSciPy, national PyCons)
  • Mentoring in data science and public speaking