Sarah DIOT-GIRARD

sdg@jlbl.net

github.com/SdgJlbl

Machine Learning Specialist

TL; DR:

Experience: 8 years · Data scientist, with a strong interest in DevOps · Python · NLP · MLOps · Data privacy · Ethical AI · Open-source

Since November 2017

Senior Data Scientist

PeopleDoc / UKG

Built an automated classification pipeline for HR documents, all formats, from scratch.

  • Improved HR users day-to-day workflow, and reduced error rates.
  • Created a POC model working on both on text and scanned documents.
  • Wrote the production code, improved execution performances and codebase maintainability.
  • Designed the prediction API for integrating with other internal applications.
  • Worked with the UI/UX team to improve results interpretability for end users.
  • Set up feedback mechanisms to monitor ML model performance in production.
  • Created a POC to anonymise text data.

Built up and led a team of 5 persons.

  • Hired and onboarded teammates (product owner, DevOps, ...).
  • Wrote technical specifications and coordinated technical tasks.
  • Mentored a teammate toward switching to a Data Scientist role.

Set up the Machine Learning team at PeopleDoc.

  • Promoted a Machine Learning culture inside the company, through demos, talks and workshops.
  • Supervised the development of ML-compatible processes and tools.
  • Initiated a ML mindset in different departments (hardware requirements, data access, ...).
  • Promoted best practices around data privacy.
  • Collaborated with other Data Science teams across the UKG group after the acquisition.

Developed MLV-tools, an open-source MLOps toolkit for easy Machine Learning pipeline versioning.

June 2016 - September 2017

Lead Data Scientist

WayKonect

Developed data-driven algorithms for connected cars.

Implemented the data pipeline from scratch, with a focus on code quality and reproducibility of results.

Designed the corporate data strategy to improve data gathering in the long term.

Developed a personalised coaching algorithm to improve driver safety and promote eco-driving.

May 2012 - June 2016

Machine Learning Research Engineer

Dassault Systèmes

Research and development on a rule inference engine (quality analysis on manufacturing processes).

Refactored legacy code, updated documentation and tests, improved rule intelligibility.

Improved predictive power of the rule engine using boosting techniques.

Added a prescriptive module for correcting poor quality outcomes.

6-month internship: Inference of gene regulatory networks from DNA chips data

October 2011 - April 2012

Research assistant

TU München / DLR

Implemented in Python optimisation algorithms for Deep Neural Networks.

Education & languages

  • Master of Science "Robotics, Cognition, Intelligence", TU München, 2012
  • Engineering degree, ENSTA ParisTech, 2012
  • French (native speaker), English (C2), German (B2+)

Tech Stack

  • ML: NumPy · pandas · sklearn · PyTorch · skorch · SpaCy · DVC
  • Code quality: pytest · black · type hints
  • DevOps: Ansible · Flask · (aiohttp) · (SQL) · bash · Linux

Interests

  • Open-source contributor
  • Speaker at tech conferences (EuroPython, PyData, EuroSciPy, national PyCons)
  • Mentoring in data science and public speaking
  • Horseback archer competing at international level