Trust me,

I'm a Data Scientist

Thoughts by Sarah Diot-Girard

EuroSciPy 2019

Introduction to tests for data scientists

We all know that we should test our code more, but somehow, we never seem to find the time. Test writing is sometimes perceived as tedious, boring, and unappealing, but it doesn’t have to be that way!

Fortunately, we can rely on testing practices that have been developed by software engineers to help us test efficiently and with minimum hustle our code: splitting tests in levels (unit, component, end-to-end), using a Continuous Integration environment, automatically triggering test suites after changes…

Machine Learning algorithms also have some specificities that we will discuss, and we will propose some solutions to mitigate the problems encountered when trying to test ML code.

This talk will focus on how to easily write tests and testable code and expose the benefits of tests on unrealistic data in a Machine Learning project.

(Tests on real data are also really important but they are not the main purpose of this talk.)