Trust me,

I'm a Data Scientist

Thoughts by Sarah Diot-Girard

Substra

Substra is a Federated Learning framework, built for cross-silo use cases.

Substra was created to enable federated learning scenarios in the cancer treatment research, allowing multiple hospitals to cooperate in building Machine Learning models while keeping their sensitive data private. It can be used in other similar setups where a few big actors want to run algorithms on each other's data without accessing the data themselves.

Substra is a Python/Django + Go + React application, deployed on a network of connected Kubernetes clusters. It's a toolbox providing solutions for all sorts of problems one may have when developing such applications: from sharing the training code via Docker images, to validating identities between clusters using certificates authorities and mTLS, etc. It powers the substrafl library which provides ready-to-use abstractions for implementing cutting-edge Machine Learning algorithms in a federated context.

Substra is a member of the Linux Foundation for Artificial Intelligence.

You can find the documentation here.

I joined the Substra project in 2023 as part of my work at Owkin, and contributed to its maintenance and its improvement since then.

MLV-tools (deprecated)

MLV-tools has been deprecated in 2021, following the significant improvements brought by DVC 1.0.

MLV-tools is a toolbox to help with versioning and reproducibility of Machine Learning experiments. Its main features include:

You can learn more about the project on GitHub. Check out the tutorials!

MLV-tools is also available on PyPI.