* Ethics for builders of data-based applications *

- Sarah Diot-Girard
- Working in Machine Learning since 2012
- Currently at
- Interested in ethics but not an expert

* Ethics for builders of data-based applications *

We want to help high schoolers

find the perfect major for them.

Measuring academic perfomances

- Let's use grades!
- Let's use grades with a weight depending on the high school!
- Let's use grades...?

Learning from past **biased** data

Sampling bias

** Bag-of-words **

['nurse', 'physician', 'math teacher']

'nurse': [1, 0, 0]

'physician': [0, 1, 0]

'math teacher': [0, 0, 1]

** Word2Vec**

['nurse', 'physician', 'math teacher']

'nurse': [.91, .87, .2, ...]

'physician': [.85, .86, .35, ...]

'math teacher': [.53, .64, .78, ...]

* Semantics derived automatically from language corpora contain human-like biases*,

Caliskan et al.

Don't hoard data!

Work on anonymized data if you can!

P(Y = grad| S = s

= P(Y = grad | S = s

For a same score, one has the same probability of graduating regardless of which subgroup one belongs to.

P(Y = grad | S > s

= P(Y = grad | S > s

For a score higher than the threshold, one has the same probability of graduating regardless of which subgroup one belongs to.

P(S <= s

= P(S <= s

If one will graduate, one has the same probability of getting a too low score regardless of which subgroup one belongs to.

E(S | Y = grad, G = b)= E(S | Y = grad, G = r)

The average score of graduating students is the same regardless of the subgroup.

P(S > s

= P(S > s

If one will fail, one has the same probability of getting a too high score regardless of which subgroup one belongs to.

E(S | Y = fail, G = b)= E(S | Y = fail, G = r)

The average score of failing students is the same regardless of the subgroup.

P(S > s

= P(S > s

The probability of having a score higher than the threshold is the same regardless of which subgroup one belongs to.

All those criteria seem fair and reasonable.

Bad news, you cannot have all of them!

* Inherent Trade-Offs in the Fair Determination of Risk Scores*, Kleinberg et al.

* Fair prediction with disparate impact: A study of bias in recidivism prediction instruments *,

A. Chouldechova

* 21 fairness definitions and their politics*, A. Narayanan

when you can have deep learning?

A cautionary tale

when you can have deep learning?

** Before building a model:**

visualisation (PCA, t-SNE),

exploratory analysis (clustering)

when you can have deep learning?

** While building a model:**

sparsity, rule-based,

prototype-based

From ELI5 documentation

when you can have deep learning?

** After building a model:**

surrogate models,

sensitivity analysis

The less data you have, the less accurate you are.

Minority subconcepts are considered as noise.

- Evaluate on a separate dataset
- Fit preprocessing on the whole dataset
- Select the best algorithm on test data
- Use inappropriate performance metrics

- Apophenia
- Illusory causation
- Confirmation bias

Beware of feedback loops!

What if our algorithm is used to match all students with their "chosen" major, nation-wide?

- Data is not neutral
- Algorithms are not objective
- Data scientists are not exempt from bias

Thanks for your attention!

sarah.diot-girard@people-doc.com

We're hiring !

* Semantics derived automatically from language corpora contain human-like biases*,

Caliskan et al.

* Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings*, Bolukbasi et al.

"How to make a racist AI without really trying", Rob Speer

* Inherent Trade-Offs in the Fair Determination of Risk Scores *, Kleinberg et al.

* Fair prediction with disparate impact: A study of bias in recidivism prediction instruments *,

A. Chouldechova

* 21 fairness definitions and their politics*, A. Narayanan

The ELI5 library

The FairML project

* “Why Should I Trust You?” Explaining the Predictions of Any Classifier*, Ribeiro et al

* Weapons of Math Destruction*, Cathy O'Neil