fairness, part ii
TRANSCRIPT
Fairness, Part IIGiulia Fanti
Fall 2019Based in part on slides by Anupam Datta
Administrative
โข HW4 out todayโข Fairness + anonymous communication (next unit)โข You will have ~3 weeks
โข Presentations starting on Wednesdayโข Upload your slides to Canvas by midnight the night before so we can
download them in the morningโข Sign up for groups on Canvas so that we can assign group gradesโข Presentation rubric on Canvas!โข Volunteer in SV to share their laptop on Wednesday?
In-class Quiz
โข On Canvas
Last time
โข Group fairnessโข Statistical parityโข Demographic parityโข Ensures that same ratio of people from each group get the โdesirableโ
outcome
โข Individual fairnessโข Ensures that similar individuals are treated similarlyโข Can learn a fair classifier by solving linear program
Today
โข When does individual fairness imply group fairness?
โข Connections to differential privacy
โข How do we take already-trained classifiers and make them fair?
Paper from last time:
V: Individuals O: outcomes
Classifier(e.g. tracking network)
x M(x)
Vendor(e.g. capital one)
A: actions
M : V ! O ฦ : O! A
๐ฅ โ ๐ โ โ&
S
T
Definedistributionsovereachset:
๐ ๐ฅ = ๐<(๐ฅ)
Individual fairness formulation:
maxAB
๐ผD~F๐ผG~AB ๐(๐ฅ, ๐)
s.t. ๐D โ๐M โค ๐ ๐ฅ, ๐ฆ โ ๐ฅ, ๐ฆ โ ๐
Q: What are the downsides to this formulation?- Need a similarity metric between users - Very high-dimensional LP may be difficult to solve- Classifier must be trained a priori with fairness
Maximize utility
Subject to fairness constraint
When does Individual Fairness imply Group Fairness?
Suppose we enforce a metric d.
Question: Which groups of individuals receive (approximately) equal outcomes?
Answer is given by Earthmover distance(w.r.t. d) between the two groups.
How different are S and T?Earthmover Distance: โCostโ of transforming one distribution to another, by โmovingโ probability mass (โearthโ).
TS
(V,d)
h(x,y) โ how much probability of x in S to move to y in T
Example: Compute Earth-Moverโs Distance
โข On document cam
bias ๐, ๐, ๐ = maxS:TUVWXYZ[W\] ^_T`a
P ๐ ๐ฅ = ๐|๐ฅ โ ๐ โ P ๐ ๐ฅ = ๐|๐ฅ โ ๐
Theorem: Any Lipschitz mapping M satisfies group fairness up to bias ๐, ๐, ๐ .
Further, bias(๐, ๐, ๐) โค ๐๐ธ๐(๐, ๐)
Some observationsbias ๐, ๐, ๐ = max
S:TUVWXYZ[W\] ^_T`aP ๐ ๐ฅ = ๐|๐ฅ โ ๐ โ P ๐ ๐ฅ = ๐|๐ฅ โ ๐
Theorem: Any Lipschitz mapping M satisfies group fairness up to bias ๐, ๐, ๐ .
โข By definition, the bias is the maximum deviation from group fairness that can be achieved!
โข Indeed, for TV distance between distributions and binary classification, bias ๐, ๐, ๐ = ๐๐ธ๐(๐, ๐)
โข Takeaway message: If your groups are very far away (in EMD), the Lipschitz condition can only get you so far in terms of group fairness!
Connections to Differential Privacy
maxAB
๐ผD~F๐ผG~AB ๐(๐ฅ, ๐)
s.t. ๐D โ๐M โค ๐ ๐ฅ, ๐ฆ โ ๐ฅ, ๐ฆ โ ๐
What if we donโt use TV distance for ๐D โ๐M ?
๐ โ ๐ f โ supiโj
log max๐ ๐๐ ๐ ,
๐ ๐๐ ๐
A mapping M satisfies ๐-differential privacy iff it satisfies the Lipschitz property!
Summary: Individual Fairness
โข Formalized fairness property based on treating similar individuals similarlyโข Incorporated vendorโs utility
โข Explored relationship between individual fairness and group fairnessโข Earthmover distance
Individual fairness formulation:
maxAB
๐ผD~F๐ผG~AB ๐(๐ฅ, ๐)
s.t. ๐D โ๐M โค ๐ ๐ฅ, ๐ฆ โ ๐ฅ, ๐ฆ โ ๐
Q: What are the downsides to this formulation?- Need a similarity metric between users - Very high-dimensional LP may be difficult to solve- Classifier must be trained a priori with fairness
Maximize utility
Subject to fairness constraint
Lots of open problems/directionโข Metric
โข Social aspects, who will define them?โข How to generate metric (semi-)automatically?
โข Earthmover characterization when probability metric is not statistical distance
โข Next: How can we compute a fair classifier from an already-computed unfair one?
More definitions of fair classifiers
โข NeurIPS 2016
IndividualโsData
True label
x ๐๐ฅ =
๐ฅpโฆ๐ฅr
Protected attribute ๐ด
t๐
LearnedClassifier
u๐
New (fair)Classifier
S๐จ = ๐
T๐จ = ๐
Equalized odds
โข Consider binary classifiersโข We say a classifier t๐ has equalized odds if for all true labels ๐ฆ,
๐ t๐ = 1|๐ด = 0, ๐ = ๐ฆ = ๐ t๐ = 1|๐ด = 1, ๐ = ๐ฆ
Q: How would this definition look if we only wanted to enforce group fairness?A: ๐ t๐ = 1|๐ด = 0 = ๐ t๐ = 1|๐ด = 1
Equal opportunity
โข Suppose ๐ = 1 is the desirable outcomeโข E.g., getting a loan
โข We say a classifier t๐ has equal opportunity if
๐ t๐ = 1|๐ด = 0, ๐ = 1 = ๐ t๐ = 1|๐ด = 1, ๐ = 1
Interpretation: True positive rate is the same for both classes
Weaker notion of fairness ร can enable better utility
How can we create a predictor that meets these definitions? โข Key property: Should be oblivious
โข A property of predictor t๐ is oblivious if it only depends on the joint distribution of ๐, ๐ด, t๐
โข What does this mean? โข It does not depend on training data ๐
Need 4 parameters to define u๐ from ( t๐, ๐ด)
0 1
0 ๐}} = ๐( u๐ = 1|๐ด = 0, t๐ = 0) ๐}p = ๐( u๐ = 1|๐ด = 1, t๐ = 0)
1 ๐p} = ๐( u๐ = 1|๐ด = 0, t๐ = 1) ๐pp = ๐( u๐ = 1|๐ด = 1, t๐ = 1)
Protected attribute ๐ด
Predicted Label t๐
Once our ๐~~โs are definedโฆ
โข How do we check that equalized odds are satisfied? ๐พi u๐ โ ๐ u๐ = 1|๐ด = ๐, ๐ = 0 , ๐ u๐ = 1|๐ด = ๐, ๐ = 1
Compute ๐พp( u๐) and ๐พ}( u๐). (Depends on joint distribution of ๐, ๐ด, t๐ ) They should be equal (to satisfy equalized odds)
Q: What condition do we need for an equal opportunity classifier?
A: The 2nd entries of ๐พp( u๐) and ๐พ} u๐ should match
Geometric Interpretation via ROC curves
Write equalized odds as an optimization
โข Letโs define a loss function โ( u๐๏ฟฝ, ๐) describing loss of utility from using u๐๏ฟฝ instead of ๐
โข Now we can optimize:
โข Objective and constraints are both linear in vector of ๐ values!
What about continuous values?
โข E.g., suppose we use a numeric credit score ๐ to predict binary value ๐
โข You can threshold the score to get a comparable definition of equalized odds
โข If ๐ satisfies equalized odds, then so does any predictor t๐ =๐ผ ๐ > ๐ก , where ๐ก is some threshold
Case study: FICO Scores
โข Credit scores ๐ range from 300 to 850โข Binary variable ๐ = whether someone will default on loan
Experiment
โข False positive โ giving a loan to someone who will defaultโข False negative โ not giving a loan to someone who will not defaultโข Loss function = false positives are 4.5x as expensive as false negatives
Baselinesโข Max profit โ no fairness constraintโข Race blind โ uses same FICO threshold for all groupsโข Group fairness โ picks for each group a threshold such that the fraction of group
members that qualify for loans is the sameโข Equal opportunity โ picks a threshold for each group s.t. fraction of non-defaulting
group members is the sameโข Equalized odds โ requires both the fraction of non-defaulters that qualify for loans
and the fraction of defaulters that qualify for loans to be constant across groups
ROC Curve Results
Profit Results
Method Profit (% relative to max profit)
Max profit 100
Race blind 99.3
Equal opportunity 92.8
Equalized odds 80.2
Group fairness (demographic parity) 69.8
Fair by some definition