![Page 1: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/1.jpg)
JUDGE, JURY AND CLASSIFIER An Introduction to Trees
15.071x – The Analytics Edge
![Page 2: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/2.jpg)
The American Legal System
15.071x – Judge, Jury and Classifier: An Introduction to Trees 1
• The legal system of the United States operates at the state level and at the federal level
• Federal courts hear cases beyond the scope of state law
• Federal courts are divided into: • District Courts
• Makes initial decision • Circuit Courts
• Hears appeals from the district courts • Supreme Court
• Highest level – makes final decision
![Page 3: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/3.jpg)
The Supreme Court of the United States
15.071x – Judge, Jury and Classifier: An Introduction to Trees 2
• Consists of nine judges (“justices”), appointed by the President • Justices are distinguished judges,
professors of law, state and federal attorneys
• The Supreme Court of the United States (SCOTUS) decides on most difficult and controversial cases • Often involve interpretation of
Constitution • Significant social, political and
economic consequences
![Page 4: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/4.jpg)
Notable SCOTUS Decisions
15.071x – Judge, Jury and Classifier: An Introduction to Trees 3
• Wickard v. Filburn (1942) • Congress allowed to intervene in industrial/economic activity
• Roe v. Wade (1973) • Legalized abortion
• Bush v. Gore (2000) • Decided outcome of presidential election!
• National Federation of Independent Business v. Sebelius (2012) • Patient Protection and Affordable Care Act (“ObamaCare”)
upheld the requirement that individuals must buy health insurance
![Page 5: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/5.jpg)
Predicting Supreme Court Cases
15.071x – Judge, Jury and Classifier: An Introduction to Trees 4
• Legal academics and political scientists regularly make predictions of SCOTUS decisions from detailed studies of cases and individual justices
• In 2002, Andrew Martin, a professor of political science at Washington University in St. Louis, decided to instead predict decisions using a statistical model built from data
• Together with his colleagues, he decided to test this model against a panel of experts
![Page 6: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/6.jpg)
Predicting Supreme Court Cases
15.071x – Judge, Jury and Classifier: An Introduction to Trees 5
• Martin used a method called Classification and Regression Trees (CART)
• Why not logistic regression? • Logistic regression models are generally not interpretable • Model coefficients indicate importance and relative effect
of variables, but do not give a simple explanation of how decision is made
![Page 7: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/7.jpg)
Data
15.071x – Judge, Jury and Classifier: An Introduction to Trees 1
• Cases from 1994 through 2001 • In this period, same nine justices presided SCOTUS
• Breyer, Ginsburg, Kennedy, O’Connor, Rehnquist (Chief Justice), Scalia, Souter, Stevens, Thomas
• Rare data set – longest period of time with the same set of justices in over 180 years
• We will focus on predicting Justice Stevens’ decisions • Started out moderate, but became more liberal • Self-proclaimmed conservative
![Page 8: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/8.jpg)
Variables
15.071x – Judge, Jury and Classifier: An Introduction to Trees 2
• Dependent Variable: Did Justice Stevens vote to reverse the lower court decision? 1 = reverse, 0 = affirm
• Independent Variables: Properties of the case • Circuit court of origin (1st – 11th, DC, FED) • Issue area of case (e.g., civil rights, federal taxation) • Type of petitioner, type of respondent (e.g., US, an employer) • Ideological direction of lower court decision (conservative or
liberal) • Whether petitioner argued that a law/practice was
unconstitutional
![Page 9: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/9.jpg)
Logistic Regression for Justice Stevens
15.071x – Judge, Jury and Classifier: An Introduction to Trees 3
• Some significant variables and their coefficients: • Case is from 2nd circuit court: +1.66 • Case is from 4th circuit court: +2.82 • Lower court decision is liberal: -1.22
• This is complicated… • Difficult to understand which factors are more important • Difficult to quickly evaluate what prediction is for a new
case
![Page 10: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/10.jpg)
Classification and Regression Trees
15.071x – Judge, Jury and Classifier: An Introduction to Trees 4
• Build a tree by splitting on variables • To predict the outcome for an observation, follow
the splits and at the end, predict the most frequent outcome
• Does not assume a linear model • Interpretable
![Page 11: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/11.jpg)
Splits in CART
15.071x – Judge, Jury and Classifier: An Introduction to Trees 5
13
15
17
19
21
23
25
25 35 45 55 65 75 85 95 105 115
Inde
pend
ent V
aria
ble
Y
Independent Variable X
Split 1
Split 2
Split 3
Predict Red Predict Gray
Predict Red Predict Gray
![Page 12: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/12.jpg)
Splits in CART
15.071x – Judge, Jury and Classifier: An Introduction to Trees 10
13
15
17
19
21
23
25
25 35 45 55 65 75 85 95 105 115
Inde
pend
ent V
aria
ble
Y
Independent Variable X
Split 1
Split 2
Split 3
Predict Red Predict Gray
Predict Red Predict Gray
Final Tree
15.071x – Judge, Jury and Classifier: An Introduction to Trees 6
X < 60
Red Y < 20
X < 85
Red Gray
Gray
Yes
Yes
Yes
No
No
No
![Page 13: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/13.jpg)
When Does CART Stop Splitting?
15.071x – Judge, Jury and Classifier: An Introduction to Trees 1
• There are different ways to control how many splits are generated • One way is by setting a lower bound for the number of
points in each subset
• In R, a parameter that controls this is minbucket • The smaller it is, the more splits will be generated • If it is too small, overfitting will occur • If it is too large, model will be too simple and accuracy
will be poor
![Page 14: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/14.jpg)
Predictions from CART
15.071x – Judge, Jury and Classifier: An Introduction to Trees 2
• In each subset, we have a bucket of observations, which may contain both outcomes (i.e., affirm and reverse)
• Compute the percentage of data in a subset of each type • Example: 10 affirm, 2 reverse ! 10/(10+2) = 0.87
• Just like in logistic regression, we can threshold to obtain a prediction • Threshold of 0.5 corresponds to picking most frequent
outcome
![Page 15: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/15.jpg)
ROC curve for CART
15.071x – Judge, Jury and Classifier: An Introduction to Trees 3
• Vary the threshold to obtain an ROC curve
![Page 16: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/16.jpg)
Random Forests
15.071x – Judge, Jury and Classifier: An Introduction to Trees 1
• Designed to improve prediction accuracy of CART
• Works by building a large number of CART trees • Makes model less interpretable
• To make a prediction for a new observation, each tree “votes” on the outcome, and we pick the outcome that receives the majority of the votes
![Page 17: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/17.jpg)
Building Many Trees
15.071x – Judge, Jury and Classifier: An Introduction to Trees 2
• Each tree can split on only a random subset of the variables
• Each tree is built from a “bagged”/“bootstrapped” sample of the data • Select observations randomly with replacement • Example – original data: 1 2 3 4 5 • New “data”:
![Page 18: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/18.jpg)
Random Forest Parameters
15.071x – Judge, Jury and Classifier: An Introduction to Trees 3
• Minimum number of observations in a subset • In R, this is controlled by the nodesize parameter • Smaller nodesize may take longer in R
• Number of trees • In R, this is the ntree parameter • Should not be too small, because bagging procedure may
miss observations • More trees take longer to build
![Page 19: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/19.jpg)
Parameter Selection
15.071x – Judge, Jury and Classifier: An Introduction to Trees 1
• In CART, the value of “minbucket” can affect the model’s out-of-sample accuracy
• How should we set this parameter?
• We could select the value that gives the best testing set accuracy • This is not right!
![Page 20: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/20.jpg)
Predict Fold 3 from Folds 1, 2, 4, 5
Whole Training Set Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
K-fold Cross-Validation
15.071x – Judge, Jury and Classifier: An Introduction to Trees 2
• Given training set, split into k pieces (here k = 5) • Use k-1 folds to estimate a model, and test model on
remaining one fold (“validation set”) for each candidate parameter value
• Repeat for each of the k folds
Predict Fold 5 from Folds 1, 2, 3, 4 Predict Fold 4 from Folds 1, 2, 3, 5
![Page 21: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/21.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fold 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Average
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fold 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
Average
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fold 1
Fold 2
Output of k-fold Cross-Validation
15.071x – Judge, Jury and Classifier: An Introduction to Trees 3
Acc
urac
y
Parameter value
![Page 22: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/22.jpg)
Cross-Validation in R
15.071x – Judge, Jury and Classifier: An Introduction to Trees 4
• Before, we limited our tree using minbucket
• When we use cross-validation in R, we’ll use a parameter called cp instead • Complexity Parameter
• Like Adjusted R2 and AIC • Measures trade-off between model complexity and
accuracy on the training set
• Smaller cp leads to a bigger tree (might overfit)
![Page 23: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/23.jpg)
Martin’s Model
15.071x – Judge, Jury and Classifier: An Introduction to Trees 1
• Used 628 previous SCOTUS cases between 1994 and 2001
• Made predictions for the 68 cases that would be decided in October 2002, before the term started
• Two stage approach based on CART: • First stage: one tree to predict a unanimous liberal decision,
other tree to predict unanimous conservative decision • If conflicting predictions or predict no, move to next stage
• Second stage consists of predicting decision of each individual justice, and using majority decision as prediction
![Page 24: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/24.jpg)
Tree for Justice O’Connor
15.071x – Judge, Jury and Classifier: An Introduction to Trees 2
No
Yes
Is the lower court decision liberal?
Reverse Is the case from the 2nd 3rd, DC or Federal Circuit Court?
Is the Respondent the US?
Reverse
Affirm
Yes
Yes
No
Is the primary issue civil rights, First Amendment, econ. activity or federalism?
Affirm Reverse
No
Yes No
![Page 25: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/25.jpg)
Tree for Justice Souter
15.071x – Judge, Jury and Classifier: An Introduction to Trees 3
No
Is Justice Ginsburg’s predicted decision liberal?
Reverse Affirm
Yes
Yes
Affirm
No Yes No
Is the lower court decision liberal?
“Make a conservative decision”
Is the lower court decision liberal?
Reverse
“Make a liberal decision”
![Page 26: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/26.jpg)
The Experts
15.071x – Judge, Jury and Classifier: An Introduction to Trees 4
• Martin and his colleagues recruited 83 legal experts • 71 academics and 12 attorneys • 38 previously clerked for a Supreme Court justice, 33
were chaired professors and 5 were current or former law school deans
• Experts only asked to predict within their area of expertise; more than one expert to each case
• Allowed to consider any source of information, but not allowed to communicate with each other regarding predictions
![Page 27: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/27.jpg)
The Results
15.071x – Judge, Jury and Classifier: An Introduction to Trees 5
• For the 68 cases in October 2002:
• Overall case predictions: • Model accuracy: 75% • Experts accuracy: 59%
• Individual justice predictions: • Model accuracy: 67% • Experts accuracy: 68%
![Page 28: An Introduction to Trees 15.071x – The Analytics Edge · Data 15.071x – Judge, Jury and Classifier: An Introduction to Trees 1 • Cases from 1994 through 2001 • In this period,](https://reader036.vdocuments.site/reader036/viewer/2022081405/5f0c7a387e708231d435997b/html5/thumbnails/28.jpg)
The Analytics Edge
15.071x – Judge, Jury and Classifier: An Introduction to Trees 6
• Predicting Supreme Court decisions is very valuable to firms, politicians and non-governmental organizations
• A model that predicts these decisions is both more accurate and faster than experts • CART model based on very high-level details of case
beats experts who can process much more detailed and complex information