discriminant functions are trained on a finite set of data how much fitting should we do? what...
TRANSCRIPT
![Page 1: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/1.jpg)
• Discriminant functions are trained on a finite set of data • How much fitting should we do?• What should the model’s dimension be?
• Model must be used to identify a piece of evidence (data) it was not trained with. • Accurate estimates for error rates of decision
model are critical in forensic science applications.
• The simplest is apparent error rate:• Error rate on training set
• Lousy estimate, but better than nothing
Decision Model Validation
![Page 2: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/2.jpg)
• Cross-Validation: systematically hold-out chunks of data set for testing • Most common: Hold-one-out
1. Omit a data vector from X
2. Train model,
3. Classify held out observation
4. Repeat for all data vectors
• Simple but give a good estimate
• Lots of literature to back up its efficacy
Decision Model Validation
![Page 3: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/3.jpg)
• C-fold cross-validation: hold out data chunks of size c.• Can become time consuming.
• Typically performance not much better than simple HOO-CV
• Caution! If decision model is sensitive to group sizes (e.g. CVA) cross-validation may not work well.• Should have at the very least, 5 replicates/group
Decision Model Validation
DON’T ARGUE WITH ME!!!!!!!!!!
![Page 4: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/4.jpg)
• Bootstrap: Make up data sets with randomly selected observation vectors (with replacement)• Bootstrap sample is the same size as X• You’ll get repeats
1. Train a decision model with the bootstrapped set• Model should not be sensitive to repeated observations!
• CVA is out!!!!
2. Test model with original X and compute error:
Decision Model Validation
Decision rules builtwith bootstrapped data set
![Page 5: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/5.jpg)
3. Test model with bootstrapped data set X* and compute error:
4. Repeat 1-3, B times:• B should be at least 200
5. Compute average “optimism”:
6. Compute the “refined” bootstrap error rate:
Decision Model Validation
Number of times obs. vect. occurs in X*
*Now Exercise: Explore some data sets with:
boostrap.R cv_boot_testset.R
![Page 6: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/6.jpg)
• t is a test for association between:• xunk, data from an unknown
• Could be from crime scene
• Could be from suspect
• A group of data from a source• Could be from suspect
• Could be from crime scene
• ANY decision rule output by a pattern recognition program can be considered as a test for association
Probabilities
![Page 7: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/7.jpg)
• Codes:• t+/- : Test indicates inclusion/exclusion
• S+/- : Evidence is/is not associated with a source
• Four probabilities are of interest:• Probability that a test yields a positive association
given that there is truly an association between evidence and a source:
• TPR is very important for forensic applications!
Probabilities
= probability of a true positive (TP)= true positive rate (TPR)= probability of a true inclusion= sensitivity
![Page 8: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/8.jpg)
• Probability that a test yields a positive association given that there is truly no association between evidence and a source:
• FPR is very important for forensic applications!
• In traditional hypothesis testing, FPR is sometimes called
• 1-FPR = specificity (TNR): rate at which true exclusions are correctly excluded
Probabilities
= probability of a false positive (FP)= false positive rate (FPR)= probability of a false inclusion
![Page 9: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/9.jpg)
• Probability that a test yields a negative association given that there is truly no association between evidence and a source:
• TNR estimates may be the most useful (and trustworthy) numbers that come out of applications probability to physical evidence...
Probabilities
= probability of a true negative (TN)= true negative rate= probability of a true exclusion= specificity
![Page 10: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/10.jpg)
• Probability that a test yields a negative association given that there is truly an association between evidence and a source:
• In traditional hypothesis testing, FN is sometimes called
• 1-FNR = sensitivity (TPR): rate at which true inclusions are correctly included
Probabilities
= probability of a false negative (FN)= false negative rate (FNR)= probability of a false exclusion
![Page 11: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/11.jpg)
• Summary:
• 1- is called test’s power
• Remember, these are all only ESTIMATES!
An association truly exists, S+
An association truly does not exist, S-
Test indicates an inclusion, t+
True Positive Rate False Positive Rate Type I error
Test indicates an exclusion, t-
False Negative RateType II error
True Negative Rate
Probabilities
![Page 12: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/12.jpg)
Probabilities• Much more difficult to objectively estimate,
but of more interest in Law applications:• Probability that an association exists given a test
indicates an association:
• Probability that an no association exists given a test indicates an association:
Bayes’ Rule Again
Prior probability thatthere is an associationbetween evidence anda source…
Also called positivepredictive value (PV+)
![Page 13: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/13.jpg)
Probabilities• Dividing these, we get the “famous” (positive)
likelihood ratio LR+:
• LR+ can be expressed as:
LikelihoodRatio
Prior odds in favorof association
Odds form of Bayes’ Rule
Posterior odds in favorof association given testindicates inclusion
![Page 14: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/14.jpg)
Probabilities• LR+ interpretations:• Ratio of: probability test indicates inclusion given
a true association vs. probability test indicates inclusion given a true exclusion
• LR+ serves as a multiplier for the prior odds in favor of an association
• LR+ gives relative effect of same source origin odds given a positive test result
![Page 15: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/15.jpg)
Probabilities• Note: In building a decision model, TPR, TNR,
FPR, FNR and LR+ are computed on a per group basis• There is no overall TPR, TNR, FPR, FNR and LR+!
• Value comes into forensic science if one of the groups is a known suspect or crime scene group, AND:• Unknowns are tested against suspect/crime scene
group• Confidence measures in the results are: TPR, FPR and
LR+ computed on the suspect/crime scene group
![Page 16: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/16.jpg)
Probabilities• How can these be used/stated in court?
?Striation pattern foundat a crime scene (CS)
• Same class characteristics as C.S.• Subclass characteristics eliminated from data
Many striation patterns generatedby a tool associated with a suspect (SP)
Include SP set in database (DB) and compute/test discrimination model
• Get TP, FP and LR+ for SP wrt/ DB
• I.D. CS with discrimination model• Result is inclusion or exclusion• TP, FP and LR+ for SP apply
to result• State in court along with size of DB
![Page 17: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/17.jpg)
Receiver Operating Characteristic• In general a classification rule t, applied to a
data point yields a score, t(x) = score
• For two groups, consider score distributions• Two groups can be right vs. wrong, pos vs. neg,
assoc. vs. no assoc., one vs. rest, one vs. one, etc.
score
cut-offscore
![Page 18: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/18.jpg)
Receiver Operating Characteristic• The cut off score is adjustable• Different choices give different TPR and FPR
• Cut off is related to prior
• Changing cut off traces out a curve on a graph of TPR vs. FPR = ROC curve
TPR
FPR0 1
1
AUC = Mann-Whitney U
“chance” diagonal
*Now Exercise: Source roc_utilities.R
play with roc.R for PLS-DA
![Page 19: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/19.jpg)
Receiver Operating Characteristic• “Chance” diagonal: If your ROC curve looks
like this• Score distributions for two groups are right on top
of each other
• 50/50 chance of assigning an unknown to the correct group.
• Area under curve (AUC): Probability of misclassification (estimated test error rate)• AUC range = 0 to 1 (really 0.5 to 1)*
• Gini coefficient: “Degree of inequality of ROC curve from chance diagonal = 2AUC-1
![Page 20: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/20.jpg)
How good of a “match” is it?Conformal PredictionVovk
• Data should be IID but that’s it C
umul
ativ
e #
of E
rror
s
Sequence of Unk Obs Vects
80% confidence20% errorSlope = 0.2
95% confidence5% errorSlope = 0.05
99% confidence1% errorSlope = 0.01
• Can give a judge or jury an easy to understand measure of reliability of classification result
• This is an orthodox “frequentist”
approach• Roots in Algorithmic Information
Theory
• Confidence on a scale of 0%-100%
• Testable claim: Long run I.D. error-rate should be the chosen significance level
![Page 21: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/21.jpg)
How Conformal Prediction works for us• Given a “bag” of obs with known identities and one obs of
unknown identityVovk
• Estimate how “wrong” labelings are for each observation with a non-conformity score (“wrong-iness”)
• Looking at the “wrong-iness” of known observations in the bag:
• Does labeling-i for the unknown have an unusual amount of “wrong-iness”??:
• For us, one-vs-one SVMs:
• If not:
• ppossible-IDi ≥ chosen level of significance
• Put IDi in the (1 - )*100% confidence interval
![Page 22: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/22.jpg)
Conformal Prediction
Theoretical (Long Run) Error Rate: 5%
Empirical Error Rate: 5.3%
14D PCA-SVM Decision Modelfor screwdriver striation patterns
• For 95%-CPT (PCA-SVM) confidence intervals will not contain the correct I.D. 5% of the time in the long run• Straight-forward validation/explanation picture for
court
![Page 23: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/23.jpg)
Conformal Prediction Drawbacks
• CPT is an interval method• Can (and does) produce multi-label I.D. intervals• A “correct” I.D. is an interval with all labels• Doesn’t happen often in practice…
• Empty intervals count as “errors”• Well…, what if the “correct” answer isn’t in the database• An “Open-set” problem which Champod, Gantz and
Saunders have pointed out
• Must be run in “on-line” mode for LRG
• After 500+ I.D. attempts run in “off-line” mode we noticed in practice
![Page 24: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/24.jpg)
• An I.D. is output for each questioned toolmark• This is a computer “match”
• What’s the probability it is truly not a “match”?
• Similar problem in genomics for detecting disease from microarray data• They use data and Bayes’ theorem to get an
estimateNo diseasegenomics = Not a true “match”toolmarks
How good of a “match” is it?Efron Empirical Bayes’
![Page 25: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/25.jpg)
Random Match Probability
{
Distribution of nDs from fragments at crime scene
Distribution of nDs from fragments in population
99% of nDs from crime scene fragments: RMP “window”
Shaded area Prob. random frag from pop would be IDd as CS frag
![Page 26: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/26.jpg)
Random Match Probability• Example
RMP ≈ (0.26 + 0.14 + 0.06)×100 = 46%
Distribution of nDs from glass fragments at crime sceneDistribution of nDs
from glass fragments in population
![Page 27: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/27.jpg)
Random Match Probability• Problems with Random Match Probability
Computations• To get reliable probabilities, need accurate probability
density functions (pdfs)• Higher dimensional pdfs require exponential amounts
of data to accurately fit (curse of dimensionality)
• Overlap in higher dimensions??
• How wide should RMP “windows” be?
• Use distributions for univariate “similarity” measures?
• Different measures correspond to different RMPs!
• No natural choice!
![Page 28: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/28.jpg)
Empirical Bayes’• We use Efron’s machinery for “empirical
Bayes’ two-groups model”Efron
• Surprisingly simple!
• Use binned data to do a Poisson regression
• Some notation:
• S-, truly no association, Null hypothesis
• S+, truly an association, Non-null hypothesis
• z, a score derived from a machine learning task to I.D. an unknown pattern with a group• z is a Gaussian random variate for the Null
![Page 29: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/29.jpg)
Empirical Bayes’• From Bayes’ Theorem we can getEfron:
Estimated probability of not a true “match” given the algorithms' output z-score associated with its “match”
Names: Posterior error probability (PEP)Kall
Local false discovery rate (lfdr)Efron
• Suggested interpretation for casework:• We agree with Gelaman and ShaliziGelman:
= Estimated “believability” of machine made association
“…posterior model probabilities …[are]… useful as tools for prediction and for understanding structure in data, as long as these probabilities are not taken too seriously.”
![Page 30: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/30.jpg)
Empirical Bayes’• Bootstrap procedure to get estimate of the KNM distribution of
“Platt-scores”Platt,e1071
• Use a “Training” set
• Use this to get p-values/z-values on a “Validation” set
• Inspired by Storey and Tibshirani’s Null estimation methodStorey
z-score
From fit histogram by Efron’s method get:
“mixture” density
We can test the fits to
and !
What’s the point??
z-density given KNM => Should be Gaussian
Estimate of prior for KNM
• Use SVM to get KM and KNM “Platt-score” distributions
• Use a “Validation” set
![Page 31: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/31.jpg)
Obs#:1 0.340 0.022 0.033 0.043 0.011 0.013 0.006 0.0092 0.281 0.023 0.028 0.052 0.013 0.014 0.007 0.0103* 0.488 0.024 0.022 0.016 0.008 0.011 0.005 0.0064 0.385 0.021 0.035 0.023 0.011 0.011 0.006 0.0085* 0.523 0.020 0.022 0.017 0.008 0.010 0.005 0.0066 0.478 0.022 0.022 0.020 0.010 0.012 0.006 0.0087 0.451 0.018 0.021 0.026 0.012 0.011 0.007 0.0108* 0.325 0.029 0.036 0.033 0.011 0.015 0.006 0.0109 * 0.592 0.016 0.009 0.015 0.006 0.008 0.004 0.00410 0.279 0.038 0.015 0.027 0.013 0.019 0.008 0.012
Obs#:1 0.022 0.033 0.043 0.011 0.013 0.006 0.0092 0.023 0.028 0.052 0.013 0.014 0.007 0.0103* 0.024 0.022 0.016 0.008 0.011 0.005 0.0064 0.021 0.035 0.023 0.011 0.011 0.006 0.0085* 0.020 0.022 0.017 0.008 0.010 0.005 0.0066 0.022 0.022 0.020 0.010 0.012 0.006 0.0087 0.018 0.021 0.026 0.012 0.011 0.007 0.0108* 0.029 0.036 0.033 0.011 0.015 0.006 0.0109 * 0.016 0.009 0.015 0.006 0.008 0.004 0.00410 0.038 0.015 0.027 0.013 0.019 0.008 0.012
Obs#:1 0.022 0.033 0.043 0.011 0.013 0.006 0.0092 0.023 0.028 0.052 0.013 0.014 0.007 0.0103* 0.024 0.022 0.016 0.008 0.011 0.005 0.0064 0.021 0.035 0.023 0.011 0.011 0.006 0.0085* 0.020 0.022 0.017 0.008 0.010 0.005 0.0066 0.022 0.022 0.020 0.010 0.012 0.006 0.0087 0.018 0.021 0.026 0.012 0.011 0.007 0.0108* 0.029 0.036 0.033 0.011 0.015 0.006 0.0109 * 0.016 0.009 0.015 0.006 0.008 0.004 0.00410 0.038 0.015 0.027 0.013 0.019 0.008 0.012
Obs#:1 0.022 0.033 0.043 0.011 0.013 0.006 0.0092 0.023 0.028 0.052 0.013 0.014 0.007 0.0104 0.021 0.035 0.023 0.011 0.011 0.006 0.0086 0.022 0.022 0.020 0.010 0.012 0.006 0.0087 0.018 0.021 0.026 0.012 0.011 0.007 0.01010 0.038 0.015 0.027 0.013 0.019 0.008 0.012
Obs#:1 0.0332 0.0234 0.0216 0.0067 0.01210 0.015
Obs#:1 0.0332 0.0234 0.0216 0.0067 0.01210 0.015
0.0330.0230.0210.0060.0120.015
Bootstrap sampleTrain SVMGet Platt scores on whole set
Toss KM Platt scoresToss obs. in bootstrap sampleRandomly select a KNM scorefrom each obs.Collect
RepeatBootstrap algorithm to Estimate KNM distribution (The Null)
![Page 32: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/32.jpg)
Estimate of log KNM Platt-score distribution
• Fit of log(KNM) to parametric form helps us avoid plethora of 0 p-values for KM validation set
• “Problem” p-values now
![Page 33: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/33.jpg)
Validation Set
Sample to get a set of IID simulated log(KNM-scores)(“reusing data” less too…??)
Check assumptions on the Null
Uniform Null p-values
Close to N(0,1)Null z-values
Lump together as the “validation set”
Compute p-values for thevalidation set from the fit null
![Page 34: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/34.jpg)
Use locfdrlocfdr Fit classic Poisson regression for f(z)
Use modified locfdr/JAGSJAGS,Plummer or StanStan
Fit Bayesian hierarchal Poisson regressions
z z
Fit local-fdr models
![Page 35: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/35.jpg)
Posterior Association Probability: Believability Curve
12D PCA-SVM locfdr fit for Glock primer shear patterns
+/- 2 standard errors
![Page 36: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/36.jpg)
Bayesian over-dispersed Poisson with intercept on test setBayesian Poisson with intercept on test set
Poisson (Efron) on test set Bayesian Poisson on test set
![Page 37: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/37.jpg)
Bayes Factors/Likelihood Ratios
• In the “Forensic Bayesian Framework”, the Likelihood Ratio is the measure of the weight of evidence.• LRs are called Bayes Factors by most statistician
• LRs give the measure of support the “evidence” lends to the “prosecution hypothesis” vs. the “defense hypothesis”
• From Bayes Theorem:
![Page 38: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/38.jpg)
Bayes Factors/Likelihood Ratios
• Once the “fits” for the Empirical Bayes method are obtained, it is easy to compute the corresponding likelihood ratios.o Using the identity:
the likelihood ratio can be computed as:
![Page 39: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/39.jpg)
Bayes Factors/Likelihood Ratios • Using the fit posteriors and priors we can obtain the likelihood ratiosTippett, Ramos
Known match LR values
Known non-match LR values
![Page 40: Discriminant functions are trained on a finite set of data How much fitting should we do? What should the model’s dimension be? Model must be used to](https://reader035.vdocuments.site/reader035/viewer/2022062517/56649f355503460f94c52ef0/html5/thumbnails/40.jpg)
Empirical Bayes’: Some Things That Bother Me
• Need a lot of z-scores• Big data sets in forensic science largely don’t exist
• z-scores should be fairly independent• Especially necessary for interval estimates around
lfdrEfron
• Requires “binning” in arbitrary number of intervals• Also suffers from the “Open-set” problem• Interpretation of the prior probability for this
application• Should Pr(S-) be 1 or very close to it? How close?