demystifying predictive coding technology
Post on 15-Jun-2015
135 Views
Preview:
TRANSCRIPT
Demystifying Predictive Coding Technology
Date: Wednesday, August 13, 2014
Time: 1 p.m. ET / Noon CT / 11 a.m. MT / 10 a.m. PT
Anita Engles, VP Products and Marketing Daegis
Doug Stewart, VP Sales Support Daegis
TAR Defined
A process for prioritizing or coding a collection of
electronic documents using a computerized
system that harnesses human judgments of one
or more Subject Matter Expert(s) on a smaller set
of documents and then extrapolates those
judgments to the remaining Document Population.* Grossman & Cormack 2012
The TAR Frontlines
• Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery (2014)
• Maura R. Grossman and Gordon V. Cormack• http://cormack.uwaterloo.ca/cormack/calstudy/
Key Findings
• Non-Random Selection Methods Work Best for Seed Set
• Active Learning Better than Passive Learning
• Senior Level Subject Matter Experts are NOT Required to Train System
TAR Steps
Process Overview
ProducingTrainingAssessing Results
Creating the Seed
Set
Keyword Searching
Relatedness Scoring
Identifying the
Population
Relatedness Scoring
Building the Map
• Build the MapStep
• Measure Relationships
Purpose
• AlgorithmsVariations
• Core to Predictive Functionality
Why It Matters
Keyword Searching
Tried and True
• Validated & Iterative Keyword Searching
Step
• Inexpensive TrainingPurpose
• Not used in All ApproachesVariations
• Drives EfficiencyWhy It Matters
motorcycle or bike AND ((throttle or accel*) w/10 stick)
Seed Set
Building the Seed Set
• Review Strategically Sampled Docs
Step
• Generates High-level Relevancy “Heat Map”
Purpose
• Random, Strategic, Judgmental Samples
Variations
• Drives EfficiencyWhy It Matters
Predicting Responsiveness
The Prediction Engine
Prediction Engine
Relatedness Map
Seed Set / Search
TrainingDefinitely
Predictive Calls
Responsive?Definitely Not
The three categories of information we know are fed into the system’s algorithm, which evaluates the data to score the likelihood of each document’s being responsive.
Assessing the Results
Building the Answer Key
•Assess Accuracy Based on Industry Standard Metrics Step
•Informs Decision to Stop TARPurpose
•Simple and Stratified Sampling
•Sample Once or Multiple Times
Variations
•DefensibilityWhy It
Matters
Definitely
Predictive Calls
Responsive?Definitely Not
Training / Learning
Continual Refinement
Definitely
Predictive Calls
Responsive?Definitely Not
Refining keyword searches and manually reviewing documents with highest levels of uncertainty moves docs from the middle toward the endpoints.
• Reviewers Train and System LearnsStep
• Transfer Subject Matter Expertise to TAR System
Purpose
• Active Learning• Passive LearningVariations
• Dramatic Cost SavingsWhy It
Matters
Post-TAR
Producing the Responsive Documents• Terminate TAR Review
• Decision based on Accuracy and Cost Metrics
• “Stabilization”• Harvest Predicted Calls• Review Responsive Docs• Sample Non-Responsive Docs• Document Entire Process
Accuracy Metrics
How Accuracy is MeasuredTAR improves the F1 score by moving documents from false (incorrect) bins to the true bins where they belong.
Selected TAR Bibliography
TAR Resources1. Search, Forward: Will Manual Document Review and Keyword
Searches be Replaced by Computer-assisted Coding? (2011)• Judge Andrew Peck• http://www.law.com/jsp/lawtechnologynews/PubArticleLTN.jsp?id=12
025165305342. Technology-Assisted Review in E-Discovery can be More Effective
and More Efficient than Exhaustive Manual Review (2011)• Maura R. Grossman and Gordon V. Cormack• http://jolt.richmond.edu/v17i3/article11.pdf
3. Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (2012)
• RAND Institute for Civil Justice: Nicholas M. Pace, Laura Zakaras
• http://www.rand.org/pubs/monographs/MG1208.html#abstract
15
Thank You!
Q&A
top related