demystifying predictive coding technology

Demystifying Predictive Coding Technology

Date: Wednesday, August 13, 2014

Time: 1 p.m. ET / Noon CT / 11 a.m. MT / 10 a.m. PT

Anita Engles, VP Products and Marketing Daegis

Doug Stewart, VP Sales Support Daegis

TAR Defined

A process for prioritizing or coding a collection of

electronic documents using a computerized

system that harnesses human judgments of one

or more Subject Matter Expert(s) on a smaller set

of documents and then extrapolates those

judgments to the remaining Document Population.* Grossman & Cormack 2012

The TAR Frontlines

• Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery (2014)

• Maura R. Grossman and Gordon V. Cormack• http://cormack.uwaterloo.ca/cormack/calstudy/

Key Findings

• Non-Random Selection Methods Work Best for Seed Set

• Active Learning Better than Passive Learning

• Senior Level Subject Matter Experts are NOT Required to Train System

TAR Steps

Process Overview

ProducingTrainingAssessing Results

Creating the Seed

Keyword Searching

Relatedness Scoring

Identifying the

Population

Relatedness Scoring

Building the Map

• Build the MapStep

• Measure Relationships

Purpose

• AlgorithmsVariations

• Core to Predictive Functionality

Why It Matters

Keyword Searching

Tried and True

• Validated & Iterative Keyword Searching

• Inexpensive TrainingPurpose

• Not used in All ApproachesVariations

• Drives EfficiencyWhy It Matters

motorcycle or bike AND ((throttle or accel*) w/10 stick)

Seed Set

Building the Seed Set

• Review Strategically Sampled Docs

• Generates High-level Relevancy “Heat Map”

Purpose

• Random, Strategic, Judgmental Samples

Variations

• Drives EfficiencyWhy It Matters

Predicting Responsiveness

The Prediction Engine

Prediction Engine

Relatedness Map

Seed Set / Search

TrainingDefinitely

Predictive Calls

Responsive?Definitely Not

The three categories of information we know are fed into the system’s algorithm, which evaluates the data to score the likelihood of each document’s being responsive.

Assessing the Results

Building the Answer Key

•Assess Accuracy Based on Industry Standard Metrics Step

•Informs Decision to Stop TARPurpose

•Simple and Stratified Sampling

•Sample Once or Multiple Times

Variations

•DefensibilityWhy It

Matters

Definitely

Predictive Calls

Training / Learning

Continual Refinement

Definitely

Predictive Calls

Refining keyword searches and manually reviewing documents with highest levels of uncertainty moves docs from the middle toward the endpoints.

• Reviewers Train and System LearnsStep

• Transfer Subject Matter Expertise to TAR System

Purpose

• Active Learning• Passive LearningVariations

• Dramatic Cost SavingsWhy It

Matters

Post-TAR

Producing the Responsive Documents• Terminate TAR Review

• Decision based on Accuracy and Cost Metrics

• “Stabilization”• Harvest Predicted Calls• Review Responsive Docs• Sample Non-Responsive Docs• Document Entire Process

Accuracy Metrics

How Accuracy is MeasuredTAR improves the F1 score by moving documents from false (incorrect) bins to the true bins where they belong.

Selected TAR Bibliography

TAR Resources1. Search, Forward: Will Manual Document Review and Keyword

Searches be Replaced by Computer-assisted Coding? (2011)• Judge Andrew Peck• http://www.law.com/jsp/lawtechnologynews/PubArticleLTN.jsp?id=12

025165305342. Technology-Assisted Review in E-Discovery can be More Effective

and More Efficient than Exhaustive Manual Review (2011)• Maura R. Grossman and Gordon V. Cormack• http://jolt.richmond.edu/v17i3/article11.pdf

3. Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (2012)

• RAND Institute for Civil Justice: Nicholas M. Pace, Laura Zakaras

• http://www.rand.org/pubs/monographs/MG1208.html#abstract

Thank You!

demystifying predictive coding technology

responsive documents

tar review decision

seed set

predictive calls responsive

smaller set of documents

manual document review

cormack http

grossman cormack

Law

predictive coding: it's here to stay

lossless predictive coding of electric signal waveforms

predictive coding i

deep predictive coding networks for video prediction and...

reduced-gate convolutional lstm using predictive coding

the ethics of predictive coding

brain-inspired predictive coding dynamics improve the

the ultimate predictive coding handbook€¦ · predictive...

linear predictive coding documentation

predictive coding for locally-linear control

predictive coding in sensory cortex - princeton...

predictive coding systems for electronic discovery

linear predictive coding

predictive coding under the free-energy...

introduction to predictive coding - wild apricot to... ·...

predictive coding introduction predictive coding - tu · pdf...

minimizing relative entropy in hierarchical predictive...

utility theory, minimum effort, and predictive coding

presented by rebecca shwayri. introduction to predictive...

predictive coding