streamlining automation in ediscovery - sandra serkes, president & ceo of valora technologies,...

39
Streamlining Automation in eDiscovery Wednesday, November 9, 2016 12:00 pm ET Sandra Serkes, President & CEO Valora Technologies, Inc.

Upload: valora-technologies

Post on 12-Jan-2017

35 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Streamlining Automation in eDiscovery

Wednesday, November 9, 201612:00 pm ET

Sandra Serkes, President & CEOValora Technologies, Inc.

Why Automate?• Cost• Efficiency (value/cost)• Time• Consistency• Accuracy• Repeatability/Best practices/Knowledge

transfer• Defensibility• Return on Investment• Economies of Scale

Forces pushing us towards automation

• Technological advances• Focus on value, particularly cost• Ever larger document/file volumes• Ever more complex data analysis needs• Decreasing time frames• Limited/finite resources• Competition for services• Security concerns• Integration with other business/legal practices

We are entering a time of Routine vs. Specialty eDiscovery Practice

20 GBs of email 2 GB of tweets45 boxes of paperForeign language docsPersonal data

• Automating Tasks• Automating Workflow

Routine v. Specialty eDiscovery• Paper productions• Foreign Language• Audio & Video files• ESI with no metadata• Email attachments• Shared files, loose files• Databases & Repositories • Special application files• Personnel documents• Contracts & Agreements• Stored records• Social media• Multi-party litigation• Sensitive material• And much more….

Moving beyond email messages!

Can we automate specialty eDiscovery?

Specialty eDiscovery Client Use Cases

• AutoIndex 400,000 files per day for 4 months

• AutoRedact SSN & TID from credit applications

• Host online “Bidder’s Library” of 100 years of scanned records

• AutoBusiness Rules for document retention & compliance

• Convert paper medical records to digital format with embedded indexing

• AutoReview 1.5M files for responsiveness, privilege, & hotdocs

• AutoIndex 3M FOIA request documents

• AutoTranslate Japanese, Spanish, French & German docs to English

• Oversee & manage 6-city simultaneous data collection & conversion

• AutoRedact personally identifying information (PII)

Typical Specialty eDiscovery Services• AutoUnitization Ability to distinguish the beginning & end

of documents, as well as determine which documents incorporate other documents as attachments

• AutoCoding Identify and label documents by type (balance sheet, tax form, memo, etc.), relevant people (authors, recipients, cc/bcc), date and subject/title.

• AutoReview Identify and label documents by groupings (dupes/near dupes, conversation threads, issues/clustering) and disposition (responsive, privileged, “hot,” etc.)

• AutoRedaction Ability to identify & markup documents to “black out” select information (such as PII – private identification information, patient data or privileged information)

• AutoTranslation Automatic translation of non-English documents to English text. Supports dozens of originating languages.

• AutoTranscription for Audio & Video Files Automatic transcription of audio & video files to corresponding text files. Mutilple file type support.

• Hosting, Database Creation & Data Visualization Hosting of pre/post-processed documents in BlackCat or other (iConect, Relativity, etc.). Intuitive, graphical presentation of data with easy navigation, understanding and manipulation of document subsets. Good for Early Case Assessment.

• AutoBusinessRules Identify and label documents by workflow treatment, retention plans, compliance audit or other groupings. Useful for DocReview, retention and compliance dispostion.

• Electronic File Processing (EFP) File Conversion to TIF/PDF format, text and metadata extraction, de-NISTing, cross-custodian de-duplication, filtering/culling, analytics

• OCR Optical Character Recognition for converting images to searchable text

• NearDuplicateDetection Identify documents that are highly similar, if not identical across custodians and the entire population. Includes cross-correlation of paper & electronic documents

• EmailThreading /Dethreading Join separated email conversation threads into a consistent stream from start to finish. Separate threaded emails into component threads

• Scanning Image conversion for paper documents into electronic image format (TIF, PDF, JPEG, etc.)

• Professional Services Options for Project Management, Technical data/file manipulation, Subject Matter Expertise, Resources & Worfklow Design & Management

Document Intake &

Submission

Submission to Valora

• Files/Docs arrive• Log COC, inventory• Tracking closed• Email acknowledgement

PH Receipt & Pre-Process

Manual 1: LDD & Triage

PowerHouse Automation

Manual 2: QC

Export & Delivery

• AutoCoding• Rules• NearDupe• AutoReview• QC Assignment

PH Post-process &

Ship

Valora Suggested Workflow Process

• Load data to systems • Autotriage (reject/alert)• AutoTranslate to English• AutoTranscribe • AutoLDD• ND Store check• Tracking ID assignment

• Error Handling• LDD QC• Special Instructions• Q&A

• FTP, SFTP• Drag & drop• Email bounce• Media (drive/DVD)• Send boxes

• Coding QC• GQC & Audit• Work Assignment• Template/Rules ID• PH Tools• Ready to Ship

• Number Assignment• Data Integrity Checks• ND Store• Export & Ship

• Prep load file(s)• Load to BlackCat• Prep shipment package• Ship & track• Confirm receipt

Client

Valora

How does Automation Work?• Processing (aka Intake) is the process of “ingesting” data into an analytics engine

– Creating OCR for scanned images– Extracting text for native files & email– Speech to text for audio/video files– Translating content to English– Re-ordering or re-aligning pages– Applying redactions

• Tagging (aka Coding, Indexing, Sequencing) is the process of extracting key information and attributes about each document– Document Type, Important Dates– Key Names & Phrases– Topics, Keywords & Themes– File, Content and DocType attributes– Relation to other documents (duplicate, related, attached, contradictory, etc.)

• Disposition (rules) is the process of creating a destination or status for each document– Retention status & duration– Folder (taxonomy) location– Labelling & keywords display

native text

text fielded data

fielded data disposition

Intake – PowerHouse – Output

PH Web Portal

Folder Taxonomy

Hosted Repository

Shared Server Poll

OCR/Text Extraction

Translation/Transcription

Unitization

Coding/Tagging

Rules/Disposition

Redaction

Exceptions

PowerHouse Portal

Users drag & drop files into the portal for immediate, automatic

loading into PowerHouse.

PowerHouse responds with an automatic acknowledgement

email.

Automating eDiscovery & Beyond

INDEXING/TAGGING

ANALYSIS/RULES

PRESENTATION

Date, Author, Patent # …

Year Total, Hot Doc, Priv…

BlackCat, Relativity, .CSV …

AutoIndexing

AutoBusinessRules

Analytics

Database Prep

How AutoCoding Works

Docs enter the system as extracted or OCR’ed text

Data is extracted from each document into a

database table

DocType = Patent ApplicationDate = 10/18/2007

Date Format = US

Author = Patent Authors, Author City, Author Country

Assignee = RIM

Tone = Neutral to slightly positive

Embedded Graphic with Title

Other Data Capturable Data Elements:• Patent Number• Filing Date• Key Phrases & Terms• Managing PTO• Implied/Attached Docs• Bar Code Present• And many more . . .

INDEXING/TAGGING for eDiscovery

AutoIndexing

AutoUnitization

AutoBusinessRules

Analytics

Database Prep

How AutoUnitization Works

Docs enter the system with physical (or no) boundaries

Documents are separated down to the unit document level

AutoReview Defined

• AutoReview is the iterative application of software and technique to capture information about documents– “Protective” Fields: Privilege, IP/Trade Secret, Confidential,

Non-responsive/Irrelevant, Work Product/Attorney Notes, Suppressed– “Producable” Fields: Responsive, Issue/Category/Filter/Cluster, Duplicate/Near

Duplicate– Categorizing /Grouping Fields: Duplicate/NearDuplicate, Conversation Thread,

Issue/Category, Hot– Privacy & Protection (Redaction): Privileged portions, customer/patient data,

financial info, Private identification information (PII)• Emerging flavors of AutoReview, Technology-Assisted Review (TAR)

– Valora one of a handful of true Service Providers– Uses software & OCR/extracted text, metadata and Statistical Pattern-Matching

• Generally accepted that AutoReview is faster and lower cost than manual review, with higher quality

ANALYSIS/RULESFor eDiscovery

Litigation Document Review Manual

Determining ResponsivenessThe document should be marked responsive if any of the following conditions are present:• Mentions or discusses the specific protocol for handling simultaneous voice and data actions• Is a design document or graphic that shows the specific protocol for handling simultaneous voice and data actions• Discusses or is related to patent ‘009• Mentions Apple Inc. or Apple Computers, Inc. or is a communication from/to anyone at Apple Computer, Inc., or apple.com.• And so on…

-7-

Rule: Responsive for Protocol DiscussionWhen: [FullText] contains any of <Voice protocol key phrases 12> and [FullText] contains any of <Data protocol key phrases 25> and [DocType] is not any of [Brochure, Press Release, Website], ...

Indexing/Tagging

Rule: Responsive for Patent ‘009When: Any document in the Attachment Family matches: [FullText] contains any of <Patent '009 key phrase list 4>, or Parent of Attachment Family matches: Any of [Author, Recipient, CCs] contains any of <Patent '009 experts contact list 23>, …

Rule: Responsive for AppleWhen: [FullText] contains (fuzzy match) any of <Apple key phrase list 7>, or Any of [Authors, Recipients, CCs] contains any of <Apple contact list 15>, or [Author] matches "*@apple.com“ …

AutoTranslation Defined

• Universal Translation to/from 65 languages– Software performs the translation per Google’s licensed translation

engine– Ex: Non-English converted to English

• Multiple choice presentation– Original language, translated language(s), or both– Presentation can include Redactions

• Available for all kinds of further processing– Convert to English, then:– Apply AutoCoding, AutoBusinessRules or AutoRedaction– Perform NearDuplicates, Filtering and Culling, or Content Clustering

• Save on expensive manual translation hours!

PowerHouse

Intake

AutoTranslation

AutoIndexing

AutoBusinessRules

Analytics

How AutoTranslation Works

Docs enter the system in their native language

Docs convert to searchable English (or other target)

What it looks likeOriginal text AutoTranslated text

DINHEIRO DIGITAL05-01-2015 as 07:53

China deixa de controlar precos dotabacoA China aboliu o controlo de precos da folha de tabaco, o ultimo produto agricola a ter limites, anunciou este fim de semana a Comissao Nacional de Desenvolvimento e Reforma, o principal organismo de planeamento economico da nacao asiatica.O prego da folha do tabaco e, no entanto, apenas urn pequeno fator no custo total dos cigarros - urnmonopolio estatal na China -, o que torna improvavel que haja efeitos significativos para os fumadores.O Governo chines tern tentado reduzir o consumo de tabaco mas as medidas tern tido urn impacto limitado.O tabaco esta antra os 24 produtos e servigos cujo controlo de custo foi removido, incluindo tambem transporte ferroviario de carga a granel, do envio de encomendas por correio, transporte de passageiros e fabrico de explosivos para use civil.A empresa estatal China Tobacco tern o monopolio da produgao de cigarros mas o prego do tabaco sera determinado de acordo corn o <<oferta e procura industrial e corn os custos e lucros da empresa>>, disse a Comissao, ern comunicado.De acordo corn declaragoes, hoje publicadas, do dirigente da Comissao Wang Shengmin ao jornal China Daily, a China produz cerca de 2,5 milhoes de toneladas de tabaco por ano.

#==============================================================##== Valora Technologies, Inc. AutoTranslation ==##== The following text has been auto-translated to English ==##== From Portuguese ==##==============================================================#

DIGITAL MONEY05/01/2015 at 07:53

China no longer controls prices oftobaccoChina abolished the control of tobacco leaf prices, the lastagricultural product to have limits, announced this weekend the CommissionNational Development and Reform Commission, the main bodyeconomic planning of the Asian nation.The tobacco leaf of the nail and, however, only a small factor in the total cost of cigarettes - aState monopoly in China - which makes it unlikely that there are significant effects on theSmoking.The Chinese government tried to reduce the consumption of tobacco but the measures taken are an impactlimited.Tobacco this and 24 products and services whose cost control was removed, includingalso transport by rail bulk cargo, shipping, mail order, transportpassengers and manufacture of explosives for civil use.The state-owned China Tobacco tern the monopoly of production cigarettes but tobacco nailwill be determined according the corn << supply and industrial demand and corn costs and profits>> company, said the Commission, in a statement.Of corn declarations agreement, published today, the head of the Commission Wang Shengmin the newspaperChina Daily, China produces about 2.5 million tons of tobacco per year.

Why bother with AutoTranslation?

• Far more cost-effective than manual translation

• Often the “gist” of a document is good enough to make decisions

• AT text is strong enough for automated processing– AutoCoding– Rules & Analytics– Classification & Workflow

• Speed! 10,000 pages/hr• Easily tag documents that

must be manually translated, leave the rest as AT.

• Analogous to Early Case Assessment, similar to OCR & other tagging technologies

• Note: there will be errors and omissions.

AutoTranscription Defined

• Automated transcription of captured speech (audio)– Software performs the transcription per IBM’s Watson licensed translation engine– Technology commonly known as speech-to-text, similar to OCR (image-to-text)

• Multiple choice presentation– Simple text– Standard legal deposition transcript format– Time stamps option– Presentation can include Redactions – Video stills option

• Available for all kinds of further processing– Convert to text, then:– Apply AutoCoding, AutoBusinessRules or AutoRedaction– Perform NearDuplicates, Filtering and Culling, or Content Clustering

• Save on expensive manual transcription hours!

PowerHouse

Intake

AutoTranscription

AutoIndexing

AutoBusinessRules

Analytics

How AutoTranscription Works

Audio (& video) files enter the system in their native

format

Docs convert to searchable text format.With time stamps, stills, formatting, redactions, as needed.

China no longer controls prices of

tobaccoChina abolished the control of tobacco leaf prices, the

lastagricultural product to have limits, announced this

weekend the Commission

National Development and Reform Commission, the

main bodyeconomic planning of the Asian nation.

The tobacco leaf of the nail and, however, only a

small factor in the total cost of cigarettes - a

State monopoly in China - which makes it unlikely

that there are significant effects on the

Smoking.The Chinese government tried to reduce the

What it looks likeOriginal video AutoTranscribed text

Cigarette_Smuggling.mp4

ABC agents are joining the fight to try and crack down on cigarette smuggling. We're not talking about one or two packs but hundreds of cartons out of Virginia. We're told it's a crime, it's big business now and criminals are cashing in. Their new abode takes us into the world of cigarette smuggling. Cigarettes aren't legal but what kind like this may cost you anywhere from thirty to forty five dollars in Virginia, in New York City it brings that's nearly one hundred and fifty. Criminals are making a lot of money by buying cigarettes here and then selling them illegally up north. It's become such big business. It's become a money-making game for them. I figured I'd give it a whirl. Cigarette smuggling according to the Virginia state crime Commission has become more profitable than cocaine heroin marijuana and guns.

Why bother with AutoTranscription?

• Far more cost-effective than manual transcription

• Often the “gist” of a document is good enough to make decisions

• AT text is strong enough for automated processing– AutoCoding– Rules & Analytics– Classification & Workflow

• Speed! 10,000 pages/hr• Easily tag documents that

must be manually transcribed, leave the rest as AT.

• Analogous to Early Case Assessment, similar to OCR & other tagging technologies

• Note: there will be errors and omissions.

• Automated redaction of “offending” text or phrases– Software performs the redaction based on Rules

• Multiple choice presentation– Image, text or both– Solid Black, Black with white writing, Translucent Yellow, Translucent

Gray• Available for all kinds of information

– List provided or “derived” from tags– Ex: SSN, DOB, Name, Age, Address, Account Number, Product

Name/ID…• Unlimited redactions in a single document

AutoRedaction Defined[REDACTED]

What kind of redaction makes sense?

Serkes Sandra 123-45-6789 226-588-98

• Should redactions be visible: always, sometimes or never?• Does someone need to approve or override system

redactions?

What kinds of information can be AutoRedacted?

• PII – names, addresses, DOB, SSN• Financial – account number, credit card info, mortgage files• Non-class action personnel & info• Product names, brand names, makes/models• Organizational names & information• Locations, addresses, lat/lon, IP addresses• Concepts & issues

Best bets are formulaic data or lists of info

BlackCat Screen Shots

Now that we’ve covered Automating the Tasks of eDiscovery, let’s Automate the Process of

Specialty eDiscovery

Why have a strong relationship with a Specialty Provider?

• When the crisis comes, you want us to be– Pre-vetted (Preferred Services Provider)– Familiar with your workflow, processes, load files, terminology, etc.– Ready to go quickly

• Best to have all “regular” workflow locked down and provided from same place. Same is true with specialty work.– 1 go to source– All specialty services, no matter how oddball– Able to adapt to unique (specialty) circumstances

• Ability to control and predict costs– Options for preferred pricing, onsite/offsite/SaaS

• Ability to completely customize the workflow to your clients’ needs

Specialty eDiscovery Pricing Models

• One-off Projects– Standard transactional services price list– Bulk Discounts available for high volume & resale

• Regular, Monthly Usage– Custom, discounted pricing at multiple tier options

• Product Licensing– PowerHouse– BlackCat– Professional Services

One-off or Subscription?

• More than 1-2 small specialty matters per month?• More than 1 “whopper” specialty case per year?• Repeat specialty cases (or tasks)?• Distributed offices and clients?• Lean litigation support team?• Prior subscription or on premise product purchases?

If you answered, “yes” to any of the above, it’s time to think about streamlining specialty matters with subscription based pricing models.

Subscription or On Premise (or Cloud) Product?

• More than 3-4 small specialty matters per month?• More than 3 “whopper” specialty cases per year?• Non-litigation uses? (Records, Info Gov, Knowledge Mgmt)• Integration with other systems? (iManage, Aderant, Conflicts

DB, case management)• Strong IT support?• Lit Support/eDiscovery as a profit center?

If you answered, “yes” to any of the above, it’s time to think about streamlining specialty matters with on premise models.

Conclusions

• Important to make distinctions between Routine and Specialty eDiscovery

• Many, many capabilities can now be mostly or fully automated

• eDiscovery is converging with other document/file-centric disciplines

• Important to evaluate your technical tool needs in advance of your budget cycle– Goes against case-by-case, one-off utilization

• Increasing role of consultants and non-attorney/counsel players

Valora Technologies• Bedford, MA software firm specializing in machine-assisted

document processing capabilities (aka analytics)– World experts in the automated analysis, indexing, mining and presentation

of documents, data & content– 20 staff, 200+ clients, 1,500,000+ pages every week

• Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies)

• Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data”

• Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics.– Provide cost competitive document analytics solutions in the United States– Provide efficient, world-class, targeted solutions to data, document & content utilization problems

The power of Big Data is the story about the ability to compete and win

with few resources and limited dollars. - Forbes, March 2012

(this is Valora’s story, too)

“”

Legal/Litigation/eDiscovery Problems

• Too many documents to review, cull & produce by hand

• Cost-effective alternative solutions to contract attorney & offshore labor “armies”

• Missing, poor, or ineffective metadata• Re-unitization, organization, indexing &

redacting of documents• Bridging multi-language document

populations to English

Records Management Problems• Help automate defensible deletion efforts

for IG• Organize & control loose documents on

shared drives, desktops, networks & devices

• Eliminate expensive and information-poor storage options

• Serve as automated intake for multiple content generation sources

Business Intelligence Problems• Organize & control decades of contracts &

agreements• Provide brand integrity/protection data

mining of public/private documents• Forecast & trending of topics, people &

locations over time• Loose, shared files analysis & control

Health Care Problems• Heavy expense & time converting hardcopy

medical records to EMRs/EHRs• Cannot keep up with fax server data

collection• Cost effective alternative solutions to

“armies” of temp data entry coders

Typical Problems Valora Solves

Who We Serve• Corporate Legal Departments with complex

document/data/content management needs– Litigation– Risk Exposure– Compliance– Records– Information Governance

• Government Agencies with limited resources for document/data/content monitoring, analysis, management – Litigation– Investigations– Compliance– Records

• Their Advisory Counsel The law firms, consultancies and service providers who support these entities

Thank You!

For More Information:

Valora Technologies, Inc.101 Great Road, Suite 220

Bedford, MA 01730781.229.2265

[email protected]