best practices in multi-language discovery and investigations
TRANSCRIPT
877.557.4273catalystsecure.com
PresentersBest Practices in Multi-Language Discovery and Investigations
David SannarJonathan Hiroshi Rossi Chihiro SuzukiThomas Gricks
JULY WEBINAR
Jonathan founded The CJK Group as an Asian language resource for law firms and corporate legal departments handling Asian-language discovery. He’s experienced in a variety of large-scale matters and managing multi-language review teams.
Jonathan Hiroshi Rossi
A prominent e-discovery attorney and one of the leading authorities on the use of TAR in litigation, Tom advises our clients on best practices for applying Catalyst's TAR engine, Insight Predict, to reduce the total time and cost of discovery.
Thomas Gricks
Chihiro is an invaluable bridge between Catalyst’s clients in Asia, their U.S. counsel, and internal teams in Asia and the United States. Chihiro is bilingual and has a gift for breaking down language and cultural barriers.
Chihiro Suzuki
Speakers
Managing Director, Professional Services, Catalyst
Founder and CEO, The CJK Group
Manager, Asia Project Consultants, Catalyst
A veteran e-discovery executive with extensive experience in Asia and the Pacific, Dave Sannar is responsible for all Catalyst operations and business growth throughout Japan and Asia, including our operations and data center in Tokyo.
David Sannar VP of International Development, Catalyst
Agenda
Preparing for a multi-language review or investigation Five steps to setting up a multi-language review project
1
2
3
4
5
Collections
Processing
ECA and Search
Review Workflows
Technology
What We See Happening
Far too often, insufficient planning goes into a multi-language review.
Even for a single-language review, millions of dollars are spent on the linear and manual review portion alone.
Most, if not all reviews, are managed by English-speaking project managers.
Corporations are turning to e-discovery tools for investigations.
For FCPA cases, multi-language document handling is a critical capability.
Initial Review Planning Questions
How many languages exist in the review population?
Which language is the most dominant? How many reviewers can you or your vendor
mobilize per language? How should the documents flow? What kind of review tags are required for your
multi-language review? What documents should be translated, and when? Should you dedicate a bilingual attorney project
manager?
Best Practices
Have a core team and system in place so you’re prepared ahead of time.
Dedicate a bilingual attorney project manager. Use the proper technology to make the process
more efficient.
An English-only project manager cannot accurately evaluate reviewers for QC or resolve delays effectively.
Data Collection
In a multi-language matter, the collections process raises technical, legal and linguistic issues.
Decide WHAT to collect before you decide HOW to collect.
Custodian interviews help identify the right data to collect.
Understand Data Privacy and Cross Border Data Transfer laws for the jurisdiction.
Scheduling and coordinating the data collection takes time. Expect delays.
Areas of Challenge in Multi-Language Data Collection
Whose Data to Collect: Decide what to collect before you decide
how to collect!
Who are your priority custodians?
Which custodians will you interview?
What to Collect: PCs, smartphones, tablets, external drives,
backups, servers, social, audio (especially with financial matters).
Check email clients and applications: Any proprietary or unique applications being
used?
Check encryption: Many Asian companies use encryption.
Check in advance and plan decryption prior to data collection.
How to Collect: Using a forensic tool is best to avoid potential
problems such as garbled characters (@#$?? ). ◆ Employing forensic methodologies provide
defensible data collection.
Legal Issues in Data Collection
Be aware of any local data privacy laws. Be aware of cross border data transfer
laws and customs. Is there any data that you need to keep out
of U.S. jurisdiction? Are there local vendors you can trust? How are you vetting third-party data
collection vendors?
Great Resources
DLA Piper’s Data Protection Laws of the World
Linklaters Data Protected
EU General Data Protection Regulations
Processing and Language Identification
Language Identification Can be challenging. Use the right technology to identify
the primary and secondary language of each document.
Essential to preparing for a multi-language review as language ID can help direct workflow.
Tokenization Chinese, Japanese, Korean and other languages that do
not use spaces between words need to be “tokenized.”
Precise tokenization is important—some tools only group two or three characters at a time without considering the correct sequencing of characters.
Tokenized words are used for accurate searches and correctly ranking documents using TAR.
Some vendors claim that 60% accuracy is sufficient in language identification. This should be a red flag.
You—or your vendor—must correctly identify the languages in your project. It’s critical to accurate processing.
ECA and Search
Could be used to assess actual languages present and distribution. Growing demand for internal investigations—especially with TAR and analytics tools for time
and cost savings. Use bilingual project managers and iterative searching—check for slang, product names,
cultural and linguistic factors, etc.
Use for identifying hot keywords, getting familiar with the terminology and nuances used in the data (in the language).
This knowledge can be used to refine keywords in the language (rather than translating).
Making a responsiveness judgment while learning the data will prepare training seeds for TAR.
This can help you plan the review workflow and logistics.
Benefits
Avoid the Pitfalls
Do not simply translate keywords from English into another language.
Cultural and linguistic nuances
Variations of terms (e.g. meeting could be: ミーティング , 会合 , 打ち合わせ , 打合せ , 会議 , etc.)
Slang, jargon, uncommon terminology
Get a bilingual review manager and project manager involved in an early stage to help the case team refine the keywords in the targeted language.
Engage a search expert at your e-discovery vendor so that any tweaks required in searches can be discussed and refined.
Wildcards? Single-byte v. double-byte? Variations of the same term? Simplified characters?
Multi-Language Document Review
How to leverage a bilingual review project manager on multi-language reviews
A primer on review workflow and quality control Servicing case teams no matter what the
language Select the right vendor and ask the right
questions
Review Workflows
Establishing an index FAQs Decision log Translation team The experts Always be QC’ing
Can the e-discovery platform’s UI be localized? It’s much more efficient for native / nationals to use an
interface in their own language.
Best Practice: Dedicate a Bilingual Project Manager
Benefits Directly make targeted searches for quality
control. The decision-making cycle is faster. Defensible and consistent relevancy calls
and issue coding. Quality privilege applied uniformly. Hot and significant documents identified
properly. Ethics in multi-language document review. Marshal the facts quickly and guard against
inadvertent production of critical client ESI.
For a TAR workflow, the project manager should be able to interface with the technical team, the review team, the case team, and have the fluency not just in English and CJK, but also in non-linguistic areas.
Characteristics of a Bilingual Review Project Manager
Licensed attorney
Certified Foreign Language Project Manager
Fluent in at least 2 languages
Trained in multiple review platforms
Has reviewed documents in Chinese, Japanese or Korean
Experience conducting document review in many practice areas such as antitrust, IP, FCPA, contract dispute, etc.
Often a former practicing lawyer
Experienced managing people, budgets, and deadlines; deep insight into the culture of the foreign language; and familiar with the proper use of e-discovery technology
Typically has Project Managed 500+ hours of a foreign language review
Technology
TAR works with multi-language documents and CJK languages; you just need to do your homework.
TAR 2.0 (continuous active learning) can be used with machine-translated training docs.
Okay to combine multi-language review into one TAR project IF (1) using CAL, and (2) reviewing to high recall. Cf. Cormack & Grossman SIGIR 2015.
By properly preparing documents with correct language identification and intelligent tokenization, Catalyst TAR 2.0 using continuous active learning has been proven to accurately and effectively rank multi-language documents with CJK languages.
TAR 2.0 Start immediately with whatever and whomever; review = training
Multi-Language Workflows: Parallel Projects
Ranking the same population Separate ranking projects for each language Multi-language training documents are shared
Multi-Language Workflows: Shared Projects
Ranking the same population One project for both languages Primary language filters route top-ranked docs to appropriate team
Q&A
Q & A
Jonathan Hiroshi Rossi
Thomas
GricksChihiro Suzuki
David Sannar
Best Practices in Multi-Language Discovery and Investigations