november 17, 2005trec 2005 trec-2006 legal track planning session jason baron dave lewis doug oard

29
November 17, 2005 TREC 2005 TREC-2006 Legal Track Planning Session Jason Baron Dave Lewis Doug Oard

Upload: dinah-hoover

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

November 17, 2005 TREC 2005

TREC-2006Legal Track Planning Session

Jason Baron

Dave Lewis

Doug Oard

Welcome to TREC-MOOT

Representing the plaintiff (“Benzo” Pyrene): – Jason Baron, J.D.

Representing the defendant (Phillip Norris, Inc.):– David D. Lewis, Ph.D.

1.Complaint

2.Production Request

3.Query negotiation

INTRODUCTION1. Benjamin A. Pyrene, on behalf of a class of individuals injured In childhood and adulthood by the effects of second hand smoke, brings this action to enjoin defendant tobacco companies from continuing to make false and misleading statements regarding the health consequences of second hand smoke, in violation of the Commonwealth of TREC’s Fraud Statute, as well as provisions of the tobacco Master Settlement Agreement (“MSA”), and the Consent Decree and Final Judgment (“Consent Decree”) entered into by the Commonwealth of TREC and approved by the Court on October 12, 2005. Plaintiffs request a finding of contempt for violation of the Consent Decree and imposition of monetary sanctions, civil penalties, and the costs, including investigating and pursuing this action.PARTIES2. Plaintiff Benjamin (“Benzo”) A. Pyrene brings this action on behalf of a nationwide class of individuals injured in childhood and adulthood by defendants’ actions. Mr. Pyrene resides at 12 Combustible Way, Commonwealth of TREC.3. Defendant Philip Norris Inc.is a Commonwealth of TREC corporation with its principal place of business in the city of Kendall, County of Tau, Commonwealth of TREC. Other corporations are as identified in the attachments to this Complaint.JURISDICTION4. This Court has jurisdiction pursuant to 1 Comm. Trec Sec. 1956, the Consent Decree, and under the MSA, section VII(a).BACKGROUND5. According to information and belief , second-hand smoke ranks third as a major preventable cause of death behind only active smoking and alcohol. Second-hand smoke is the smoke that individuals breathe when they are located in the same air space as smokers. Second-hand smoke is a mixture of exhaled …

Complaint

TOBACCO COMPANIES’ ACTIONS12. Philip Norris has made numerous representations since the filing of the MSA and Consent Decree regarding

the lack of danger from secondhand smoke. A complete listing of these misrepresentations, provided in a table at imaginary Attachment 1, showing time and place of each misrepresentation as well as a summary of the content of the misrepresentation.

COUNT I (Consumer Fraud – Deception)13. Defendants have engaged in a pattern or practice of deceptive acts or practices in violation of the above-

referenced statutes, by making false or misleading representations about the reduced health risks associated with second hand smoking,

COUNT II (MSA)14. Defendants actions in misrepresenting the effects of second hand smoke violate the MSA at section III(r),

because they are material misrepresentations of fact regarding the health consequences of using a tobacco product.

COUNT III (Consent Decree)15. Defendants actions in misrepresenting the effects of second hand smoke violate the Consent Decree,,

because they are material misrepresentations of fact regarding the health consequences of using a tobacco product.

RELIEF REQUESTEDWherefore, Plaintiffs request that this Court enter the following relief:Declare Philip Norris et al. violated the MSA and Consent Decree by making statements that are false regarding

the effects of second hand smoke, which in turn have created a substantial risk of harm to consumers.Permanently enjoin defendants, their officers, agents, servants, employees and attorneys, and those persons in

active concert of participation with them who receive actual notice of the injunction, from representing in any manner, expressly or implicitly, directly or indirectly, in connection with the manufacturing, advertising, packaging, labeling, promotion offering for sale, sale, or distribution of cigarettes or any other tobacco product for which it does not possess competent and reliable scientific information sufficient to support such representation, that exposure to second hand smoke is perfectly safe.

Enter an order imposing monetary sanctions and a Civil Contempt Order for violations of the Consent Decree and MSA.

Impose a civil penalty of $1,000 for each violation of State law.Defendants to pay costs and expenses, including attorneys’ fees, in connection with the investigation and

litigation of this matter. …

TREC-MOOT

Representing the plaintiff (“Benzo” Pyrene): – Jason Baron, J.D.

Representing the defendant (Phillip Norris, Inc.):– David D. Lewis, Ph.D.

1.Complaint

2.Production Request

3.Query negotiation

BENZO A. PYRENE v. PHILIP NORRIS – REQUESTS TO PRODUCE PROPOUNDED BY PLAINTIFFS PURSUANT TO FED. R. Civ. P. 34

1. All documents referencing scientific research on the effects of second hand smoking

2. All documents that expressly link second hand smoke to being a medical health hazard.

3. All documents that discuss one of the following topics plus expressly reference second hand smoking:

• Sidestream smoke• platelet activation, • abnormalities of vasodilation. • Injury to the arterial lining, • atherosclerosis, • benzo(a)pyrene • butadiene.

4. All documents showing that senior level management at Philip Norris were aware of the dangers of second hand smoking.

5. All documents referencing asthma in children.

6. All documents referencing smoke free ordinances governing public places. …

Production Request

Welcome to TREC-MOOT

Representing the plaintiff (Mr. “Benzo” Pyrene): – Jason Baron, J.D.

Representing the defendant (Phillip Norris, Inc.):– David D. Lewis, Ph.D.

1.Complaint

2.Production Request

3.Query negotiation

Shifting the Rules of the Game

• Classic IR– Goal: satisfy a visceral information need– Understanding of need evolves during search– Personal view of relevance

• E-Discovery– Goal: identify a set of responsive documents– Negotiated information need – Agreed / defensible / explainable process

Key Stakeholders

• E-Discovery participants (Sedona Conference)– Judges– Law firms– Regulatory agencies– Technology providers

• IR research teams (TREC)– Negotiated information needs– Different genre (document images, metadata, …)

Primary-Source SearchUse Cases in Legal Applications

• Two-party [Legal Track focus]– “Discovery”

• Negotiate relevance definition, search process• Contract lawyers identify relevant documents• Partners review relevant docs for “privilege”

– Regulatory / Oversight investigation– Freedom of Information Act (FOIA)

• One-party [not our focus]– Risk assessment

Process Requirements(in order of decreasing importance)

• Two-party– Negotiated (not personal) information needs

• Recall-oriented– “Smoking gun detection”

• Explainable– Quantifiable comparison to present best practice

• Affordable– Minimize amount of human review

Current E-Discovery Process

BooleanRetrieval

Query

Result Set

Review

Delivery

ResponsiveDocuments

QueryFormulation

Indexing Index

Acquisition Collection

Possible E-Discovery process

SourceSelection

RankedRetrieval

Query

Selection

Ranked List

IncrementalReview

ResultSet

Delivery

ResponsiveDocuments

QueryFormulation

IR System

Indexing Index

Acquisition Collection

Collection Options• Tobacco (IIT/UCSF) [Discovery]

– 3-7 million scanned documents, diverse genre• Good OCR available for >1 million documents

– Expert judges and assessment system exist

• Enron (FERC) [Regulatory Oversight]– ~100,000 emails, attachments, phone transcripts– Sample topics exist, judgment will be hard

• State Department (National Archives) [FOIA]– ~500,000 “cables” (messages)

IIT/UCSF Tobacco Collection

• 7 million scanned documents– Distributed in “standard” TREC XML format– Probably on a few DVD’s

• Some form of OCR for half the collection– Possibly from two systems

• Metadata fields for the full collection– People, date, source, …– 7 company-specific DTD’s

• Goal is to search the full collection– OCR-subset results will also be reported

IIT/UCSF Metadata Example<A ID="BVW63A00"><br t="p">504110499-0499</br><YR>19000000</YR><L>CARLSON TN; LRD</L><rn t="m">MINNESOTA 1RFP128;

MINNESOTA COURT ORDER;US COMPREHENSIVE REQUEST 343;US COMPREHENSIVE REQUEST 175;US COMPREHENSIVE REQUEST 179;MISSOURI COURT ORDER 19980814</rn>

<r>COLBY FG</r><K>CORRESPONDENCE REFLECTING RESULTS OF

LITERATURE SEARCH PREPARED BY LRD EMPLOYEE ENGAGED TO ASSIST

ATTORNEYS AND TRANSMITTED TO RJR SCIENTIST WORKING ON BEHALF OF THE LEGAL DEPARTMENT.</K><st>R PRIV:WP;JD</st><dm>20031215</dm></A>

DOCID:Bates number:

Date:Author:

Mentioned:

Recipient:Title:

Academic Researcher’s TopicTitle: Firesafe Cigarettes

Relevance Criteria: Relevant documents provide information on 1) firesafe cigarettes, or 2) tobacco industry responses to legislation and interest around firesafe cigarettes.

Relevance Judges and/or Sources of Relevant Documents: [4 names]

Keywords: firesafe, firesafe cigarettes, reduced flammability, self-extinguishing, fire safety education, accidental fires, fire prevention, furniture flammability, reduced ignition

Key Attributes: Most relevant documents will have been created between 1980 and the present.

Some Issues

• Evaluation measures– N-recall (recall at N that Boolean query found)

• Topic Generation– Modeling a representative process– Question typology

• Minimizing the cost of entry– Format as superset of standard TREC style

• Outreach to E-Discovery technology groups

Legal Track Topic Generation• Develop 30 topics for 2006 (losing a few)

• Start with a complaint for simulated lawsuit– Requestor defines specific information needs

• 2 lawyers negotiate Boolean query– Responding party can do preliminary searches– Result set size defines ranked list depth

• Same lawyers negotiate Ranked query– Optionally, providing more information

Possible Query Fields

• Title / Description / Narrative

• Negotiated Boolean

• Boolean “limit-by” suggestion

• “Rank by” cues– Metadata– Free text

Focus Conditions

• Ranked retrieval from TD (“required”?)

• Ranked retrieval from everything

• Ranked retrieval from Boolean only

Strawman Schedule

Jan 1 Commit to a collection

Mar 1 Guidelines ready

Apr 1 Collection release

Jul 1 Topic release

Aug 1 Runs submitted by sites

Aug 5 Pools ready for judgment

Sep 20 Judgments completed

Oct 1 Results release

Mid-Nov TREC-2006

Questions for Participants

• Who wants to participate?

• Is Aug 1 submission OK?– Would very early data release help?

• Do we want tasks other than doc retrieval?– Social networks?

• Should we create a robust subtrack?

Questions for NIST

• Can NIST help with:– Distributing the collection?– Building specialized scoring scripts?– Accepting runs and creating judgment pools?– Scoring official runs?

• Who will be our primary NIST contact?

• Do our dates match your constraints?

Next Steps

• Mailing list

Backup Slides

Incremental Ranked Review

Limit to: Marlboro NOT “Upper Marlboro”

Limit to: Marlboro NOT “Upper Marlboro”Rank by:tobacco, {policy staff}

Limit to: Marlboro NOT “Upper Marlboro”Rank by:regulation, manipulation

<!--DTD for rjr--><!--"Document". This element contains all of the meta-data about a given document. It has an "ID" attribute used to uniquely identify a given document. The values of this attribute are refered to as TIDs (Tobacco IDs). They are generally of the format "AAAddAdd" which "A" stands for any letter and "d" sands for any digit. Sometimes due to old errors a TID of the form "AAAdAAdd" will occur. These values are allocated and assigned to documents in the Relational Database.--><!ELEMENT A (br, r?, L?, pb?, b?, c?, rn?, YR, dt?, co?, m?, br?, dp, p?, re?, ag?, rn, sc?, sh?, s?, te?, K?, DS, PV)><!ATTLIST A ID CDATA #REQUIRED><!--"Bates Range". The source data contains two bates range fields: "Document ID" (which is abbreviated here as "num"), and "Other Number" (abbreviated as "onm). This element has a type attribute "t" used to indicate which field the element represents. A value of "p" (for "primary") means that it represents "Document ID". A value of "o" (for "other") means that it represents "Other Number".--><!ELEMENT br (#PCDATA)><!ATTLIST br t CDATA><!--"Recipient" here abbreviated as "add"--><!ELEMENT r (#PCDATA)><!--"Author" here abbreviated as "aut"--><!ELEMENT L (#PCDATA)><!--"Production Box" here abbreviated as "box"--><!ELEMENT pb (#PCDATA)><!--"Brand" here called "brd"--><!ELEMENT b (#PCDATA)><!--"Copied" here abbreviated as "cpy"--><!ELEMENT c (#PCDATA)><!--"Request Number". The source data contains two request number fields: "Request Number" (abbreviated as "req") and "Possible Minnestoa Requests" (abbreviated as "crs"). The element has a type attribute "t" used to indicate which of these two fields it represents. A value of "p" indicates that it represent the first. A value of "m" indicates that it represents the second.--><!ELEMENT rn (#PCDATA)><!ATTLIST rn t CDATA><!--"Document Date" here abbreviated as "docdt"--><!ELEMENT YR (#PCDATA)><!--"Document Type" here abbreviated as "dtp"--><!ELEMENT dt (#PCDATA)><!--"Characteristics" here abbreviated as "mar" for "marginalia"--><!ELEMENT co (#PCDATA)><!--"Mentioned" here abbreviated as "men"--><!ELEMENT m (#PCDATA)><!--"Date Produced" here abbreviated as "dpt"--><!ELEMENT dp (#PCDATA)><!--"Page Count" here abbreviated as "pglen"--><!ELEMENT p (#PCDATA)><!--"Redacted Information" here abbreviated as "red"--><!ELEMENT re (#PCDATA)><1--"Attachment Group" here abbreviated as "ref" for "reference document"--><!ELEMENT ag (#PCDATA)><!--"Special Collections" here abbreviated as "scoll"--><!ELEMENT sc (#PCDATA)><!--"Date Shipped" here abbreviated as "ship"--><!ELEMENT sh (#PCDATA)><!--"Source" here abbreviated as "src"--><!ELEMENT s (#PCDATA)><!--"Trial Exhibit" here abbreviated as "texh"--><!ELEMENT te (#PCDATA)><!--"Title" here abbreviated as "ttl"--><!ELEMENT K (#PCDATA)><!--"Data Source". This is single letter code indicating that this document is from the American Tobacco set.--><!ELEMENT DS (#PCDATA)><!--"Provenence". This element contains a two letter abbreviation for the data set, followed by a space, followed by a the TID of the current record.--><!ELEMENT PV (#PCDATA)>