tdt 2003 evaluation workshop, nist, november 17-18, 2003 creating the annotated tdt-4 y2003...

12
TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Creating the Annotated TDT-4 Y2003 Evaluation Corpus Stephanie Strassel, Meghan Glenn Linguistic Data Consortium - University of Pennsylvania {strassel, [email protected]}

Upload: sasha-oldroyd

Post on 15-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

Creating the Annotated TDT-4 Y2003 Evaluation Corpus

Stephanie Strassel, Meghan Glenn

Linguistic Data Consortium - University of Pennsylvania

{strassel, [email protected]}

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

Data Collection/Preparation Collection

Multiple sources, languages October 2000 – July 2001

TDT-4 Corpus V1.0 Arabic, Chinese, English only October 2000 – January 2001 Collection subsampled for annotation

Goal: Reduce licensing, transcription and segmentation costs Broadcast sources: select 4 of 7 or 3 of 5 days, stagger selection to

maximize coverage by day Newswire sources: sampling consistent with previous years

• No down-sampling of Arabic NW Reference transcripts

Closed-caption text where available Commercial transcription agencies otherwise

• Spell-check names for English commercial transcripts Provide initial story boundaries & timestamps

ASR Output & Machine Translation TDT-4 Corpus V 1.1

Incorporates patches to Mandarin ASR data to fix encoding; removes empty files

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

TDT-4 Corpus Overview

AFA Newswire 19126 --- ---ALH Newswire 10656 --- ---ANN Newswire 9682 --- ---VAR Radio 2378 68 commercialNTV (Web) Television 871 20.5 commercial

APW Newswire 10268 --- ---NYT Newswire 4842 --- ---VOA Radio 2694 70 commercial + spell checkPRI Radio 1965 62 commercial + spell checkCNN Television 4698 64.5 closed-captionABC Television 1692 38.5 closed-captionNBC Television 1234 35 closed-caption

MNB Television 997 43 closed-caption

XIN Newswire 9837 --- ---ZBN Newswire 8114 --- ---VOM Radio 1780 64 commercialCNR (Web) Radio 2259 43 commercialCTV Television 1483 32.5 commercialCTS (Web) Television 2221 44 commercialCBS (Web) Television 1451 34 commercial

BBN

LIMSI

BBN

IBM TJ Watson

Research Center

N/A

Systran (run at LDC)

ASRmachine

translationreference transcripts

Arabic

English

Mandarin

language source data type # documentstotal audio

(hours)

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

TDT Concepts

STORY In TDT2, story is “a section containing at least two

independent declarative clauses on same topic” In TDT3, definition modified to capture annotators’ intuitions

about what constitutes story Distinction between “preview/teaser” and complete news story

TDT4 preserves this content-based story definition Greater emphasis on consistent application of story definition

among annotation crew

EVENT A specific thing that happens at a specific time and place

along with all necessary preconditions and unavoidable consequences

TOPIC An event or activity along with all directly related events and

activities

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

Topics for 2003

40 new topics selected, defined, annotated for 2003 evaluation 20 from Arabic seed stories 10 each from Mandarin, English

Topic selection strategy same as in 2002Arabic topics are somewhat different

Despite same selection strategy First time we’ve had Arabic seed stories

“Topic well” is running dry 80 news topics with high likelihood of cross-

language hits from 4-month span!

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

Selection Strategy

Team leaders examine randomly-selected seed story Potential seeds balanced across corpus

(source/date/lang) Identify TDT-style seminal event within story Apply rule of interpretation to convert event to

topic 13 rules state, for each type of seminal event, what other

types of events should be considered related No requirement that selected topics have cross-

language hits But team leaders use knowledge of corpus to select

stories likely to produce hits in other language sources Handful of “easily confusable” topics

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

Rules of Interpretation

1. Elections, e.g. 30030: Taipei Mayoral Elections

Seminal events include: a specific political campaign, election day coverage, inauguration, voter turnouts, election results, protests, reaction.

Topic includes: the entire process, from announcements of a candidate's intention to run through the campaign, nominations, election process and through the inauguration and formation of a newly-elected official's cabinet or government.

2. Scandals/Hearings, e.g. 30038: Olympic Bribery Scandal

3.  Legal/Criminal Cases, e.g. 30003: Pinochet Trial

4.  Natural Disasters, e.g., 30002: Hurricane Mitch

5.  Accidents, e.g., 30014: Nigerian Gas Line Fire

6.  Acts of Violence or War, e.g., 30034: Indonesia/East Timor Conflict

7.  Science and Discovery News, e.g., 31019: AIDS Vaccine Testing Begins

8.  Financial News, e.g., 30033: Euro Introduced

9.  New Laws, e.g., 30009: Anti-Doping Proposals

10.  Sports News, e.g., 31016: ATP Tennis Tournament

11.  Political and Diplomatic Meetings, e.g., 30018: Tony Blair Visits China

12.  Celebrity/Human Interest News, e.g., 31036: Joe DiMaggio Illness

13.  Miscellaneous News, e.g., 31024: South Africa to Buy $5 Billion in Weapons

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

Topic Research

Provides context Annotators specialize in

particular topics (of their choosing)

Includes timelines, maps, keywords, named entities, links to online resources for each topic

Feeds into annotation queries

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

Topic Definition

Fixed format to enhance consistency

Seminal event lists basic facts – who/what/when/where

Topic explication spells out scope of topic and potential difficulties

Rule of interpretation link Link to additional

resources Feeds directly into topic

annotation

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

Annotation Strategy Overview

Search-guided complete annotation Work with one topic at a time Multiple stages for each topic; multiple iterations of each

stage Two-way topic labeling decision

Topic Labels YES: story discusses the topic in a substantial way NO: story does not discuss the topic at all, or only

mentions the topic in passing without giving any information about the topic

No BRIEF in TDT-4 “Not Easy” label for tricky decisions

Triggers additional QC

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

Annotation Search Stages

Stage 1: Initial query Submit seed story or keywords as query to search engine Read through resulting relevance-ranked list Label each story as YES/NO Stop after finding 5-10 on-topic stories, or After reaching “off-topic threshold”

At least 2 off-topic stories for every 1 OT read AND The last 10 consecutive stories are off-topic

Stage 2: Improved query using OT stories from Stage 1 Issue new query using concatenation of all known OT stories Read and annotate stories in resulting relevance-ranked list until

reaching off-topic threshold Stage 3: Text-based queries

Issue new query drawn from topic research & topic definition documents plus any additional relevant text

Read and annotate stories in resulting relevance-ranked list until reaching off-topic threshold

Stage 4: Creative searching Annotators instructed to use specialized knowledge, think creatively

to find novel ways to identify additional OT stories

TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003

Additional Annotation & QC Top-Ranked Off-Topic Stories (TROTS)

Define search epoch First 4 on-topic stories chronologically sorted

Find two highly-ranked off-topic documents for each topic-language Precision

All on-topic (YES) stories reviewed by senior annotator to identify false alarms All “not easy” off-topic stories reviewed

Adjudication Review pooled site results and adjudicate cases of disagreement with LDC annotators’ judgments

Pooled 3 sites’ tracking results Reviewed all purported LDC FAs For purported LDC Misses

• English and Arabic: reviewed cases where all 3 sites disagreed with LDC• Mandarin: reviewed cases where 2 or more sites disagreed with LDC