mitre tides ife-bio kickoff meeting david anderson, laurie damianos, david day, lynette hirschman,...
DESCRIPTION
MITRE Status of MiTAP 0 Availability: excellent -Available ~100% to users inside, outside firewall -12 individual user accounts, 6 group accounts -8 daily users on average, mostly repeat users 0 Data capture: rich & dynamic -~70 working sources, new source added in 30 min -Average 5.8K msgs/day, 1 min latency -250K msgs total in system 0 Analysis tools: improving -Messages in 6 languages (with COTS translation) -Sorted into 173 newsgroups -Color coded tagging (pers/org/loc/disease) -Popup summarization 0 Product: need to understand how system is being usedTRANSCRIPT
MITRE
TIDES IFE-BioKickOff Meeting
David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth,
Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever
October 17, 2001
0
50
100
150
200
250
300
350
400
10/13
/2000
10/20
/2000
10/27
/2000
11/3/
2000
11/10
/2000
11/17
/2000
11/24
/2000
TIME
Num
ber
Cas
es
CasesNew_casesDead
Track_id Date Disease Country City_nameCases New_casesDeadEbola 10/30/00 Ebola Uganda Gulu 224 19 73Ebola 10/31/00 Ebola Uganda Gulu 239 15 75Ebola 11/01/00 Ebola Uganda Gulu 251 12 80Ebola 11/11/00 Ebola Uganda Gulu 269 4 87Ebola 11/13/00 Ebola Uganda Gulu 321 1 104Ebola 11/17/00 Ebola Uganda Gulu 329 4 107Ebola 11/17/00 Ebola Uganda Masindi 4 0 4Ebola 11/19/00 Ebola Uganda Mbarara 12 2 9Ebola 11/20/00 Ebola Tanzania Mwanza 7 2 0Ebola 11/21/00 Ebola Kenya Busia 3 3 0
MITRE
Agenda0 Current Status and Experiments (Laurie)0 User Feedback on MiTAP and Exercise (Eric)0 Lessons Learned (Laurie)0 Architecture Briefing (Jay & Scott)0 Geospatial Processing (George)0 Schedule (Jay)0 Issues and Discussion (All)
MITRE
Status of MiTAP0 Availability: excellent
- Available ~100% to users inside, outside firewall- 12 individual user accounts, 6 group accounts- 8 daily users on average, mostly repeat users
0 Data capture: rich & dynamic- ~70 working sources, new source added in 30 min- Average 5.8K msgs/day, 1 min latency- 250K msgs total in system
0 Analysis tools: improving- Messages in 6 languages (with COTS translation)- Sorted into 173 newsgroups- Color coded tagging (pers/org/loc/disease)- Popup summarization
0 Product: need to understand how system is being used
MITRE
02000400060008000
100001200014000
7/1
7/15
7/29
8/12
8/26 9/9
9/23
10/7
10/2
1
# M
essa
ges
02468101214
# U
sers
# messages# users
MiTAP Activity:Messages and Users Over Time
Aug Experiment
Attack on America
MITRE
Performance Summary: Sudan 1999 vs Attack on America 2001
Sudan I ncidentJ uly 1999
CommentsAvailability NA 95% Security via I P fi ltering
Users 5 10Capture
Msgs 1000 40,000 250,000 msgs total Sources 20 70 29 new sources added; 30 min/ source
Throughput NA 8000 msgs/ day Latency f or feeds: < 1min Languages 1 6 French, Spanish, Portuguese, Russian,
Chinese EnglishAnalysis
News groups NA 173 89 new groups
Tagging No YesPeople, organizations, locations, date, diseases
Translation No Yes 5 languages, variable quality Search No (web only) Yes Boolean, sort by date/ relevance
Attack on AmericaSeptember 2001
MITRE
Disease of the Month ExperimentsAugust September October
Who MI TAP Team: control vs test
UMass/ NYU: no control
MI TAP Team: control vs test
What dengue f ever dengue f ever bio threats
Why debug experiment, underlying processes
stimulate thinking re inf o extraction, I R
see what system collected since exercise
FindingsMiTAP report had more detail, more up-to-date, poorer coverage
(nothing evaluated)
MiTAP user wrote report with 1/ 5 searches, 1/ 2 docs, more up-to-date
Lessons Learned
useless f or report writing, search diffi cult, online capture confi g hard
search more diffi cult with more docs, search poorly integrated, need better viz tools
summaries useless, duplicates hard to distinguish
Outcomesimproved source integration (f aster, easier)
(brainstorming session cancelled due to change in priorities)
improvements on search
MITRE
Feedback from Eric0 Report on Bio-Threats0 Deployment for N20 MiTAP Status
- Utility- Usability- Accessibility
MITRE
Lessons LearnedAvailability
User accounts for production systemNo training needed (instructions available on website)Stronger security (e.g., intrusion detection)Better back-up, monitoring of throughputMore processing power
CaptureReduced latency on scheduled downloads and
spidering, hourly capture of headlinesDistributed capture processingBetter capture of formatted sourcesSome badly filtered, excess volume causes backlogPoor zoning/formatting/decoding of some sources
MITRE
Lessons Learned (2)Analysis
Improved search (e.g., by date/relevance, popups, integrated with news server)
Improved “normalization” of names, regionsToo much data! - need better filtering, topic
detection & clustering, summarizationBetter MT, support for ArabicQ&AGeospatial & temporal visualizationAdvanced searchBetter information extraction
MITRE
Lessons Learned (3)Product
No environment for preparing reportsWorkspace
Drag&drop repositoryEditing capabilitiesMultidoc summarizationCollaboration feature (chat & shared workspace)
MITRE
Catalyst Update: Recent work0 Usability for developers
- Logger- Configuration file refinements
0 Improvements for distributed systems- Redesign of I/O polling procedures- Explicit synchronization feature for
Language Processor developers
MITRE
Logger
Documents
MetaDataWord.Text
SentenceWord.POS
Entity Extractio
n
Tokenize
Tagger
Sentence
Entities
catlogger catlogger
MITRE
In progress0 Usability for developers
- Monitor (system status capability)- Native XML I/O! (for ease of debugging &
for lightweight Catalyst )0 Information retrieval
- Integration between Catalyst and new IR engine
- Pushing stream filters toward archived streams
0 Documentation
MITRE
Monitor
Documents
MetaDataWord.Text
SentenceWord.POS
Entity Extractio
n
Tokenize
Tagger
Sentence
Monitor Monitor
Entities
MITRE
XML I/O
XML doc XML doc
XML doc EventExtractio
nXML doc
Catalyst to XML
EventExtractio
nXML to
Catalyst
Present
With XML I/O feature Easier to debug!
MITRE
XML I/O
Non-Catalyst Process
XML
Wrapper
Process
CatalystProcesse
s
CatalystProcesse
s
With XML I/O feature
Easier path to integrate existing language processing systems!
MITRE
Archived streams
XML docAnswer Extractio
n
IndexRefineme
nt
Question Answering Application
Candidate
Selection
Coreference
filter criteria
Filter criteria must be pushed upstream from its origination point toward the indices so that process may be reduced to little more than is absolutely necessary.
Origination point
Indices
MITRE
For the Midterm - 12/12/2001
0 Monitor0 XML I/O support in the Catalyst library0 Lightweight Catalyst design0 Documentation
MITRE
Catalyst collaborations0 Qanda
-Catalyst-based Qanda used for TREC-Catalyst-based Qanda deployed at AFIWC
0 Information retrieval-Archived annotation streams (for creating IR indexes)-Seekable streams (for processing IR queries)
0 Other projects-ACE/Alembic (Information Extraction)-Audio hot-spotting (Speech Retrieval)-Reading-comp (Question Answering)
MITRE
Document Management 0 Process scheduling0 System linkage0 Inter-site cooperation support0 User features
MITRE
Process Scheduling0 Problem: MiTAP needs the ability to prioritize
sources- ‘Catching up’ on a new source shouldn’t prevent
timely processing of an important existing source0 Solution:
- Preprocessing daemon will notify scheduler of incoming content
- Scheduler assigns jobs to available resources based on priority
0 Status:- Prototype scheduler delivered (Ponte)- Preprocessing daemon rewrite in mid-November
(Wohlever)
MITRE
System Linkage0 Problem: Ever notice how new features tend to only
apply to new content?- MiTAP is not flexible - difficult to:
=Reprocess and repost a message that has errors=Find the original source document=Etc.
- Currently, retroactive changes require 11th hour hacking (or sometimes 12th hour hacking)
0 Solution: Keep database of linkage information to make the system more flexible
0 Status: - Additional information currently being logged- Linkage database - March
MITRE
Inter-site Cooperation Support 0 Problem: Collaboration with other TIDES
contractors who have large legacy systems- Issue of communication more than scalability
0 Solution:- Linkage database for annotations, similar to the
one used for system maintenance- Web client server communication- Path to scalable solution w/richer interactions
0 Status:- Data management - January- Communications: investigation of relevant
protocols and preliminary design - completed- Native XML support for Catalyst - December
MITRE
User features0 Problem: MiTAP helps you find good information,
then what?0 Solution:
- Web accessible support for user views and data organization to assist in reporting and analysis
- Automated view construction/feedback incorporating additional TIDES technologies
0 Status:- Schema for v.1 of workspace developed
(Ponte, Anderson)- Supporting code in progress (Ponte)- Prototype - December
MITRE
Geo-Spatial Normalization - Goal
Goal:We have: Text containing place namesWe want: Points on maps
Process:Extract place namesLook up places on a listDetermine Lat-LongDisplay
Seattle
47.6 N 122.317 W
Problems:• Place name not on list• More than one place with same name
MITRE
Geo-Spatial Normalization - Solution
Solution:Part 1: A significant portion of the references
can be resolved using easy methods.
Unambiguous: Seattle ToulouseAmbiguous: Paris WashingtonDisambiguated:Paris, Texas The State of WashingtonSolution:Part 2: Use the “easily resolved” references as
training data for a machine learning classifier which will distinguish the rest.
MITRE
Geo-Spatial Normalization - Plans
For MidTerm (Dec. 12, 2001)• Detect a significant portion of the “easily
resolvable” references• Display with some map tool
- Web delivery desirable
After MidTerm (May, 2002)• Try to find more “easily resolvable” references• Do the machine learning part• Integrate with other mapping tools
MITRE
IFE-Bio ScheduleWhat Why When
Availability Add user accounts Widen access to system by requestI mprove quality of online capture I mprove system utility as sources are added
Build new message processing demon I ncrease throughput, decrease posting latency mid-November
Replace tides2000 with more powerf ul machine
I ncrease throughput, decrease posting latency November
Simplif y document processing scripts & improve logging and error detection Simplif y admin duties December
Augment search page f unctionality Simplif y fi nding relevant data ongoingHandle zoning & encoding issues better I mprove translations ongoingAdd MT f or other languages Support Arabic, others as availableAdd question answering Simplif y fi nding relevant data December
I mprove sorting, fi ltering, thumbnail "key entity" list
Provide better fi ltering (e.g., FBI S, Relief Web), provide better name tagging to be used f or better sorting into newsgroups
soon
Product
Evaluation Disease of the Month ExperimentsAssess utility, evaluate usability, measure progress monthly
Data Capture
Analysis
(see architecture schedule, f ollowing)
MITRE
Architecture ScheduleWhat Why When
Scheduler Prototype Support of new message capture daemon
Delivered, support ongoing
DB Tools Prerequisite f or system linkage and intersite cooperation J anuary
System Linkage DBEnable addition of new f eatures; ease system administration
March
Analysis Architecture support f or Q&A December
Product User Workspace Protoype Support f or report construction December
I nfrastructure Catalyst Monitor Ease development and debugging December
Native XML Support Support f or legacy systems DecemberDocumentation Usability Ongoing
Data Capture
MITRE
Issues and Discussion0 How is MiTAP currently being used?
- Who are the users?- What are the users doing?- What do users want?
0 Prioritization of issues- Integrated feasibility experiment versus
operational prototype: =Possible deployment vs integration of other TIDES technologies
(Do we need to adjust our priorities?)
- Along what dimensions should we optimize?=Availability, capture, analysis, presentation