impact of different relation extraction methods on network analysis results
DESCRIPTION
Impact of different relation extraction methods on network analysis results. Jana Diesner. Need: scalable, reliable, robust methods & tools. Unstructured At any scale. Network Analysis Answer substantive and graph-theoretic questions Develop and test hypothesis and theories - PowerPoint PPT PresentationTRANSCRIPT
Impact of different relation extraction methods on network analysis results
Jana Diesner
Motivation
Text Data Network Data Applications
• Need: scalable, reliable, robust methods & tools
• Unstructured
• At any scale
• Network Analysis • Answer
substantive and graph-theoretic questions
• Develop and test hypothesis and theories
• Visualizations • Populate
databases• Input to further
computations, e.g. simulations, machine learning
Research Questions and Relevance
• How do network data and analysis results obtained by using different relation extraction methods compare to each other?
• Why does it matter?– Increased comparability,
generalizability, transparency of methods and tools
– Increased control and power for developers and users
– Supports drawing of reasonable and valid conclusions
Relation Extraction Methods
Proximity-based
linkage of nodes
Proximity-based
linkage of nodes
Database query
Proximity-based
linkage of nodes
Meta-
Data
Text, manual
(TextM)
Text, automated (TextA) Meta-data
(META)
Subject Matter
Experts (SME)
Codebook
Data
5
Sudan Corpus Funding Corpus Enron CorpusGenre Newswire Scientific Writing Emails
Size 80,000 articles 56,000 proposals 53,000 emailsSource LexisNexis Cordis FERC/ SECTime span 8 years 22 years 4 yearsText-based networks
Article bodies Project description
Email bodies
Meta-data network
Index terms Index terms and collaborators
Email headers
• Large-scale, over-time, open source data from different domains
Results I
1. Text automated vs. manual: total number of nodes of sub-type “generic” far higher than “specific”
– Rethink focus of network analysis: collectives vs. individuals
– Importance of detecting unnamed entities
2. Ground truth data (SME) hardly resembled by analyzing text bodies and not at all by meta-data networks
– In most ideal case, 50% of nodes and 20% of links
3. Agreement in structure and key entities depends on type of network
Results II
3. Agreement between text-based, and with meta-data depends on type of network
Type of Network
Text-Based Networks Meta-Data Network
Social networks
- Substantial overlap between manual and automated, esp. w.r.t. key players- Localized view on geo-political entities and culture
-Major international key players-Small overlap in key entities with text-based networks
Knowledge networks
- Gist of information in terms of common sense entities- Minimal overlap between manual and automated
- Seem more informative (mini-summaries)-Less coreference resolution issues - Minimal overlap with text-based For more complete view, combine automated text-based
with meta-data network
Acknowledgements • This work was supported by the National Science Foundation (NSF)
IGERT 9972762, the Army Research Institute (ARI) W91WAW07C0063, the Army Research Laboratory (ARL/CTA) DAAD19-01- 2-0009, the Air Force Office of Scientific Research (AFOSR) MURI FA9550-05-1-0388, the Office of Naval Research (ONR) MURI N00014‐08‐11186, and a Siebel Scholarship. Additional support was provided by the CASOS Center at Carnegie Mellon University. The views and conclusions contained in this talk are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of the NSF, ARI, ARL, AFOSR, ONR, or the United States Government.
8
Thank You! Questions, Comments, Feedback: [email protected]