supporting clinical trial data curation and integration with table mining
TRANSCRIPT
Supporting clinical trial data curation and integration
with table miningNikola Milosevic1, Cassie Gregson3, Robert Hernandez3, Goran Nenadic1,2
1School of Computer Science, University of Manchester2 The Farr Institute @HeRC3AstraZeneca
Clinical trial publications• Around 800 000 clinical trials in PubMed• Difficult to digest/search• Text mining approaches• But tables and figures are
often not processed
Tables in publications• Present factual information• Usually:• Experimental settings (i.e. demographics)• Findings and results (e.g. DDI, side effects, adverse events…)• Background information (previous research, datasets, etc.)• Examples
• Important information about trials
Extraction and curation of table data
Challenges• Complex structure• Table dimensionality (1, 2, multi-dimensional)• Visual relationships
• Dense content• Ambiguous short text• Lack of context• Acronyms and abbreviations• Incomplete information
Table analysis overview
Table types (1)• 4 types: list, matrix, super-row and multi-tables• List table:
Table types (2)• Matrix table
Table types (3)• Super-row table
Table types (4)• Multi-table
Example of decomposition
Example of decomposition
Example of decomposition
Results
Next steps• Add semantic annotations• Link patterns in data cells with its meaning• Build/Expand knowledge bases• Relate to existing knowledge on the semantic web
Annotation schema• Meta-data• Paper (name, abstract, authors, publisher)• Authors (names, emails, affiliations)• Table (caption, footers)• Cells (content, role)• Inter-cell relationships• Semantics (links to ontologies, dictionaries, knowledge bases)
Summary• Tables contain valuable information such as settings or
results • System for extraction and curation of table data• Decomposition and annotation of the tables• Accuracy of 85%
• Semantic analysis and information extraction