introducing odie ncbo seminar series february 18, 2009
TRANSCRIPT
![Page 1: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/1.jpg)
Introducing ODIEIntroducing ODIE
NCBO Seminar Series
February 18, 2009
![Page 2: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/2.jpg)
ExampleExample
![Page 3: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/3.jpg)
IE using ontologiesIE using ontologies
Diagnosis Malignant MelanomaBreslow Depth 0.72 mmLateral Margin PositiveRegression ProbableUlceration NegativeTIL Focally Brisk
![Page 4: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/4.jpg)
OE using documentsOE using documents
punch biopsyjunctional componentpagetoid spreaddermal melanocytesBreslow depthlymphocytic infiltratesregressionmicroscopic satellitesvascular invasiontumor infiltrating lymphocytesSpitz nevusepithelioid nevus
![Page 5: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/5.jpg)
Two Tasks ~ One problemTwo Tasks ~ One problem
Ontology
TextOntology Enrichment:Uses concepts as source of concepts and relationships to enrich and validate ontology
Information Extraction:Uses concepts as source of concepts and relationships to enrich and validate ontology
Specific Aims 2,3,4
Specific Aims 1,3,5
![Page 6: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/6.jpg)
Specific Aims Specific Aims Specific Aim 1: Develop and evaluate methods for information extraction (IE) tasks
using existing OBO ontologies, including:
Named Entity Recognition (NER)
Co-reference Resolution (CR)
Discourse Reasoning (DR)
Attribute Value Extraction (AVE)
Specific Aim 2: Develop and evaluate general methods for clinical-text mining to assist in ontology development, including:
Concept Discovery (CD)
Concept Clustering (CC)
Taxonomic Positioning (TP)
Specific Aim 3: Develop reusable software for performing information extraction and ontology development leveraging existing NCBO tools and compatible with NCBO architecture.
Specific Aim 4: Enhance National Cancer Institute Thesaurus Ontology using the ODIE toolkit.
Specific Aim 5: Test the ability of the resulting software and ontologies to address important translational research questions in hematologic cancers.
![Page 7: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/7.jpg)
Ontology EnrichmentOntology Enrichment
• Machine assisted
- Extraction- Filtering and Organization- Visualization- Suggestions
• Human decision-maker (developer, curator)
• Feedback and improvement of OE
![Page 8: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/8.jpg)
Project OrganizationProject Organization
Concept Discovery Coreference Resolution ODIE 0.5
Kaihong LiuRebecca Crowley Wendy ChapmanKevin Mitchell
Wendy ChapmanGuergana SavovaMelissa Castine
Rebecca Crowley Kevin MitchellGirish ChavanEugene Tseytlin
Study and compare methods for ontology enrichment; design methods for evaluation
Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms
Develop and implement architecture and UI; Create framework for using results of research; Implement work of research groups
![Page 9: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/9.jpg)
DomainDomain
Will attempt to develop general tools whenever possible
• Priorities for evaluation of components in :
Radiology and pathology reports
NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA)
Cancer domains (including hematologic oncology)
![Page 10: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/10.jpg)
ProgressProgress
• ODIE 0.5 pre-release on NCBO SourceForge
• Annotation software and document sets
• Res Proj #1: LSP annotation project
• Res Proj #2: Coreference resolution annotation
• Starting Res Proj #3: Discourse Reasoning
![Page 11: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/11.jpg)
• Toolkit for developers of NLP applications and ontologies
• Pre-released on NCBO SourceForge as ODIE 0.5
• Current release focuses on NER and CD
• Support interaction and experimentation
• Package systems at the conclusion of working with ODIE
• Foster cycle of enrichment and extraction needed to advance development of NLP systems
• Ontology enrichment as opposed to denovo development
• Human-machine collaboration as opposed to fully automated learning
ODIE SoftwareODIE Software
![Page 12: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/12.jpg)
ODIE Download/InfoODIE Download/Info
ODIE Installer: http://caties.cabig.upmc.edu/ODIE/odieinstaller.exe
GForge Site: https://bmir-gforge.stanford.edu/gf/project/odie/
User Forums: https://bmir-gforge.stanford.edu/gf/project/odie/forum/
ODIE on NCBO Tools Page: http://bioontology.org/tools/ODIE.html
![Page 13: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/13.jpg)
Users/WorkflowUsers/Workflow
ODIE is intended for:
• users who want to use NCBO ontologies to perform various NLP tasks (+/- may need to add concepts locally to achieve sufficient performance)
• users who want to enrich ontologies using concepts derived from documents (very early in process of ontology development)
![Page 14: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/14.jpg)
Plans for ODIE 1.0Plans for ODIE 1.0
Ability to import additional ontologies from Bioportal or from owl files
Ability to export proposal/enriched ontologies.
Ability to add and configure new processing resources (UIMA or GATE based)
Ability to build processing pipelines using processing resources
Will come out of the box with a processing pipeline and processing resources for NER, CD and COREF.
![Page 15: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/15.jpg)
Research Project 1:Ontology EnrichmentResearch Project 1:
Ontology EnrichmentNearly completed survey of
lexical, statistical and hybrid methods for ontology enrichment
Methodology to study “utility” of various approaches (Liu, PhD Thesis in progress)
First project underway involves the simplest of the methods to be studied – Lexicosyntactic Patterns (LSP) – regular expressions over POS
Concept Discovery
Kaihong LiuRebecca Crowley Wendy ChapmanKevin Mitchell
Study and compare methods for ontology enrichment; design methods for evaluation
![Page 16: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/16.jpg)
LSP PatternsLSP Patterns
The presence of certain “lexico-syntactic patterns” can indicate a particular semantic relationship between two nouns
Example:
DIFFERENTIAL DIAGNOSIS INCLUDES, BUT IS NOT LIMITED TO, SPINDLE CELL NEOPLASM OF PERINEURIAL ORIGIN (SUCH AS SCHWANNOMA) AND SPINDLE CELL MALIGNANT MELANOMA
“such as” indicates hyponym relationship between two noun phrase
![Page 17: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/17.jpg)
Technique 1 - LSPTechnique 1 - LSP
PRURIGO NODULE (aka LICHEN SIMPLEX CHRONICUS)
COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA
![Page 18: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/18.jpg)
LPS distribution resultLPS distribution result
Patterns
Pathology Corpus852764 reports, 16157608
sentences
Radiology Corpus209997 Reports, 4057228
sentences
# Sentences Unique # of sentences # Sentences
Unique # of sentences
NP especially NP 14 11 19 10NP also called NP 48 37 29 22NP such as NP 98 95 906 251NP's NP 202 45 5 2NP in NP 4851 1689 106 47NP aka NP 5396 460 2 2NP including NP 6291 4952 1403 747NP other NP 6940 2251 10622 1407NP like NP 7649 2267 410 235NP, NP 8211 5351 7385 3889NP of NP 14275 4032 2906 607NP in the NP 47124 23178 64044 29285NP is NP 92374 25024 7349 2896NP of the NP 246798 70735 173016 54895
Number of sentences contain lexico-syntactic pastterns
![Page 19: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/19.jpg)
Step 1 -Domain Expert annotation• Annotation tasks: 1. Meaningful medical phrases (MMP) that can stand
alone before LSP and after LSP.2. The phrases before and after LSP have to be related
•Before LSP •After LSP •LSP
Term1 Term2
PRURIGO NODULE LICHEN SIMPLEX CHRONICUS
BENIGN ECCRINE NEOPLASIA NODULAR HIDROADENOMA….. …….
• Calculate : total # of MMP , # of MMP per LSP • Calculate : total # of MMP , # of MMP per LSP
PRURIGO NODULE (aka LICHEN SIMPLEX CHRONICUS)
COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA
![Page 20: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/20.jpg)
Step 2 - Curator Judgment
1. Is the concept in the ontology?
2. If not, should it be added into the ontology?
3. If not, what is the reason?
For each term
1. What is the relationship between them?
2. Is this relationship exist in the ontology?
3. If not, should it be added into the ontology?
4. If not, what is the reason?
For each pair of terms
Term1 Term2
PRURIGO NODULE LICHEN SIMPLEX CHRONICUS
BENIGN ECCRINE NEOPLASIA NODULAR HIDROADENOMA
….. …….
New Concept and Relationship Suggestion Rates
New Concept and Relationship Acceptance Rates
![Page 21: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/21.jpg)
First experiment result–concept enrichment
First experiment result–concept enrichment
Radiology Reports
Proceed the LSP Following the
LSP
Total # of meaningful
medical Phrase
# of meaningful medical Phrase/
# of LSP
Total # of meaningful
medical Phrase
# of meaningful
medical Phrase/ # of
LSP such as 17 100% 31 124%
including 27 159% 66 264%
Pathology Reports
Proceed the
LSP Following the LSP
Total # of meaningful
medical Phrase
# of meaningful medical Phrase/ #
of LSP (25)
Total # of meaningful
medical Phrase
# of meaningful
medical Phrase/ # of
LSP (25)such as 27 108% 55 220%
including 24 96% 35 233%aka 25 100% 28 112%
![Page 22: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/22.jpg)
First experiment result– concept enrichment (NCIT)
First experiment result– concept enrichment (NCIT)
![Page 23: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/23.jpg)
First experiment – extracted relationships
First experiment – extracted relationships
![Page 24: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/24.jpg)
First experiment – extracted relationships
First experiment – extracted relationships
LSPs
such as including aka
Pe
rce
nta
ge
0
20
40
60
80
100
Hyponym relationship is not in the NCIT Hyponym relationship should be added into the NCIT
![Page 25: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/25.jpg)
First experiment – Concept Enrichment for RadLex
First experiment – Concept Enrichment for RadLex
Column1 # of TermsNot in
RadLexIn
RadLex Blank
Should be added to RadLex
Suggestion rate
Acceptance rate
Proceeding LSP 29 11 16 2 10 38% 91%
Following LSP 68 24 41 3 10 35% 42%
Total 97 35 57 5 20 36% 57%
![Page 26: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/26.jpg)
Research Project 2:Coreference Resolution
Research Project 2:Coreference Resolution
Anaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent)
Examples of Types of anaphoric relations:
Identity (or coreference)Set/subsetPart/whole
Anaphora resolution is a computational technique for the discovery of anaphoric relations
Coreference Resolution
Wendy ChapmanGuergana SavovaMelissa Castine
Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms
![Page 27: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/27.jpg)
DefinitionsDefinitionsAnaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent)
Type of anaphoric relations
Identity (or coreference)Set/subsetPart/wholeOther
Anaphora resolution is a computational technique for the discovery of anaphoric relations
![Page 28: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/28.jpg)
ProgressProgressCompleted and Ongoing:Annotation schema DevelopmentGuidelinesTraining of annotators
4 training sessions
IAA: after session 1 – in the 40’s
IAA: after session 3 – in the 60’s
Planned:
Complete Reference Standard (RS)
Algorithm testing and further development
![Page 29: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/29.jpg)
Data Sets for RSData Sets for RS
50 clinical notes (named entities annotated)
50 Pathology (disorders, tumors)
20 Pathology (conditions)
20 Radiology (conditions)
20 Discharge summaries (conditions)
20 ED (conditions)
20 ED (respiratory conditions) •Mayo
•Pitt
![Page 30: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/30.jpg)
QUESTIONS ?QUESTIONS ?
![Page 31: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/31.jpg)
Visualization of document setVisualization of document set
![Page 32: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/32.jpg)
NER – viewing conceptsNER – viewing concepts
![Page 33: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/33.jpg)
Multiple OntologiesMultiple Ontologies
![Page 34: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/34.jpg)
OE – Concept SuggestionOE – Concept Suggestion
![Page 35: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/35.jpg)
Ranked SuggestionsRanked Suggestions
![Page 36: Introducing ODIE NCBO Seminar Series February 18, 2009](https://reader036.vdocuments.site/reader036/viewer/2022081513/56649eb15503460f94bb6f47/html5/thumbnails/36.jpg)
Adding ProposalsAdding Proposals