mayo clinic - porting a natural language processing...
TRANSCRIPT
-
Porting a Natural Language Processing Algorithmto Extract Findings from Colonoscopy Pathology
Reports
Jennifer A. Pacheco1; Andrew J. Gawron, MD, PhD2; KennethBorthwick3; Peggy L. Peissig, PhD4; David S Carrell, PhD5; Luke
V. Rasmussen1; Huan Mo, MD6; William K. Thompson, PhD1
1Northwestern University, Chicago, IL; 2University of Utah, Salt Lake City, UT; 3GeisingerHealth System, Danville, PA; 4Marshfield Clinic, Marshfield, WI; 4Group Health Research
Institute, Seattle, WA; 6Vanderbilt University, Nashville, TN
March 22, 2016
-
Disclosure
I Will Thompson is co-founder of Textractor Technologies LLC
-
Funding
This work has been supported in part by funding from PhEMA (R01GM105688) and eMERGE (U01 HG006379, U01 HG006378 and U01HG006388).
-
Electronic Medical Records and Genomics Network
I NHGRI funded project (currentlyin 3rd funding cycle)
I Development of methods andbest practices for using the EHRas a tool for genomic research
I Network members link EHR datato DNA biobanks, pool resultsacross sites
I Several dozen electronicphenotypes completed and inprogress (www.PheKB.org)
Example Phenotypes: Type 2 Diabetes,Peripheral Arterial Disease, QRS Duration,Colon Polyps, Resistant Hypertension
http://www.phekb.org/
-
Phenotype: Colon Polyps
I Colorectal cancer (CRC) is the 2nd leading cause ofcancer-related mortality in the United States
I Colonoscopy is a widely used CRC screening modality, results inpolyp biopsies with histologies described in pathology notes
I Serrated vs. adenoma cancer pathway
I Goal: EHR-based phenotype and GWAS across eMERGE sites
-
Developing and Porting the NLP-based Algorithm
[A] NLP pipeline; [B] Document annotator tool; [C] KNIME workflow forexecuting NLP; [D] KNIME workflow for evaluation of results againstgold standard
-
NLP Modules
-
Groovy Script UIMA Annotator
Groovy script UIMA annotators are very easily configured, and scriptscan be updated and re-loaded without recompiling the entire NLPlibrary. Built-in functionality for Groovy scripts includes:
I Support for cTAKES common type system
I Selecting and filtering annotations
I Creating annotations
I Matching patterns of annotations and text
-
Groovy Script Annotator Implementation
@Overridevoid initialize(UimaContext context) {super.initialize(context)config = new CompilerConfiguration()config.setScriptBaseClass("org.northshore.cbri.UIMAUtil")
shell = new GroovyShell(config)// load in script file contentsthis.script = shell.parse(scriptContents)
}
@Overridevoid process(JCas jcas) {
UIMAUtil.setJCas(jcas)this.script.run()
}
-
Annotation Selection
// select all Sentences containing// an EntityMentionsents = select
type:Sentence,filter:contains(EntityMention)
I Built-in support for cTAKES common type system
I Pre-defined and on-the-fly filters
I Compositional filters w. boolean functions
I First class support for regex strings & operators
-
Annotation Selection (More Complex Example)
// select all Sentence annotations contained in// "Findings" Segment that also contains an// EntityMention annotation and ends with// text "tubular adenoma"select(type:Segment).grep { seg ->seg.id == "FINDINGS"}.each {select(type:Sentence, filter:(and (coveredBy(seg),{it.coveredText==∼/.+tubular\s+adenoma},contains(EntityMention)))
}}
-
Annotation Creation
create(type:EntityMention,begin:0, end:10,polarity:1, uncertainty:0,ontologyConcepts:[create(type:UmlsConcept, cui:"C01"),create(type:UmlsConcept, cui:"C02")]
)
I Built-in support for cTAKES common types
I Can nest calls to create on the fly
-
Annotation Matching
sents = select(type:Sentence)patterns = [∼/(?i)(tubular|villous)\s+adenoma/]
match(sents, patterns,{ create(type:EntityMention,
begin:it.start(1), end:it.end(1),polarity:1, uncertainty:0,ontologyConcepts:[create(type:UmlsConcept, cui:"C01")]
)})
I Patterns can be specified over text and/or annotations
I Specified functions are applied to every match
I Action taken can be anything (create annotation one possibility)
-
Annotations Matching (More Complex Example)
pat = (∼/(?s)(?@Head)(?=(?@Head)|\Z)/)AnnotationMatcher matcher =pat.matcher(includeText:false)
matcher.each{ Map binding ->create(type:Segment,begin:binding.get("h1").begin,end:(binding.get("h2") ?
binding.get("h2").begin: jcas.documentText.length()))}
-
Pathology Note Concept Annotator
-
Abstractor Tool
https://github.com/lrasmus/DocumentAbstraction
https://github.com/lrasmus/DocumentAbstraction
-
KNIME Execution Workflow
https://www.knime.org
https://www.knime.org
-
KNIME Integration with NLP Pipeline
-
KNIME Validation Workflow
-
Sharing KNIME Workflows
Exporting a KNIME workflow Importing a KNIME workflow
-
Results
I Using the annotator tool at a single site, a corpus of 200randomly selected pathology notes was created. At the polypfinding level (histology + location match), the algorithm achievedsensitivity of 0.95 and PPV 0.95.
I The algorithm and tools were subsequently shared with threeadditional eMERGE sites
I Porting to new sites required modifications to script filesregulating sectionizing and relation extraction.
I After running the algorithm across all four sites, we were able toextract a genotyped cohort of 5839 patients.
I Our qualitative evaluation indicated that using the toolssignificantly sped up the porting process, and is a promisingapproach to similar tasks involving NLP-based phenotypealgorithms.
-
Conclusion
I Porting NLP-based phenotype algorithms across sites presentstime-consuming challenges, limiting cross-site collaborationopportunities.
I We developed a set of tools to help make it easier to share,execute, modify, and validate NLP algorithms.
I In future work we will peform a quantitive evaluation of thebenefits of using these tools for porting NLP-based phenotypealgorithms.
I Pathology notes are relatively uniform and lend themselves topattern-based rules; more complex notes or tasks may not lendthemselves as well to this approach.
I The integration of NLP with KNIME can be improved bydeveloping extension nodes that allow for direct editing of scriptfiles without requiring re-generation of NLP libraries.
-
Thanks!