information extraction group health
DESCRIPTION
Information Extraction Group Health. David Carrell, PhD Group Health Research Institute June 29, 2010. David’s background. Group Health Research Institute (GHRI). Group Health (www.ghc.org) Founded 1947, Seattle, WA Integrated delivery system (“HMO”) ~600K patients in WA (some OR, ID) - PowerPoint PPT PresentationTRANSCRIPT
Information ExtractionGroup Health
David Carrell, PhDGroup Health Research InstituteJune 29, 2010
BA
Ph
il. &
Re
lig.
BA
Ph
il. &
Re
lig.
1979
NNCNNCM
A P
ol.
Sci
ence
MA
Pol
. S
cien
ce
1983
SUNYSUNY U of WashingtonU of Washington
PhD
Pol
. S
cien
ceP
hD P
ol.
Sci
ence
1990
Pew
Hea
lth P
olic
yP
ew H
ealth
Pol
icy
1992
UCSFUCSF U of WashU of Wash
Res
earc
h, I
TR
esea
rch,
IT
1993
Group HealthGroup Health
IT,
Res
earc
hIT
, R
esea
rch
2005
David’s background ...
Group Health Research Institute (GHRI)
● Group Health (www.ghc.org)
●Founded 1947, Seattle, WA
●Integrated delivery system (“HMO”)
●~600K patients in WA (some OR, ID)
●Comprehensive EMR & patient portal (2004+)
● GHRI (www.grouphealthresearch.org)
●Founded 1983
●300 staff (50 investigators)
●2009: >250 active grants ($39M)
Group Health Research Institute (GHRI)
● Applied research
●Epidemiology, health systems, clinical trials, economics ...
●Limited bio-informatics expertise
● Collaborative
●HMO-Research Network, Cancer-RN, ... MH-RN
●Federated data systems
● NLP vision
●NLP expertise through collaboration
●Bring NLP to the text—locally ... other network sites
HMO Research Network• Large data repositories • Common EMR platforms
GHRI & Research Consortia
• Virtual Data Warehouse (VDW)
GHRI & Virtual Data Warehouse (VDW)
DiagnosesMRN
provideradate
enctypedxpdx
diagproviderorigdx
Tumor
MRN
dxdatestaging vars…tumor vars…
treatment vars…etc.
EncountersMRN
provideradate
enctypeddate
encounter_subtypefacility_code
discharge_dispositiondischarge_status
DRGadmitting_source
department
Demographics
MRNbirth_date
genderrace1-5hispanic
Pharmacy
rxdaterxsuprxamtrxmd
MRNndc
Vital SignsMRN
measure_datehtwtbmi
days_diffdiastolicsystolicposition
CensusMRNblock
blockgpcountystatetractzip
education vars...income var...race vars...
Procedures
MRNprovider
adateenctype
pxcodetype
performingproviderpxcntorigpx
NDCndc
GenericNameBrandName
Provider
SpecialtyProvider
EnrollmentMRN
enr_startenr_end
ins_medicareins_medicaid
ins_commercialins_privatepay
ins_otherdrugcov
• Structured data (legacy + Epic/EMR)
• Minimum 1990+
• Integrated care delivery (some claims)
• Diagnoses, procedures, pharmacy, tumor, vitals, census/geocode, etc.
HMO Research Network
GHRI & Virtual Data Warehouse (VDW)
GHRI & NLP Adoption
DiagnosesMRN
provideradate
enctypedxpdx
diagproviderorigdx
Tumor
MRN
dxdatestaging vars…tumor vars…
treatment vars…etc.
EncountersMRN
provideradate
enctypeddate
encounter_subtypefacility_code
discharge_dispositiondischarge_status
DRGadmitting_source
department
Demographics
MRNbirth_date
genderrace1-5hispanic
Pharmacy
rxdaterxsuprxamtrxmd
MRNndc
Vital SignsMRN
measure_datehtwtbmi
days_diffdiastolicsystolicposition
CensusMRNblock
blockgpcountystatetractzip
education vars...income var...race vars...
Procedures
MRNprovider
adateenctype
pxcodetype
performingproviderpxcntorigpx
NDCndc
GenericNameBrandName
Provider
SpecialtyProvider
EnrollmentMRN
enr_startenr_end
ins_medicareins_medicaid
ins_commercialins_privatepay
ins_otherdrugcov
Structured Information from TextPathology
MRNaccession_number
collection_date
coding_datethesaurus_version
ImagingMRN
image_numberimage_date
providercoding_date
thesaurus_version
PathologyConcepts
accession_number
concept_codeconcept_type
negated
ImagingConcepts
image_number
concept_codeconcept_type
negated
Clinical NotesConcepts
MRNprovider
adateenctype
concept_codeconcept_type
negated
HMO Research Network
GHRI & NLP Adoption
• caBIG TBPT adoption proposal, Jun 2006 • caTIES for pathology & radiology text, ~2007• Chart note text, May 2007• GWAS (eMERGE) proposal, Aug 2007• GATE experimentation, Feb 2008• Strategic planning conference, Dec 2008• ARRA Challenge Grant, Apr 2009• UIMA/cTAKES adoption, Aug 2009• Proposals... e.g.,HMORN multi-site, Jan 2010
GHRI & NLP Adoption
GHRI & NLP Adoption
● How to bring NLP capacity to clinical text?
●“Cookbooks” (SAS Java programmers)
●“Parachuted” hardware
●Parachuted virtual machine (?)
●Cloud-based processing
● Security issues●Other?
GHRI & NLP Adoption
Challenges of Cloud-based Solutions:
Unfamiliar technologies
Responsibility sharing (e.g., security)
Patient privacy
Institutional risk
De-identification
Graduated adoption?
GHRI & NLP Adoption
SHARP Cloud Security Workshop
Spring 2011
Educational focus
Challenges of processing clinical text in a novel security space (virtual firewall?)
Security best practices
IRB engagement
Graduated adoption strategies
SHARP -- Exploring deployment strategies
NLP Challenge Grant
Natural Language Processing for Cancer Research Network Surveillance Studies
• Aim 1:Deploy open-source NLP softwareDevelop ETL connective tissueBuild “human capital” (Java, NLP)
• Aim 2:NLP algorithm boot camp: Recurrent breast cancer diagnoses>3000 existing gold standard cases (human reviewed)
• Approach:Local deployment/programming supportHigh-level NLP/bioinformatics expertise via external collaboration
• Participants:GHRI (Carrell, Buist, Chubak), Mayo Clinic/Harvard (Savova), Pittsburgh (Chapman), Vanderbilt (Xu).
04/21/23 16
Epic/Clarity Chart NotesEpic/Clarity Chart Notes
Radiology Reports
Radiology Reports
Pathology Reports
Pathology Reports
UIMA/cTAKES NLP
UIMA/cTAKES NLP
RawRaw
RichRich
Document Manager
Document Manager
Document_Identifier Concept_Code
Radiology_Report_000001 2877143
Radiology_Report_000001 8600231
Radiology_Report_000001 3134988
Radiology_Report_000001 5287109
NormalizedNormalized
NLP SQL Server Database
NLP SQL Server Database
NLP Challenge Grant – Aim 1
17
DocumentType
Available Documents
Percent NLPConcept-Coded
Chart Notes 20M 25%
Radiology 4M 33%
Pathology 1.2M 2%
Chart NotesChart Notes
RadiologyRadiology
PathPath
NLP Challenge Grant – Aim 1
NLP Challenge Grant – Aim 2
NLP Challenge Grant – Aim 2
Rec Br Ca?
AE1 AE2 AE3
Progress Notes
AE1 AE2
Oncology Notes
AE1 AE2 AE3
Radiology Reports
AE1
Pathology Reports
NLP Challenge Grant – Aim 2
eMERGE consortium
• Vanderbilt, Mayo, Northwestern, Marshfield, Group Health
• Can EMRs from multiple institutions provide comparable phenotype data for GWAS?
• 14 phenotypes
• Group Health
•structured data
•Adoption of NLP algorithms developed by others
•“Low-tech” NLP
• Text explorer, Assisted chart abstraction
Clinical Text ExplorerSelect text source (chart notes,
radiology, pathology, etc.) Search: recurrent NEAR breast NEAR cancer.
Date range
Sample spec’s N documents,
N patients found
Search terms highlighted
Assisted Chart Abstraction
A-ZFull-text
Indexes
Chart notes• 550K pts• 17M notes• 0.8B lines
SQL Server
Chart notes• 550K pts• 17M notes• 0.8B lines
• Pre-processed
A-ZID A-Z
Date
CohortLists
DataWarehouse
A-ZEtc.
• Point-and-click• Outside EMR
AssistedChart
AbstractionGUI
NLPConceptCodes
Data
• Text capture
Assisted Chart Abstraction
Identify Cohort
Selection criteria applied to the patient
Selection criteria applied to the notes
Pt Dx/Px/Rx
Pt Visits
Pt Demog
Note Date
Note By
Note Type
Note Text
Assign note
priority
Assisted Chart
Abstraction
Traditional chart abstraction Assisted chart abstraction
Data
Assisted Chart Abstraction
2903(100%)
Initial cohort identification: 137,019 (100%)
671(23%)
Inclusion criteria
(demog., dx, px, etc.):
70,119 (51%)
122(4%)
Pre-processed
text:
284 (0.2 %)
228(8%)Electronic
text:
28,186 (21%)
Chart NotesPatientsStage
• Text: “CATARACT”• Note: Op/Ophthal exam• Near: Cataract procedure
Assisted Chart Abstraction
Potential SHARP synergy ...
National Cancer Institute FOA:Tools for Electronic Data Extraction
• Funding:NCI Contract for software development
• Aim:Enhance/automate existing SEER cancer case identification (largely manual abstraction of EHR/paper charts)
• Approach:Assess, propose, test, modify, develop, deploy technologies that leverage NLP to automate some aspects of SEER workflow
• Participants:IMS, Inc., SEER sites (4), Group Health, Harvard
SHARP – NLP research lab
Questions – Discussion