information extraction group health

29
Information Extraction Group Health David Carrell, PhD Group Health Research Institute June 29, 2010

Upload: shing

Post on 18-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Information Extraction Group Health. David Carrell, PhD Group Health Research Institute June 29, 2010. David’s background. Group Health Research Institute (GHRI). Group Health (www.ghc.org) Founded 1947, Seattle, WA Integrated delivery system (“HMO”) ~600K patients in WA (some OR, ID) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Information Extraction Group Health

Information ExtractionGroup Health

David Carrell, PhDGroup Health Research InstituteJune 29, 2010

Page 2: Information Extraction Group Health

BA

Ph

il. &

Re

lig.

BA

Ph

il. &

Re

lig.

1979

NNCNNCM

A P

ol.

Sci

ence

MA

Pol

. S

cien

ce

1983

SUNYSUNY U of WashingtonU of Washington

PhD

Pol

. S

cien

ceP

hD P

ol.

Sci

ence

1990

Pew

Hea

lth P

olic

yP

ew H

ealth

Pol

icy

1992

UCSFUCSF U of WashU of Wash

Res

earc

h, I

TR

esea

rch,

IT

1993

Group HealthGroup Health

IT,

Res

earc

hIT

, R

esea

rch

2005

David’s background ...

Page 3: Information Extraction Group Health

Group Health Research Institute (GHRI)

● Group Health (www.ghc.org)

●Founded 1947, Seattle, WA

●Integrated delivery system (“HMO”)

●~600K patients in WA (some OR, ID)

●Comprehensive EMR & patient portal (2004+)

● GHRI (www.grouphealthresearch.org)

●Founded 1983

●300 staff (50 investigators)

●2009: >250 active grants ($39M)

Page 4: Information Extraction Group Health

Group Health Research Institute (GHRI)

● Applied research

●Epidemiology, health systems, clinical trials, economics ...

●Limited bio-informatics expertise

● Collaborative

●HMO-Research Network, Cancer-RN, ... MH-RN

●Federated data systems

● NLP vision

●NLP expertise through collaboration

●Bring NLP to the text—locally ... other network sites

Page 5: Information Extraction Group Health

HMO Research Network• Large data repositories • Common EMR platforms

GHRI & Research Consortia

• Virtual Data Warehouse (VDW)

Page 6: Information Extraction Group Health

GHRI & Virtual Data Warehouse (VDW)

DiagnosesMRN

provideradate

enctypedxpdx

diagproviderorigdx

Tumor

MRN

dxdatestaging vars…tumor vars…

treatment vars…etc.

EncountersMRN

provideradate

enctypeddate

encounter_subtypefacility_code

discharge_dispositiondischarge_status

DRGadmitting_source

department

Demographics

MRNbirth_date

genderrace1-5hispanic

Pharmacy

rxdaterxsuprxamtrxmd

MRNndc

Vital SignsMRN

measure_datehtwtbmi

days_diffdiastolicsystolicposition

CensusMRNblock

blockgpcountystatetractzip

education vars...income var...race vars...

Procedures

MRNprovider

adateenctype

pxcodetype

performingproviderpxcntorigpx

NDCndc

GenericNameBrandName

Provider

SpecialtyProvider

EnrollmentMRN

enr_startenr_end

ins_medicareins_medicaid

ins_commercialins_privatepay

ins_otherdrugcov

• Structured data (legacy + Epic/EMR)

• Minimum 1990+

• Integrated care delivery (some claims)

• Diagnoses, procedures, pharmacy, tumor, vitals, census/geocode, etc.

Page 7: Information Extraction Group Health

HMO Research Network

GHRI & Virtual Data Warehouse (VDW)

Page 8: Information Extraction Group Health

GHRI & NLP Adoption

DiagnosesMRN

provideradate

enctypedxpdx

diagproviderorigdx

Tumor

MRN

dxdatestaging vars…tumor vars…

treatment vars…etc.

EncountersMRN

provideradate

enctypeddate

encounter_subtypefacility_code

discharge_dispositiondischarge_status

DRGadmitting_source

department

Demographics

MRNbirth_date

genderrace1-5hispanic

Pharmacy

rxdaterxsuprxamtrxmd

MRNndc

Vital SignsMRN

measure_datehtwtbmi

days_diffdiastolicsystolicposition

CensusMRNblock

blockgpcountystatetractzip

education vars...income var...race vars...

Procedures

MRNprovider

adateenctype

pxcodetype

performingproviderpxcntorigpx

NDCndc

GenericNameBrandName

Provider

SpecialtyProvider

EnrollmentMRN

enr_startenr_end

ins_medicareins_medicaid

ins_commercialins_privatepay

ins_otherdrugcov

Structured Information from TextPathology

MRNaccession_number

collection_date

coding_datethesaurus_version

ImagingMRN

image_numberimage_date

providercoding_date

thesaurus_version

PathologyConcepts

accession_number

concept_codeconcept_type

negated

ImagingConcepts

image_number

concept_codeconcept_type

negated

Clinical NotesConcepts

MRNprovider

adateenctype

concept_codeconcept_type

negated

Page 9: Information Extraction Group Health

HMO Research Network

GHRI & NLP Adoption

Page 10: Information Extraction Group Health

• caBIG TBPT adoption proposal, Jun 2006 • caTIES for pathology & radiology text, ~2007• Chart note text, May 2007• GWAS (eMERGE) proposal, Aug 2007• GATE experimentation, Feb 2008• Strategic planning conference, Dec 2008• ARRA Challenge Grant, Apr 2009• UIMA/cTAKES adoption, Aug 2009• Proposals... e.g.,HMORN multi-site, Jan 2010

GHRI & NLP Adoption

Page 11: Information Extraction Group Health

GHRI & NLP Adoption

● How to bring NLP capacity to clinical text?

●“Cookbooks” (SAS Java programmers)

●“Parachuted” hardware

●Parachuted virtual machine (?)

●Cloud-based processing

● Security issues●Other?

Page 12: Information Extraction Group Health

GHRI & NLP Adoption

Page 13: Information Extraction Group Health

Challenges of Cloud-based Solutions:

Unfamiliar technologies

Responsibility sharing (e.g., security)

Patient privacy

Institutional risk

De-identification

Graduated adoption?

GHRI & NLP Adoption

Page 14: Information Extraction Group Health

SHARP Cloud Security Workshop

Spring 2011

Educational focus

Challenges of processing clinical text in a novel security space (virtual firewall?)

Security best practices

IRB engagement

Graduated adoption strategies

SHARP -- Exploring deployment strategies

Page 15: Information Extraction Group Health

NLP Challenge Grant

Natural Language Processing for Cancer Research Network Surveillance Studies

• Aim 1:Deploy open-source NLP softwareDevelop ETL connective tissueBuild “human capital” (Java, NLP)

• Aim 2:NLP algorithm boot camp: Recurrent breast cancer diagnoses>3000 existing gold standard cases (human reviewed)

• Approach:Local deployment/programming supportHigh-level NLP/bioinformatics expertise via external collaboration

• Participants:GHRI (Carrell, Buist, Chubak), Mayo Clinic/Harvard (Savova), Pittsburgh (Chapman), Vanderbilt (Xu).

Page 16: Information Extraction Group Health

04/21/23 16

Epic/Clarity Chart NotesEpic/Clarity Chart Notes

Radiology Reports

Radiology Reports

Pathology Reports

Pathology Reports

UIMA/cTAKES NLP

UIMA/cTAKES NLP

RawRaw

RichRich

Document Manager

Document Manager

Document_Identifier Concept_Code

Radiology_Report_000001 2877143

Radiology_Report_000001 8600231

Radiology_Report_000001 3134988

Radiology_Report_000001 5287109

NormalizedNormalized

NLP SQL Server Database

NLP SQL Server Database

NLP Challenge Grant – Aim 1

Page 17: Information Extraction Group Health

17

DocumentType

Available Documents

Percent NLPConcept-Coded

Chart Notes 20M 25%

Radiology 4M 33%

Pathology 1.2M 2%

Chart NotesChart Notes

RadiologyRadiology

PathPath

NLP Challenge Grant – Aim 1

Page 18: Information Extraction Group Health

NLP Challenge Grant – Aim 2

Page 19: Information Extraction Group Health

NLP Challenge Grant – Aim 2

Page 20: Information Extraction Group Health

Rec Br Ca?

AE1 AE2 AE3

Progress Notes

AE1 AE2

Oncology Notes

AE1 AE2 AE3

Radiology Reports

AE1

Pathology Reports

NLP Challenge Grant – Aim 2

Page 21: Information Extraction Group Health

eMERGE consortium

• Vanderbilt, Mayo, Northwestern, Marshfield, Group Health

• Can EMRs from multiple institutions provide comparable phenotype data for GWAS?

• 14 phenotypes

• Group Health

•structured data

•Adoption of NLP algorithms developed by others

•“Low-tech” NLP

• Text explorer, Assisted chart abstraction

Page 22: Information Extraction Group Health

Clinical Text ExplorerSelect text source (chart notes,

radiology, pathology, etc.) Search: recurrent NEAR breast NEAR cancer.

Date range

Sample spec’s N documents,

N patients found

Search terms highlighted

Page 23: Information Extraction Group Health

Assisted Chart Abstraction

Page 24: Information Extraction Group Health

A-ZFull-text

Indexes

Chart notes• 550K pts• 17M notes• 0.8B lines

SQL Server

Chart notes• 550K pts• 17M notes• 0.8B lines

• Pre-processed

A-ZID A-Z

Date

CohortLists

DataWarehouse

A-ZEtc.

• Point-and-click• Outside EMR

AssistedChart

AbstractionGUI

NLPConceptCodes

Data

• Text capture

Assisted Chart Abstraction

Page 25: Information Extraction Group Health

Identify Cohort

Selection criteria applied to the patient

Selection criteria applied to the notes

Pt Dx/Px/Rx

Pt Visits

Pt Demog

Note Date

Note By

Note Type

Note Text

Assign note

priority

Assisted Chart

Abstraction

Traditional chart abstraction Assisted chart abstraction

Data

Assisted Chart Abstraction

Page 26: Information Extraction Group Health

2903(100%)

Initial cohort identification: 137,019 (100%)

671(23%)

Inclusion criteria

(demog., dx, px, etc.):

70,119 (51%)

122(4%)

Pre-processed

text:

284 (0.2 %)

228(8%)Electronic

text:

28,186 (21%)

Chart NotesPatientsStage

• Text: “CATARACT”• Note: Op/Ophthal exam• Near: Cataract procedure

Assisted Chart Abstraction

Page 27: Information Extraction Group Health

Potential SHARP synergy ...

National Cancer Institute FOA:Tools for Electronic Data Extraction

• Funding:NCI Contract for software development

• Aim:Enhance/automate existing SEER cancer case identification (largely manual abstraction of EHR/paper charts)

• Approach:Assess, propose, test, modify, develop, deploy technologies that leverage NLP to automate some aspects of SEER workflow

• Participants:IMS, Inc., SEER sites (4), Group Health, Harvard

Page 28: Information Extraction Group Health

SHARP – NLP research lab

Page 29: Information Extraction Group Health

Questions – Discussion