wendy)chapman) danielle)mowery) - idash · pdf filewendy)chapman) danielle)mowery)))...

26
integra(ng Data for Analysis, Anonymiza(on, and SHaring Natural Language Processing Wendy Chapman Danielle Mowery

Upload: duongtu

Post on 02-Mar-2018

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

integra(ng  Data  for  Analysis,  Anonymiza(on,  and  SHaring    

Natural  Language  Processing  Wendy  Chapman  Danielle  Mowery  

   

Page 2: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Tools  &  Services  

Collabora(ve  Knowledge  Authoring  

Visualiza(on  of  NLP  Annota(ons  

Evalua(on  Workbench  

De-­‐Iden(fica(on  

Classifier  Development  

Annota(on  Environment  

Increase  access  to  text    through    NLP  

Decrease  Burden  of  Developing  NLP  

NLP Tools & Services for iDASH

Surveillance  from  TwiOer  

NLP  App  Customiza(on  

Page 3: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Overview  

•  How  can  we  encourage  sharing  of  clinical  data?  » Crea(ng  an  iDASH  de-­‐iden(fica(on  applica(on  

•  How  can  we  decrease  the  burden  in  crea(ng  training  cases  and  annota(ng?  » Developing  an  iDASH  annota(on  environment  » Demo  of  the  iDASH  annota(on  environment  

•  De-­‐iden(fica(on  use  case  

7/19/12   Supported  by  the  NIH  Grant  U54  HL108460  to  the  University  of  California,  San  Diego   3  

Page 4: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Enabling  Data  Sharing  

•  Kawasaki  Disease  DBP  has  pa(ent  data  »  images  »  structured  data  »  clinical  reports  

•  Sharing  this  clinical  data  with  other  researchers  »  Offers  opportuni(es  for  research  advances  »  Presents  many  challenges  

•  How  can  we  enable  sharing  of  Kawasaki  Disease  and  other  clinical  data?  »  Informed  consent  »  Customizable  DUA  for  data  providers  »  HIPAA-­‐compliant  storage  

Supported  by  the  NIH  Grant  U54  HL108460  to  the  University  of  California,  San  Diego   4  7/19/12  

Page 5: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

De-­‐iden(fica(on  of  Clinical  Data  

•  Missing  link  »  Tool  for  removing  18  HIPAA  Iden(fiers  

•  Headers  –  fairly  straighXorward  •  Text  –  more  difficult  

7/19/12   Supported  by  the  NIH  Grant  U54  HL108460  to  the  University  of  California,  San  Diego   5  

   NAME:  Yongsan  Wong    MRN:    5238492  DOB:  06.06.2006  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  This  is  a  14-­‐month-­‐old  baby  boy  (Yongsan)  who  was  transferred  from  Children’s  Community  with  presump(ve  diagnosis  of  Kawasaki  with  fever  for  more  than  5  days  and  conjunc(vi(s,  mild  arthri(s  with  edema,  rash,  resolving  and  with  elevated  neutrophils  and  thrombocytosis,  elevated  CRP  and  ESR.  When  he  was  sent  to  the  hospital,  he  had  a  fever  of  102.  

Pa(ent  names  

Hospital  names  

Medical  record  numbers  

…  

Headers   Text  

Page 6: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Customizable De-­‐iden(fica(on  Service

BoB  

Run de-id tool locally

Retrain on local data

Evaluate de-id On local data

Produce de-id texts

Enable    sharing  of  clinical  data  

1. Pre-trained de-id application

2. Interface for corrections & retraining

3. Support for evaluation of output

Danielle  Mowery,  BreO  South,  Anurag  Nara,  Liqin  Wang,  Mingyuan  Zhang,  Shazia  Ashfaq,  Melissa  Tharp  

Page 7: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Customizable De-­‐iden(fica(on  Service

BoB  

Run de-id tool locally

Retrain on local data

Evaluate de-id On local data

Produce de-id texts

Enable    sharing  of  clinical  data  

1. Pre-trained de-id application

2. Interface for corrections & retraining

3. Support for evaluation of output

Page 8: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

1. Build a Shareable De-identified Corpus

•  MT Samples »  Website with thousands of medical transcriptions »  Minimally de-identified »  Freely available

•  Pilot annotation phase »  6 annotators »  350 reports

•  Distributed annotation phase »  Recruit community annotators »  2,000 reports

Research  Ques(ons:    

-­‐  What  is  the  best  way  to  train  many  annotators?  

-­‐  How  does  pre-­‐annota(on  help?  

-­‐  Does  clustering  data  improve  speed?  

Danielle  Mowery,  BreO  South,  Liqin  Wang,  Mingyuan  Zhang,  Anurag  Narra,  Shazia  Ashfaq    

Page 9: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

BoB:  Best  of  Breed  

7/19/12   Supported  by  the  NIH  Grant  U54  HL108460  to  the  University  of  California,  San  Diego   9  

Ini(al  De-­‐id  Tool  -­‐  BoB    •  Developed  at  the  Salt  Lake  City  VA  •  Incorporates  techniques  used  in  all  other  de-­‐iden(fica(on  applica(ons  

•  Sta(s(cal  •  Regular  expressions  •  Dic(onaries  

 Eventually  add  other  open  source  tools  for  user  to  select  from  

Page 10: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Customizable De-­‐iden(fica(on  Service

BoB  

Run de-id tool locally

Retrain on local data

Evaluate de-id On local data

Produce de-id texts

Enable    sharing  of  clinical  data  

1. Pre-trained de-id application

2. Interface for corrections & retraining

3. Support for evaluation of output

Page 11: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

2. Interface for Correction & Retraining

eHOST

University  of  Utah  –  BreO  South,  Chris  Leng  

Page 12: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Customizable De-­‐iden(fica(on  Service

BoB  

Run de-id tool locally

Retrain on local data

Evaluate de-id On local data

Produce de-id texts

Enable    sharing  of  clinical  data  

1. Pre-trained de-id application

2. Interface for corrections & retraining

3. Support for evaluation of output

Page 13: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Document  &    annota(ons  

Outcome  Measures  for  Selected  Annota(ons  

Select  Classifica(ons    to  View  

Report  List  

AOributes  for  Selected  

Annota(on  

Rela(onships  for  Selected  

Annota(on  Christensen, Murphy, Frabetti, Rodriguez, Savova

3.  Evalua(on  Workbench  

Page 14: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Crea(ng  a  Training  Set  

7/19/12   Supported  by  the  NIH  Grant  U54  HL108460  to  the  University  of  California,  San  Diego   14  

•  Time consuming »  Recruiting & training annotators for high agreement

•  Expensive »  Domain experts especially expensive »  Need annotation by multiple people

•  Logistically challenging »  Managing files and batches of reports »  Setting up annotation tool

•  Redundant »  Hasn’t someone created a schema for this before?

Page 15: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Overview  

•  How  can  we  encourage  sharing  of  clinical  data?  » Crea(ng  an  iDASH  de-­‐iden(fica(on  applica(on  

•  How  can  we  decrease  the  burden  in  crea(ng  training  cases  and  annota(ng?  » Developing  an  iDASH  annota(on  environment  » Demo  of  the  iDASH  annota(on  environment  

•  De-­‐iden(fica(on  use  case  

7/19/12   Supported  by  the  NIH  Grant  U54  HL108460  to  the  University  of  California,  San  Diego   15  

Page 16: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

iDASH Annotation Environment

Annotation Admin eHOST

Client apps on local computer

S Duvall, B South, G Savova, N Elhadad, H Hochheiser

Goal: provide an environment to decrease the burden of annotation

Annotator  Registry  

iDASH Web Services

Evalua(on  Workbench  

Page 17: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Annotator Registry

•  Enlist  for  annota(on    •  Cer(fy  for  annota(on  tasks  

»  Personal  health  informa(on  »  Part-­‐of-­‐speech  tagging  »  UMLS  mapping  

•  Set  pay  rate    •  Searchable  •  Available  for  inclusion  in  new  annota(on  task  

hOp://idash.ucsd.edu/nlp-­‐annotator-­‐registry  

Page 18: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

1.  Assign  annotators  to  a  task  

Annota(on  Admin  

Page 19: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

2.  Create  a  Schema  

Page 20: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

3.  Assign  users  and  set  (me  expecta(ons  

Page 21: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

3.  Keep  track  of  progress  

Page 22: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

eHOST

Syncs with Annotation Admin »  Download schema to annotate with »  Download batch of reports to annotate »  Upload annotated reports

Page 23: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Evalua(on  Workbench  

•  Compare  annota(ons  from  two  sources  •  Drill  down  to  understand  differences  •  Calculate  outcome  measures  •  Perform  error  analysis  

7/19/12   Supported  by  the  NIH  Grant  U54  HL108460  to  the  University  of  California,  San  Diego   23  

Page 24: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Demo of iDASH Annotation Environment

Annotation Admin eHOST

Client apps on local computer

Danielle Mowery

Annotator  Registry  

Evalua(on  Workbench  

iDASH Web Services

Page 25: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

Conclusion  

•  iDASH  NLP  Ecosystem  goals  » Decrease  barriers  to  sharing  of  clinical  data  »  Enhance  clinical  data  use  for  research  

•  Leveraging  the  iDASH  secure  cloud  

•  Future  work  »  Evaluate  and  extend  the  Annota(on  environment  for  crowdsourcing  

» Create  a  customizable  de-­‐id  applica(on  for  iDASH  users  

7/19/12   Supported  by  the  NIH  Grant  U54  HL108460  to  the  University  of  California,  San  Diego   25  

Page 26: Wendy)Chapman) Danielle)Mowery) - iDASH · PDF fileWendy)Chapman) Danielle)Mowery))) Tools& Services Collaborave) ... DUA) for)dataproviders) ... baby)boy)( Yongsan) who)was)transferred)from)Children’s

7/19/12   Supported  by  the  NIH  Grant  U54  HL108460  to  the  University  of  California,  San  Diego  

Thank  you!    

Ques(ons?