predicting hospital readmission using cascading

25
Predicting Hospital Readmission using Cascading by H. Michael Covert and Victoria Loewengart September 3, 2015 Proprietaryand Confidential

Upload: cascading

Post on 16-Apr-2017

580 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Predicting Hospital Readmission Using Cascading

Predicting  Hospital  Readmission  using  Cascading

by  H.  Michael  Covert  and  Victoria  Loewengart

September  3,  2015

Proprietary  and  Confidential

Page 2: Predicting Hospital Readmission Using Cascading

Agenda• Analytics  Inside• Predictive  Analytics  Use  Case:    Hospital  

Readmissions• Technical  Solution  Overview  • Why  Cascading  and  Alternatives  Explored• Implementation  Considerations• Best  Practices  for  Operational  Readiness

2September  3,  2015 Proprietary  and  Confidential

Page 3: Predicting Hospital Readmission Using Cascading

Analytics  Inside• A  Westerville  Ohio  base  company  founded  in  late  2011.• Self-­‐funded,  profitable,  and  growing.  • The  outgrowth  business  intelligence  and  advanced  analytics  

consulting  (2007),  big  data  consulting  (2010)  ,  and  academic  research.

• Our  mission  is  to:– Be  data  scientists  and  to  develop  and  deliver  innovative  advanced  technologies  to  

vertical  industries.• Health  Care• Cyber-­‐security• Intelligence  and  Law  Enforcement

– Engage  and  focus  on  endeavors  that  deliver  social  value.– To  make  some  money  and  have  some  fun.– To  partner  with  organizations  that  have  delivery  muscle,  vertical  industry  

expertise,  and  share  common  values.

• We  provide  products,  services,  and  training.

3September  3,  2015 Proprietary  and  Confidential

Page 4: Predicting Hospital Readmission Using Cascading

Analytics  Inside• We  are  Concurrent  business  partners

– We  deliver  Concurrent  Cascading  training  services– We  also  have  an  advanced  Cascading  course– We  have  now  18  Big  Data  Technology  modules  that  can  be  assembled  into  

customized  training  curriculums

4September  3,  2015 Proprietary  and  Confidential

Phase Course  Title Description

Planning Hadoop  PlanningSteps  involved  in  identifying  need,  planning  for  introduction  of  Hadoop  into  your  environment,  systems  integration  and  architecture.

Introduction  to  Linux A  comprehensive  review  of  the  Linux  Operating  System.  Introduction  to  Java A  comprehensive  review  of  the  Java  technology  stack.

Introduction  to  Hadoop An  detailed  overview  of  the  Hadoop  technology  stack.Hadoop  Software  Development MapReduce  programming,  NoSQL  (HBase)  programming,  Sqoop

Hadoop  Quality  Assurance Overview  of  standard  QA  processes  and  how  they  are  different  from  standard  QA.Introduction  to  YARN Development  of  Hadoop  version  2  YARN  development.Introduction  to  Spark Development  of  Spark  systemsIntroduction  to  Storm Development  of  Storm  systemsIntroduction  to  Kafka Development  of  Kafka  message  queueing  systems

Natural  Langauge  Processing Overview  of  computational  linguisticsIntroduction  to  Graph  Theory Overview  of  using  Hadoop  and  Spark  to  do  advanced  graph  theoryAdvanced  Text  Analytics Text  analytics  using  big  data  frameworks.

Real  time  Big  DataOverview  of  next  generation  big  data  soultions  that  utilize  real-­‐time  (non-­‐batch)  technology.

Cascading Developer  training  for  the  Concurrent  Cascading  product  set.Advanced  Cascading Developer  training  for  advanced  usages  of  the  Cascading  frameworkAdvanced  Analytics Using  Mahout  to  do  predictive  analytics.

Hadoop  Systems  Administration Installing,  configuring,  and  maintaining  a  Hadoop  clusterAdministration

Comprehension

Development

Page 5: Predicting Hospital Readmission Using Cascading

Analytics  Inside  Solutions• Our  solution  family,  collectively  known  as  RelMiner™,  

consists  of  a  set  of  software  components  that  are  designed  to  be  integrated  into  domain  specific  products

• Specialization  in:– Big  data  solutions

• Hadoop,   YARN,  Tez,  and  Spark• NoSQL  Databases• Streaming  – Kafka,  Storm,  and  Spark  Streaming

–Machine  Learning• Predictive  and  Prescriptive  Analytics

–Natural  language  processing   (Computational   Linguistics,  Text  Analytics,  and  Text  Mining)–Graph  theory  and  graph  databases

• Cascading  is  core  to  all  of  our  development–We  started  with  1.3  and  have  now  seamlessly  migrated  to  3.0  using  Tez

5September  3,  2015 Proprietary  and  Confidential

Page 6: Predicting Hospital Readmission Using Cascading

Predictive  Analytics  Use  CaseHospital  Readmissions

• A  hospital  patient  readmission   is  a  costly  event  that  health  care  providers  are  attempting  to  reduce.  – A  readmission  is  defined  as  ANY  reentry  to  a  hospital  30  days  or  

less  from  a  prior  discharge.  • The  US  Affordable  Care  Act  mandates  lower  readmission

– If  not  achieved,  providers  face  fines  or  reduced  government  reimbursement.  

– Specifically,  the  US  Medicare  and  Medicaid  will  either  not  pay  or  will  reduce  the  payment  made  to  hospitals  for  expenses  incurred  when  readmissions  occur.  

– By  the  end  of  2014,  over  2600  hospitals  incurred  in  excess  of  $24B  of  losses  from  a  Medicare  and  Medicaid  expenses,  now  expected  to  rise  to  $50B  by  the  end  of  this  year.

•6September  3,  2015 Proprietary  and  Confidential

Page 7: Predicting Hospital Readmission Using Cascading

• Predictive  analytics  is  now  being  used  to  categorize  and  prioritize  those  patients  with  the  highest  likelihood  of  readmission– Impact  is  both  clinical  (better  outcome   for  the  patient)  AND financial

• And  better  financially  performing  hospitals  generally  have  better  outcomes!

• Many  such  predictive  systems  exist.  One  is  the  LACE  score.– Invented  by  the  Ottawa  Hospital  Research  Institute,  Institute  for  Clinical  

Evaluation  Sciences,  University  of  Toronto,  University  of  Ottawa  and  University  of  Calgary

– It  is  a  calculation  that  predicts  the  probability  of  readmission  or  death  following  a  hospital  discharge  based  on:• Length  of  Stay  (days  in  hospital)• Accuity (acuteness  or  severity  of  condition)• Comorbidity  (all  of  a  patient’s  diagnosed  conditions)• Emergency  Visits

7September  3,  2015 Proprietary  and  Confidential

Predictive  Analytics  Use  CaseHospital  Readmissions

Page 8: Predicting Hospital Readmission Using Cascading

Why  This  is  a  Big  Data  Problem• Typical  patient  can  have  several  gigabytes  of  data

– Much  information  is  hidden  in  unstructured  chart  data  and  clinical  notes

– 68,000  diagnosis  codes  and  a  very  large  number  of  modifiers,  and  new  diagnosis  codes  are  constructed  to  contain  a  lot  of  information  • Site  – where  on  the  body  the  diagnosis  applies• Combination  codes  – encode  two  or  more  related  conditions

– 87,000  procedure  codes,  again  now  highly  encoded  with  information

• Machine  learning  uses  wide  and  variable  vector  lengths• Multiple  models  are  desirable

– Possibly  trained  by  segmentation  (Neonatal,  geriatric,  etc.)• Data  variation  is  quite  large  – HCPCS,  ICD-­‐10,  CPT,  and  more

– And  researchers  want  to  add  more  data!

8September  3,  2015 Proprietary  and  Confidential

Page 9: Predicting Hospital Readmission Using Cascading

Hospital  Readmissions

9September  3,  2015 Proprietary  and  Confidential

LACE  Subassembly

LACE  Score  =Length  of  Stay  Score  +  Acuity  Score +  Emergency  Visits  Score  +  Comorbidity  Score

Length  of  Stay  in  DaysAcuity (serious  condition)

#  Visits  to  Emergency  Room  over  some  period  of  time

Comorbidity  Score =  Charlson  Comorbidity   Index  =  Age  Score +  ∑  Diagnosis  ScoresPatient  Demographic  Records

Patient  Admission   Records

Patient  Diagnosis  Records

Page 10: Predicting Hospital Readmission Using Cascading

MedPredictAdvanced  Analytics  for  Health  Care

Fundamentals

Classification  and  prediction  of  care  outcome.  It  provides  dashboard  level  reporting,  early  identification  of  potential  undesirable  outcomes,  and  actionable  suggestions  for  remediation.  It  combines  a  variety  of  available  data  sources  including:

• Patient  biometric  data  and  historic  data• Value  Based  Purchasing  scores• ICD-­‐9  and  ICD-­‐10  diagnosis  and  procedures  • DRG  and  MDC  classifications• NCQA  metric  categories  and  HEDIS  standard  data• Pharmaceutical  usage• Patient  and  facility  demographic  data• Patient  psychographic  data  from  chart  data• Re-­‐admissions  and  mortality  data• Inpatient  and  emergency  department  discharge  data• Patient  satisfaction  survey  data• Meaningful   Use  Summary  of  Care  records• Incorporates  LACE  score  through   a  sophisticated  LACE  calculation  

engine

MedPredict™

9September  3,  2015 Proprietary  and  Confidential

Advanced.Analytical.Intelligence.

Page 11: Predicting Hospital Readmission Using Cascading

MedPredict™

11September  3,  2015 Proprietary  and  Confidential

MedPredict™

LACE  EnginePatient  RecordsName/IDDOB/Age

Gender,  Race,  Ethnicity…

Patient  Admission  Records

Admittance   and  Discharge  DateAdmission  Source  and  Type

Admission  TypeDischarge  Disposition

HospitalExpenseInsurer…

Patient  Diagnosis  Records

Date  of  DiagnosisLocation  of  Diagnosis

Diagnosis  CodePhysician  NotesDRG  and  MDCCC/MCC…

Age  Tier  Scores

ER  Buckets

Diagnosis  Scores

Patient  Summary  Record

Diagnosis  Vector

LACE  Metrics

Patient  Data

ETL

ETL

ETL

PredictionEngine

Length  of  Stay

Expected  Expense

ExpectedOutcome

Readmission  Probabilities

Look  back  time

Look  back  time

Page 12: Predicting Hospital Readmission Using Cascading

12September  3,  2015 Proprietary  and  Confidential

MedPredict™

Patient  RecordsName/ID

DOB/Age,  Gender,  Race,  Ethnicity

Patient  Admission  Records

Admittance   and  Discharge  DateAdmission  Source  and  Type

Admission  TypeDischarge  Disposition

HospitalExpenseInsurer

Patient  Diagnosis  Records

Date  of  DiagnosisLocation  of  Diagnosis

Diagnosis  CodePhysician  NotesDRG  and  MDC

CC/MCC

Transform  and  Score

Transform  and  Score

Transform  and  Score

Age  Tier  Score

Look  back  timeER  Buckets

Diagnosis  Score

Patient  Prediction  Record

Diagnosis  Vector

LACE  Metrics

Patient  Data

Procedure  Vector

LACE

ETL

ETL

ETL

PredictionEngine

Length  of  Stay

Expected  Expense

ExpectedOutcome

Readmission  Probabilities

Look  back  time

Look  back  time

Look  back  time

Patient  Procedure  Records

Date,  Procedure  Codes,  Pharmaceuticals,  Patient  Chart  Data

Transform  and  Score

Procedure   Score

Appropriateness  of  Care  Index

MedPredict™

Profitability

Page 13: Predicting Hospital Readmission Using Cascading

MedExtractAdvanced  Analytics  for  Health  Care

Fundamentals

Clinical  information  extraction  from  unstructured  text.The  Problem      -­‐ Even  with  the  advent  of  data  management  technology,  most  of  the  patient  information  is  recorded  as  unstructured  text.  That  includes  health  care  plans,  prescriptions,  doctor’s  observations,  and  patients’  communications  with  their  health  care  providers.    A  wealth  of  information  is  buried  within  these  documents,  yet  it  is  difficult  to  find  because,  unlike  a  database,  it  cannot  be  queried  with  a  uniform  method.

The  Solution  -­‐ MedExtract  utilizes  advanced  Text  Analytics  techniques  and  technologies  for  effective  information  extraction  from  unstructured  text  .    From  the  free  form  text  it  can  extract:

• Diseases,  diagnoses,  and  procedures• Names,  addresses,  phone  numbers,   locations• Drugs,  dosages,  and  usage• Patient  observations  of  sentiment  (depression,  anger,  etc.)

Advanced.Analytical.Intelligence.

MedExtract™

10September  3,  2015 Proprietary  and  Confidential

Page 14: Predicting Hospital Readmission Using Cascading

Technology  Underpinnings

14September  3,  2015 Proprietary  and  Confidential

Page 15: Predicting Hospital Readmission Using Cascading

Technical  Solution  Overview• MedPredict™  contains  several  parts:

– ETL• Ingestion  and  cleansing  using  subassemblies   to  trap  and  record  errors• Creation  of  the  patient  diagnosis  record  from  patient  diagnosis  history• Creation  of  the  patient  admittance  record  from  hospital  discharge  records• Creation  of  full  patient  scoring  record• Text  extraction   from  unstructured  sources

– LACE  calculation• Multiple   subassemblies

– Predictive  modeling• Training  predictive  models• Model  testing• Model  deployment

– Nightly  run• Compute  patient  metrics  and  insert  new  record• Sorting  and  prioritization   -­‐>  Reporting  and  alerting• Trend  analysis  

– Analysis  and  Refinement• Adding  new  data• Adding  new  calculations• Adjusting  parameters• Retraining…  

15September  3,  2015 Proprietary  and  Confidential

ETL Training

ModelModelModel

Scoring

Reporting

Alerting

Analysis

Discharge Chart Diagnosis Procedure Patient History

Kafkapub/sub

Page 16: Predicting Hospital Readmission Using Cascading

Technology  Underpinnings

16September  3,  2015 Proprietary  and  Confidential

Cascading

MedPredictand  

MedExtract

Workflow  control

TechnologyMigration

Driven

MonitoringPerformance  and  Tuning

MedMiner  infrastructure

Mahout

OpenNLP

Page 17: Predicting Hospital Readmission Using Cascading

Why  Cascading?• We  had  already  been  using  Cascading!  

– We  started  here  in  2011,  so  we  had  four  years  under  our  belt– Reliability  and  stability  was  an  issue.  Cascading  is  mature,  and  

unlike  other  systems,  we  understand  how  it  works.• We  literally  “wrote  the  book”  on  Cascading  J

– Cascading  has  been  cost-­‐effective  and  has  preserved  a  large  initial  investment  for  us.• Core  product  set  was  written  in  version  1.3  in  2011.• We  moved  to  version  2  in  2013.• Since  June  2015,  we  are  now  running  under  version  3  using  Tez  

– Cascading  Test  Driven  Design  principles  have  made  development  easier.

– Cascading  subassemblies  and  cascades  have  provided  us  with  tremendous  code  reuse.

17September  3,  2015 Proprietary  and  Confidential

Page 18: Predicting Hospital Readmission Using Cascading

Why  Cascading?

• We  have  one  (portable)  language  and  framework  to  learn  and  maintain.

• We  use  Cascading  dynamic  control.– We  heavily  instrument  our  flows,  and  these  metrics  control  the  

number  of  iterations  that  we  use– We  tried  to  do  this  in  Pig  and  MapReduce,  but  found  it  to  be  very  

difficult  to  impossible

• RelPredict  was  written  in  Mahout.  Cascading  wraps  this  functionality  and  gives  us  “hyper-­‐parallelism”

• RelExtract  uses  a  heavily  augmented  OpenNLP  and  Cascading  also  wraps  this  functionality  in  a  customizable  “pipeline”  much  like  UIMAfit.

18September  3,  2015 Proprietary  and  Confidential

Page 19: Predicting Hospital Readmission Using Cascading

Why  Cascading?• Modules  in  MedPredict  seemed  naturally  to  fit  into  the  Cascading  model  – Several  discrete  steps  must  be  formed  and  fed  forward– Pipe  metaphor  is  ideal  manner  of  expression– Hashjoins give  huge  performance  benefits  during  calculation  phase

– Checkpointing  is  used  extensively  due  to  expected  high  error  rate  in  data

– Cascading  Local  mode  allows  smaller  hospitals  to  use  the  system  without  requiring  a  full  Hadoop  stack

19September  3,  2015 Proprietary  and  Confidential

Page 20: Predicting Hospital Readmission Using Cascading

Implementation  Considerations• Occasionally,  we  find  disk  spills  to  impact  performance  due  to  two  large  

multi-­‐dataset  joins  that  the  LACE  calculation  performs.  • We  make  extensive  usage  of  Buffer  operations,  so  we  need  to  monitor  

these  phases  closely  due  to  memory  and  compute  requirements.  • Usually  at  some  predefined  trigger  point,  we  have  to  run  a  very  large  

predictive  retraining  job.  It  is  very  resource  intensive.  Data  lineage  is  a  big  issue  since  MedPredict's  data  originates  from  many  sources.  

• ETL  is  complex  and  is  customized  relative  to  the  data  that  the  app  is  provided.  – It  must  be  transformed  into  a  common  format  before  MedPredict can  consume  it.  – Errors  in  incoming  data  streams  can  be  quite  baffling  at  times.  

• MedExtract and  Natural  Language  Processing   is  very  resource  intensive.  – Named  Entity  extraction  uses  both  machine  learning  and  dictionaries– We  produce  indices  of  searchable  terms  (usually  consumed  by  Solr/Lucene).– We  produce  records  that  augment  the  other  ETL  streams.

You  have  to  manage  and  monitor  all  these  things.20September  3,  2015 Proprietary  and  Confidential

Page 21: Predicting Hospital Readmission Using Cascading

MedMiner  Deep  Learning

21September  3,  2015 Proprietary  and  Confidential

InpatientAssociations

AndRelationships

Health  Care  Team  Model

Chart  Model

Pharmacy  Model

VBP  Model

Diagnosis,  Treatment,   and  

Outcome

Comparative  Performance

Financial  Impact

Status  and  Discharge  Disposition

Financial Cost

Emergency  Room

VBP  Metrics

Hospitals  and  

Facilities

DataMgmt

RelMiner  Router

Fees   and  Services

Pharmacy

Diagnostic  and  

Procedural

Outcome Outcome

Length  of  Stay

Readmittance

NeonatalJuvenile

AdultGeriatric

Learning  Feedback  Loop  

Chart  and  Notes

NLP

NLP

Page 22: Predicting Hospital Readmission Using Cascading

Best  Practices  for  Operational  Readiness

• We  monitor   end-­‐to-­‐end  performance  and  use  Driven  to  tune  some  of  the  larger  flows.  

• Because  we  use  flow  skipping  extensively,  and  we  monitor  when  steps  have  not  been  skipped  (indicating   that  a  data  refresh  has  occurred).

• Errors  in  incoming   data  streams  can  be  quite  baffling  at  times.  We  use  Traps  and  Checkpoints   to  help  here  extensively.

22September  3,  2015 Proprietary  and  Confidential

Driven,  from  Concurrent,  is  used   to  track  the  performance  of  MedPredict™  to  solve  these  operational  problems

We  are  now  testing  our  system  using  Cascading  version  3  and  Tez.  

Page 23: Predicting Hospital Readmission Using Cascading

Best  Practices  for  Operational  Readiness

• Visualize  your  pipelines   to  make  sure  your  applications  are  executing  as  expected  in  dev,  test  and  prod  environments.

• If  you  are  in  a  regulated  industry,   like  Healthcare,  track  data  lineage.    You  will  have  to  report  on  it  for  internal  and  external  audits

23September  3,  2015 Proprietary  and  Confidential

Driven,  from  Concurrent,  is  used   to  track  the  performance  of  MedPredict™  to  solve  these  operational  problems

Page 24: Predicting Hospital Readmission Using Cascading

Technical  Overview  -­‐ Driven• MedExtract  and  Natural  Language  Processing  is  very  resource  

intensive.  – Named  Entity  extraction  uses  both  machine  learning  and  dictionaries– We  produce  indices  of  searchable  terms  (usually  consumed  by  

Solr/Lucene).– We  produce  records  that  augment  the  other  ETL  streams.  Driven  is  

used  here  for  perf/tune  and  to  monitor  the  overall  NLP  pipes.

• We  are  now  testing  a  Tez  port  of  our  system.  Driven  plays  a  roll  here  as  well.  

24September  3,  2015 Proprietary  and  Confidential

Page 25: Predicting Hospital Readmission Using Cascading

Questions  and  Answers

September  3,  2015 25Proprietary  and  Confidential

[email protected]@AnalyticsInside.us

http://www.AnalyticsInside.us