addressing users’ healthcare needs through personal health messages presenter : jason h.d. cho...

Addressing Users’ Healthcare Needs through Personal Health Messages

Presenter : Jason H.D. Cho

Department of Computer Science,

University of Illinois at Urbana-Champaign, Urbana, IL

• Healthcare is becoming an emerging area.

• Lots of data readily available!• Medical web forums, which we have used in this paper, typically

spans millions of posts.

• Traditionally, health informatics utilize Electronic Medical Records.• In 2006, less than 10% of the hospitals used EMR. By 2009,

almost 50% of the hospitals started using EMR.

Health Informatics and Data Science

• Electronic Medical Records traditionally used in health informatics.• Privacy issues.• Data not readily available.

• In this talk, I’ll talk about how we can utilize personal health messages, or in our case, medical web forums, can be used to address similar problems that EMRs do.

• I’ll talk about two works, each from different perspective:• Macro and Micro.

EMR and Medical Web ForumsWhy bother with medical web forums?

• Macro Perspective• Learn what vast majority of users are saying.

Addressing Users’ health needs

• Macro Perspective• Learn what vast majority of users are saying.


Chee, 2011

• Micro Perspective• Users would like to conduct their own research.


• I’ll present two works that address both the macro and micro perspective.• Macro Perspective : Comparative Effectiveness Research

• Micro Perspective : Case Retrieval System (Under submission)

In this talk…

Jason H.D. Cho1,4, Vera Q.Z Liao1,4, Yunliang Jiang1,4,5, Bruce R. Schatz1,2,3,4

1Department of Computer Science,2Institute of Genomic Biology,

3Department of Medical Information Science,4University of Illinois at Urbana-Champaign, Urbana,

IL5Twitter, Inc., San Francisco, CA

Aggregating Personal Health Messages for Scalable

Comparative Effectiveness Research

• Generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor clinical conditions or to improve the delivery of care.

• The American Recovery and Reinvestment Act of 2009 (ARRA) allotted $1.1 billion to support CER.

• Existing Approaches : Randomized trials – precise, but expensive to conduct, generally not

scalable. Research reviews – scalable, but only utilize works done in existing

literature.

• Our approach: Low cost, can generate hypotheses quickly, scalable MedHelp (1 million health messages), Yahoo! Answers (10 million health

messages), HealthBoards (1 million health messages)

Comparative Effectiveness Research

General Technique & TerminologiesTreatment Sentence

Context

Attitude of Context : sgn(Positive – Negative)

= -1So negative attitude

• We have determined users’ sentiment towards treatment is a good indicator of effectiveness. Users’ sentiment towards treatment : Summation of context

attitude the user makes towards treatment of interest Preference between the two treatments is defined if more

people have more net positive sentiment towards a treatment than the other.

• We introduce three different approaches to determine effectiveness based on users’ sentiments.

Our Approach

• Compare authors who explicitly compare two treatments.

• This approach is more precise since the person is directly comparing two treatments against each other.

• However, not many patients compare two treatments directly. We can relax the definition of effectiveness. The new approach should be consistent with individual effectiveness

comparison study.

Individual Effectiveness Comparison Study

Chemotherapy : -Hormonal Therapy

: +

Chemotherapy : +Hormonal Therapy

: -

Chemotherapy is preferred over hormonal therapy.

• Compare groups of people who prefer treatment over those who do not.

• This approach allows leveraging bigger pool of population cohort.

• Both individual comparison and population comparison gave similar preference results on experiments we ran. Allows us to run population effectiveness comparison in lieu of

individual effectiveness comparison! Increases size of cohort pool by order of magnitude.

Population Effectiveness Comparison Study

Chemotherapy : -Hormonal Therapy : -

Chemotherapy is preferred over hormonal therapy

Chemotherapy : + Hormonal Therapy : +

• Different demographics may react differently to a given treatment.

• We conducted population effectiveness comparison study on each demographic groups of interest.

• Two types of comparison : Cross-group Comparison : Compares against two different

demographic groups on one treatment. Within-group Comparison : Compares against two

treatments on one demographics group.

• Q : How do we extract patient’s demographics?

Demographics Effectiveness Comparison

Beta Blocker : -

Beta Blocker : +

Young

Beta Blocker : +

Beta Blocker : -

Old

Older people prefer beta blockers than younger people do.

• Approach 1 : Utilize users’ Profile

• What if user did not list demographic information? We implemented rule-learning demographic extraction algorithm to solve

this problem.

Demographic Extraction

We introduce rule-learning algorithm to extract age.

1. Extract all phrases that match users’ profile page demographic information and mentions in health messages.

2. Run frequent sequence pattern mining algorithm (PrefixSpan) to mine frequent patterns.

3. Remove low precision frequent patterns

Demographic Extraction

I am 30 years old, ...…, a 30 year old …He is 30 years old.Day 30 for me.

I am 30 years old, ...…, a 30 year old …He is 30 years old.Day 30 for me.

Demographic Extraction Performance Evaluation

Precision# Inferred

# Inferred & has Age# Users Users w Age

# Inferred & no age

Our approach effectively removed most of the inferred age that is irrelevant compared to the baseline approach.This approach doubled the number of people with demographic information.

Bre

as

t Can

cer

Heart

Dise

ase

• We used MedHelp forums as our data source, and selected forum categories based on diseases of interest.

• We chose diseases and treatments to conduct experiments from Institute of Medicine’s 100 CER priority list.

• test to determine preference significance.

• Many of the findings were consistent with existing medical literature, such as those from Cochrane Reviews, Agency for Healthcare Research Quality (AHRQ) and New England Journal of Medicine.

• We show some of the results that were statistically significant. On population effectiveness comparison study, 50% of our findings were

consistent with existing literature. The rest, we weren’t able to find literature that verified our claim.

Our Findings

• Population Effectiveness Comparison : Generally each treatments had thousands of patients. For breast cancer : Radiation (2,393), Chemotherapy

(2,878), Hormonal Therapy (1,680) – approximately 7,000 patients

For heart disease : Anticoagulants (2,162), Inhibitor (2,422), Blocker (7,257), Device (2,457) – almost 15,000 patients

How big is the cohort pool?

Population Effectiveness Comparison

Radiation Hormonal Chemotherapy0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.410.39

0.44

0.26

0.34

0.29

Breast Cancer Treatment Comparison

PositiveNegativeCochrane Review :

Chemotherapy is advantageous over hormonal therapy in reducing tumor response rate

New England Journal of Medicine :Patients who had radiation therapy showed lower post-treatment side effects than those who had hormonal treatments.

Population Effectiveness Comparison

Blocker Anticoagulants Device0

0.1

0.2

0.3

0.4

0.5

0.6

0.27

0.35 0.34

0.54

0.40.43

Heart Disease Treatments Comparison

PositiveNegative

New England Journal of Medicine :- Patients using devices (pacemakers,

ICDs) often take Warfarin (anticoagulant).

New England Journal of Medicine:- Warfarin is at least as effective as beta blockers, but are often times more cost-effective.

Male Female0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.21 0.2

0.560.59

Gender comparison on Inhibitors

PositiveNegative

Demographic Effectiveness Comparison

Young Old0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.210.26

0.62

0.56

Age comparison on beta blockers

PositiveNegative

Agency for Health Research and Quality :ACE inhibitors reduce composite efficacy endpoints similarly in males and females.

Archives of Internal Medicine :Younger people have trends of being more impacted by cognitive impairment than older people.

• We introduced how CER hypotheses can be generated using health messages. We introduced how preference as measured by sentiment can be a good

indicator of treatment effectiveness. We also introduced high precision demographic extraction algorithm to

broaden the cohort pool. Personal health messages are scalable. MedHelp was used as our data source,

but other forums can be aggregated to further broaden the cohort pool.

• The results from our algorithm was consistent with existing medical literature.

Conclusion

• Investigate on signals that can be a good indicator of effectiveness (depth). Entity relation semantics extraction to analyze relation between

treatment and its aspects (effectiveness, side effects, etc’s) Shallow Information Extraction approach can be utilized to

determine whether subsection of forum text is about symptoms or treatments.

• Merge multiple sources to leverage bigger cohort pool (breadth). Other medical web forums, such as WebMD, HealthBoards. Social networks and micro-blogs such as Facebook, Twitter and

other sources.

Future Works

Jason H.D. Cho1,4, Parikshit Sondhi1,4, Chengxiang Zhai1,4, Bruce R. Schatz1,2,3,4

* Slides Courtesy of Parikshit Sondhi

1Department of Computer Science,2Institute of Genomic Biology,

3Department of Medical Information Science,4University of Illinois at Urbana-Champaign, Urbana,

IL

Resolving Healthcare Forum Posts via Similar Thread Retrieval

• Users may often want to conduct research by themselves. • They may be curious about what disease they have, or

which medications they may take.

• Macro-tasks cannot take care of this, since it assumes users already know what they want already.

Case Retrieval Task

Query Characteristics

• Queries meant for human experts not automated systems

• Simple non-technical language

• Presence of emotional statements

Document Characteristics

Envisioned Response

The following threads discuss similar problems: Doritos Allergy Very Severe and New

Certain Foods + Beer = Flushing and Head Pounding…Help!

Peanut/Food Allergies

……………………

Method Overview

• Baseline Weighing• First Post BM-25

• Thread BM-25

• Semantic Weighing• Medical term extraction

• Shallow Information Extraction

• Post Weighing• Monotonic Weighing

• Parabolic Weighing

• Forum Category Weighing• Uniform Weighing (FCUW)

• Feedback Weighing (FCFW)

Background (BKG) Neither PE nor MED

I am severly allergic to some product that is found in both Tostitos and Doritos, as well as random other types of chips. I know the solution is "don't eat chips" but what could the product be? I don't want to accidentally consume it. When I eat this, I get very bad stomach cramps and it ruins the rest of my day/night - the only solution is to go to sleep so I can't feel it. Help! Any ideas on this?

Shallow Information Extraction

Physical Examination (PE) Disease, Symptoms

Medication (MED) Treatment, Prevention

Sondhi, 2010

Medical Entity Extraction

• Applied ADEPT toolkit (MacLean and Heer 2013)

• High precision but low recall

Not all posts are equally representative

),(' twc

Post Weighing

Sondhi, 2013

Post Weighing

),()3,1( 1pwcf

),()3,3( 3pwcf

: gives the weight of post i in a thread with K posts),( Kif

Monotonic Post Weighing

2m

1m

3m

Post Position i

Relative Post

Weightfor K=10

Parabolic Post Weighing

Post Weighing Methods Evaluation

FF UF LQ Cross Forum

0.4

0.5

0.6

0.7

0.8

UniformMonotonicParabolic

Forum Used

Acc

ura

cy

Forum Categories

• Relevance feedback based on top k retrieved documents.• Forum Category Uniform Weighing (FCUW) : Weighs top-k forum

categories equally.

• Forum Category Feedback Weighing (FCFW) : Weighs forum categories based on how frequently they appeared on retrieved documents.

Forum Category Weighing

Randomly selecting forum IDRatio of current forum ID

amongst retrieved documents

State of the Art Baseline• Baseline BM-25 formula:

• c(w,t): Count of word w in thread t

• c(w,q): Count of word w in query q

• FPBM-25: Consider only the content of first post to represent the thread document

• TBM-25: Consider content of entire thread to represent the thread document

ShallowEx: Relevance Scoring

Give higher importance to PE and MED sentences

Modified Query Count

Word count in PE sentences

Word count in MED sentences

Word count in BKG sentences

MedicalEx: Relevance Scoring

Count of occurrences

labeled as med entity

Count of occurrences

not labeled as med entity

Modified query

frequency

Post Weighing: Relevance Scoring

Modified Thread Frequency Post Weight Post Frequency

Forum Category Weighing Scoring

New ScoreForum Category

Feedback Weighing

Weights for forum category weighing

Method Summary

• Baseline Weighing• First Post BM-25

• Thread BM-25

• Semantic Weighing• Medical term extraction

• Shallow Information Extraction

• Post Weighing• Monotonic Weighing

• Parabolic Weighing

• Forum Category Weighing• Uniform Weighing (FCUW)

• Feedback Weighing (FCFW)

Evaluation via Pooling• 350K threads and 20 queries from HealthBoards

• 2 judges first judged 100 query-thread pairs• 88% agreement (κ=0.76)

• 730 total judged query-thread pairs• 324 relevant

• 406 irrelevant

Results: Semantic Methods

Run Method P@5 Recall@30 MAP

B1 Baseline TBM-25 0.3000 0.2846 0.1977

B2 Baseline FPBM-25 0.4700 0.4975 0.3316

S1 B2+MedEx 0.4600 0.4283 0.2918

S2 B2+ShallowEx 0.53 (12.7%) 0.4847 (-2.5%) 0.3481 (4.9%)

Shallow extraction is better than medical entity extraction

Results: Post Weighing


B2 Baseline FPBM-25 0.4700 0.4975 0.3316

P1 Monotonic 0.5100 (8.5%) 0.5240 (5.3%) 0.3631 (9.5%)

P2 Parabolic 0.5100 (8.5%) 0.5040 0.3494

Both post weighing schemes outperform the baseline

Results: Forum Category Weighing


B2 Baseline FPBM-25 0.4700 0.4975 0.3316

P1 Uniform Weighing 0.5200(10.6%)

0.4678(-7.0%) 0.3334 (0.5%)

P2 Feedback Weighing 0.5100 (8.5%)

0.4610(-7.3%) 0.3389 (2.2%)

Uniform Weighing and Feedback Weighing similar performance, but FCFW less parameters to tune.

Results: Method Combinations


B2 Baseline FPBM-25 0.4700 0.4975 0.3316

S2 Baseline FPBM-25 + ShallowEx

0.53 0.4847 0.3481

C2 Monotonic + ShallowEx

0.5400 (14.9%) 0.5354 (7.6%) 0.3745 (12.9%)

C3 Parabolic+ShallowEx

0.5100 0.5155 0.3573

C4 Monotonic + ShallowEx + FCFW

0.5200 0.5625 (13.1%) 0.3702Monotonic + ShallowEx performs the best

What we Learnt• Fairly high P@5 accuracy is achievable

• Shallow information extraction is better for query understanding

• Utility of posts drops steadily with position

• Easy extension of baseline method

Conclusion

• It is possible to address health problems from both macro and micro perspective using health messages.• Macro : Comparative Effectiveness Research• Micro : Case retrieval task

• Health informatics is an emerging area, lots of works done, lots to be done.

• Utilizing Medical web forums

• Phones can be used to measure health as well!• Many fitness apps are out on the market.• Gait patterns are known to be indicative of health.

• If this line of task sound interesting, please feel free to talk to me!

Future works

Questions?

Thank you!

• It is possible that only a subset of demographics may be posting on medical forums. For example, people who have severe sickness are less likely to post, and those

who are more educated are more likely to use the web. People who’s had negative experience with treatment more likely to post.

• However, these forums do not have limitations on geography, while many randomized trials tend to be limited to particular region, i.e., hospitals that conducted the study.

• Furthermore, we expect these sampling bias to be evenly distributed across treatments.

• Finally, while not utilized in our approach, patients often post symptoms or diagnosis results on the thread post. This allows us to later on sift based on symptoms.

• People tend to post negative symptoms. These can be expected to be evenly spread out between other treatments.

Demographics?

• Demographic Effectiveness Comparison : Generally hundreds of cohorts for each treatment and each

demographic group. Examples are for older population. For breast cancer : Radiation (739), Hormonal (525), Chemotherapy

(770) For heart disease : Anticoagulants (153), Device (166), Inhibitor (217),

Blocker (414)

• We have only used one source, MedHelp. We can broaden this pool by Aggregating multiple sources (WebMD, HealthBoards, disease specific

forums, or even micro-blogs such as Twitter.) Coming up with treatment-agnostic supervised demographic inference

algorithm can broaden the pool as well.

How big is the cohort pool?

• We used top-down approach using various reliable sources (such as Mayo clinic’s website and those from various government sponsored agencies) to extract keywords.

• We also used bottom-up approach that utilized UMLS thesaurus to generate keywords. MetaMap was initially used to extract treatments from

forum threads. These words were then queried into Medline Plus Connect

API to determine if they indeed belong to the treatment class or not.

Treatment Lists?

• J. H. D. Cho and V. Q. Liao and Y. Jiang and B. Schatz, Aggregating Personal Health Messages for Scalable Comparative Effectiveness Research. ACM BCB, 2013

• J. H. D. Cho and P. Sondhi and C. Zhai and B. Schatz, Resolving Healthcare Forum Posts via Similar Thread Retrieval. WWW, 2014

• K. Pattabiraman and P. Sondhi and C. Zhai, Exploiting Forum Thread Structures to Improve Thread Clustering. ICTIR 2013.

• P. Sondhi and M. Gupta and C. Zhai and J. Hockenmaier, Shallow Information Extraction from Medical Forum Data. COLING 2010.

• B. W. Chee and R. Berlin and B Schatz, Predicting Adverse Drug Events from Personal Health Messages, AMIA 2011

• Diana L. MacLean and Jeffrey Heer. Identifying medical terms in patient-authored text: a crowdsourcing-based approach. Journal of the American Medical Informatics Association, pages amiajnl–2012–001110+, May 2013.

References

unig

ram

s

+sem

antic

+pos

ition

+mor

phol

ogica

l

+wor

dcou

nt

+thr

eadc

reat

or

+big

ram

s

Feat. S

elec

tion

60

62

64

66

68

70

72

74

76

Performance results for different feature sets

Order-1 CRF

SVM

Feature Set

Pe

rce

nta

ge

A

ccu

racy

We use the best performing SVM based classifier(Posts: 175, Sentences: 1494)

ShallowEx: Extraction Model

• We thank the anonymous reviewers for their insightful comments. This research was supported in part by Health Information Technology Center (HITC) Fellowship at the University of Illinois at Urbana-Champaign, and State Farm Doctoral Scholarship. We would also like to thank Sean Massung for helping the authors with the revision.

Acknowledgements

addressing users’ healthcare needs through personal health messages presenter : jason h.d. cho...

Documents

users healthcare needs

users health needs similar

electronic medical records

medical web forumswhy

users health needs chee

users health needs aggregation

vast majority of users

health informatics