addressing users’ healthcare needs through personal health messages presenter : jason h.d. cho...
TRANSCRIPT
Addressing Users’ Healthcare Needs through Personal Health Messages
Presenter : Jason H.D. Cho
Department of Computer Science,
University of Illinois at Urbana-Champaign, Urbana, IL
• Healthcare is becoming an emerging area.
• Lots of data readily available!• Medical web forums, which we have used in this paper, typically
spans millions of posts.
• Traditionally, health informatics utilize Electronic Medical Records.• In 2006, less than 10% of the hospitals used EMR. By 2009,
almost 50% of the hospitals started using EMR.
Health Informatics and Data Science
• Electronic Medical Records traditionally used in health informatics.• Privacy issues.• Data not readily available.
• In this talk, I’ll talk about how we can utilize personal health messages, or in our case, medical web forums, can be used to address similar problems that EMRs do.
• I’ll talk about two works, each from different perspective:• Macro and Micro.
EMR and Medical Web ForumsWhy bother with medical web forums?
• Macro Perspective• Learn what vast majority of users are saying.
Addressing Users’ health needs
Chee, 2011
• I’ll present two works that address both the macro and micro perspective.• Macro Perspective : Comparative Effectiveness Research
• Micro Perspective : Case Retrieval System (Under submission)
In this talk…
Jason H.D. Cho1,4, Vera Q.Z Liao1,4, Yunliang Jiang1,4,5, Bruce R. Schatz1,2,3,4
1Department of Computer Science,2Institute of Genomic Biology,
3Department of Medical Information Science,4University of Illinois at Urbana-Champaign, Urbana,
IL5Twitter, Inc., San Francisco, CA
Aggregating Personal Health Messages for Scalable
Comparative Effectiveness Research
• Generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor clinical conditions or to improve the delivery of care.
• The American Recovery and Reinvestment Act of 2009 (ARRA) allotted $1.1 billion to support CER.
• Existing Approaches : Randomized trials – precise, but expensive to conduct, generally not
scalable. Research reviews – scalable, but only utilize works done in existing
literature.
• Our approach: Low cost, can generate hypotheses quickly, scalable MedHelp (1 million health messages), Yahoo! Answers (10 million health
messages), HealthBoards (1 million health messages)
Comparative Effectiveness Research
General Technique & TerminologiesTreatment Sentence
Context
Attitude of Context : sgn(Positive – Negative)
= -1So negative attitude
• We have determined users’ sentiment towards treatment is a good indicator of effectiveness. Users’ sentiment towards treatment : Summation of context
attitude the user makes towards treatment of interest Preference between the two treatments is defined if more
people have more net positive sentiment towards a treatment than the other.
• We introduce three different approaches to determine effectiveness based on users’ sentiments.
Our Approach
• Compare authors who explicitly compare two treatments.
• This approach is more precise since the person is directly comparing two treatments against each other.
• However, not many patients compare two treatments directly. We can relax the definition of effectiveness. The new approach should be consistent with individual effectiveness
comparison study.
Individual Effectiveness Comparison Study
Chemotherapy : -Hormonal Therapy
: +
Chemotherapy : +Hormonal Therapy
: -
Chemotherapy is preferred over hormonal therapy.
• Compare groups of people who prefer treatment over those who do not.
• This approach allows leveraging bigger pool of population cohort.
• Both individual comparison and population comparison gave similar preference results on experiments we ran. Allows us to run population effectiveness comparison in lieu of
individual effectiveness comparison! Increases size of cohort pool by order of magnitude.
Population Effectiveness Comparison Study
Chemotherapy : -Hormonal Therapy : -
Chemotherapy is preferred over hormonal therapy
Chemotherapy : + Hormonal Therapy : +
• Different demographics may react differently to a given treatment.
• We conducted population effectiveness comparison study on each demographic groups of interest.
• Two types of comparison : Cross-group Comparison : Compares against two different
demographic groups on one treatment. Within-group Comparison : Compares against two
treatments on one demographics group.
• Q : How do we extract patient’s demographics?
Demographics Effectiveness Comparison
Beta Blocker : -
Beta Blocker : +
Young
Beta Blocker : +
Beta Blocker : -
Old
Older people prefer beta blockers than younger people do.
• Approach 1 : Utilize users’ Profile
• What if user did not list demographic information? We implemented rule-learning demographic extraction algorithm to solve
this problem.
Demographic Extraction
We introduce rule-learning algorithm to extract age.
1. Extract all phrases that match users’ profile page demographic information and mentions in health messages.
2. Run frequent sequence pattern mining algorithm (PrefixSpan) to mine frequent patterns.
3. Remove low precision frequent patterns
Demographic Extraction
I am 30 years old, ...…, a 30 year old …He is 30 years old.Day 30 for me.
I am 30 years old, ...…, a 30 year old …He is 30 years old.Day 30 for me.
Demographic Extraction Performance Evaluation
Precision# Inferred
# Inferred & has Age# Users Users w Age
# Inferred & no age
Our approach effectively removed most of the inferred age that is irrelevant compared to the baseline approach.This approach doubled the number of people with demographic information.
Bre
as
t Can
cer
Heart
Dise
ase
• We used MedHelp forums as our data source, and selected forum categories based on diseases of interest.
• We chose diseases and treatments to conduct experiments from Institute of Medicine’s 100 CER priority list.
• test to determine preference significance.
• Many of the findings were consistent with existing medical literature, such as those from Cochrane Reviews, Agency for Healthcare Research Quality (AHRQ) and New England Journal of Medicine.
• We show some of the results that were statistically significant. On population effectiveness comparison study, 50% of our findings were
consistent with existing literature. The rest, we weren’t able to find literature that verified our claim.
Our Findings
• Population Effectiveness Comparison : Generally each treatments had thousands of patients. For breast cancer : Radiation (2,393), Chemotherapy
(2,878), Hormonal Therapy (1,680) – approximately 7,000 patients
For heart disease : Anticoagulants (2,162), Inhibitor (2,422), Blocker (7,257), Device (2,457) – almost 15,000 patients
How big is the cohort pool?
Population Effectiveness Comparison
Radiation Hormonal Chemotherapy0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.410.39
0.44
0.26
0.34
0.29
Breast Cancer Treatment Comparison
PositiveNegativeCochrane Review :
Chemotherapy is advantageous over hormonal therapy in reducing tumor response rate
New England Journal of Medicine :Patients who had radiation therapy showed lower post-treatment side effects than those who had hormonal treatments.
Population Effectiveness Comparison
Blocker Anticoagulants Device0
0.1
0.2
0.3
0.4
0.5
0.6
0.27
0.35 0.34
0.54
0.40.43
Heart Disease Treatments Comparison
PositiveNegative
New England Journal of Medicine :- Patients using devices (pacemakers,
ICDs) often take Warfarin (anticoagulant).
New England Journal of Medicine:- Warfarin is at least as effective as beta blockers, but are often times more cost-effective.
Male Female0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.21 0.2
0.560.59
Gender comparison on Inhibitors
PositiveNegative
Demographic Effectiveness Comparison
Young Old0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.210.26
0.62
0.56
Age comparison on beta blockers
PositiveNegative
Agency for Health Research and Quality :ACE inhibitors reduce composite efficacy endpoints similarly in males and females.
Archives of Internal Medicine :Younger people have trends of being more impacted by cognitive impairment than older people.
• We introduced how CER hypotheses can be generated using health messages. We introduced how preference as measured by sentiment can be a good
indicator of treatment effectiveness. We also introduced high precision demographic extraction algorithm to
broaden the cohort pool. Personal health messages are scalable. MedHelp was used as our data source,
but other forums can be aggregated to further broaden the cohort pool.
• The results from our algorithm was consistent with existing medical literature.
Conclusion
• Investigate on signals that can be a good indicator of effectiveness (depth). Entity relation semantics extraction to analyze relation between
treatment and its aspects (effectiveness, side effects, etc’s) Shallow Information Extraction approach can be utilized to
determine whether subsection of forum text is about symptoms or treatments.
• Merge multiple sources to leverage bigger cohort pool (breadth). Other medical web forums, such as WebMD, HealthBoards. Social networks and micro-blogs such as Facebook, Twitter and
other sources.
Future Works
Jason H.D. Cho1,4, Parikshit Sondhi1,4, Chengxiang Zhai1,4, Bruce R. Schatz1,2,3,4
* Slides Courtesy of Parikshit Sondhi
1Department of Computer Science,2Institute of Genomic Biology,
3Department of Medical Information Science,4University of Illinois at Urbana-Champaign, Urbana,
IL
Resolving Healthcare Forum Posts via Similar Thread Retrieval
• Users may often want to conduct research by themselves. • They may be curious about what disease they have, or
which medications they may take.
• Macro-tasks cannot take care of this, since it assumes users already know what they want already.
Case Retrieval Task
Query Characteristics
• Queries meant for human experts not automated systems
• Simple non-technical language
• Presence of emotional statements
Envisioned Response
The following threads discuss similar problems: Doritos Allergy Very Severe and New
Certain Foods + Beer = Flushing and Head Pounding…Help!
Peanut/Food Allergies
……………………
Method Overview
• Baseline Weighing• First Post BM-25
• Thread BM-25
• Semantic Weighing• Medical term extraction
• Shallow Information Extraction
• Post Weighing• Monotonic Weighing
• Parabolic Weighing
• Forum Category Weighing• Uniform Weighing (FCUW)
• Feedback Weighing (FCFW)
Background (BKG) Neither PE nor MED
I am severly allergic to some product that is found in both Tostitos and Doritos, as well as random other types of chips. I know the solution is "don't eat chips" but what could the product be? I don't want to accidentally consume it. When I eat this, I get very bad stomach cramps and it ruins the rest of my day/night - the only solution is to go to sleep so I can't feel it. Help! Any ideas on this?
Shallow Information Extraction
Physical Examination (PE) Disease, Symptoms
Medication (MED) Treatment, Prevention
Sondhi, 2010
Medical Entity Extraction
• Applied ADEPT toolkit (MacLean and Heer 2013)
• High precision but low recall
Post Weighing
),()3,1( 1pwcf
),()3,3( 3pwcf
: gives the weight of post i in a thread with K posts),( Kif
Post Weighing Methods Evaluation
FF UF LQ Cross Forum
0.4
0.5
0.6
0.7
0.8
UniformMonotonicParabolic
Forum Used
Acc
ura
cy
• Relevance feedback based on top k retrieved documents.• Forum Category Uniform Weighing (FCUW) : Weighs top-k forum
categories equally.
• Forum Category Feedback Weighing (FCFW) : Weighs forum categories based on how frequently they appeared on retrieved documents.
Forum Category Weighing
Randomly selecting forum IDRatio of current forum ID
amongst retrieved documents
State of the Art Baseline• Baseline BM-25 formula:
• c(w,t): Count of word w in thread t
• c(w,q): Count of word w in query q
• FPBM-25: Consider only the content of first post to represent the thread document
• TBM-25: Consider content of entire thread to represent the thread document
ShallowEx: Relevance Scoring
Give higher importance to PE and MED sentences
Modified Query Count
Word count in PE sentences
Word count in MED sentences
Word count in BKG sentences
MedicalEx: Relevance Scoring
Count of occurrences
labeled as med entity
Count of occurrences
not labeled as med entity
Modified query
frequency
Forum Category Weighing Scoring
New ScoreForum Category
Feedback Weighing
Weights for forum category weighing
Method Summary
• Baseline Weighing• First Post BM-25
• Thread BM-25
• Semantic Weighing• Medical term extraction
• Shallow Information Extraction
• Post Weighing• Monotonic Weighing
• Parabolic Weighing
• Forum Category Weighing• Uniform Weighing (FCUW)
• Feedback Weighing (FCFW)
Evaluation via Pooling• 350K threads and 20 queries from HealthBoards
• 2 judges first judged 100 query-thread pairs• 88% agreement (κ=0.76)
• 730 total judged query-thread pairs• 324 relevant
• 406 irrelevant
Results: Semantic Methods
Run Method P@5 Recall@30 MAP
B1 Baseline TBM-25 0.3000 0.2846 0.1977
B2 Baseline FPBM-25 0.4700 0.4975 0.3316
S1 B2+MedEx 0.4600 0.4283 0.2918
S2 B2+ShallowEx 0.53 (12.7%) 0.4847 (-2.5%) 0.3481 (4.9%)
Shallow extraction is better than medical entity extraction
Results: Post Weighing
Run Method P@5 Recall@30 MAP
B2 Baseline FPBM-25 0.4700 0.4975 0.3316
P1 Monotonic 0.5100 (8.5%) 0.5240 (5.3%) 0.3631 (9.5%)
P2 Parabolic 0.5100 (8.5%) 0.5040 0.3494
Both post weighing schemes outperform the baseline
Results: Forum Category Weighing
Run Method P@5 Recall@30 MAP
B2 Baseline FPBM-25 0.4700 0.4975 0.3316
P1 Uniform Weighing 0.5200(10.6%)
0.4678(-7.0%) 0.3334 (0.5%)
P2 Feedback Weighing 0.5100 (8.5%)
0.4610(-7.3%) 0.3389 (2.2%)
Uniform Weighing and Feedback Weighing similar performance, but FCFW less parameters to tune.
Results: Method Combinations
Run Method P@5 Recall@30 MAP
B2 Baseline FPBM-25 0.4700 0.4975 0.3316
S2 Baseline FPBM-25 + ShallowEx
0.53 0.4847 0.3481
C2 Monotonic + ShallowEx
0.5400 (14.9%) 0.5354 (7.6%) 0.3745 (12.9%)
C3 Parabolic+ShallowEx
0.5100 0.5155 0.3573
C4 Monotonic + ShallowEx + FCFW
0.5200 0.5625 (13.1%) 0.3702Monotonic + ShallowEx performs the best
What we Learnt• Fairly high P@5 accuracy is achievable
• Shallow information extraction is better for query understanding
• Utility of posts drops steadily with position
• Easy extension of baseline method
Conclusion
• It is possible to address health problems from both macro and micro perspective using health messages.• Macro : Comparative Effectiveness Research• Micro : Case retrieval task
• Health informatics is an emerging area, lots of works done, lots to be done.
• Utilizing Medical web forums
• Phones can be used to measure health as well!• Many fitness apps are out on the market.• Gait patterns are known to be indicative of health.
• If this line of task sound interesting, please feel free to talk to me!
Future works
• It is possible that only a subset of demographics may be posting on medical forums. For example, people who have severe sickness are less likely to post, and those
who are more educated are more likely to use the web. People who’s had negative experience with treatment more likely to post.
• However, these forums do not have limitations on geography, while many randomized trials tend to be limited to particular region, i.e., hospitals that conducted the study.
• Furthermore, we expect these sampling bias to be evenly distributed across treatments.
• Finally, while not utilized in our approach, patients often post symptoms or diagnosis results on the thread post. This allows us to later on sift based on symptoms.
• People tend to post negative symptoms. These can be expected to be evenly spread out between other treatments.
Demographics?
• Demographic Effectiveness Comparison : Generally hundreds of cohorts for each treatment and each
demographic group. Examples are for older population. For breast cancer : Radiation (739), Hormonal (525), Chemotherapy
(770) For heart disease : Anticoagulants (153), Device (166), Inhibitor (217),
Blocker (414)
• We have only used one source, MedHelp. We can broaden this pool by Aggregating multiple sources (WebMD, HealthBoards, disease specific
forums, or even micro-blogs such as Twitter.) Coming up with treatment-agnostic supervised demographic inference
algorithm can broaden the pool as well.
How big is the cohort pool?
• We used top-down approach using various reliable sources (such as Mayo clinic’s website and those from various government sponsored agencies) to extract keywords.
• We also used bottom-up approach that utilized UMLS thesaurus to generate keywords. MetaMap was initially used to extract treatments from
forum threads. These words were then queried into Medline Plus Connect
API to determine if they indeed belong to the treatment class or not.
Treatment Lists?
• J. H. D. Cho and V. Q. Liao and Y. Jiang and B. Schatz, Aggregating Personal Health Messages for Scalable Comparative Effectiveness Research. ACM BCB, 2013
• J. H. D. Cho and P. Sondhi and C. Zhai and B. Schatz, Resolving Healthcare Forum Posts via Similar Thread Retrieval. WWW, 2014
• K. Pattabiraman and P. Sondhi and C. Zhai, Exploiting Forum Thread Structures to Improve Thread Clustering. ICTIR 2013.
• P. Sondhi and M. Gupta and C. Zhai and J. Hockenmaier, Shallow Information Extraction from Medical Forum Data. COLING 2010.
• B. W. Chee and R. Berlin and B Schatz, Predicting Adverse Drug Events from Personal Health Messages, AMIA 2011
• Diana L. MacLean and Jeffrey Heer. Identifying medical terms in patient-authored text: a crowdsourcing-based approach. Journal of the American Medical Informatics Association, pages amiajnl–2012–001110+, May 2013.
References
unig
ram
s
+sem
antic
+pos
ition
+mor
phol
ogica
l
+wor
dcou
nt
+thr
eadc
reat
or
+big
ram
s
Feat. S
elec
tion
60
62
64
66
68
70
72
74
76
Performance results for different feature sets
Order-1 CRF
SVM
Feature Set
Pe
rce
nta
ge
A
ccu
racy
We use the best performing SVM based classifier(Posts: 175, Sentences: 1494)
ShallowEx: Extraction Model
• We thank the anonymous reviewers for their insightful comments. This research was supported in part by Health Information Technology Center (HITC) Fellowship at the University of Illinois at Urbana-Champaign, and State Farm Doctoral Scholarship. We would also like to thank Sean Massung for helping the authors with the revision.
Acknowledgements