matrixflow: temporal network visual analytics to track symptom evolution during disease progression
DESCRIPTION
Our paper was presented at AMIA 2012 in Chicago in November 2012. Citation: Adam Perer, Jimeng Sun. MatrixFlow: Temporal Network Visual Analytics to Track Symptom Evolution during Disease Progression. American Medical Informatics Association Annual Symposium (AMIA 2012). Chicago, Illinois. (2012). Objective: To develop a visual analytic system to help medical professionals improve disease diagnosis by providing insights for understanding disease progression. Methods: We develop MatrixFlow, a visual analytic system that takes clinical event sequences of patients as input, constructs time-evolving networks and visualizes them as a temporal flow of matrices. MatrixFlow provides several interactive features for analysis: 1) one can sort the events based on the similarity in order to accentuate underlying cluster patterns among those events; 2) one can compare co-occurrence events over time and across cohorts through additional line graph visualization. Results: MatrixFlow is applied to visualize heart failure (HF) symptom events extracted from a large cohort of HF cases and controls (n=50,625), which allows medical experts to reach insights involving temporal patterns and clusters of interest, and compare cohorts in novel ways that may lead to improved disease diagnoses. Conclusions: MatrixFlow is an interactive visual analytic system that allows users to quickly discover patterns in clinical event sequences. By unearthing the patterns hidden within and displaying them to medical experts, users become empowered to make decisions influenced by historical patterns.TRANSCRIPT
MatrixFlowTemporal Network Visual Analytics
to Track Symptom Evolutionduring Disease Progression
Adam PererJimeng Sun
Healthcare Analytics Research GroupIBM T.J. Watson Research Center
Overview•Visual Analytics for Data-Driven
Healthcare
Text Mining
•To extract symptoms from clinical notes
Social Network Analysis
•To model co-occurence symptom networks
Visualization
•To enable clinicians to reach insights
Challenge•Difficult to Diagnose Diseases
•Co-morbidities may mask presence.
•Physicians often use clinical knowledge without quantitative data from Electronic Medical Records.
•There are few analytical tools to extract meaningful insights.
Heart Failure•Potentially fatal disease that affects
2% of adults in developed countries
•Difficult to manage
•No systematic diagnostic criteria
•Framingham HF Diagnosis based on
•2 major criteria
•1 major criteria & 2 minor criteria
Heart designed by Catherine Please from The Noun Project
SymptomsFramingham Criteria
Rales
Radiographic Cardiomegaly
Acute Pulmonary Edema
Hepatojugular Reflex
Bilateral Ankle Edema
Nocturnal Cough
Dyspnea on Ordinary Exertion
Hepatomegaly
Pleural Effusion
Population•50,625 Patients (Geisinger Clinic PCPs)
•4,644 case patients
• 1,200 with preserved ejection fraction
• 1,615 with reduced ejection fraction
• 45,981 control patients
Text Mining•3.3 million Clinical Notes (4 GB of Text)
•NLP used to identify Framingham criteria
•900,000 positive
•3.6 million negative
•97% of HF cases met criteria extracted
•8% of controls met criteria extracted
screenshot of part of a clinical note.
Jimeng Sun, PhD
Text Mining•3.3 million Clinical Notes (4 GB of Text)
•NLP used to identify Framingham criteria
•900,000 positive
•3.6 million negative
•97% of HF cases met criteria extracted
•8% of controls met criteria extracted
screenshot of part of a clinical note.
Jimeng Sun, PhD
SequenceAnkle Edema
Medication 1
DO Exertion
Medication 2
DO Exertion
AP Edema
Medication 3
HF Diagnosis
Sequence
Year 1 Year 2 Year 3
Ankle Edema
Medication 1
DO Exertion
Medication 2
DO Exertion
AP Edema
Medication 3
HF Diagnosis
Network
Year 1 Year 2 Year 3
Ankle Edema
Medication 1
DO Exertion
Medication 2
DO Exertion
AP Edema
Medication 3
HF Diagnosis
Network
Year 1 Year 2 Year 3
Ankle Edema
Medication 1
DO Exertion
Medication 2
DO Exertion
AP Edema
Medication 3
HF Diagnosis
Network
Year 1 Year 2 Year 3
Ankle Edema
Medication 1
DO Exertion
Medication 2
DO Exertion
AP Edema
Medication 3
HF Diagnosis
Network
Year 1 Year 2 Year 3
Ankle Edema
Medication 1
DO Exertion
Medication 2
DO Exertion AP Edema
Medication 3
Network
Year 1 Year 2 Year 3
Clustering•Visualization reveals clusters of
clinical events
•Hierarchical Clustering (Newman’s modularity matrix)
•Determines sort order for matrices
window, such as months or weeks.
Once MatrixFlow calculates the diagnosis alignment and intervals are specified, the system models the clinical networks across all patients. In practice, the system determines the nodes in the network by extracting a list of all unique clinical events, and these events become the nodes of the network. The system then efficiently computes, for each event pair, the number of distinct patients who exhibited both events during the same time interval. For instance, if 500 patients out of a population of 10,000 experienced Symptom1 and Symptom2 together in the same interval, then those two nodes would be connected by an edge with a weight of 500/10000 or 5%. However, if zero patients experienced Symptom1 and Symptom 3 in the same time interval, then there would not be an edge connected those two events. This co-occurrence network is computed using our advanced network modeling framework, Orion10. As our networks now feature edges that have varying edge weights, our matrix visualization renders the color of each cell according to a sequential color scale representing the edge weight value (such as the color scale shown at the bottom of Figure 6).
Visualizing Clinical Event Networks
When visualizing matrices, it is important to choose an effective method to sort the order of nodes in order to reveal as many patterns as possible4. As we wish to reveal clusters of clinical events, we employ a greedy hierarchical clustering optimizing Newman’s modularity metric17. From this algorithm, we are able to obtain a sort order that minimizes the distance among connected nodes by ordering the nodes according to the cluster tree produced by our hierarchical clustering algorithm. Figure 4 shows an example of matrix visualization before and after the sorting is produced. In the latter example, well-connected nodes (APEdema, DOExertion, and Medication 3) assemble to form a large box-like structure in the visualization, which makes the cluster more apparent than in the unsorted version on the left.
Figure 4. An illustrative example of revealing clusters by using hierarchical clustering. The matrix on the left is unsorted, whereas the matrix on the right is sorted to minimize distance among connected nodes in the clinical event network.
In addition to visualizing clinical event networks as matrices, one of MatrixFlow’s main features is the ability to compare networks as they change over time. As has been alluded in Figure 3, MatrixFlow supports this by visualizing matrices side-by-side along a horizontal axis. These small multiples provide an overview of the disease progression over time. However, in order make the temporal trends clearer, users can interact with the matrices to focus on a particular trend of interest. By moving the mouse over an edge cell, a line graph is presented that plots the evolution of the co-occuring events across all time intervals. For instance, in Figure 5, the user selected the co-occurrence of DOExertion and Medication 2 and the temporal line graph is visualized.
Patient 1
Patient 2
Patient 3
Patient 4
January 2012 November 2012
•Align by diagnosis date
•Aggregate patient event networks
•Split by user-chosen intervals
Population
BA C Diagnosis
CB E Diagnosis
BA D Diagnosis
DC E Diagnosis
•Align by diagnosis date
•Aggregate patient event networks
•Split by user-chosen intervals
Population
BA C Diagnosis
CB E Diagnosis
BA D Diagnosis
DC E Diagnosis
•Align by diagnosis date
•Aggregate patient event networks
•Split by user-chosen intervals
Population
BA C Diagnosis
CB E Diagnosis
BA D Diagnosis
DC E Diagnosis
•Align by diagnosis date
•Aggregate patient event networks
•Split by user-chosen intervals
Population
•Align by diagnosis date
•Aggregate patient event networks
•Split by user-chosen intervals
Population
•Align by diagnosis date
•Aggregate patient event networks
•Split by user-chosen intervals
Population
Cohorts
Positive Mentions of Framingham Symptoms
Cohorts
•Show temporal trend graphic (e.g. Figure 7 here, but with all 3 matrix
Cohorts
•Show temporal trend graphic (e.g. Figure 7 here, but with all 3 matrix
Evaluation•4 Medical Scientists
•Cardiologist, ER, ER, Epidemiologist
• “Help clinicians make earlier diagnoses”
• “Help prioritize preventative strategies to avoid onset of HF”
Insights• “Symptoms of HF are documented months to
years preceding clinical diagnosis but are also present in non-HF patients.”
• “Rapid increase in HF symptoms are more prominent in cases rather than in matched controls.”
• “MatrixFlow can contribute to the further refinement of diagnostic criteria and may allow for earlier prediction of HF in a primary care population.”
MatrixFlow
Thanks!
[email protected]://www.research.ibm.com/healthcare
Adam Perer and Jimeng SunHealthcare Analytics Research Group
IBM T.J. Watson Research Center