giving raw data a chance to talk: a demonstration of...

27
Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment by Teeradache Viangteeravat, PhD, and Naga Satya V. Rao Nagisetty, MS Abstract Secondary use of large and open data sets provides researchers with an opportunity to address high- impact questions that would otherwise be prohibitively expensive and time consuming to study. Despite the availability of data, generating hypotheses from huge data sets is often challenging, and the lack of complex analysis of data might lead to weak hypotheses. To overcome these issues and to assist researchers in building hypotheses from raw data, we are working on a visual and analytical platform called PRD Pivot. PRD Pivot is a de-identified pediatric research database designed to make secondary use of rich data sources, such as the electronic health record (EHR). The development of visual analytics using Microsoft Live Labs Pivot makes the process of data elaboration, information gathering, knowledge generation, and complex information exploration transparent to tool users and provides researchers with the ability to sort and filter by various criteria, which can lead to strong, novel hypotheses. Keywords: clinical research; translational research; visual analytics; research data warehouse; medical informatics; biomedical informatics Introduction Using a clinical research database to facilitate potential cohort discovery and recruit patients for possible future studies is not a new concept, but forming hypotheses from data sets consisting of hundreds to thousands of variables and analyzing them in an intuitive way is a very challenging and complex process. Visual analytics can facilitate the discourse between the user and the data by providing the opportunity for visual interaction with the data in a way that can support analytical reasoning and the exploration of data from multiple perspectives. Visual analytics not only permit users to detect expected events, such as those that might be predicted by models, but also help users discover the unexpected— surprising anomalies, changes, patterns, and relationships that can then be examined and assessed to develop new insights. In this article, we demonstrate exploratory analysis techniques using Microsoft Live Labs Pivot technology, 1 a visual analytics tool that offers a fresh way to visually browse and arrange massive

Upload: others

Post on 24-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual

Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to

Promote Cohort Discovery, Research, and Quality Assessment

by Teeradache Viangteeravat, PhD, and Naga Satya V. Rao Nagisetty, MS

Abstract Secondary use of large and open data sets provides researchers with an opportunity to address high-

impact questions that would otherwise be prohibitively expensive and time consuming to study. Despite the availability of data, generating hypotheses from huge data sets is often challenging, and the lack of complex analysis of data might lead to weak hypotheses. To overcome these issues and to assist researchers in building hypotheses from raw data, we are working on a visual and analytical platform called PRD Pivot. PRD Pivot is a de-identified pediatric research database designed to make secondary use of rich data sources, such as the electronic health record (EHR). The development of visual analytics using Microsoft Live Labs Pivot makes the process of data elaboration, information gathering, knowledge generation, and complex information exploration transparent to tool users and provides researchers with the ability to sort and filter by various criteria, which can lead to strong, novel hypotheses.

Keywords: clinical research; translational research; visual analytics; research data warehouse; medical informatics; biomedical informatics

Introduction Using a clinical research database to facilitate potential cohort discovery and recruit patients for

possible future studies is not a new concept, but forming hypotheses from data sets consisting of hundreds to thousands of variables and analyzing them in an intuitive way is a very challenging and complex process. Visual analytics can facilitate the discourse between the user and the data by providing the opportunity for visual interaction with the data in a way that can support analytical reasoning and the exploration of data from multiple perspectives. Visual analytics not only permit users to detect expected events, such as those that might be predicted by models, but also help users discover the unexpected—surprising anomalies, changes, patterns, and relationships that can then be examined and assessed to develop new insights.

In this article, we demonstrate exploratory analysis techniques using Microsoft Live Labs Pivot technology,1 a visual analytics tool that offers a fresh way to visually browse and arrange massive

Page 2: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

2 Perspectives in Health Information Management, Winter 2014

amounts of data (and images) online. As we show, it can be used to classify data by characteristics, such as in demographic, geographic, and neuroimaging classifications.

Literature Review Visual analytics enhances the concept of information visualization and can be seen as an integrated

approach combining visualization, human factors, and data analysis.2 The goal of visual analytics is to permit people to draw conclusions that lead to better decisions by visually representing information in a way that allows direct interaction with the data and can provide new insights. The synergy among computation, visual representation, and interactive thinking supports intensive analysis by harnessing the human visual system to support information collection, organization, and analysis, that is, the process of making sense of information. Visual analytics is a multidisciplinary field3–5 that combines the methods and strengths of various research areas, including human-computer interaction, cognitive and perceptual science, decision science, information visualization, scientific visualization, geospatial visualization and analytics, databases, data mining and management, statistics, knowledge discovery and representation, and graphics and rendering. It takes advantage of humans’ ability to optically process large amounts of information at once, allowing them to apply analytical reasoning and assess, plan, and make decisions. The benefits of visual data exploration over automatic data mining techniques that use statistics or machine learning are as follows:

1. Visual analytics can easily deal with extremely heterogeneous and noisy data; it is intuitive and does not require understanding complex mathematical or statistical algorithms or parameters, and it is of great value when little is known about the data.

2. It can be used to analyze problems and find effective and efficient solutions that might elude either a machine or a human working alone.6

Geospatial visual analytics is a specialized subtype of visual analytics that supports spatial analysis

and decision making through interactive visual interfaces, such as maps and other visual artifacts.7 Many good online resources to learn about geospatial visual analytics are GeoAnalytics.net,8 Web GIS in Practice IX,9 and a tutorial provided by the Commission on GeoVisualization of the International Cartographic Association.10 A number of software applications and tools that can be useful in various geospatial visual analytics are offered by the GeoVISTA Center at Pennsylvania State University.11 Some examples of human health, surveillance/emergency management, and epidemiology-related geospatial visual analytics applications can be found on a web-based data system for infectious disease surveillance and management that utilizes movable timelines and line-list querying, in addition to other tools for aggregating and stratifying data.12 Google Public Data Explorer is a powerful visualization tool for exploring, visualizing, and sharing data in a Gapminder-like manner.13, 41 Data sets from providers such as World Bank and the US Centers for Disease Control and Prevention (CDC), including data sets that are directly related to human health, such as infectious disease outbreaks, sexually transmitted diseases, mortality, and cancer, are available to explore in Google Public Data Explorer.14

With the recent expansion of web technologies and increased network performance,15 delivering massive image collections has become feasible for translational researchers and clinician-scientists to analyze, interpret, and possibly even make diagnoses from these distributed, networked image collections. Given the recent advancements in web service technologies, a basic component to be considered when developing distributed image portals for viewing massive image collections is the ability to efficiently interact with and effectively search large amounts of data to answer multidimensional analytical queries, along with the ability to augment the data with pertinent experiential knowledge. Several centers and the National Institutes of Health (NIH) have invested heavily in individual image-data storage and retrieval systems. The Biomedical Informatics Research Network (BIRN) system extracts or retrieves and then transmits images from a source,16 while the cancer Biomedical Informatics Grid (caBIG) managed oncology and radiology images from multiple sources through its web servers.17 Notable projects using the National Biomedical Imaging Archive (NBIA) database are the Reference Image Database to

Page 3: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment Evaluate Response (RIDER),18 the Lung Image Database Consortium (LIDC),19 and the Virtual Colonoscopy Collection.20 The NBIA is an online image repository tool that aims to improve the use of imaging to increase the efficiency of cancer detection, diagnosis, and therapeutic response, and to improve clinical decision support.21 Waxholm Space is a conceptual and physical atlas space developed by the International Neuroinformatics Coordinating Facility to serve as a framework for registering and spatially relating neuroanatomical and physiological data, as well as to facilitate data sharing in neuroscience. Currently, researchers can use Waxholm Space to query the spatial location of their own images and retrieve structure names, gene expression, and other data associated with the user-defined point of interest in resources such as the Allen Brain Atlas.22 The Allen Brain Atlas, a genome-wide image database collection, uses an interactive, web-based platform to present a comprehensive online resource for the exploration of mouse and human brain research.23 BrainMaps is an NIH-funded project developed to serve as an online, interactive digital atlas of massive, high-resolution scanned brain structure images for research and didactic purposes.24 The Mouse Brain Library and the WebQTL databases provide huge collections of mouse brain structure data for studies of function behavior and genetic control.25, 26

Here, we demonstrate the addition of a visual analytics layer called PRD Pivot to our clinical research database using Microsoft Live Labs Pivot technology,27 a free tool that offers a novel way to examine and arrange huge amounts of clinical data online. This added layer enables data visualization and the ability to drill down (moving from summary to detailed data) by filtering and sorting information in electronic databases, leading to the discovery of patterns and relationships that would otherwise not be apparent. The visual analytics layer would obviously serve as a research tool for users to explore a massive amount of clinical data at the characteristics level (e.g., demographics) and at observational levels (e.g., number of admissions, medication orders, type of admission or readmission, high-resolution clinical images, and time-frequency signal analyses). In addition, PRD Pivot has been designed to interoperate with the existing Informatics for Integrating Biology and the Bedside (i2b2) framework; thus, a star schema28 has been used for the database structure of PRD Pivot. The future capability to transport metadata and data sets between these systems will require no duplicate effort to write a new extract-transform-load (ETL) process, a hurdle that has been encountered by others.29, 30

PRD Pivot Collection Requirements We deployed PRD Pivot using Linux CentOS dual-core 2.26-GHz Xeon processors running on the

Apache web server. The minimum requirements needed to run Pivot collections are as follows:

• Collection.cxml: The collection extensible markup language (CXML) file consists of a set of rules to describe structured data to be displayed in the PRD Pivot collection. The CXML file contains the set of categories and types associated with it. The types are String, LongString, Number, Date, Time, and Link, which describe the majority of the information associated with the individual images in the collection.

• Collection.xml: This extensible markup language (XML) file contains the unique set of identifications (IDs) and image sizes (width and height) that are assigned to an individual image along with zoom levels of information. Collection.xml is automatically created when the “deep zoom” function in Python is run to subdivide images into various zoom levels. The Python deep zoom function can be downloaded at the Python website.31

• Python Imaging Library (PIL):32 PIL provides image processing functionality and supports many file formats. PIL works in conjunction with Python Deep Zoom Tools for Pivot collections.

• Python Deep Zoom Tools: The deep zoom tool33 version 0.1.0 was adopted to subdivide images into the various zoom levels described in Collection.xml.

• Collection files: These files contain the image collections in various zoom levels indicated as dzi format (the deep zoom file format obtained from Python Deep Zoom Tools).

Page 4: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

4 Perspectives in Health Information Management, Winter 2014

• MNE with Python:34 This Python toolbox provides PRD Pivot with the ability to perform magnetoencephalography (MEG) and electroencephalography (EEG) data analysis and visualization.

Pivot collections can be created by either automated or manual processes. A demonstration of the

automated generation of Pivot collections is available.35 Automated generation would enable investigators of clinical research projects to easily and quickly analyze massive neurological images and very high-resolution clinical images.

Development of the PRD Pivot System: Metadata Data sources include files marked for inclusion in the Pediatric Health Information System that are

extracted from electronic health records (EHRs). The Pediatric Health Information System is an administrative database containing hospital data from 43 leading North American children’s hospitals. We initially built the database to accommodate the data fields from the “Clinician” and “Physician” files that would be the most useful to clinicians conducting research. Appendix A lists variables suggested for the initial phase of the database. Each file contains data for the encounters discharged during a calendar quarter. Data are available from these sources back through 2009.

The database uses All Patient Refined Diagnosis Related Groups (APR-DRGs), which are normed on a pediatric patient population. APR-DRGs, which are a proprietary grouping methodology developed in a joint effort between 3M Health Information Systems and the National Association of Children’s Hospitals and Related Institutions, provide the most comprehensive and complete classification of any severity-of-illness system for pediatric patients.36 Among the 316 APR-DRGs, common APR-DRG codes include 138 (Bronchiolitis/RSV pneumonia), 141 (Asthma), 160 (Major repair of heart anomaly), 225 (Appendectomy), 420 (Diabetes), 440 (Kidney transplant), 662 (Sickle cell anemia crisis), and 758 (Childhood behavioral disorder). Each group has four severity levels of illnesses and four mortality risk levels, whereas the standard DRGs and the Medicaid Severity Diagnosis Related Groups (MS-DRGs) have only a single severity and mortality risk level per group. As an example of the use of APR-DRGs, there are multiple diagnosis codes for asthma, and an encounter might have asthma as the principal diagnosis or a secondary diagnosis. If the encounter was primarily for asthma treatment, the APR-DRG code will be 141. All asthma encounters will be assigned the same APR-DRG code. In our EMR (Electronic Medical Record) system, we code inpatient encounters to APR-DRGs as well as diagnosis related groups (DRGs).

Data are available in PRD Pivot back through 2009, including emergency room, ambulatory surgery, and observation encounters. In this article, we demonstrate analysis of data from all patient visits in 2012. The total number of observations includes 92,175 admissions and 59,868 distinct patient records. Among all patients, 53.5 percent (32,037) were male, 22.8 percent (13,656) were white, and 67.5 percent (40,383) were black or African American. Each source file is put into a “loading zone,” which is a secure database server (MySQL) located behind the University of Tennessee Health Science Center (UTHSC) firewall and protected with an authentication mechanism. To prevent data type failure, all of the columns in the “loading zone” table are defined as variable characters. After all of the flat files are successfully loaded, the loading table is converted into a table with the same structure, but with valid data types. The information is aggregated and cleaned before a customized ETL process is executed to place data into the PRD Pivot. The ETL process restructures the de-identified data into the format required by PRD Pivot. The PRD Pivot programmer examines all quality assurance, usability, and feasibility needs in a highly secured staging environment before pushing the database forward to a production environment. The production database will be able to be queried through a web-based interface with appropriate controlled access and login authentication.

Page 5: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment

De-identification of Data in the PRD Pivot System Only de-identified data are available from PRD Pivot to researchers. We maintain the identified data

in a separate server room with very limited access to meet Health Insurance Portability and Accountability Act (HIPAA) privacy and security standards. Individual-level data are not accessible without appropriate Institutional Review Board approval. All potential re-identification attempts are protected by following the best-practice de-identification process. A series of inclusion and exclusion rules (see Table 1) were developed according to HIPAA specifications to ensure that patient de-identification was sufficient to protect confidentiality. The ETL process enforces these rules with precision. An auditing component embedded in the ETL process provides the ability to review and monitor the status of the PRD Pivot database’s compliance. All patients’ medical record numbers are replaced by an arbitrarily generated sequence number to prevent re-identification. The crosswalk between the medical record numbers and the newly assigned identification value is securely retained on a separate secure server. This link is accessible only to authorized users in the Children’s Foundation Research Institute Biomedical Informatics Core. All data that could potentially identify an individual are shifted such that an individual cannot be identified (the degree of shifting varies from patient to patient). To further protect against patient re-identification, all queries involving fewer than five patients will generate the response “less than 5 patients” without providing any information.

PRD Pivot System Security and Information Technology Infrastructure The data are stored on a MySQL database37 server, housed in a HIPAA-class room and protected by

the UTHSC firewall. Access to the server room is protected by the use of key cards and personal security passcodes. The server uses a RAID configuration on the hard drives to ensure that one failed drive will not cause any data loss. Spare drives are available so that a drive can be replaced immediately in the event of a drive failure. All data are moved to a backup server on a nightly basis. To mitigate data loss in the event of a disaster, the UTHSC disaster recovery plan and/or business continuity plan will be followed. The de-identified database is only available inside the UTHSC virtual private network (VPN). The VPN encrypts all traffic between the firewall and the user’s machine. All communication with the database server is controlled by the firewall because the server resides on a private segment of the UTHSC network. The firewall has an active intrusion detection and protection feature. A user is assigned a unique username and password, which are required for authentication. The credentials are issued after the recipient has provided proper identification and certifications of IRB and HIPAA training. In addition, each user is required to sign a confidentiality agreement prior to accessing the data. Each user must sign a System Rules of Behavior document, including expectations for genuine user-individual identification and password security, prior to being granted PRD Pivot access.

Results Figure 1, Figure 2, and Figure 3 demonstrate the potential patient counts and visual view of patients

who had primary diagnoses of asthma conditions and were readmitted to the hospital within 30 days of discharge. Simple pie charts provide a brief summary of the patient characteristic classifications, such as gender, race, ethnicity, and patient type, for the data retrieved using the PRD Pivot search. Among all patient admission types, 71.4 percent (2,453 admissions) were emergency room, 15.1 percent (518 admissions) were inpatient, 13.5 percent (463 admissions) were observation, and 0.1 percent (2 admissions) were ambulatory surgery. Figure 3 shows a tree view of possible asthma cases, in which black or African American patients demonstrate a higher risk of being readmitted to the hospital than patients of other races. For single patients (by patient ID), we calculate the time interval between the last discharge date and the next admission date using the “Patient Type” status (0 = inpatient, 1 = emergency room, 2 = ambulatory surgery, 3 = observation) for specific readmission types (inpatient to inpatient, inpatient to emergency room, etc.). Table 2 shows examples of how readmission intervals are calculated. This information would be useful to support the evaluation of potential causes for hospital readmission, including reducing unnecessary readmissions. Among these patients, a possible study may be conducted on the accessibility of allergen immunotherapy and asthma outcomes in the African American population (a patient-centered outcome research interest).

Page 6: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

6 Perspectives in Health Information Management, Winter 2014

Figure 4, Figure 5, and Figure 6 show additional applications of the PRD Pivot system for analysis of the patient data. Figure 4 shows the connection between symptoms of asthma and symptoms of bronchitis (ICD-9-CM diagnosis code 466.0). When the two conditions coexist, bronchitis can cause asthma symptoms to worsen (cause an asthma attack). Figure 5 shows a spatial analysis using geocoding. Geocoding is the process of finding associated geographic coordinates from geographic data, such as addresses or zip codes. With the integration of web service technology, geographic data sets (five-digit zip codes) stored in PRD Pivot can be securely mapped into a geographic information system (GIS) for an analysis of health disparities. Figure 6 demonstrates the ability to track the 10th most common ICD-9-CM diagnosis, Acute upper respiratory infections of unspecified site (465.9). This feature could be used to promote efficient resource management during peak seasons.

Discussion The results presented in this article should not be taken as an accurate representation of our patient

data because the results do not include all the data records. These results are meant to demonstrate the potential of the PRD Pivot database and the feasibility of a full-scale data exploration tool using visual analytics like Pivot, an idea introduced by Gary Flake in a TED conference.38 Visual analytics (in which a user asks a question via an interactive visual interface, a data query is performed, and the resulting data are transformed into a visual presentation) takes the process of data analysis many steps further from conventional visualizations (in which data are transformed into a presentation) by focusing on what a researcher wants to know, rather than merely on what data are available.39

In addition to the visual analytics layer, we still have much work to do to enhance the PRD Pivot metadata. Using ICD-9-CM codes for cohort identification has limitations because these codes are generally used for billing purposes and not for clinical research. We plan to add a feature that will allow users to use a text-mining technique to query free-text fields (physician and clinician notes). Additional sociodemographic variables such as income, type of insurance, comorbidity covariates, severity and mortality risk levels from APR-DRGs, and financial charges will be added to the PRD Pivot database to support the evaluation of potential causes of readmission and the development of prediction models.

Future Functionality The visual analytics layer gives the user a dynamic ability to analyze the data items and also serves as

a visualization tool. However, the secondary analytic and statistical functions that would typically be used to test hypotheses will be integrated into PRD Pivot to support dynamic output of group statistics, including tests for differences among groups such as ANOVA and t-tests. In addition, test runs of the HTML540 version of PRD Pivot on tablets with iOS (iPad) and Android operating systems have begun. Further studies in clinical research at the Department of Pediatrics and UTHSC will be evaluated to guide the future expansion of PRD Pivot. We plan to integrate image data of various modalities, such as radiology and neurology images, into PRD Pivot. Figure 7 and Figure 8 demonstrate the Pivot collections for neurological images. Figure 7 shows the forest view of human brain atlas images. The user can view magnetic resonance imaging (MRI) sections of a living human brain in coronal, horizontal, and sagittal views. Figure 8 demonstrates the development of automated brain segmentation using a structured hierarchical clustering method. This example illustrates the possibility of building a broad collection of adult or child brain MRI scans or other medical imaging with a content-based image retrieval technique that would allow retrieval of relevant or similar case studies, thereby enabling comparative study of similar cases (autism, brain damage, abnormalities, etc.) that would otherwise be too complex and time consuming for researchers to perform. Access to a broad range of child brain images will provide important information regarding early brain development, which may eventually result in the development of early behavioral, cognitive, and pharmaceutical treatments to improve outcomes for children with brain abnormalities.

PRD Pivot can also compute independent component analysis (ICA) on raw MEG data using source analysis (see Figure 9 and Figure 10). We demonstrate the use of 64 principal component analysis (PCA) components supplied to ICA plus additional PCA components up to rank 64 of the MEG data. The source

Page 7: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment matching the electrocardiography and electro-oculography is found automatically and displayed. MEG is a functional neuroimaging technique for mapping brain activity. Clinical uses of MEG include detecting and localizing pathological activity in patients with epilepsy. The MEG technique can be used to localize the eloquent cortex for surgical planning in patients with brain tumor. This application of the PRD Pivot database would conveniently combine clinical text and image data in one centralized location.

Evaluation of the PRD Pivot System and Outcome Measures

Pilot users recruited from the Department of Pediatrics and other UTHSC departments’ pool of faculty investigators and researchers participated in PRD Pivot training and completed a survey regarding their experience. Figure 11 shows the average results for each survey question. Overall, users were satisfied with the system (average answer of 5 on a seven-point scale ranging from “strongly disagree” to “strongly agree”). Respondents identified the most positive aspects as the visual representations, the fact that the grouping abilities are easy to use once one gets used to it, and the fact that the graphs are helpful. The most negative aspect identified was that not enough clinical data were available (no vital signs or lab results, only what was ordered in terms of medications, radiology, etc.). The most negative aspects will be addressed in future versions of the PRD Pivot system. Indeed, one of the goals is to evaluate whether the uses of PRD Pivot support retrospective studies using de-identified data, generation of study hypotheses, and prospective decision-making in clinical care. To further assess PRD Pivot, we are conducting posttest assessments to evaluate the average number of publications, average impact score, usability and feasibility, and number of clinical studies related to the database. The results will be compared to pretest information that we collected and will be presented in future studies.

Conclusions Development of a clinical research database that can handle a massive amount of information in one

centralized database (data warehouse) was challenging. Although such a database is not a new concept, making it intuitive for physicians and researchers to use and extracting meaningful information with a powerful informatics tool is a rewarding task. We strongly believe that using visual analytical techniques will promote a solid foundation for clinical research studies, facilitate the ability to link clinical diagnosis data to treatment outcomes, and help to support clinical decisions. From the usability side, visual analytics allow users to directly interact with the underlying data to gain insights, draw conclusions, and ultimately make better decisions. This study of the PRD Pivot database demonstrated the first stage of validating the rich content within the recorded data that are extracted from EHRs. The integration of Microsoft Live Labs Pivot, an innovative web service architecture that provides a unique alternative means of viewing massive clinical and neurological imaging information, into the PRD database allows users to discover recognizable patterns, thus offering the potential for exploring new ideas that can be used to test research hypotheses and promote translational research. Successful implementation of the PRD Pivot database would greatly advance progress toward the short-term goal of increasing the number of clinical studies and the long-term goals of reducing readmission rates and decreasing the physical, emotional, and financial costs of disease management in children.

Page 8: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

8 Perspectives in Health Information Management, Winter 2014

Acknowledgments The authors thank the University of Tennessee Health Science Center (UTHSC) Department of

Information Technology Services Computing Systems division and the UTHSC Office of Biomedical Informatics for the use of informatics resources and collaborations. The authors gratefully acknowledge Rae Shell and Grady Wade for proofreading and providing good comments. The authors would also like to thank the Michigan State University Brain Biodiversity Bank for permission to use the human brain atlas images, reproduced (or adapted) with permission from http://www.brains.rad.msu.edu and http://brainmuseum.org, supported by the US National Science Foundation. Finally, this work could not have been accomplished without the clinical expertise and support of my colleague, Eunice Huang. This work was supported by the Children’s Foundation Research Institute (CFRI).

Teeradache Viangteeravat, PhD, is the technical director of the Biomedical Informatics Core at the Children’s Foundation Research Institute and assistant professor of biomedical informatics in the Department of Pediatrics at the University of Tennessee Health Science Center in Memphis, TN.

Naga Satya V. Rao Nagisetty, MS, is a bioinformatics research specialist at the Children’s Foundation Research Institute at Le Bonheur Children’s Hospital in Memphis, TN.

Page 9: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment

Notes 1. Flake, G. “Is Pivot a Turning Point for Web Exploration?” TED.com video. February/March 2010.

Available at http://www.ted.com/talks/gary_flake_is_pivot_a_turning_point_for_web_exploration.html.

2. Keim, D., G. Robertson, J. Thomas, and J. van Wijk. “Guest Editorial: Special Section on Visual Analytics.” IEEE Transactions on Visualization and Computer Graphics 12, no. 6 (2006): 1361–62.

3. Keim, D., G. Andrienko, J. Fekete, G. Carsten, J. Kohlhammer, and G. Melancon. “Visual Analytics: Definition, Process, and Challenges.” In A. Kerren, J. T. Stasko, J.-D. Fekete, and C. North (Editors), Information Visualization: Human-centered Issues and Perspectives. Konstanz, Germany: Springer-Verlag, 2008, 154–75.

4. Keim, D. A., F. Mansmann, J. Thomas, and H. Ziegler. “Visual Analytics: Scope and Challenges.” In S. J. Simoff, M. H. Böhlen, and A. Mazeika (Editors), Visual Data Mining: Theory, Techniques and Tools for Visual Analytics. Berlin Heidelberg: Springer-Verlag, 2008, 76–90.

5. Thomas, J. J., and K. A. Cook (Editors). Illuminating the Path: Research and Development Agenda for Visual Analytics. Pacific Northwest National Laboratory, Richland Washington, USA: IEEE Press, 2005.

6. Keim, D. “Information Visualization and Visual Data Mining.” IEEE Transactions on Visualization and Computer Graphics 8, no. 1 (2002): 1–8.

7. De Amicis, R., R. Stojanovic, and G. Conti (Editors). GeoSpatial Visual Analytics: Geographical Information Processing and Visual Analytics for Environmental Security. Dordrecht, The Netherlands: Springer, 2009.

8. Web Portal for GeoSpatial Visual Analytics. Available at http://geoanalytics.net/. 9. Kamel Boulos, M., T. Viangteeravat, M. Anyanwu, V. Nagisetty, and E. Kuscu. “Web GIS in

Practice IX: A Demonstration of Geospatial Visual Analytics Using Microsoft Live Labs Pivot Technology and WHO Mortality Data.” International Journal of Health Geographics 10 (2011): 19.

10. Andrienko, G., and N. Andrienko. Geospatial Visual Analytics Tutorial. 2007 (updated 2009). Available at http://www.peer.eu/fileadmin/user_upload/opportunities/metier/course4/c4_visual_analytics_geospatial.pdf.

11. “GeoVISTA Software.” Penn State GeoVISTA Center. Available at http://www.geovista.psu.edu/software/index.html.

12. Guo D. Visual analytics of spatial interaction patterns for pandemic decision support. International Journal of Geographical Information Science. 2007;21(8):859-877. Doi: 10.1080/13658810701349037.

13. Google Public Data Explorer. Available at http://www.google.com/publicdata/home. 14. Google Public Data Explorer Dataset Directory. Available at

http://www.google.com/publicdata/directory#!st=DATASET&q=cancer. 15. CERN. “World Wide Web@20.” 2009. http://info.cern.ch/hypertext/WWW/TheProject.html

(accessed July 12, 2013). 16. Keator, D., J. Grethe, D. Marcus, B. Ozyurt, S. Gadde, S. Murphy, S. Pieper, D. Greve, R. Notestine,

H. Bockholt, P. Papadopoulos, et al. “A National Human Neuroimaging Collaboratory Enabled by the Biomedical Informatics Research Network (BIRN).” IEEE Transactions on Information Technology in Biomedicine 12, no. 2 (2008): 162–72.

17. National Cancer Institute. “Cancer Biomedical Informatics Grid caBIG®.” 2009. https://cabig.nci.nih.gov/ (accessed July 12, 2013).

18. National Cancer Institute. “RIDER [Reference Image Database to Evaluate Therapy Response].” Available at https://wiki.nci.nih.gov/display/CIP/RIDER (accessed July 12, 2013).

19. Clarke, L. P., B. Y. Croft, E. Staab, H. Baker, and D. C. Sullivan. “National Cancer Institute Initiative: Lung Image Database Resource for Imaging Research.” Academic Radiology 8, no. 5 (2001): 447–50. Abstract at http://www.ncbi.nlm.nih.gov/pubmed/11345275 (accessed July 12, 2013).

Page 10: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

10 Perspectives in Health Information Management, Winter 2014

20. Slomka P, Elliott E, Driedger A. "Java-based Remote Viewing and Processing of Nuclear Medicine Images: Towards "The Imaging Department Without Walls." The Journal of Nuclear Medicine. 2000;41:111–118.

21. National Cancer Institute. “National Biomedical Imaging Archive.” Available at http://ncia.nci.nih.gov/ (accessed July 12, 2013).

22. INCF. “International Neuroinformatics Coordinating Facility.” Available at http://www.incf.org/ (accessed July 12, 2013).

23. Allen Institute for Brain Science. Available at http://www.alleninstitute.org/ (accessed June 12, 2013).

24. Mikula, S., J. M. Stone, and E. G. Jones. “BrainMaps.org—Interactive High-Resolution Digital Brain Atlases and Virtual Microscopy.” Brains, Minds, and Media (2008).

25. Rosen, G. P., A. Williams, J. A. Capra, M. T. Connolly, B. L. L. Cruz, D. Airey, K. Kulkarni, and R. W. Williams. “The Mouse Brain Library @ http://www.mbl.org.” International Mouse Genome Conference 14, no. 166 (2000).

26. Williams, R. W., L. Yan, X. Zhou, L. Lu, A. Centeno, L. Kuan, M. Hawrylycz, and G. D. Rosen. “Global Exploratory Analysis of Massive Neuroimaging Collections using Microsoft Live Labs Pivot and Silverlight.” Neuroinformatics 2010: INCF Japan Node Session Abstracts. 2010. Available at http://neuroinformatics2010.org/incf-japan-node-special-symposium/incf-japan-node-session-abstracts.

27. Flake, G. “Is Pivot a Turning Point for Web Exploration?” 28. Murphy, S. N., M. E. Mendis, D. A. Berkowitz, I. Kohane, and H. C. Chueh. “Integration of Clinical

and Genetic Data in the i2b2 Architecture.” AMIA Annual Symposium Proceedings (2006): 1040. 29. Deshmukh, V. G., S. M. Meystre, and J. A. Mitchell. “Evaluating the Informatics for Integrating

Biology and the Bedside System for Clinical Research.” BMC Medical Research Methodology 9 (2009): 70.

30. Hruby, G., J. McKiernan, S. Bakken, and C. Weng. “A Centralized Research Data Repository Enhances Retrospective Outcomes Research Capacity: A Case Report.” Journal of the American Medical Informatics Association 20 (2012): 563–67.

31. Open Zoom. “Python Deep Zoom Tools (0.1.0).” http://open-zoom.googlecode.com/files/deepzoom-tools-0.1.0.zip (accessed July 9, 2013).

32. Python. “Python Imaging Library (PIL).” Available at http://www.pythonware.com/products/pil/ (accessed July 9, 2013).

33. Open Zoom. “Python Deep Zoom Tools (0.1.0).” http://open-zoom.googlecode.com/files/deepzoom-tools-0.1.0.zip (accessed July 9, 2013).

34. MNE Developers. “Tutorial: MEG and EEG Data Processing with MNE and Python.” Available at http://martinos.org/mne/python_tutorial.html (accessed July 12, 2013).

35. Viangteeravat, T., M. Anyanwu, V. Nagisetty, and E. Kuscu. “Automated Generation of Massive Image Knowledge Collections Using Microsoft Live Labs Pivot to Promote Neuroimaging and Translational Research.” Journal of Clinical Bioinformatics 1, no. 1 (2011): 18.

36. Sedman, A. B., V. Bahl, E. Bunting, K. Bandy, S. Jones, S. Z. Nasr, K. Schulz, and D. A. Campbell. “Clinical Redesign Using All Patient Refined Diagnosis Related Groups.” Pediatrics 114, no. 4 (2004): 965–69.

37. Oracle Corporation. “MySQL Database Engine.” Available at http://www.mysql.com (accessed June 2009).

38. Flake, G. “Is Pivot a Turning Point for Web Exploration?” 39. Kamel Boulos, M., T. Viangteeravat, M. Anyanwu, V. Nagisetty, and E. Kuscu. “Web GIS in

Practice IX: A Demonstration of Geospatial Visual Analytics Using Microsoft Live Labs Pivot Technology and WHO Mortality Data.”

40. W3C. “HTML5: A Vocabulary and Associated APIs for HTML and XHTML.” Available at http://www.w3.org/TR/html5/ (accessed July 12, 2013).

41. Gapminder. Available at http://www.gapminder.org/

Page 11: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment

Appendix A Descriptions of PRD Pivot Metadata

Variable Name Description Valid Values Demographic variables

Gender Patient’s gender F = Female M = Male U = Unknown

Race Patient’s race American Indian or Alaska Native

Asian Black or African

American Native Hawaiian or

Other Pacific Islander

White Age Patient’s age Age at last birthday Birth weight Patient’s birth

weight, in grams 1 to 9,999

Gestational age Patient’s gestational age at birth, in weeks

1 to 99

Encounter/admission variables

Admission type Patient’s admission type

1 = Emergency 2 = Urgent 3 = Elective 4 = Newborn 5 = Trauma 9 = Information not available

LOS Patient’s length of stay

Time interval between Admit date and Discharge date

Admit date Patient’s admission date

Enter as YYYYMMDD (YYYY = year; MM = month [with leading zero]; DD = day [with leading zero]

Discharge date Patient’s discharge date

Enter as YYYYMMDD (YYYY = year; MM = month [with leading zero]; DD = day [with leading zero]

Disposition Patient’s discharge 1 = Discharged to

Page 12: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

12 Perspectives in Health Information Management, Winter 2014

status home 2 = Transferred to short-term facility 3 = Transferred to skilled nursing facility 4 = Transferred to intermediate care facility 5 = Transferred to other healthcare facility 6 = Transferred to home health care 7 = Left AMA (against-medical-advice) 20 = Expired/Mortality

Age at admission Patient’s age at admission

Time interval between date of birth and Admission date

AAP age group (American Academy of Pediatrics)

AAP age group code 1 = 0–30 days 2 = 1–23 months 3 = 2+ years 0 = missing

Admitting diagnosis The ICD-9-CM diagnosis code describing the patient’s diagnosis at the time of admission

Valid ICD-9-CM diagnosis code

Physician profiles Attending Physician NPI

Attending physician’s National Provider Identifier

A sequence of exactly 10 characters, each character restricted to 0–9

Principal Px Physician NPI

The physician National Provider Identifier associated with the surgeon of the principal procedure

A sequence of exactly 10 characters, each character restricted to 0–9

Diagnoses Admitting diagnosis The ICD-9-CM diagnosis code describing the patient’s diagnosis at the time of admission

Valid ICD-9-CM diagnosis code

Principal diagnosis The ICD-9-CM Valid ICD-9-CM

Page 13: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment

diagnosis code describing the condition established after study to be chiefly responsible for occasioning the admission of the patient for care

diagnosis code

Secondary diagnosis An ICD-9-CM diagnosis code corresponding to a condition that existed at the time of admission, that developed subsequently, or that affected the treatment received and/or the length of stay

Valid ICD-9-CM diagnosis code

Procedures Principal procedure code

An ICD-9-CM procedure code corresponding to the principal procedure of the patient’s admission

Valid ICD-9-CM procedure code

Derived flags NICU flag A flag indicating whether the patient spent time in a neonatal intensive care unit during the admission

0 = No 1 = Yes

PICU flag A flag indicating whether the patient spent time in a pediatric intensive care unit during the admission

0 = No 1 = Yes

Vent flag A flag indicating whether the patient spent time on mechanical ventilation during the admission

0 = No 1 = Yes

ECMO flag A flag indicating whether the patient spent time on extracorporeal membrane

0 = No 1 = Yes

Page 14: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

14 Perspectives in Health Information Management, Winter 2014

oxygenation during the admission

Hyperal flag A flag indicating whether the patient spent time on hyperalimentation during the admission

0 = No 1 = Yes

Readmissions/returns IP to IP Inpatient to Inpatient 0 = Inpatient 1 = Emergency Department 2 = Ambulatory Surgery 3 = Observation 4 = Clinic Visit 5 = All Other

IP to ER Inpatient to Emergency

IP to AS Inpatient to Ambulatory Surgery

OBS to IP Observation to Inpatient

OBS to AS Observation to Ambulatory Surgery

OBS to OBS Observation to Observation

OBS to ER Observation to Emergency

AS to ER Ambulatory Surgery to Emergency

AS to IP Ambulatory Surgery to Inpatient

ER to ER Emergency to Emergency

ER to AS Emergency to Ambulatory Surgery

ER to IP Emergency to Inpatient

ER to OBS Emergency to Observation

Page 15: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment

Table 1 Data De-identification and Exclusion Rules

Identifier Included in PRD Pivot Database

Included in Source Data

Name No No Postal address No Yes Social security number No No Telephone and fax numbers No No Birth date Age will be provided Yes Admission date No Yes Discharge date No Yes Date of death No No Medical record number No Yes Certificate/license numbers No No E-mail addresses No No Ages over 89 and all elements of dates indicative of such age

No No

Health plan beneficiary numbers

No No

Vehicle identifiers and serial numbers, including license plate numbers

No No

Device identifiers and serial numbers

No No

Web URLs No No Internet protocol (IP) address No No Biometric identifiers, including fingerprints and voice prints

No No

Full-face photographic images and any comparable images

No No

Any other unique identifying numbers about the individual

No* No*

Note: Asterisks indicate items that are not believed to be captured in any field of the PRD Pivot database, but they may be added without our knowledge. We do not intend to use such identifiers.

Page 16: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

16 Perspectives in Health Information Management, Winter 2014

Table 2 Readmission Calculation Examples Patient

ID Admit Date

Discharge Date

Patient Type

Time Interval (Any)

Time Interval

(IP to IP)

Time Interval

(IP to ER) 1 08/13/2012 08/21/2012 IP 0 0 0 1 08/24/2012 08/24/2012 AS 3 0 0 1 08/27/2012 08/27/2012 AS 3 0 0 1 08/31/2012 09/07/2012 IP 4 10 0 2 08/21/2012 08/31/2012 IP 0 0 0 2 09/07/2012 09/07/2012 ER 7 0 7

Abbreviations: IP, inpatient; AS, ambulatory surgery; ER, emergency.

Page 17: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment

Figure 1 PRD Pivot Cohort Search Demonstrates the Ability to Search for Possible Asthma Cases (APR-DRG Code 141)

Page 18: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

18 Perspectives in Health Information Management, Winter 2014

Figure 2 PRD Pivot Cohort Search Demonstrates the Ability to Search for Possible Asthma Cases (APR-DRG Code 141) with Patients Readmitted within 30 Days of Discharge (Tree View Sorted by Patient’s Gender)

Page 19: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment Figure 3 PRD Pivot Cohort Search Demonstrates the Ability to Search for Possible Asthma Cases (APR-DRG Code 141) with Patients Readmitted within 30 Days of Discharge (Tree View Sorted by Patient’s Race)

Page 20: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

20 Perspectives in Health Information Management, Winter 2014

Figure 4 PRD Pivot Search Shows Coexisting Conditions: Asthma (ICD-9-CM Code 493.92) and Bronchitis (ICD-9-CM Code 466.0)

Page 21: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment

Figure 5 PRD Pivot Integrates with ESRI Geographic Information Systems (GIS) for Spatial Analysis

Page 22: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

22 Perspectives in Health Information Management, Winter 2014

Figure 6 PRD Pivot Demonstrates the Ability to Track the 10th Most Common ICD-9-CM Diagnosis (Acute upper respiratory infections of unspecified site, 465.9)

Page 23: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment

Figure 7 PRD Pivot Collection for Neurological Images: Forest View of the Human Brain Atlas

Page 24: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

24 Perspectives in Health Information Management, Winter 2014

Figure 8 PRD Pivot Collection for Neurological Images Demonstrating the Feasibility of Integrating Image Analysis: Example Showing MRI Brain Segmentation Using Structured Hierarchical Clustering

Page 25: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment Figure 9 PRD Pivot Computes Independent Component Analysis (ICA) Components on Raw Magnetoencephalography (MEG) Data

Page 26: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

26 Perspectives in Health Information Management, Winter 2014

Figure 10 Source Analysis: PRD Pivot Automatically Finds and Displays the Source Matching the Electrocardiography and Electro-oculography

Page 27: Giving Raw Data a Chance to Talk: A Demonstration of ...perspectives.ahima.org/.../2013/12/GivingRawData.pdf · Visual analytics enhances the concept of information visualization

Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment

Figure 11 Average Responses of PRD Pivot Assessment Using REDCap Survey Tool