sentara linked data workshop - sept 10, 2012
Post on 07-May-2015
329 Views
Preview:
DESCRIPTION
TRANSCRIPT
Integrated Data for Improved Personal Health Delivery
10-September-2012Presenters: Bernadette Hyland, David Wood & Luke Ruth
Email. bhyland@3roundstones.comTwitter: @BernHyland
This presentation: http://slideshare.net/3roundstones
• 9.00-9.20 - (All) Introductions
• 9.20-9.45 - (Phil) Goals & objectives
• 9.45-10.30 - (Bernadette) Value proposition of Linked Data, update on government data publishing initiatives, Health Datapalooza
• 10.30-11.10 - (David) Intro to enterprise linked data, a resource oriented approach to interoperability
• 11.10-11.30 - Break
• 11.30-12noon - (Luke) Review of Weather Health app development
• 12.00-12.45 - lunch
• 12.45-1.30 (David) Web of data architecture, Callimachus
• 1.30-2.15 (All) Building support within Sentara, uses cases for Weather Health (Phase I), Q&A
Today’s Agenda
• Sentara team
• 3 Round Stones team
• Dave Wood, PhD - Enterprise Architect
• Bernadette Hyland - Sr. Solutions Architect
• Luke Ruth - Software Engineer
• ... All specialists in Web architecture & Linked Data
Introductions ...
Customers & Affiliations
Environmental Protection Agency
Government Printing Office
Health & Human Services
• Linked Data is about publishing and consuming data using international data standards
• Based on 20 year old idea
• A system of linked information systems
Why am I speaking on Linked Data and sharing today? I’m here in my role as the co-chair of W3C GLD WG.I’m a serial entrepreneur in this space having founded several companies that led some of the most widely used Open Source projects for Linked Data, including Mulgara, OpenRDF/Sesame, the PURLs 2.0 and Callimachus. I’ve authored chapters a couple peer-reviewed chapters in these books which are available in hardcopy or for free, via the Web.
What ideas involving data access, sharing & re-use can we help nurture?
Businesses are in future shock
• Needs changing at faster pace
• Affordable Care Act,
new regulations, changes in
global economy accelerating
changes
• Information increasingly morecentral to the operation of
any business
Jeff Pollock, Oracle
In a dynamic economy, we have to adapt quickly. We cannot change people or hardware fast enough. We have to take a new approach in software to deal with this. This is a quote from a director @ Oracle who is saying this.
Credits: (c) Random House
"If information systems are to
keep up with business,
we need to change more than technology -
we need to change how people deal with technology."
- Jeff Pollock
Of course, Jeff also said "Changes in behavior have to be well-motivated and show some visible value immediately."
Goal for improved health delivery ...
• Harness larger & more complex datasets to evaluate the potential for health impacts
• More accurately predict factors that contribute to illness or diagnose disease
DATADrives every decision we make daily &
every decision others make on our behalf
What is happening to data? We are sharing it ...The Web is the a natural place to publish information for public dissemination.The modern Web is an information system owned by no one and yet open to vendors, governments and private citizens. The Web of documents has been a great place to share HTML, PDF. However we are entering the Web of Data. This is how we’ll share most open data in the next decade.
“We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.”
-- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People
Governments around the world are defining detailed digital services plans that are based on Open data and open APIs to deliver government and private digital services. At the highest level, government executives in the UK, EU, US, India, Brazil are committed to managing open data and content in a way that is useful for the consumer of that content. The question is HOW?
Sharing Worldwide
We are sharing documents and data worldwide, routinely with people we don’t know. If achieved, it will transform how governments interact with one another, between nations and how they serve their citizens in the 21st Century.Using the Web to solicit input and inform decision making, and ultimately, to create a more transparent and accessible government is a very, very worthwhile goal.
Who is sharing their data ... ? Small and large commercial and government organizations, NGOs, Non-profits ... plus many universities. Governments in the last few years have been responding to Open Government initiatives that mandate publishing open government data. Some are careful, slow-moving entities who simply needed to find real solutions to real problems.
RetailersGoal: Improve click-throughs on search results
Book PublishersGoals: Improve internal manuscript pipelines, expose
additional ways of finding and using content
New Media
GovernmentsGoals: Governmental transparency and/or improved
internal efficiencies (data warehouses)
Common business need ...
•The ability to integrate & manage large amounts of data in a rigorous & transparent manner
•Discovery through interaction of scientific communities, including biomedical informatics & evidence-based medicines
How many are doing it ...
the Web of Data• No one vendor owns it• It scales ... to Web-scale• Doesn’t require a super model• Based on International Data Exchange Standards (RDF, SPARQL)
Scope: Bigger than any other deployed systemInfinitely adaptable: Changes piecemeal and allows for ad hoc additions & changes.Ownership: Nobody owns it
Let’s look at some ‘versions’ of the Web. It should be said here that Tim Berners-Lee, the recognized “father” of the WWW, doesn’t like the idea of versioning the Web. I happen to agree, but I understand why people do it.
As we talk about these versions of the Web, you may want to think of this as a continuum with significant waves; each with its own benchmark technologies rather than specific versions with distinct start and end points.
Nova Spivack of Radar Networks and Twine.com created this.
RDF is a lingua franca for data
exchange
Not all of Open Government content is Linked Data. A relatively small percentage of open data is 4-5 star linked data, however it is growing exponentially. Use of structured data is actively promoted by international standards groups like the W3C and major search engines, Google, Yahoo!, Bing, Yandex.
SemanticTechnologies
SemanticWeb
Linked
Data
Linked Open Data is a small, pragmatic portion of the greater body of Semantic Technologies & international standards for data.
Credit: http://www.w3.org/DesignIssues/LinkedData.html
★ Make your stuff available on the web (any format)
★★ make it available as structured data (e.g. Excel instead of image scan of a table)
★★★ Use a non-proprietary format (e.g. CSV instead of Excel)
★★★★ Use URLs to identify things, so that people can point at your stuff
★★★★★ Link your data to other people’s data to provide context
The 5 Stars of Open Linked DataGuidance per Tim Berners-Lee, W3C
★ Publish your vocabulary on the Web at a stable URI
★★ Provide human-readable documentation and basic metadata (e.g. creator, publisher, date of creation, last modification, version number)
★★★ Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopes
★★★★ Make your vocabulary available via its namespace URI, both as a formal file and human-readable documentation, using content negotiation
★★★★★ Link to other vocabularies by re-using elements rather than re-inventing
5 Stars of Open Linked VocabulariesBernard Vatant (Mondeca) Guidance
Credit: http://blog.hubjects.com/2012/02/is-your-linked-data-vocabulary-5-star_9588.html
Why is RDF important?
• It is an international standard for publishing data on the Web (public and private)
• Data exchange model
• It is the future of the Web
• ... because it is how we share and reuse data
Leading publishers, HCLS scientists, library scientists, new media, old media, retailers have all committed to structured data for improved search & access.
WE’VE SEEN THIS BEFORE
Like HTML and RDF, credit cards have a human-readable side and a machine-readable side.
Each HTML page is paired with a machine-readable data representation.
Open Government Data3 brief years ...
• Starting in 2008, a few heads of state directed open government data to be published on the Web
• In September 2011, Presidents Obama (USA) and Rousseff (Brazil) endorsed the Open Government Partnership
• 7 other nations launched their government’s National Plans during the meeting of the UN General Assembly
Beginning in 2008, a a couple of heads of state embraced directed open government to be published on the Web. Last month, (September 2011), President Obama and President Dilma Rousseff stood with other heads of state to endorse the principles of the Open Government Partnership and launch their government’s Open Government National Plans during the meeting of the UN General Assembly.In addition to Brazil and the US, nations who have made committments include: Indonesia, Mexico, Norway, Philippines, South Africa, and the UK.
• Structured data on the Web is rapidly becoming mainstream
• Government authorities are funding more Linked Open Data projects, especially for weather, human health and scientific research
• In 2012 we’re seeing Apps Challenges, hack-a-thons, funding ($1M-$200M)
What is next for Data?
What’s next? We are already seeing signs of the things to come.Structured data on the Web is quickly becoming mainstream.There have been many well-publicized triple challenges, hack-a-thons, apps challenges -- they are popping up everywhere.Organizations with mission critical applications based on relational technologies are creating a layer above their traditional architectures and building Linked Data-driven Web apps. Web apps based on LD are beginning to replace data warehouses.
Publishing data in 2012 & beyond ...
• Good = Use Data Standards (RDF) to publish metadata about data and models
• Better = Use a Linked Data approach to publish all your open data on the Web
• Best = Link your data + models using a Linked Data approach
• Web architecture, Web-scale
EMRData
InternalPortal Data
Linked DataCloud
Open Government
Data
Social Media
Clinical Condition Specific
PhysiciansServicesLocations
DBpediaPub Med
NLM
CDCEPAUS
Census
FacebookTwitter
ClinicalOntology
BusinessOntology
Methodology
1. Define target population and clinical data from electronic medical record
2. Identify sources of open government data related to environmental, weather, and other variables related to chronic pulmonary disease exacerbations
3. Combine open content from NLM, PubMed, Medline to support education
4. Leverage a Linked Data approach, using Open Source and international data exchange standards (RDF)
5. Alert patient of possible hazardous conditions and recommend appropriate actions
Iterative Approach
• Initial POC delivered May 2012 (60 day sprint)
• EMR (anonymized)
• EPA air quality
• Doctors listing (spreadsheet)
• Demo’d at Health Datapalooza, Washington DC in June
Using EMR and Linked Open Data
to Manage Chronic
Asthma and COPD
Health DatapaloozaHealth Data Ini,a,ve Forum III
Health Data Ini,a,ve Forum III
Health Datapalooza
Pa$ents with chronic pulmonary disease that are educated and no$fied of adverse environmental, weather, and geographic condi$ons are . . .
Conceptual MODEL
be#er able to respond and proac/vely manage their condi/on.
Health Data Ini,a,ve Forum III
Health Datapalooza
Decrease in costly Emergency Department visits
Reduce hospital re-‐admissions aBer treatment
Improve self-‐care and medica$on compliance
Awareness of triggers and disease management
Value
MODEL
PROPOSITION
Big data ecosystem includes complex data
A phased approach to delivery of a successful Weather Health Explorer application is selecting both available and reliable data sources as inputs. It is for these reasons, authoritative government sources from organizations including the National Library of Medicine (NLM), National Oceanic and Atmospheric Association (NOAA) and the US Environment Protection Agency (EPA) have been selected for use in this project.
Health Data Ini,a,ve Forum III
Health Datapalooza
Leverage Linked
CDCEPAUS Census
DBpediaPub MedNLM
Web of Data
EMR
SMS
Web
SEMANTIC FRAMEWORK
DATA, OPEN SOURCE & STANDARDS
Callimachus is a Linked Data Management platform that takes full advantage of RDF and data driven navigation. Created with Web 2.0 developers in mind.Governments are providing citizens access to open government data; Corporates can information to the public, customers, suppliers, regulators, with timely information on the corporation; Research portals etc.
CurrentEPA
DataPatient
Admission Data by Date
Historic EPA Data at Admission
Today’s Asthma Forecast
Anticipate and Prevent
Progress Update
• June - Sept 2012
• Designed Weather Health Web application
• Identified data sources (NIH, NOAA, EPA)
• Created a Web based application with live data feeds from NIH, NOAA & EPA
• Hosted on the cloud using a linked data management system, Callimachus
NOAA EPA AQS EPA UV
NIH NIH
User
The NLM will function as the primary source for drug-related information. The NLM publishes multiple API’s that could be of use to this project but the most immediately beneficial will probably be one called DailyMed. DailyMed is an API that offers access to current Structured Product Label (SPL) information for drugs.
http://demo.3roundstones.net/sentara/home.docbook?view
Drug information may also be taken from a service called MedlinePlus - which is organized and distributed by the National Library of Medicine, National Institutes of Health, and the Department of Health and Human Services. Upgrades are currently being done to MedlinePlus which will include the ability to return an XML document as opposed to a search results page. This feature would be extremely useful and if fully functional, may make MedlinePlus the logical choice for primary drug information.
EBS - 50 GBM2.2XLarge
S3 - 50 GB
Additional attachedstorage
Periodic snapshots(backup)
Monitoring Service
Appl
icatio
n-le
vel
mon
itorin
gHT
TP/H
TTPS
Email/SMS
notifications
Adm
inistr
ation
Off-site backups
SNS
System-level
monitoring
Emai
l/SM
Sno
tifica
tions
Callimachus(application)
Public users
Hosted on cloud
In summary, Weather Health ... • Leverages internal and external structured data on the Web
• All data from authoritative sources
• Involves a combination of static and dynamic data
• Hosted on the cloud using AWS
• Created using a linked data management system
• Callimachus enables Web 2.0 internal or contract developers to combine data sources & quickly build a web UI for Web or mobile devices
The Weather Health application can also serve to warn patients of drug interactions or advising them on dosage. There is also opportunity for smaller modules within the application such as pill identification by using imprint data. This application was built using Callimachus, a data platform for data-driven applications. Callimachus allows Web 2.0 developers within Sentara or external developers to combine multiple data sources and quickly build a Web UI.
The basic architecture for the Weather Health solution involves a combination of both static (or pseudo-static) and dynamic data.
LUNCH BREAK!
Web of Data
• Resource oriented approach to data interoperability
• Callimachus Overview
• Maturity of ecosystem
• Development environments, reporting tools, databases, hosting, commercial support & training
• Next steps, an iterative approach
1970s 1980s 1990s
$ cat foo.txt | grep blah | sort
A neat little package Client-Server The Early Web
A History of Silos
Web of Data
Extending theUniversal Client
Expanding theUniversal Connection
Providing theUniversal Database
Explaining the Logic
Ubiquitous,reusable applications
The Next Great Leap
1970s 1980s 1990s 2000s
Code written
Dataformatted
Writing Business Applications
R&D | RDI
Requirements of The Informatics Landscape
vMust span the entire drug development lifecycleo and back (post-market surveillance to discovery)
vMust support large and very heterogeneous datao single nucleotide polymorphisms to countries
vWill change as new science emerges & new regulations come into playoMedline just under 1M articles/year
vMust be able to work with multiple, international regulatory bodiesoEmerging markets
vPartners, customers and collaborators will changeo and will have divergent technical aptitudes
vMust be able to interoperated with pre-competitive consortiaoCan they perform common tasks for the community
vMust be able to work with legacy datao Lots of unmined gems here!
Maximal Agility
Slide credit: Tom Plaster, PhD, AstraZeneca
Improving Internal Interoperability
Scientists, Clinicians, Informaticists can now freely interoperate as:
vThe PURL server provides a central identity management authority for resources that are of value (need to persist) across the enterprise. The Persistent URLs are used to connect resources found in multiple locations
vThe vocabulary server provides a way of harmonizing concepts across different domainsoWhere possible, public vocabularies are usedoWhere not, they’re extendedoWe don’t want to develop and maintain vocabularies
Slide credit: Tom Plaster, PhD, AstraZeneca
•Callimachus is a framework for data-driven applications based on Linked Data principles
•Callimachus allows Web developers to easily create data driven applications for the Web
• It is Open Source (FLOSS)
•http://callimachusproject.org
• Large and small vendors are involved in Linked Data
• From Oracle, IBM to 3 Round Stones
• Listing of active research projects & deployments See http://dir.w3.org/
• Best practices, see http://www.w3.org/2011/gld/charter
Tools & best practices?
W3C HCLS
vActivities:oContinue to develop high level (e.g. TMO) and architectural (e.g. SWAN)
vocabularies.oImplement proof-of-concept demonstrations and industry-ready code.oDocument guidelines to accelerate the adoption of the technology.oDisseminate information about the group's work at government, industry, academic
events and by participating in community initiatives.vUse Cases/DomainsoDrug DiscoveryoElectronic Lab NotebooksoComparator Arm Data oPatient Data Ownership oBiotech AcquisitionoSupply Chain AutomationoWeb Integration oBio-surveillance oCo-development
Reference: http://www.w3.org/blog/hcls/
The mission of the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG) is to develop, advocate for, and support the use of Semantic Web technologies across health care, life sciences, clinical
research and translational medicine
Slide credit: Tom Plaster, PhD, AstraZeneca
This work is Copyright © 2011-2012 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
This presentation is licensed under a Creative Commons BY-SA license, allowing you to share and remix its contents as long as you give us attribution and share alike.
top related