cs 157b: database management systems ii may 1 class meeting department of computer science san jose...
TRANSCRIPT
CS 157B: Database Management Systems IIMay 1 Class Meeting
Department of Computer ScienceSan Jose State University
Spring 2013Instructor: Ron Mak
www.cs.sjsu.edu/~mak
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
2
Managing Scientific Data
Four different case studies from my personal experiences during the past dozen years.
Collaborative Information Portal (CIP) NASA, 2002-2004 Mars Exploration Rovers Mission (MER)
System Health Information Portal (SHIP) NASA, 2004-2005
Shot Data Management Lawrence Livermore National Laboratory, 2006-2007 National Ignition Facility (NIF)
Smarter Planet Platform for Analysis and Simulation of Health (SPLASH) IBM Almaden Research Center, 2010-2012
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
3
Case Study 1:The Collaborative Information Portal (CIP)
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
4
Case Study 1: The Collaborative Information Portal
My role: Senior ScientistResearch Institute for Advanced Computer Science (RIACS) Architect and lead developer of the CIP
middleware, 2002-2003 Mission support at the NASA Ames
Research Center and JPL, 2004
Mission Control Center, JPLMars Exploration Rovers Mission
The Collaborative Information Portal (CIP) was a key ground-based application used by the NASA’s Mars Exploration Rovers (MER) mission.
Mission scientists, engineers, and researchers at JPL and around the world used CIP to access mission data in a secure and organized manner over the Internet.
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
5
Mars Exploration Rover Mission Twin robot geologists search for liquid water on Mars in the past.
Launched: June 10 and July 7, 2003 Landed: January 3 and 24, 2004 Duration: 90 days
Mission Center:Jet Propulsion Laboratory Pasadena, CA
But after over eightEarth years, one is stilloperating.
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
6
MER Mission Requirements
Time management
Data management
Personnelmanagement
Mission Geology Lab, JPL
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
7
Schedule viewer
Event horizon
Broadcast messages
Clocks
Tool tabs
The Collaborative Information Portal (CIP)
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
8
Data Product Navigator
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
9
Three–Tiered Enterprise Architecture
Client Java application (Swing)
Middleware Server Web services, Enterprise JavaBeans (EJB),
Java Message Service (JMS) Service-oriented architecture (SOA)
Data Repository Mission file servers (Unix) File monitor (Java application) Data loader (Java application) Database (Oracle)
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
10
Architecture Overview
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
11
CIP Middleware Services
User management Metadata Schedules
Mars and Earth time File and directory Message
Mission Control Center, JPL
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
12
Middleware Technologies
Enterprise JavaBeans (EJBs) to achieve reliability, scalabilty, security, platform independence, and standards. Stateless session beans Stateful session beans Message–driven beans
Web services to expose the remote methods of the Service Provider EJBs to the client applications.
Java Message Service (JMS) for synchronous and asynchronous messaging.
BEA WebLogic application server.
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
13
Case Study 2:The Systems Health Information Portal (SHIP)
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
14
Case Study 2: The Systems Health Information Portal
The Systems Health Information Portal (SHIP) constantly monitors and analyzes sensor data gathered from a manned NASA space vehicle.
My role: Project ScientistUniversity of California at Santa Cruz Architect, lead developer, and systems
integrator, 2004-2005
If there is a fault with the vehicle, SHIP quickly analyzes the situation and recommends corrective actions for the astronauts on board, even if contact with ground control is lost.
Life on board the International Space Station
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
15
SHIP Requirements
Access to disparate data sources Databases, files, live instrument streams, web services, web pages, programs
Multiple data formats Relational tables XML, text, and
binary files Proprietary and
legacy data formats
Generate and manage reports Fault analyses Prognostications Procedure manuals
Test bed International Space
Station (ISS) Rendezvous in low Earth orbit (LEO) of a manned space capsule with the main rocket engines
in preparation for a trip to the Moon or to Mars.
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
16
SHIP Architecture
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
17
SHIP Architecture
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
18
SHIP Architecture
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
19
SHIP Architecture
MatchingEngine
Casebase
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
20
Case-BasedReasoning
SHIP Architecture
MatchingEngine
Casebase
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
21
Case-BasedReasoning
SHIP Architecture
MatchingEngine
Casebase
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
22
Case-BasedReasoning
SHIP Architecture
MatchingEngine
Casebase
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
23
Case-BasedReasoning
SHIP Architecture
MatchingEngine
Casebase
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
24
Case-BasedReasoning
SHIP Architecture
Raw data Integrated information
Knowledge Analysis
MatchingEngine
Casebase
Action
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
25
Composite: Disparate Data Sources
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
26
Composite: Relational Table Joins
Join virtual tables created from disparate data sources.
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
27
SHIP Screen Shot
ISSInternational Space Station
PUIProgram Unique Identifier(part number)
PRACAProblem Report and Corrective Action
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
28
SHIP Summary
Problem Resolution and Corrective Action (PRACA) system for manned space vehicles Disparate data sources: live sensor feeds, archived legacy data, XML files,
Word documents, web pages, web services, databases, etc. Case-based reasoning and rules-based diagnoses Semantics-based knowledge management
Service-oriented architecture (SOA) Web-based and desktop client applications J2EE technologies Web services Enterprise information integration (EII) Application integration JavaServer Faces, BEA WebLogic, Composite Software Open source software
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
29
Case Study 3:Shot Data Management
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
30
Case Study 3: Shot Data Management
The National Ignition Facility (NIF) is a major laser-based fusion energy research project at the Lawrence Livermore National Laboratory.
My role: Enterprise Software Strategist Architect, catalyst, lead developer,
and systems integrator, 2006-2007 High security clearance (Level P)
Each simultaneous firing of its 192 powerful lasers at a BB-sized target generates gigabytes of data that Shot Data Management routes in near real time to the project scientists and researchers.
Target positioner
NIFThe National Ignition Facility
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
31
Shot Data Management Primary Services
Data provisioning Integration of disparate and
heterogeneous data sources Secure access and downloads Metadata-based queries
Data management Version management
and revision control Archive data from NIF’s
expected 30-year lifespan
Data marts Historical and trend analysis of
specific subject areas Support ad hoc data reporting
The target: A polished 2 mm capsulefilled with cryogenic hydrogen fuel.
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
32
Requirements
Application insulation Eliminate duplicate data Near real time
data access Reusable data services Decision support Quality control Hierarchical storage
management Data security Integrate legacy data silos Workflow management NIF target chamber
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
33
Architecture Overview
Industry standards Web services XML data sources JDBC and ODBC (database
interfaces) Java Message Service (JMS) BPEL (Business Process Execution
Language)
Java programming language SOA (Service-Oriented Architecture) Linux server clusters
Oracle RDBMS Oracle CMS
(Content Management System)
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
34
Visualizing Workflows
Large display of many nodes in multiple “swim lanes” Each node represents a unit of work to be performed. Each swim lane represents a category of work. Results flow from one node to the next based on rules.
The display is maintained in real time. Currently active nodes are highlighted.
SXI Hohlraum LEH Monitor Analysis – Rev 6-4-07
Instrument Analysis Diagnostic Analysis Campaign Analysis
Hoh
lraum
(TD
|Por
t|SX
I|???
)S
XI I
nteg
ratin
g C
amer
a(T
D|P
ort*
|SX
I|SC
CD
)
Feature Mapping (superimpose LEH
and perform line-outs)
Bad Pixel(Hot Pixels)
(Saturated Regions)
Correct CCD Instrument
Written by IPTType: IPT Production Code
Flat Field
Written by SchneiderType: RS Desktop Code
Separate Regions
Background Correction
Region Separation
Written by SchneiderType: RS Desktop Code
Map LEH Outline
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
35
Creating a Workflow
Create workflow diagrams using an IDE.
Compile workflow diagrams into BPEL scripts. Business Process Execution
Language (XML)
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
36
Shot Data Management Summary
Large, complex distributed system based on SOA Integration of legacy data silos
data warehouses relational databases content managers
Data archiving HSM 30 years of data
Workflows Shot approval and setup Configuration and calibration Results analysis and visualization
Installation of the target chamber
“This system saved 6 months off the software development schedule.”
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
37
Case Study 4
SPLASH(Smarter Planet Platform for Analysis and Simulation of Health)
A Platform for IntegratingHeterogeneous Simulation Models
My role: Research Staff Member, 2010-2012
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
38
Obesity
It’s an example of a REALLY hard problem in healthcare
Divide and conquer: Eat less! Exercise more! Obesity
Eatless!
Exercisemore!
What’s to decide?
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
39
It’s not that simple …
Lots of components, all interrelated!
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
40
Lever1 Lever2 Lever3
Policy “Flight Simulator”
Disease Progression Models
Personalized Medicine(Targeted interventions)
Healthcare Ecosystem
(Society)
Healthcare Ecosystem
(Society)
System Structure(Organizations)
System Structure(Organizations)
Delivery Operations(Processes)
Delivery Operations(Processes)
Clinical Practices(People)
Clinical Practices(People)
Careflow Models(Flow of Patients, Money,
Information)
Business Models
Socio-Economic Models
1
2
3
4
5
6
Multi-Level, End-to-End Modeling
Rouse, W. B. & Cortese, D. A. (2010). Introduction, in W. B. Rouse & D. A. Cortese (Eds.), Engineering the System of Healthcare Delivery. IOS Press.
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
41
The SPLASH Platform
SPLASH REPOSITORYSPLASH REPOSITORY
Model and Data Registration
Model and Data Registration
Model, Data, and Mapping DiscoveryModel, Data, and
Mapping Discovery
Model and Data Composition
Model and Data Composition
Model ExecutionModel Execution
Experiment ManagerExperiment Manager
SPLASH MODULES
Collaborative Reporting and Visualization
Collaborative Reporting and Visualization
Providemodels and
data
Usemodels and
data
Multi-disciplinary users
Multi-disciplinary users
Metadata- Model inputs and outputs- Access and execution- Data schemas- Model and data locations- Model and data semantics
Metadata- Model inputs and outputs- Access and execution- Data schemas- Model and data locations- Model and data semantics
contains
DataData
Composite Model
Composite Model
DataData
ModelModel
ModelModel
Mapping
Mapping
describes
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
42
Hypothetical Obesity Model Integration
Buying and Eating(Agent-based simulation
model)
Buying and Eating(Agent-based simulation
model)
Transportation(VISUM simulation model)
Transportation(VISUM simulation model)
BMI(Differential equation
model)
BMI(Differential equation
model)
Exercise(Stochastic discrete-event
simulation model)
Exercise(Stochastic discrete-event
simulation model)
Time alignmentand data mergeTime alignmentand data merge
GeospatialalignmentGeospatialalignment
Demographic data
Demographic data
GIS dataGIS data
Facility dataFacility data
Dataflow
Data source
Simulation model
Data Transformation
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
43
How Can We Integrate Models?
Pre-integrated Models already integrated
within a shared framework
Simulation model
Data transformer
Statistical model
Decision/optimization model
Dataset
Tightly coupled Models share a common API All data use common predetermined
formats
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
44
How Can We Integrate Models?
Tightly coupled with data transformations Output from one model is
transformed to be the input for the next model
Simulation model
Data transformer
Statistical model
Decision/optimization model
Dataset
Loose coupling through data exchange Independently developed models File and database I/O, web service calls Leverage existing work Facilitate collaboration SPLASH
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
45
Model Integration as a Scientific Workflow
Kepler Scientific Workflow System Drag-and-drop graphical editor to design workflows Automatically runs simulation models and data transformations Extensible open-source software developed by U.C. Davis and U.C. Berkeley
Obesity Workflow
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
46
SPLASH ActorsObesity Workflow
Data actor
Model actor
Mapping actor
Visualization actor
Data actors: input and output files, databases, web services, etc. Model actors: simulation, optimization, statistical models Mapping actors: data transformations, time and space alignment Visualization actors: graphs, reports, etc.
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
47
The Clio++ Mapping Actor
SPLASH automatically checks the mapping actor’s source and destination links at design time when you open the mapping actor SPLASH loads the source and target schemas into Clio++ Ready for you to map source fields to target fields
Clio++
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
48
Clio++ Schema MappingClio++ New_Job0.mapjob (*)
Manually map output fields from the source schemas to the corresponding input fields of the target schema
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
49
Execution and Results
Distributed, parallel execution Run simulation models and data transformation scripts
Final results as reports and graphs for analysis
Obesity Workflow
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
50
Sample Final Results
© 2011 IBM Corporation
At tick 20:Open a new healthy foods store in a neighborhood.
At tick 20:Open a new healthy foods store in a neighborhood.
Without traffic modelWithout traffic model Including traffic modelIncluding traffic model
* Many assumptions, sample only, your mileage may vary …* Many assumptions, sample only, your mileage may vary …
BMI by rich/poor BMI by rich/poor
richpoor
richpoor
Department of Computer ScienceSpring 2013: May 1
CS 157B: Database Management Systems II© R. Mak
51
It’s All about Collaboration
The SPLASH platform brings domain experts together Each expert contributes a model for the integration Designing workflows forces system-level thinking
No decision silos! SPLASH provides a common platform
Experts can talk to each other!
Discover new insights into a really hard problem What models do we need for the workflow? What experiments should we run?
How does changing a model’s input affect the final result? Make better-informed policy decisions
A platform for many domains,not just healthcare!