computational infrastructure for policy informatics
DESCRIPTION
Computational Infrastructure for Policy Informatics. Policy Informatics in an Interdependent World Workshop Washington DC September 13 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 - PowerPoint PPT PresentationTRANSCRIPT
11
Computational Infrastructure for Policy Informatics
Policy Informatics in an Interdependent World Workshop
Washington DC September 13 2007
Geoffrey FoxComputer Science, Informatics, Physics
Pervasive Technology LaboratoriesIndiana University Bloomington IN 47401
http://grids.ucs.indiana.edu/ptliupages/presentations/ [email protected] http://www.infomall.org
2222
e-moreorlessanything ‘e-Science is about global collaboration in key areas of science,
and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology
e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research
Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world.
This generalizes to e-moreorlessanything including presumably e-Policyinformatics
A deluge of data of unprecedented and inevitable size must be managed and understood.
People (see Web 2.0), computers, data and instruments must be linked.
On demand assignment of experts, computers, networks and storage resources must be supported
3333
Role of Cyberinfrastructure Cyberinfrastructure is infrastructure that supports
distributed science (e-Science)– data, people, computers Exploits Internet technology (Web2.0) adding (via Grid
technology) management, security, supercomputers etc. It has two aspects: parallel – low latency (microseconds)
between nodes and distributed – highish latency (milliseconds) between nodes
Parallel needed to get high performance on individual large simulations, data analysis etc.; must decompose problem
Distributed aspect integrates already distinct components – especially natural for data
Cyberinfrastructure is in general a distributed collection of parallel systems
Cyberinfrastructure is made of services (originally Web services) that are “just” programs or data sources packaged for distributed access
44
Structure of Cyberinfrastructure Distributed software systems are being “revolutionized” by
developments from e-commerce, e-Science and the consumer Internet. There is rapid progress in technology families termed “Web services”, “Grids” and “Web 2.0”
The emerging distributed system picture is of distributed services with advertised interfaces but opaque implementations communicating by streams of messages over a variety of protocols
• Complete systems are built by combining either services or predefined/pre-existing collections of services together to achieve new capabilities
As well as Internet/Communication revolutions (distributed systems), multicore chips will likely be hugely important (parallel systems)
Industry not academia is leading innovation in these technologies
55
Policy Informatics Infrastructure The Party Line approach is clear – one creates a
Cyberinfrastructure consisting of distributed services accessed by portals/gadgets/gateways/RSS feeds
Services include:• “original data”
• Transformations or filters implementing DIKW (Data Information Knowledge Wisdom) pipeline
• Final “Decision Support” step converting wisdom into action
• Generic services such as security, profiles etc.
Some filters could correspond to large simulations Infrastructure will be set up as a System of Systems (Grids of
Grids)• Services and/or Grids just accept some form of DIKW and produce
another form of DIKW
• “Original data” has no explicit input; just output
6
Database
SS
SS
SS
SS
SS
SS
SS
SS
SS
SS
FS
FS
FS
FS
FS
FS
FS
FS FS
FS
FS
FS
FS
FS
FS
FS
FS FS
FS
FS
PortalFS
OS
OS
OS
OS
OS
OS
OS
OS
OS
OS
OS
OS
MD
MD
MD
MD
MD
MD
MD
MD
MD
MetaDataFilter Service
Sensor Service
OtherService
AnotherGrid
Raw Data Data Information Knowledge Wisdom
Decisions
SS
SS
AnotherService
AnotherService
SSAnother
Grid SS
AnotherGrid
SS
SS
SS
SS
SS
SS
SS
SS
FS
Inter-Service Messages
77
Information Management/Processing Diagram describes e-Science, Military Command and Control
and perhaps Policy Informatics Data Information Knowledge Wisdom transformation (SOAP or just RSS) messages transport information expressed
in a semantically rich fashion between sources and services that enhance and transform information so that complete system provides• Semantic Web technologies like RDF and OWL might help us
to have rich expressivity but they might be too complicated We are meant to build application specific information
management/transformation systems for each domain • Each domain has specific services/standards (for API’s and Information)
and will use generic services (like R for datamining) and standards (RDF, WSDL)
• What is PIML Policy Informatics Markup Language?• Standards made before consensus or not observant of technology progress
are dubious (cf. HLA in simulation or many grid standards)
88
Too much Computing? Historically one has tried to increase computing capabilities by
• Optimizing performance of codes
• Exploiting all possible CPU’s such as Graphics co-processors and “idle cycles”
• Making central computers available such as NSF/DoE/DoD supercomputer networks
Next Crisis in technology area will be the opposite problem – commodity chips will be 32-128way parallel in 5 years time and we currently have no idea how to use them – especially on clients• Only 2 releases of standard software (e.g. Office) in this time span
Gaming and Generalized decision support (data mining) are two obvious ways of using these cycles• Intel RMS analysis
• Note even cell phones will be multicore
“Too much data” matched to “Too much computing” but implications involved rather different
99
Intel’s Projection
10Pradeep K. Dubey, [email protected]
Tomorrow
What is …? What if …?Is it …?
Recognition Mining Synthesis
Create a model instance
RMS: Recognition Mining SynthesisRMS: Recognition Mining Synthesis
Model-basedmultimodalrecognition
Find a modelinstance
Model
Real-time analytics ondynamic, unstructured,
multimodal datasets
Photo-realism andphysics-based
animation
TodayModel-less Real-time streaming and
transactions on static – structured datasets
Very limited realism
11Pradeep K. Dubey, [email protected]
What is a tumor? Is there a tumor here? What if the tumor progresses?
It is all about dealing efficiently with complex multimodal datasetsIt is all about dealing efficiently with complex multimodal datasets
Recognition Mining Synthesis
Images courtesy: http://splweb.bwh.harvard.edu:8000/pages/images_movies.html
12 Intel’s Application Stack
1313
What should we do? There will be high quality parallel data mining algorithms
• Speech Recognition, Text and multimedia search and browsers• New generation of desktop aides • What are synergies to “Personal aides in an information rich world” (future of
PC?) and Policy Informatics? What filters (data mining) does policy informatics need? As computing free, focus on identifying information/knowledge/wisdom
needed (there is probably too much data but not so much wisdom in DIKW pipeline)• We should use supercomputer/computer services but Information services more
important and less “controversial” Identify standards for data and data-mining API’s Set up distributed Policy Informatics Services Use Web 2.0 (as it makes things easier) not current Grids (which makes
things harder)• Build a “Programmable Policy Informatics Web”’• Emphasize Simplicity• Is “Secrecy” important and in fact viable?
Should we care just about “original data” or also about the whole pipeline DIKW?
1414
Web 2.0 Mashups and APIs
http://www.programmableweb.com/apis has (Sept 12 2007) 2312 Mashups and 511 Web 2.0 APIs and with GoogleMaps the most often used in Mashups
Mashups are called workflow in Grid arena
1515
The List of Web 2.0 API’s
Each site has API and its features
Divided into broad categories
Only a few used a lot (49 API’s used in 10 or more mashups)
RSS feed of new APIs Amazon S3 growing
in popularity
1616
Spare Slides
1717
Grid Service Philosophy I Services receive data in SOAP messages, manipulate it
and produce transformed data as further messages Knowledge is created from information by services
• Information is created from data by services Semantic Grid comes from building metadata rich
systems of services Meta-data is carried in SOAP messages The Grid enhances Web services with semantically rich
system and application specific management One must exploit and work around the different
approaches to meta-data (state) and their manipulation in Web Services
1818
Grid Service Philosophy II There are a horde of support services supplying security,
collaboration, database access, user interfaces The support services are either associated with system or
application where the former are WS-* and GS-* which implicitly or explicitly define many support services
There are generalized filter services which are applications that accept messages and produce new messages with some data derived from that in input• Simulations (including PDE’s and reactive systems)• Data-mining• Transformations• Agents• Reasoning • Decision making Tools are all termed filters here
Agent Systems are a special case of Grids Peer-to-peer systems can be built as a Grid with particular
discovery and messaging strategies
1919
Grid Service Philosophy III Filters can be a workflow which means they are
“just collections of other simpler services” Grids are distributed systems that accept
distributed messages and produce distributed result messages
A service or a workflow is a special case of a Grid A collection of services on a multi-core chip is a
Grid Sensors or Instruments are “managed” by services;
they may accept non SOAP control messages and produce data as messages (that are not usually SOAP)
20
Virtual Observatory Astronomy GridIntegrate Experiments
Radio Far-Infrared Visible
Visible + X-ray
Dust Map
Galaxy Density Map
2121
Service or Web service Approach One uses GML, CML etc. to define the data in a system and one
uses services to capture “methods” or “programs” In eScience, important services fall in three classes
• Simulations• Data access, storage, federation, discovery• Filters for data mining and manipulation
Services use something like WSDL (Web Service Definition Language) to define interoperable interfaces (see OPAL talk!)
WSDL establishes a “contract” independent of implementation between two services or a service and a client
Services should be loosely coupled which normally means they are coarse grain
Services will be composed (linked together) by mashups (typically scripts) or workflow (often XML – BPEL)
Software Engineering and Interoperability/Standards are closely related
2222
Philosophy of Web Service Grids Much of Distributed Computing was built by natural
extensions of computing models developed for sequential machines
This leads to the distributed object (DO) model represented by Java and CORBA• RPC (Remote Procedure Call) or RMI (Remote Method
Invocation) for Java Key people think this is not a good idea as it scales badly
and ties distributed entities together too tightly• Distributed Objects Replaced by Services
Note CORBA was considered too complicated in both organization and proposed infrastructure• and Java was considered as “tightly coupled to Sun”• So there were other reasons to discard
Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages
2323
Web services Web Services build
loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles.
Web Services interact by exchanging messages in SOAP format
The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.
Databases
Humans
ProgramsComputational resources
Devices
reso
urce
s
BP
EL,
Jav
a, .N
ET
serv
ice
logi
c
<env:Envelope> <env:Header> ... </env:header> <env:Body> ... </env:Body></env:Envelope> m
essa
ge p
roce
ssin
g
SO
AP
and
WS
DL
SOAP messages
2424
A typical Web Service In principle, services can be in any language (Fortran .. Java ..
Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)
The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python
PaymentCredit Card
WarehouseShippingcontrol
WSDL interfaces
WSDL interfaces
Security CatalogPortalService
Web Services
Web Services
25
The Grid and Web Service Institutional Hierarchy
OGSA GS-*and some WS-*GGF/W3C/….XGSP (Collab)
WS-* fromOASIS/W3C/Industry
Apache Axis.NET etc.
Must set standards to get interoperability
2: System Services and Features(WS-* from OASIS/W3C/Industry)
Handlers like WS-RM, Security, UDDI Registry
3: Generally Useful Services and Features(OGSA and other GGF, W3C) Such as
“Collaborate”, “Access a Database” or “Submit a Job”
4: Application or Community of Interest (CoI)Specific Services such as “Map Services”, “Run
BLAST” or “Simulate a Missile”
1: Container and Run Time (Hosting) Environment (Apache Axis, .NET etc.)
XBMLXTCE VOTABLECMLCellML
26
The Ten areas covered by the 60 core WS-* Specifications
WS-* Specification Area Examples
1: Core Service Model XML, WSDL, SOAP
2: Service Internet WS-Addressing, WS-MessageDelivery; Reliable Messaging WSRM; Efficient Messaging MOTM
3: Notification WS-Notification, WS-Eventing (Publish-Subscribe)
4: Workflow and Transactions BPEL, WS-Choreography, WS-Coordination
5: Security WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation
6: Service Discovery UDDI, WS-Discovery
7: System Metadata and State WSRF, WS-MetadataExchange, WS-Context
8: Management WSDM, WS-Management, WS-Transfer
9: Policy and Agreements WS-Policy, WS-Agreement
10: Portals and User Interfaces WSRP (Remote Portlets)
27
Activities in Global Grid Forum Working Groups
GGF Area GS-* and OGSA Standards Activities
1: Architecture High Level Resource/Service Naming (level 2 of slide 6),Integrated Grid Architecture
2: Applications Software Interfaces to Grid, Grid Remote Procedure Call, Checkpointing and Recovery, Interoperability to Job Submittal services, Information Retrieval,
3: Compute Job Submission, Basic Execution Services, Service Level Agreements for Resource use and reservation, Distributed Scheduling
4: Data Database and File Grid access, Grid FTP, Storage Management, Data replication, Binary data specification and interface, High-level publish/subscribe, Transaction management
5: Infrastructure Network measurements, Role of IPv6 and high performance networking, Data transport
6: Management Resource/Service configuration, deployment and lifetime, Usage records and access, Grid economy model
7: Security Authorization, P2P and Firewall Issues, Trusted Computing
28
Two-level Programming I• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model• We make a Service (same as a “distributed object” or
“computer program” running on a remote computer) using conventional technologies– C++ Java or Fortran Monte Carlo module
– Data streaming from a sensor or Satellite
– Specialized (JDBC) database access
• Such services accept and produce data from users files and databases
• The Grid is built by coordinating such services assuming we have solved problem of programming the service
Service Data
2929
Two-level Programming II The Grid is discussing the composition of distributed
services with the runtime interfaces to Grid as opposed to UNIX pipes/data streams
Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs
Such interpretative environments are the single processor analog of Grid Programming
Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately
Service1 Service2
Service3 Service4
3030
Grid Workflow Data Assimilation in Earth Science Grid services triggered by abnormal events and controlled by workflow process real
time data from radar and high resolution simulations for tornado forecasts
Typical graphical interface to service composition