computational infrastructure for policy informatics

30
1 Computational Infrastructure for Policy Informatics Policy Informatics in an Interdependent World Workshop Washington DC September 13 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 http://grids.ucs.indiana.edu/ptliupages/ presentations/

Upload: zasha

Post on 13-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Computational Infrastructure for Policy Informatics. Policy Informatics in an Interdependent World Workshop Washington DC September 13 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computational Infrastructure for Policy Informatics

11

Computational Infrastructure for Policy Informatics

Policy Informatics in an Interdependent World Workshop

Washington DC September 13 2007

Geoffrey FoxComputer Science, Informatics, Physics

Pervasive Technology LaboratoriesIndiana University Bloomington IN 47401

http://grids.ucs.indiana.edu/ptliupages/presentations/ [email protected] http://www.infomall.org

Page 2: Computational Infrastructure for Policy Informatics

2222

e-moreorlessanything ‘e-Science is about global collaboration in key areas of science,

and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology

e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research

Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world.

This generalizes to e-moreorlessanything including presumably e-Policyinformatics

A deluge of data of unprecedented and inevitable size must be managed and understood.

People (see Web 2.0), computers, data and instruments must be linked.

On demand assignment of experts, computers, networks and storage resources must be supported

Page 3: Computational Infrastructure for Policy Informatics

3333

Role of Cyberinfrastructure Cyberinfrastructure is infrastructure that supports

distributed science (e-Science)– data, people, computers Exploits Internet technology (Web2.0) adding (via Grid

technology) management, security, supercomputers etc. It has two aspects: parallel – low latency (microseconds)

between nodes and distributed – highish latency (milliseconds) between nodes

Parallel needed to get high performance on individual large simulations, data analysis etc.; must decompose problem

Distributed aspect integrates already distinct components – especially natural for data

Cyberinfrastructure is in general a distributed collection of parallel systems

Cyberinfrastructure is made of services (originally Web services) that are “just” programs or data sources packaged for distributed access

Page 4: Computational Infrastructure for Policy Informatics

44

Structure of Cyberinfrastructure Distributed software systems are being “revolutionized” by

developments from e-commerce, e-Science and the consumer Internet. There is rapid progress in technology families termed “Web services”, “Grids” and “Web 2.0”

The emerging distributed system picture is of distributed services with advertised interfaces but opaque implementations communicating by streams of messages over a variety of protocols

• Complete systems are built by combining either services or predefined/pre-existing collections of services together to achieve new capabilities

As well as Internet/Communication revolutions (distributed systems), multicore chips will likely be hugely important (parallel systems)

Industry not academia is leading innovation in these technologies

Page 5: Computational Infrastructure for Policy Informatics

55

Policy Informatics Infrastructure The Party Line approach is clear – one creates a

Cyberinfrastructure consisting of distributed services accessed by portals/gadgets/gateways/RSS feeds

Services include:• “original data”

• Transformations or filters implementing DIKW (Data Information Knowledge Wisdom) pipeline

• Final “Decision Support” step converting wisdom into action

• Generic services such as security, profiles etc.

Some filters could correspond to large simulations Infrastructure will be set up as a System of Systems (Grids of

Grids)• Services and/or Grids just accept some form of DIKW and produce

another form of DIKW

• “Original data” has no explicit input; just output

Page 6: Computational Infrastructure for Policy Informatics

6

Database

SS

SS

SS

SS

SS

SS

SS

SS

SS

SS

FS

FS

FS

FS

FS

FS

FS

FS FS

FS

FS

FS

FS

FS

FS

FS

FS FS

FS

FS

PortalFS

OS

OS

OS

OS

OS

OS

OS

OS

OS

OS

OS

OS

MD

MD

MD

MD

MD

MD

MD

MD

MD

MetaDataFilter Service

Sensor Service

OtherService

AnotherGrid

Raw Data Data Information Knowledge Wisdom

Decisions

SS

SS

AnotherService

AnotherService

SSAnother

Grid SS

AnotherGrid

SS

SS

SS

SS

SS

SS

SS

SS

FS

Inter-Service Messages

Page 7: Computational Infrastructure for Policy Informatics

77

Information Management/Processing Diagram describes e-Science, Military Command and Control

and perhaps Policy Informatics Data Information Knowledge Wisdom transformation (SOAP or just RSS) messages transport information expressed

in a semantically rich fashion between sources and services that enhance and transform information so that complete system provides• Semantic Web technologies like RDF and OWL might help us

to have rich expressivity but they might be too complicated We are meant to build application specific information

management/transformation systems for each domain • Each domain has specific services/standards (for API’s and Information)

and will use generic services (like R for datamining) and standards (RDF, WSDL)

• What is PIML Policy Informatics Markup Language?• Standards made before consensus or not observant of technology progress

are dubious (cf. HLA in simulation or many grid standards)

Page 8: Computational Infrastructure for Policy Informatics

88

Too much Computing? Historically one has tried to increase computing capabilities by

• Optimizing performance of codes

• Exploiting all possible CPU’s such as Graphics co-processors and “idle cycles”

• Making central computers available such as NSF/DoE/DoD supercomputer networks

Next Crisis in technology area will be the opposite problem – commodity chips will be 32-128way parallel in 5 years time and we currently have no idea how to use them – especially on clients• Only 2 releases of standard software (e.g. Office) in this time span

Gaming and Generalized decision support (data mining) are two obvious ways of using these cycles• Intel RMS analysis

• Note even cell phones will be multicore

“Too much data” matched to “Too much computing” but implications involved rather different

Page 9: Computational Infrastructure for Policy Informatics

99

Intel’s Projection

Page 10: Computational Infrastructure for Policy Informatics

10Pradeep K. Dubey, [email protected]

Tomorrow

What is …? What if …?Is it …?

Recognition Mining Synthesis

Create a model instance

RMS: Recognition Mining SynthesisRMS: Recognition Mining Synthesis

Model-basedmultimodalrecognition

Find a modelinstance

Model

Real-time analytics ondynamic, unstructured,

multimodal datasets

Photo-realism andphysics-based

animation

TodayModel-less Real-time streaming and

transactions on static – structured datasets

Very limited realism

Page 11: Computational Infrastructure for Policy Informatics

11Pradeep K. Dubey, [email protected]

What is a tumor? Is there a tumor here? What if the tumor progresses?

It is all about dealing efficiently with complex multimodal datasetsIt is all about dealing efficiently with complex multimodal datasets

Recognition Mining Synthesis

Images courtesy: http://splweb.bwh.harvard.edu:8000/pages/images_movies.html

Page 12: Computational Infrastructure for Policy Informatics

12 Intel’s Application Stack

Page 13: Computational Infrastructure for Policy Informatics

1313

What should we do? There will be high quality parallel data mining algorithms

• Speech Recognition, Text and multimedia search and browsers• New generation of desktop aides • What are synergies to “Personal aides in an information rich world” (future of

PC?) and Policy Informatics? What filters (data mining) does policy informatics need? As computing free, focus on identifying information/knowledge/wisdom

needed (there is probably too much data but not so much wisdom in DIKW pipeline)• We should use supercomputer/computer services but Information services more

important and less “controversial” Identify standards for data and data-mining API’s Set up distributed Policy Informatics Services Use Web 2.0 (as it makes things easier) not current Grids (which makes

things harder)• Build a “Programmable Policy Informatics Web”’• Emphasize Simplicity• Is “Secrecy” important and in fact viable?

Should we care just about “original data” or also about the whole pipeline DIKW?

Page 14: Computational Infrastructure for Policy Informatics

1414

Web 2.0 Mashups and APIs

http://www.programmableweb.com/apis has (Sept 12 2007) 2312 Mashups and 511 Web 2.0 APIs and with GoogleMaps the most often used in Mashups

Mashups are called workflow in Grid arena

Page 15: Computational Infrastructure for Policy Informatics

1515

The List of Web 2.0 API’s

Each site has API and its features

Divided into broad categories

Only a few used a lot (49 API’s used in 10 or more mashups)

RSS feed of new APIs Amazon S3 growing

in popularity

Page 16: Computational Infrastructure for Policy Informatics

1616

Spare Slides

Page 17: Computational Infrastructure for Policy Informatics

1717

Grid Service Philosophy I Services receive data in SOAP messages, manipulate it

and produce transformed data as further messages Knowledge is created from information by services

• Information is created from data by services Semantic Grid comes from building metadata rich

systems of services Meta-data is carried in SOAP messages The Grid enhances Web services with semantically rich

system and application specific management One must exploit and work around the different

approaches to meta-data (state) and their manipulation in Web Services

Page 18: Computational Infrastructure for Policy Informatics

1818

Grid Service Philosophy II There are a horde of support services supplying security,

collaboration, database access, user interfaces The support services are either associated with system or

application where the former are WS-* and GS-* which implicitly or explicitly define many support services

There are generalized filter services which are applications that accept messages and produce new messages with some data derived from that in input• Simulations (including PDE’s and reactive systems)• Data-mining• Transformations• Agents• Reasoning • Decision making Tools are all termed filters here

Agent Systems are a special case of Grids Peer-to-peer systems can be built as a Grid with particular

discovery and messaging strategies

Page 19: Computational Infrastructure for Policy Informatics

1919

Grid Service Philosophy III Filters can be a workflow which means they are

“just collections of other simpler services” Grids are distributed systems that accept

distributed messages and produce distributed result messages

A service or a workflow is a special case of a Grid A collection of services on a multi-core chip is a

Grid Sensors or Instruments are “managed” by services;

they may accept non SOAP control messages and produce data as messages (that are not usually SOAP)

Page 20: Computational Infrastructure for Policy Informatics

20

Virtual Observatory Astronomy GridIntegrate Experiments

Radio Far-Infrared Visible

Visible + X-ray

Dust Map

Galaxy Density Map

Page 21: Computational Infrastructure for Policy Informatics

2121

Service or Web service Approach One uses GML, CML etc. to define the data in a system and one

uses services to capture “methods” or “programs” In eScience, important services fall in three classes

• Simulations• Data access, storage, federation, discovery• Filters for data mining and manipulation

Services use something like WSDL (Web Service Definition Language) to define interoperable interfaces (see OPAL talk!)

WSDL establishes a “contract” independent of implementation between two services or a service and a client

Services should be loosely coupled which normally means they are coarse grain

Services will be composed (linked together) by mashups (typically scripts) or workflow (often XML – BPEL)

Software Engineering and Interoperability/Standards are closely related

Page 22: Computational Infrastructure for Policy Informatics

2222

Philosophy of Web Service Grids Much of Distributed Computing was built by natural

extensions of computing models developed for sequential machines

This leads to the distributed object (DO) model represented by Java and CORBA• RPC (Remote Procedure Call) or RMI (Remote Method

Invocation) for Java Key people think this is not a good idea as it scales badly

and ties distributed entities together too tightly• Distributed Objects Replaced by Services

Note CORBA was considered too complicated in both organization and proposed infrastructure• and Java was considered as “tightly coupled to Sun”• So there were other reasons to discard

Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages

Page 23: Computational Infrastructure for Policy Informatics

2323

Web services Web Services build

loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles.

Web Services interact by exchanging messages in SOAP format

The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.

Databases

Humans

ProgramsComputational resources

Devices

reso

urce

s

BP

EL,

Jav

a, .N

ET

serv

ice

logi

c

<env:Envelope> <env:Header> ... </env:header> <env:Body> ... </env:Body></env:Envelope> m

essa

ge p

roce

ssin

g

SO

AP

and

WS

DL

SOAP messages

Page 24: Computational Infrastructure for Policy Informatics

2424

A typical Web Service In principle, services can be in any language (Fortran .. Java ..

Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)

The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python

PaymentCredit Card

WarehouseShippingcontrol

WSDL interfaces

WSDL interfaces

Security CatalogPortalService

Web Services

Web Services

Page 25: Computational Infrastructure for Policy Informatics

25

The Grid and Web Service Institutional Hierarchy

OGSA GS-*and some WS-*GGF/W3C/….XGSP (Collab)

WS-* fromOASIS/W3C/Industry

Apache Axis.NET etc.

Must set standards to get interoperability

2: System Services and Features(WS-* from OASIS/W3C/Industry)

Handlers like WS-RM, Security, UDDI Registry

3: Generally Useful Services and Features(OGSA and other GGF, W3C) Such as

“Collaborate”, “Access a Database” or “Submit a Job”

4: Application or Community of Interest (CoI)Specific Services such as “Map Services”, “Run

BLAST” or “Simulate a Missile”

1: Container and Run Time (Hosting) Environment (Apache Axis, .NET etc.)

XBMLXTCE VOTABLECMLCellML

Page 26: Computational Infrastructure for Policy Informatics

26

The Ten areas covered by the 60 core WS-* Specifications

WS-* Specification Area Examples

1: Core Service Model XML, WSDL, SOAP

2: Service Internet WS-Addressing, WS-MessageDelivery; Reliable Messaging WSRM; Efficient Messaging MOTM

3: Notification WS-Notification, WS-Eventing (Publish-Subscribe)

4: Workflow and Transactions BPEL, WS-Choreography, WS-Coordination

5: Security WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation

6: Service Discovery UDDI, WS-Discovery

7: System Metadata and State WSRF, WS-MetadataExchange, WS-Context

8: Management WSDM, WS-Management, WS-Transfer

9: Policy and Agreements WS-Policy, WS-Agreement

10: Portals and User Interfaces WSRP (Remote Portlets)

Page 27: Computational Infrastructure for Policy Informatics

27

Activities in Global Grid Forum Working Groups

GGF Area GS-* and OGSA Standards Activities

1: Architecture High Level Resource/Service Naming (level 2 of slide 6),Integrated Grid Architecture

2: Applications Software Interfaces to Grid, Grid Remote Procedure Call, Checkpointing and Recovery, Interoperability to Job Submittal services, Information Retrieval,

3: Compute Job Submission, Basic Execution Services, Service Level Agreements for Resource use and reservation, Distributed Scheduling

4: Data Database and File Grid access, Grid FTP, Storage Management, Data replication, Binary data specification and interface, High-level publish/subscribe, Transaction management

5: Infrastructure Network measurements, Role of IPv6 and high performance networking, Data transport

6: Management Resource/Service configuration, deployment and lifetime, Usage records and access, Grid economy model

7: Security Authorization, P2P and Firewall Issues, Trusted Computing

Page 28: Computational Infrastructure for Policy Informatics

28

Two-level Programming I• The Web Service (Grid) paradigm implicitly assumes a

two-level Programming Model• We make a Service (same as a “distributed object” or

“computer program” running on a remote computer) using conventional technologies– C++ Java or Fortran Monte Carlo module

– Data streaming from a sensor or Satellite

– Specialized (JDBC) database access

• Such services accept and produce data from users files and databases

• The Grid is built by coordinating such services assuming we have solved problem of programming the service

Service Data

Page 29: Computational Infrastructure for Policy Informatics

2929

Two-level Programming II The Grid is discussing the composition of distributed

services with the runtime interfaces to Grid as opposed to UNIX pipes/data streams

Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs

Such interpretative environments are the single processor analog of Grid Programming

Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately

Service1 Service2

Service3 Service4

Page 30: Computational Infrastructure for Policy Informatics

3030

Grid Workflow Data Assimilation in Earth Science Grid services triggered by abnormal events and controlled by workflow process real

time data from radar and high resolution simulations for tornado forecasts

Typical graphical interface to service composition