autonomic grid computing: concepts, infrastructure and applications

47
1 Autonomic Grid Computing: Concepts, Infrastructure and Applications The Applied Software Systems Laboratory ECE/CAIP, Rutgers University htt // i t d /TASSL http://www.caip.rutgers.edu/TASSL (Ack: NSF, DoE, NIH) Outline Pervasive Grid Environments - Unprecedented Opportunities Pervasive Grid Environments - Unprecedented Challenges Autonomic Grid Computing Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid Environments Autonomic Applications in Pervasive Grid Environments An Illustrative Application Concluding Remarks

Upload: others

Post on 12-Sep-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Autonomic Grid Computing: Concepts, Infrastructure and Applications

1

Autonomic Grid Computing: Concepts, Infrastructure and Applications

The Applied Software Systems LaboratoryECE/CAIP, Rutgers University

htt // i t d /TASSLhttp://www.caip.rutgers.edu/TASSL

(Ack: NSF, DoE, NIH)

Outline

• Pervasive Grid Environments - Unprecedented Opportunities

• Pervasive Grid Environments - Unprecedented Challenges

• Autonomic Grid Computing

• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid EnvironmentsAutonomic Applications in Pervasive Grid Environments

• An Illustrative Application

• Concluding Remarks

Page 2: Autonomic Grid Computing: Concepts, Infrastructure and Applications

2

Grid Computing – The Hype!

Grid ComputingBy M. Mitchell WaldropMay 2002

Hook enough computers together and what do you get? A new kind ofutility that offers supercomputer processing on tap.

Is Internet history about to repeat itself?

Defining Grid Computing …

• “… a concept, a network, a work in progress, part hype and part reality, and it’s increasingly capturing the attention of the computing community …” A. Applewhite, IEEE DS-Online

• “ grids are networks for computation – they are thinking number-crunching entities… grids are networks for computation they are thinking, number crunching entities. Like a decentralized nervous system, grids consist of high-end computers, servers, workstations, storage systems, and databases that work in tandem across private and public networks …” O. Malik, Red Herring

• “… a kind of hyper network that links computers and data storage owned by different groups so that they can share computing power…” USA Today

• ‘The Matrix’ crossed with ‘Minority Report’ … D. Metcalfe (quoted in Newsweek)• “… use clusters of personal computers, servers or other machines. They link together

to tackle complex calculations In part grid computing lets companies harness theirto tackle complex calculations. In part, grid computing lets companies harness their unused computing power, or processing cycles, to create a type of supercomputer …”J. Bonasia, Investor’s Business Daily

• “… grid computing links far-flung computers, databases, and scientific instruments over the public internet or a virtual private network and promises IT power on demand…… All a user has to do is submit a calculation to a network of computers linked by grid-computing middleware. The middleware polls a directory of available machines to see which have the capacity to handle the request fastest…” A. Ricadela, Information Week

Page 3: Autonomic Grid Computing: Concepts, Infrastructure and Applications

3

The Grid Vision

• Imagine a world– in which computational power (resources, services, data, etc.)

is as readily available as electrical powery p– in which computational services make this power available to

users with differing levels of expertise in diverse areas– in which these services can interact to perform specified tasks

efficiently and securely with minimum of human intervention• on-demand, ubiquitous access to computing, data, and services• new capabilities constructed dynamically and transparently from

distributed services– a large part of this vision was originally proposed by Fenando

Corbato (The Multics Project, 1965, www.multicians.org)

Grid Idea By A Simple Analogy

Some power stations dispersedOne consumer dispersed everywhere produce the electrical power

The produced power is

distributed over a power network

One consumer wants to access to

that power Now the user is able to access to

the power gridHe/she comes to an agreement with the

electrical societyThe electrical

society provides for a new socket in which the user

can plug

• The user:– Does not need to know anything about what stays beyond the socket.– Can absorb all the power he wants according to the agreement

• The power society– Can modify production technologies at any moment– Manages the power network as it wants– Defines terms and conditions of the agreement

Ack. F. Scibilia

Page 4: Autonomic Grid Computing: Concepts, Infrastructure and Applications

4

In the same way . . .

Some computing farms produce the computing

Computing power is made available over

the Internet

One user wants to access to intensive

• The user:

the computing power

computational power

He/she comes to an agreement with

some society that offers grid services

Now the user accesses to grid facilities as a grid

user

The society will provide for grid facilities allowing the user to

access to its grid resources and providing for proper tools• The user:

– Does not need to know what stays beyond its user interface– Can access to a massive amount of computational power through a simple

terminal• The society:

– Can extend grid facilities at any moment– Manages the architecture of the grid– Defines policies and rules for accessing to grid resources

Ack. F. Scibilia

What about Grid Computing

Grid Computing paradigm is an emerging way of thinking distributed environments in a global scale infrastructure to:

• Share data• Distribute computation• Coordinate works• Access to remote

instrumentation

infrastructure to:

Ack. F. Scibilia

Page 5: Autonomic Grid Computing: Concepts, Infrastructure and Applications

5

Key Enablers of Grid Computing - Exponentials

• Network vs. computer performance– Computer speed doubles every 18

monthsS– Storage density doubles every 12 months

– Network speed doubles every 9 months

– Difference = order of magnitude per 5 years

• 1986 to 2000– Computers: x 500

Scientific American (Jan-2001)– Computers: x 500– Networks: x 340,000

• 2001 to 2010– Computers: x 60– Networks: x 4000

“When the network is as fast as the computer's internal links, the machine disintegrates across

the net into a set of special purpose appliances”

(George Gilder)

Ack: I. Foster

Why Computing Grids now?

• Because the amount of computational power needed by many applications is getting very huge

Thousands of CPUs working at the same time

on the same task

huge

• Because the amount of data requires massive and complex distributed storage systems

• To make easier the cooperation of people and resources belonging to different organizations

From hundreds of Gigabytes to Petabytes (1015) produced

by the same application.

People of several organizations working together to achieve a

common goal

• To access to particular instrumentation that is not easily reachable in a different way

• Because it is the next of step in the evolution of distribution of computation

Because it cannot be moved or replicated or its cost is too much

expensive.

To create a marketplace of computational power and storage over the Internet

Ack. F. Scibilia

Page 6: Autonomic Grid Computing: Concepts, Infrastructure and Applications

6

Who is interested in Grids?

Research community, to carry out important results from experiments that involve many

d l d iand many people and massive amounts of resources

Enterprises that can have huge computation without the need for

extending their current informatics infrastructure

Businesses, which can provide for computational power and data storage against a contract or for rental

Ack. F. Scibilia

Properties of Grids

• Transparency– The complexity of the Grid architecture is hidden to the final user– The user must be able to use a Grid as it was a unique virtual supercomputerThe user must be able to use a Grid as it was a unique virtual supercomputer– Resources must be accessible setting their location apart

• Openness– Each subcomponent of the Grid is accessible independently of the other

components• Heterogeneity

– Grids are composed by several and different resources• Scalability• Scalability

– Resources can be added and removed from the Grid dynamically• Fault Tolerance

– Grids must be able to work even if a component fails or a system crashes• Concurrency

– Different processes on different nodes must be able to work at the same time

Ack. F. Scibilia

Page 7: Autonomic Grid Computing: Concepts, Infrastructure and Applications

7

Challenged Issues in Grids (i)

• Security– Authentication and authorization of users– Confidentiality and not repudiationConfidentiality and not repudiation

• Information Services– To discover and monitor Grid resource– To check for health-status of resources– As basis for decision making processes

• File Management– Creation, modification and deletion of files– Replication of files to improve access performances– Ability to access to files without the need to move tham locally to the

code• Administration

– Systems to administer Grid resource respecting local administration policies

Ack. F. Scibilia

Challenged Issues in Grids (ii)

• Resource Brokering– To schedule tasks across different resources

To make optimal or s boptimal decisions– To make optimal or suboptimal decisions– To reserve (in the future) resources and network bandwidth

• Naming services– To name resources in un unambiguous way in the Grid scope

• Friendly User Interfaces– Because most of Grid users have nothing to do with computing

science (physicians chemistries )science (physicians, chemistries . . .)– Graphical User Interfaces (GUIs)– Grid Portals (very similar to classical Web Portals)– Command Line Interfaces (CLIs) for experts

Ack. F. Scibilia

Page 8: Autonomic Grid Computing: Concepts, Infrastructure and Applications

8

The Grid

“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”

1. Enable integration of distributed resources2. Using general-purpose protocols & infrastructure3. To achieve better-than-best-effort service

Ack: I. Foster

Virtual Organizations (VOs)

• A Virtual Organization is a collection of people and resources that work in a coordinated way to achieve a common goal

• To use Grid facilities, any user MUST subscribe to a Virtual OrganizationTo use Grid facilities, any user MUST subscribe to a Virtual Organization as member

• Each people or resource can be member of more VOs at the same time• Each VO can contain people or resources belonging to different

administration domains

Ack. F. Scibilia

Page 9: Autonomic Grid Computing: Concepts, Infrastructure and Applications

9

Virtual Laboratory

• A new way of cooperating in experimentsexperiments

• A platform that allow scientists to work together on in the same “Virtual” Laboratory Grid Infrastructre

People

Devices

?? Instruments

• Strictly correlated to Grids and Virtual Organizations

Computing resourcesData

Ack. F. Scibilia

Profound Technical Challenges

How do we, in dynamic, scalable, multi-institutional, computationally & data-rich settings:

• Negotiate & manage trust• Access & integrate data• Construct & reuse workflows• Plan complex computations• Detect & recover from failures• Capture & share knowledge

R t & f li i

• Support collaborative work• Define primitive protocols• Build reusable software• Package & deliver software• Deploy & operate services• Operate infrastructure

U d i f t t• Represent & enforce policies• Achieve end-to-end QoX• Move data rapidly & reliably

• Upgrade infrastructure• Perform troubleshooting• Etc., etc., etc.

Ack: I. Foster

Page 10: Autonomic Grid Computing: Concepts, Infrastructure and Applications

10

Globus Alliance

• The Globus Alliance– Is a community of people and organizations involved in projection and

development of Grid technologiesf f– University of Illinois, Argonne National Laboratory, University of Edinburgh,

EPCC, etc…

• The Globus Toolkit (GT)– It is a standard de facto– It is a bag of services– At its fourth release (GT4)– Now adopts Web Services interfacesp

• The Global Grid Forum– It is a forum of grid researchers– Works to define standards and protocols on grid technologies– It is divided in Working Groups (WGs)– http://www.ggf.org

Ack. F. Scibilia

Globus Services

P WS

Delegation Data Replication Grid Telecontrol Protocol

Community Scheduler Framework

WebMDS Python WS Core

Credential Management

Pre-WS Authentication Authorization

GridFTP

Replica Location

Pre-WS Grid Resource Allocation

& Management

Monitoring & Discovery (MDS2)

eXtensible IO

C Common Libraries

Authentication Authorization

Pre-WS Authentication Authorization

Reliable File Transfer

OGSA-DAI

Grid Resource Allocation & Management

Workspace Management

Trigger

Index

Java WS Core

C WS Core

WS Components

Non-WS ComponentsManagement p

Security Data Management

Execution Management

Information Services

Common Runtime

eXtensible IO (XIO)

Components

Core GT Component: public interfaces frozen between incremental releases; best effort support

Contribution/Tech Preview: public interfaces may change between incremental releases

Deprecated Component: not supported; will be dropped in a future releaseAck. F. Scibilia

Page 11: Autonomic Grid Computing: Concepts, Infrastructure and Applications

11

Hourglass Reference Model

• Fabric layer:– Manages resources locally

• Connectivity– Network communications (IP DNS etc )

Application

Network communications (IP, DNS etc.)– Security: authentication, authorization, certification– Single Sign On

• Resource– Allocation, reservation and monitoring of resources– Data access and transport– Gathering of information on resources

• Collective– View of services as collections– Discovery and allocation

Connectivity

Resource

Collective

sco e y a d a ocat o– Replica and catalogue of data– Management of workflow

• Application– User applications– Tools and interfaces

Fabric

Ack. F. Scibilia

Application Areas (i)

• Physicical Science Applications– GryPhiN, http://www.gryphin.org/

Particle Ph sics DataGrid (PPDG) http //grid fnal go /ppdg/– Particle Physics DataGrid (PPDG), http://grid.fnal.gov/ppdg/– GridPP, http://www.gridpp.ac.uk/– AstroGrid, http://www.astrogrid.org/

• Life Science Applications– Protein Data Bank (PDB), http://www.rcsb.org/pdb/Welcome.do

Biomedical Informatics Research Network (BIRN)– Biomedical Informatics Research Network (BIRN), http://www.nbirn.net/

– Telemicroscopy, http://ncmir.ucsd.edu/– myGrid, http://www.mygrid.org.uk/

Ack. F. Scibilia

Page 12: Autonomic Grid Computing: Concepts, Infrastructure and Applications

12

Application Areas (ii)

• Engineering Oriented Applications– NASA Information Power Grid (IPG), http://www.ipg.nasa.gov/

Grid Enabled Optimi ation and Design Search for Engineering– Grid Enabled Optimization and Design Search for Engineering (GEODISE), http://www.geodise.org/

• Commercial Applications– Butterfly Grid, http://www.butterfly.net/– Everquest, http://www.everquest.com/

• E-Utility– ClimatePrediction experiment, http://www.climateprediction.net/

Ack. F. Scibilia

An Example: SETI@Home

• The SETI@Home project– Searches for Extra Terrestrial Intelligence

(SETI)C ll ti l f i i f• Collecting samples of microwaves coming from the Universe through a telescope

• Scheduling tasks spread over Grid nodes to analyse these samples

– Uses desktop computers as Grid nodes– Working nodes are dynamically added and

removed to the grid– The owner of the desktop machine decides

how contribute to the project offering its j gcomputational power

• To contribute to the project– http://setiathome.berkeley.edu/– Download and install the client– Your machine will work as a Grid node when

is idle (in place of your screensaver)

Ack. F. Scibilia

Page 13: Autonomic Grid Computing: Concepts, Infrastructure and Applications

13

A Global Grid Community

Ack: I. Foster

The Grid

“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”

The original Grid concept has moved on!

Ack: I. Foster

Page 14: Autonomic Grid Computing: Concepts, Infrastructure and Applications

14

Emerging Pervasive Information/Computation Infrastructures

• Explosive growth in computation, communication, information and integration technologies– Computing, communication, data is ubiquitousComputing, communication, data is ubiquitous

• Pervasive ad hoc “anytime-anywhere” access environments– Ubiquitous access to information – Peers capable of producing/consuming/processing information at

different levels and granularities– Embedded devices in clothes, phones, cars, mile-markers, traffic

lights, lamp posts, medical instruments …

• “On demand” computational/storage resources, services

Pervasive Computational Ecosystems and Dynamic Information Driven Applications

Resources discovered, negotiated, co-allocated on-the-fly. components

deployed

Experts query, configure resources

Computers, Storage,Instruments, ...

Experts interact and collaborate using ubiquitous and

pervasive portals

Applications& Services

Model A

Model BLaptop

PDA

ComputerScientist

Scientist

Resources

Components write into the archive

Experts monitor/interact with/interrogate/steer models (“what if” scenarios,…). Application

notifies experts of interesting events.

Components dynamically composed.

“WebServices” discovered &

invoked.

Data Archive &Sensors

DataArchives

Sensors, Non-Traditional Data

Sources

Experts mine archive, match

real-time data with history

Real-time data assimilation/injection

(sensors, instruments, experiments, data

archives),

Automated mining & matching

Page 15: Autonomic Grid Computing: Concepts, Infrastructure and Applications

15

Pervasive Grid Environments - Unprecedented Opportunities• Pervasive Grids Environments

– Seamless, secure, on-demand access to and aggregation of, geographically distributed computing, communication and information resources

C t t k d t hi i t t b t i i t• Computers, networks, data archives, instruments, observatories, experiments, sensors/actuators, ambient information, etc.

– Context, content, capability, capacity awareness– Ubiquity and mobility

• Knowledge-based, information/data-driven, context/content-aware computationally intensive, pervasive applications– Symbiotically and opportunistically combine services/computations, real-

time information, experiments, observations, and to manage, control, predict, adapt, optimize, …predict, adapt, optimize, …

• Crisis management, monitor and predict natural phenomenon, monitor and manage engineering systems, optimize business processes

• A new paradigm ?– seamless access

• resources, services, data, information, expertise, …– seamless aggregation– seamless (opportunistic) interactions/couplings

Information-driven Management of Subsurface Geosystems: The Instrumented Oil Field (with UT-CSM, UT-IG, OSU, UMD, ANL)

Detect and track changes in data during production.Invert data for reservoir properties.Detect and track reservoir changes.

Assimilate data & reservoir properties intothe evolving reservoir model.

Use simulation and optimization to guide future production.

Data Driven

ModelDriven

Page 16: Autonomic Grid Computing: Concepts, Infrastructure and Applications

16

LandfillsLandfills OilfieldsOilfieldsVision: Diverse Geosystems – Similar Solutions

ModelsModels SimulationSimulation

DataDataControlControlUndergroundPollution

UndergroundPollution

UnderseaReservoirsUnderseaReservoirs

Management of the Ruby Gulch Waste Repository (with UT-CSM, INL, OU)

• Ruby Gulch Waste Repository/Gilt Edge Mine, South Dakota – ~ 20 million cubic yard of– ~ 20 million cubic yard of

waste rock– AMD (acid mine drainage)

impacting drinking water supplies

• Monitoring System– Multi electrode resistivity system

(523)

– Flowmeter at bottom of dump– Weather-station– Manually sampled chemical/air

ports in wells– Approx 40K measurements/day

(523)• One data point every 2.4

seconds from any 4 electrodes – Temperature & Moisture sensors

in four wells

“Towards Dynamic Data-Driven Management of the Ruby Gulch Waste Repository,” M. Parashar, et al, DDDAS Workshop, ICCS 2006, Reading, UK, LNCS, Springer Verlag, Vol. 3993, pp. 384 – 392, May 2006.

Page 17: Autonomic Grid Computing: Concepts, Infrastructure and Applications

17

Data-Driven Forest Fire Simulation (U of AZ)

• Predict the behavior and spread of wildfires (intensity, propagation speed and direction, modes of spread) – based on both dynamic and

static environmental and vegetation conditions

– factors include fuel characteristics and configurations, chemical reactions balancesreactions, balances between different modes of hear transfer, topography, and fire/atmosphere interactions.

“Self-Optimizing of Large Scale Wild Fire Simulations,” J. Yang*, H. Chen*, S. Hariri and M. Parashar, Proceedings of the 5th International Conference on Computational Science (ICCS 2005), Atlanta, GA, USA, Springer-Verlag, May 2005.

Many Application Areas ….

• Hazard prevention, mitigation and response– Earthquakes, hurricanes, tornados, wild fires, floods, landslides, tsunamis, terrorist

attacks• Critical infrastructure systems

– Condition monitoring and prediction of future capability• Transportation of humans and goods

– Safe, speedy, and cost effective transportation networks and vehicles (air, ground, space)

• Energy and environment– Safe and efficient power grids, safe and efficient operation of regional collections

of buildings• Health

– Reliable and cost effective health care systems with improved outcomesE t i id d i i ki• Enterprise-wide decision making

– Coordination of dynamic distributed decisions for supply chains under uncertainty• Next generation communication systems

– Reliable wireless networks for homes and businesses• … … … …

• Report of the Workshop on Dynamic Data Driven Applications Systems, F. Darema et al., March 2006, www.dddas.org

Source: M. Rotea, NSF

Page 18: Autonomic Grid Computing: Concepts, Infrastructure and Applications

18

Outline

• Pervasive Grid Environments - Unprecedented Opportunities

• Pervasive Grid Environments - Unprecedented ChallengesPervasive Grid Environments Unprecedented Challenges– System, Information, Application Uncertainty

• Autonomic Grid Computing

• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid Environments

• An Illustrative Application

• Concluding Remarks

Pervasive Grid Applications – Unprecedented Challenges: Uncertainty

• System Uncertainty– Very large scales– Ad hoc structures/behaviors

p2p hierarchical etc architect res

• Information Uncertainty– Availability, resolution, quality of

information– Devices capability, operation, • p2p, hierarchical, etc, architectures

– Dynamic• entities join, leave, move, change

behavior– Heterogeneous

• capability, connectivity, reliability, guarantees, QoS

– Lack of guarantees• components, communication

p y, p ,calibration

– Trust in data, data models – Semantics

• Application Uncertainty– Dynamic behaviors

• space-time adaptivityp– Lack of common/complete

knowledge (LOCK)• number, type, location, availability,

connectivity, protocols, semantics, etc.

– Dynamic and complex couplings• multi-physics, multi-model, multi-

resolution, ….– Dynamic and complex (ad hoc,

opportunistic) interactions– Software/systems engineering

issues• Emergent rather than by design

Page 19: Autonomic Grid Computing: Concepts, Infrastructure and Applications

19

Pervasive Grid Computing – Research Issues, Opportunities• Programming systems/models for data integration and runtime self-

management– components and compositions capable of adapting behavior, interactions

and informationcorrectness consistency performance quality-of-service constraints– correctness, consistency, performance, quality-of-service constraints

• Content-based asynchronous and decentralized discovery and access services– semantics, metadata definition, indexing, querying, notification

• Data management mechanisms for data acquisition and transport with real time, space and data quality constraints– high data volumes/rates heterogeneous data qualities sourceshigh data volumes/rates, heterogeneous data qualities, sources – in-network aggregation, integration, assimilation, caching

• Runtime execution services that guarantee correct, reliable execution with predictable and controllable response time– data assimilation, injection, adaptation

• Security, trust, access control, data provenance, audit trails, accounting

Outline

• Pervasive Grid Environments - Unprecedented Opportunities

• Pervasive Grid Environments - Unprecedented Challenges

• Autonomic Grid Computing

• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid EnvironmentsAutonomic Applications in Pervasive Grid Environments

• An Illustrative Application

• Concluding Remarks

Page 20: Autonomic Grid Computing: Concepts, Infrastructure and Applications

20

Integrating Biology and Information Technology: The Autonomic Computing Metaphor

• Current programming paradigms, methods, management tools are inadequate to handle the scale, complexity, dynamism and heterogeneity of emerging systems

• Nature has evolved to cope with scale, complexity, heterogeneity, dynamism and unpredictability, lack of guarantees– self configuring, self adapting, self optimizing, self healing, self

protecting, highly decentralized, heterogeneous architectures that work !!!

• Goal of autonomic computing is to build a self-managing system that addresses these challenges using high level guidance

“Autonomic Computing: An Overview,” M. Parashar, and S. Hariri, Hot Topics, Lecture Notes in Computer Science, Springer Verlag, Vol. 3566, pp. 247-259, 2005.

Adaptive Biological Systems

• The body’s internal mechanisms continuously work together to maintain essential variables within physiologicalessential variables within physiological limits that define the viability zone

• Two important observations:– The goal of the adaptive behavior is

directly linked with the survivability– If the external or internal environment

h th t t id itpushes the system outside itsphysiological equilibrium state the systemwill always work towards coming back tothe original equilibrium state

Page 21: Autonomic Grid Computing: Concepts, Infrastructure and Applications

21

Ashby’s Ultrastable System

Environment Essential Variables

Motorchannels

Sensorchannels

Reacting Part R Step Mechanisms/Input Parameter S

Autonomic Computing Characteristics (IBM)

By IBM

Page 22: Autonomic Grid Computing: Concepts, Infrastructure and Applications

22

Autonomic Computing Architecture

• Autonomic elements (components/services)– Responsible for policy-driven self-management of individual

components

• Relationships among autonomic elements – Based on agreements established/maintained by autonomic

elements– Governed by policies– Give rise to resiliency, robustness, self-management of

systemsystem

Autonomic Computing – Conceptual Architecture (IBM)

Page 23: Autonomic Grid Computing: Concepts, Infrastructure and Applications

23

Autonomic Elements: Structure

• Fundamental atom of the architecture

Ack. IBM

– Managed element(s)• Database, storage system,

server, software app, etc.– Plus one autonomic manager

• Responsible for:– Providing its service– Managing its own behavior in ES

Monitor

Analyze

Execute

Plan

Knowledge

Autonomic Manager

g gaccordance with policies

– Interacting with other autonomic elements

An Autonomic Element

Managed ElementES

Autonomic Elements: Interactions

• Relationships– Dynamic, ephemeral,

Ack. IBM

y , p ,opportunistic

– Defined by rules and constraints

– Formed by agreement• May be negotiated

Full spectrum– Full spectrum• Peer-to-peer• Hierarchical

– Subject to policies

Page 24: Autonomic Grid Computing: Concepts, Infrastructure and Applications

24

Autonomic Systems: Composition of Autonomic Elements

Ack. IBM

ReputationAuthority

Network

EventCorrelator

Monitor

WorkloadManager

Server

Server

Negotiator

Broker

ProvisionerRegistry

BrokerSentinel

Arbiter PlannerWorkloadManager

Database

Network

Registry

Database

MonitorServer

StorageStorage

Storage

Sentinel

Monitor

Aggregator

Monitor

Database

Autonomic Computing: Research Issues and Challenges

Scale, Complexity, Dynamism, Heterogeneity, Unreliability, Uncertainty

• Defining autonomic elementsg– Programming paradigms and development models/frameworks

• Autonomic element definition and construction• Rule definition, representation, and enforcement

• Constructing autonomic systems/applications– Composition, coordination, interactions models and infrastructures

• Dynamic (rule-based) configuration, execution and optimization• Dynamic (opportunistic) interactions, coordination, negotiation

• Execution and managementExecution and management– Runtime and middleware services

• Discovery, Coordination, Messaging, Security, Management, …– Security, protection– Fault tolerance, reliability, availability, …

• Policies, Learning, AI, …

Page 25: Autonomic Grid Computing: Concepts, Infrastructure and Applications

25

Outline

• Pervasive Grid Environments - Unprecedented Opportunities

• Pervasive Grid Environments - Unprecedented Challenges,Pervasive Grid Environments Unprecedented Challenges, Opportunities

• Autonomic Grid Computing

• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid Environments

• An Illustrative Application

• Concluding Remarks

Applications

Programming System(Programming Model + Abstract Machine)

Applications

Programming System(Programming Model + Abstract Machine)

Programming Pervasive Grid Systems

– Programming System• programming model,

languages/abstraction – syntax + semantics

Grid Middleware Infrastructure

(Programming Model Abstract Machine)

Virtualization(Create and manage VOs and VMs)

Abstraction(Support abstract behaviors/assumptions)

Virtual Machine Virtual Machine Virtual Machine

Grid Middleware Infrastructure

(Programming Model Abstract Machine)

Virtualization(Create and manage VOs and VMs)

Abstraction(Support abstract behaviors/assumptions)

Virtualization(Create and manage VOs and VMs)

Abstraction(Support abstract behaviors/assumptions)

Virtual Machine Virtual Machine Virtual Machine

– entities, operations, rules of composition, models of coordination/communication

• abstract machine, execution context and assumptions

• infrastructure, middleware and runtime

– Hide v/s expose uncertainty?

Organization Organization Organization

Virtual Organization

Organization Organization OrganizationOrganizationOrganization OrganizationOrganization OrganizationOrganization

Virtual Organization Hide v/s expose uncertainty?

• virtual homogeneity, ease of programming

• robustness • cross-layer adaptation and

optimization– the inverted stack …

“Conceptual and Implementation Models for the Grid,” M. Parashar and J.C. Browne, Proceedings of the IEEE, Special Issue on Grid Computing, IEEE Press, Vol. 19, No. 3, March 2005.

Page 26: Autonomic Grid Computing: Concepts, Infrastructure and Applications

26

Autonomic Grid Computing – A Holistic Approach

• Computing has evolved and matured to provide specialized solutions to satisfy relatively narrow and well defined requirements in isolation– performance, security, dependability, reliability, availability, throughput,

pervasive/amorphous automation reasoning etcpervasive/amorphous, automation, reasoning, etc.

• In case of pervasive Grid applications/environments, requirements, objectives, execution contexts are dynamic and not known a priori– requirements, objectives and choice of specific solutions (algorithms,

behaviors, interactions, etc.) depend on runtime state, context, and content– applications should be aware of changing requirements and executions

contexts and to respond to these changes are runtime

• Autonomic Grid computing - systems/applications that self-manage – use appropriate solutions based on current state/context/content and

based on specified policies– address uncertainty at multiple levels– asynchronous algorithms, decoupled interactions/coordination,

content-based substrates

Project AutoMate: Enabling Autonomic Applications

Sesame/D

Autonomic Grid Applications

Programming SystemAutonomic Components, Dynamic Composition,

Opportunistic Interactions, Collaborative Monitoring/Controly

Acc

ord

Prog

ram

min

gFr

amew

ork

Rudder

Coordination

Middlew

are

DAIS

Protection S

ervice

Decentralized Coordination EngineAgent Framework,

Decentralized Reactive Tuple Space

Semantic Middleware ServicesContent-based Discovery, Associative Messaging

Content OverlayContent-based Routing Engine,

Self-Organizing Overlay

Ont

olog

y, T

axon

omy

Met

eor/S

quid

Con

tent

-bas

edM

iddl

ewar

e

• Conceptual models and implementation architectures – programming systems based on popular programming models

• object, component and service based prototypes– content-based coordination and messaging middleware– amorphous and emergent overlays

• http://automate.rutgers.edu

Page 27: Autonomic Grid Computing: Concepts, Infrastructure and Applications

27

Project AutoMate: Core Components

• Accord – A Programming System for Autonomic Grid ApplicationsS id D t li d I f ti Di d C t t• Squid – Decentralized Information Discovery and Content-based Routing

• Meteor – Content-based Interactions/Messaging Middleware• Rudder/Comet – Decentralized Coordination Middleware• ACE – Autonomic Composition Engine• SESAME – Context-Aware Access ManagementSESAME Context Aware Access Management• DAIS – Cooperative Protection against Network Attacks

• More information/Papers – http://automate.rutgers.edu“AutoMate: Enabling Autonomic Grid Applications,” M. Parashar et al, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Special Issue on Autonomic Computing, Kluwer Academic Publishers. Vol. 9, No. 2, pp. 161 – 174, 2006.

Accord: Rule-Based Programming System

• Accord is a programming system which supports the development of autonomic applications.– Enables definition of autonomic components with programmable

behaviors and interactions.

– Enables runtime composition and autonomic management of these components using dynamically defined rules.

• Dynamic specification of adaptation behaviors using rules.

• Enforcement of adaptation behaviors by invoking sensors and actuators.p y g

• Runtime conflict detection and resolution.

• 3 Prototypes: Object-based, Components-based (CCA), Service-based (Web Service)“Accord: A Programming Framework for Autonomic Applications,” H. Liu* and M. Parashar, IEEE Transactions on Systems, Man and Cybernetics, Special Issue on Engineering Autonomic Systems, IEEE Press, Vol. 36, No 3, pp. 341 – 352, 2006.

Page 28: Autonomic Grid Computing: Concepts, Infrastructure and Applications

28

Autonomic Element/Behaviors in Accord

Element Manager

Functional PortComputational

Element Manager

Functional PortComputational Application strategiesApplication strategies

Autonomic Element

Control Port

Operational Port

ComputationalElement

Autonomic Element

Control Port

Operational Port

ComputationalElement

E t A t tOther

E t A t tOther

Application workflow

Composition manager

Application strategiesApplication requirements

Interaction rules

Interaction rules

Interaction rules

Interaction rules

Behavior rules

Behavior rules

Behavior rules

Behavior rules

Application workflow

Composition manager

Application strategiesApplication requirements

Interaction rules

Interaction rules

Interaction rules

Interaction rules

Behavior rules

Behavior rules

Behavior rules

Behavior rules

Element Manager

Event generation

Actuatorinvocation

Interfaceinvocation

Internalstate

Contextualstate

Rules

Element Manager

Event generation

Actuatorinvocation

Interfaceinvocation

Internalstate

Contextualstate

Rules

LLC Based Self Management within Accord

Self-Managing Element OptimizationFunction

LLC Controller

El t/S i M t d ith LLC

ComputationalElement

LLC Controller

Element Manager

InternalState

ContextualState

Element Manager

Advice Computational Element Model

• Element/Service Managers are augmented with LLC Controllers– monitors state/execution context of elements– enforces adaptation actions determined by the controller– augment human defined rules

Page 29: Autonomic Grid Computing: Concepts, Infrastructure and Applications

29

The Self-managing Shock Simulation: Self-optimizing Via Component Replacement

The Self-managing Shock Simulation: Self-optimizing Via Component Adaptation

Page 30: Autonomic Grid Computing: Concepts, Infrastructure and Applications

30

The Self-managing Shock Simulation: Self-healing Via Component Replacement

Decentralized (Decoupled/Asynchronous) Content-based Middleware Services

Pervasive Grid Environment

Self-Managing Overlay (SquidTON)

Page 31: Autonomic Grid Computing: Concepts, Infrastructure and Applications

31

SquidTON: Reliable & Fault Tolerant Overlay

• Pervasive Grid systems are dynamic, with nodes joining, leaving and failing relatively often

• => data loss and temporarily inconsistent overlay structure• => the system cannot offer guarantees

B ild d d i t th l t k– Build redundancy into the overlay network– Replicate the data

• SquidTON = Squid Two-tier Overlay Network– Consecutive nodes form unstructured groups, and at the same time are connected

by a global structured overlay (e.g. Chord)– Data is replicated in the group

2522

249

1710

22

1710

33

82

86

9194

128132150

157

141

231

249240

185

192

Groups of nodes Group identifier

250231

249 240

185192

128

132

150

157141

161

8286

9194

3325

22

102

41

Content Descriptors and Information Space

• Data element = a piece of information that is indexed and discovered– Data, documents, resources, services, metadata, messages, events, etc.

• Each data element has a set of keywords associated with it, which ydescribe its content => data elements form a keyword space

2D keyword space for a P2P file sharing system

network

Documentkeyw

ord2

computer keyword1 comp*

Complex query(comp*, *)

keyword1

keyw

ord2

2D keyword space for a P2P file sharing system

network

Documentkeyw

ord2

computer keyword1 comp*

Complex query(comp*, *)

keyword1

keyw

ord2

network

Documentkeyw

ord2

computer keyword1

network

Documentkeyw

ord2

computer keyword1computer keyword1 comp*

Complex query(comp*, *)

keyword1

keyw

ord2

comp*

Complex query(comp*, *)

keyword1

keyw

ord2

comp*

Complex query(comp*, *)

keyword1

keyw

ord2

3D keyword space for resource sharing, using the attributes: storage space, base bandwidth and cost

Storage space (MB)

Bas

e ba

ndw

idth

(Mbp

s)

Computational resource

Cost

30

100

9

Storage space (MB)

Bas

e ba

ndw

idth

(M

bps)

Cost

2025

10

Complex query(10, 20-25, *)

3D keyword space for resource sharing, using the attributes: storage space, base bandwidth and cost

Storage space (MB)

Bas

e ba

ndw

idth

(Mbp

s)

Computational resource

Cost

30

100

9

Storage space (MB)

Bas

e ba

ndw

idth

(M

bps)

Cost

2025

10

Complex query(10, 20-25, *)

Storage space (MB)

Bas

e ba

ndw

idth

(Mbp

s)

Computational resource

Cost

30

100

9

Storage space (MB)

Bas

e ba

ndw

idth

(Mbp

s)

Computational resource

Cost

30

100

9

Storage space (MB)

Bas

e ba

ndw

idth

(Mbp

s)

Computational resource

Cost

30

100

9

Storage space (MB)

Bas

e ba

ndw

idth

(M

bps)

Cost

2025

10

Complex query(10, 20-25, *)

Storage space (MB)

Bas

e ba

ndw

idth

(M

bps)

Cost

2025

10

Complex query(10, 20-25, *)

Storage space (MB)

Bas

e ba

ndw

idth

(M

bps)

Cost

2025

10

Complex query(10, 20-25, *)

Page 32: Autonomic Grid Computing: Concepts, Infrastructure and Applications

32

Content Indexing: Hilbert SFC

• f: Nd → N, recursive generation

1

0

10 00 01 10 11

11

10

01

0000 11

01 10

0000

0010

0001

0011

0100

01010110

011110001001

101010111100

11111110

1101

1

0

10 00 01 10 11

11

10

01

0000 11

01 10

0000

0010

0001

0011

0100

01010110

011110001001

101010111100

11111110

1101

• Properties:– Digital causality– Locality preserving– Clustering

Cluster: group of cells connected by a segment of the curve

Content Indexing, Routing & Querying

itude 0

51

(*, 4)Content profile 3:

Content profile 2

(4-7,0-3)Content profile 2:

Content profile1• Demonstrated analytically and experimentally that

4

4 70

3

Generate the clusters

alt

longitude

13

33

47

51

40

Content profile 3Matching messages2

1 7

7

(2,1)Content profile 1:

p

– for large systems that queries with p% coverage will query p% of the nodes, independent on data distribution.

– the system scales with the number of nodes and data– optimization significantly reduces the number of clusters generated

and messages sent– slightly increases the number of nodes queried – only a small

number of “intermediary” nodes are involvedGenerate the clusters associated with the content profile

Rout to the nodes that store the clusters Send the results to the

requesting node

• Note:– More than one cluster are typically stored at a node– Not all clusters that are generated for a query exist in the network – SFC, clusters generation is recursive, i.e., a prefix tree (trie)

• Optimization: embed the tree into the overlay and prune nodes during construction

number of intermediary nodes are involved

Page 33: Autonomic Grid Computing: Concepts, Infrastructure and Applications

33

Squid Content Routing/Discovery Engine: Experimental Evaluation

• System size: 103 to 106 nodes• Data:

– Uniformly distributed synthetic

• Experiments:– Number of clusters generated for

a queryUniformly distributed, synthetic generated data

– 4*105 CiteSeer data• Load balanced system

– Number of nodes queried• All results are plotted on a

logarithmic scale

Squid Content Routing/Discovery Engine:Optimization

• Number of clusters generated for queries with coverage 1%, 0.1%, 0.01%, with and without optimization

• The results are normalized against the clusters that the query defines onThe results are normalized against the clusters that the query defines on the curve (i.e. without optimization).

0.01

0.1

1

10

ized

num

ber

ofcl

uste

rs

0.001

0.01

0.1

1

10

ized

num

bero

fclu

ster

s

3D Uniformly distributed data 3D CiteSeer data

0.0001

0.001

System size

Nor

mal

i

Clusters 1% query 0.1% query 0.01% query

10 3 10 4 10 50.00001

0.0001

System size

Nor

mal

i

Clusters 1% query 0.1% query 0.01% query

10 3 10 4 10 5

Page 34: Autonomic Grid Computing: Concepts, Infrastructure and Applications

34

Squid Content Routing/Discovery Engine – Nodes Queried

• Percentage of nodes queried for queries with coverage 1%, 0.1%, 0.01%, with and without optimization

0.1

1

10

%of

node

sque

ried

0.1

1

10

%of

node

sque

ried

0.01

System size1%1% optimization

0.1%0.1% optimization

0.01%0.01% optimization

10 3 10 4 10 5 10 6 0.01

System size10 3 10 4 10 5

1%1% optimization

0.1%0.1% optimization

0.01%0.01% optimization

3D Uniformly distributed data 3D CiteSeer data

Project Meteor: Associative Rendezvous

• Content-based decoupled interaction with programmable reactive behaviors

– Messages - (header, action, data)• Symmetric post primitive: does not

diff i i b i /d

profile credentials

message context

TTL (time to live)

Action

Header

• storedifferentiating between interest/data– Associative selection

• match between interest and data– Reactive behavior

• Execute action field upon matching

• Decentralized in-network aggregation – Tries for back-propagating and

aggregating matching data items• Supports WS Notification standard

[Data]

store• retrieve• notify• delete

Profile = list of (attribute, value) pairs:Example:<(sensor_type, temperature), (latitude, 10), (longitude, 20)>

C1

C2

post (<p1, p2>, store, data)

notify_data(C2)

notify(C2)

(1)

(3)

(2) post(<p1, *>, notify_data(C2) )

<p1, p2> <p1, *>

match

Page 35: Autonomic Grid Computing: Concepts, Infrastructure and Applications

35

Heterogeneity Management

• Heterogeneity management and adaptations at AR nodes using reactive behavrio– Policy-based adaptations based on capabilities, preferences, resourcesy p p , p ,

Comet Coordination Space

• A virtual global shared-space constructed from a semantic multi-dimensional information space, which is deterministically mapped onto the system peer nodesdeterministically mapped onto the system peer nodes

• Space is associatively accessible by all system peer nodes. Access is independent of the physical locations of tuples or hosts– Tuple distribution

• A tuple/template (XML) is associated with k keywords• Squid content-based routing engine used or exact and approximateSquid content based routing engine used or exact and approximate

tuple distribution and retrieval– Transient spaces

• Enable application to explicitly exploit context locality

“COMET: A Scalable Coordination Space in Decentralized Distributed Environments,” Z. Li* and M. Parashar, Proceedings of the 2nd International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P 2005), San Diego, CA, USA, IEEE Computer Society Press, pp. 104 – 111, July 2005.

Page 36: Autonomic Grid Computing: Concepts, Infrastructure and Applications

36

Supporting the Rudder Agent Framework

• Agents communication – associatively reading, writing, and extracting tuples

• Agent coordination protocolsg p– Decentralized election protocol

• Based on wait-free consensus protocols– Resilient to node/link failures

– Discovery protocol• Registry implemented using XML tuples• Element registered using Out • Element unregistered using In • Elements discovered using Rd/RdAll operation• Elements discovered using Rd/RdAll operation

– Interaction protocol• Contract-Net protocol• Two agent bargaining protocol

• Workflow engine

“Enabling Dynamic Composition and Coordination of Autonomic Applications using the Rudder Agent Framework,” Z. Li* and M. Parashar, The Knowledge Engineering Review, Cambridge University Press (also SAACS 2005).

Implementation/Deployment Overview

• Current implementation builds on JXTA– SquidTON, Squid, Comet and

• Deployments include– Campus Grid @ Rutgers– Orbit wireless testbed (400q , q ,

Meteor layers are implemented as event-driven JXTA services

Orbit wireless testbed (400 nodes)

– PlanetLab wide-area testbed• At least one node selected from

each continent

Page 37: Autonomic Grid Computing: Concepts, Infrastructure and Applications

37

Outline

• Pervasive Grid Environments - Unprecedented Opportunities

• Pervasive Grid Environments - Unprecedented Challenges,Pervasive Grid Environments Unprecedented Challenges, Opportunities

• Autonomic Grid Computing

• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid Environments

• An Illustrative Application

• Concluding Remarks

The Instrumented Oil Field of the Future (UT-CSM, UT-IG, RU, OSU, UMD, ANL)

• Production of oil and gas can take advantage of installed sensors that will monitor the reservoir’s state as fluids are extracted

• Knowledge of the reservoir’s state during production can result in better engineering decisions

Detect and track changes in data during productionInvert data for reservoir propertiesDetect and track reservoir changes

Assimil t d t & s i p p ti s int

engineering decisions – economical evaluation; physical characteristics (bypassed oil, high pressure zones);

productions techniques for safe operating conditions in complex and difficult areas

Assimilate data & reservoir properties intothe evolving reservoir model

Use simulation and optimization to guide future production, future data acquisition strategy

“Application of Grid-Enabled Technologies for Solving Optimization Problems in Data-Driven Reservoir Studies,” M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz and M Wheeler, FGCS. The International Journal of Grid Computing: Theory, Methods and Applications (FGCS), Elsevier Science Publishers, Vol. 21, Issue 1, pp 19-26, 2005.

Page 38: Autonomic Grid Computing: Concepts, Infrastructure and Applications

38

Effective Oil Reservoir Management: Well Placement/Configuration

• Why is it important – Better utilization/cost-effectiveness of existing reservoirs– Minimizing adverse effects to the environment

Better ManagementBad Management

Less Bypassed OilMuch Bypassed Oil

An Autonomic Well Placement/Configuration Workflow

Start Parallel Instance connects to

DISCOVERNotifies ClientsClients interact

Send guesses

OptimizationService

IPARSFactory

SPSA

VFSA

Exhaustive

DISCOVERclient

client

Generate Guesses Send GuessesStart Parallel

IPARS InstancesInstance connects to

DISCOVERClients interact

with IPARS

If guess not in DBinstantiate IPARS

with guess asparameter

MySQLDatabase

If guess in DB:send response to Clientsand get new guess fromOptimizer

Search

AutoMate Programming System/Grid Middleware

History/ Archived

Data

Sensor/ContextData

Oil prices, Weather, etc.

Page 39: Autonomic Grid Computing: Concepts, Infrastructure and Applications

39

Data-Flow for Autonomic Well Placement/Configuration

Data Conversion

Data ManipulationTools

Seismic Sim

Seismic Sim

Seismic Sim

Seismic DatasetsDistributedExecution

Reservoir Datasets

Seismic Sim

Seismic Sim

Seismic Sim

Seismic Datasets

Reservoir Datasets Data ManipulationTools

Sensor Data

RD

SUM

AVGDIFF

SUM SUM

RD

……..Node 1 Node 20

DIFF DIFF DIFFTransparent Copies

Transparent Copies(one copy per node)

SUMFilters

Seismic Datasets

50.00

50.00

VisualizationPortals

Data

Context Data

AutonomicAutonomic OilOil Well Placement/Configuration

permeability Pressure contours3 wells, 2D profile

Contours of NEval(y,z,500)(10)

Requires NYxNZ (450)evaluations. Minimumappears here.

VFSA solution: “walk”: found after 20 (81) evaluations( )

Page 40: Autonomic Grid Computing: Concepts, Infrastructure and Applications

40

AutonomicAutonomic OilOil Well Placement/Configuration (VFSA)

“An Reservoir Framework for the Stochastic Optimization of Well Placement,” V. Matossian, M. Parashar, W. Bangerth, H. Klie, M.F. Wheeler, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Kluwer Academic Publishers, Vol. 8, No. 4, pp 255 – 269, 2005 “Autonomic Oil Reservoir Optimization on the Grid,” V. Matossian, V. Bhat, M. Parashar, M. Peszynska, M. Sen, P. Stoffa and M. F. Wheeler, Concurrency and Computation: Practice and Experience, John Wiley and Sons, Volume 17, Issue 1, pp 1 – 26, 2005.

Wide Area Data Streaming in the Fusion Simulation Project (FSP)

GTC Runs on Teraflop/Petaflop Supercomputers Large data analysis

End-to-end system with monitoring routines

40Gbps

Data archiving

Data replication

Data replication

User monitoring

Post processing

User monitoring

Visualization

• Wide Area Data Streaming Requirements – Enable high-throughput, low latency data transfer to support near real-time

access to the data– Minimize related overhead on the executing simulation– Adapt to network conditions to maintain desired QoS– Handle network failures while eliminating data loss.

Visualization

Page 41: Autonomic Grid Computing: Concepts, Infrastructure and Applications

41

Autonomic Data Streaming for Coupled Fusion Simulation Workflows

TransferSimulation AnalysisStorage

PPPL, NJNERSC, CAData Grid Middleware

& Logistical Networking

Storage

ORNL, TN

BackboneSimulation : Simulation Service

(SS)

Analysis/Viz :Data Analysis Service (DAS)

Storage : Data Storage Service (DSS)

Data TransferBuffer

Autonomic Data Streaming Service

Managed Service

Transfer : Autonomic Data Streaming Service (ADSS)

– Buffer Manager Service (BMS)

– Data Transfer Service (DTS)

g(ADSS)

• Grid-based coupled fusion simulation workflow– Run for days between ORNL (TN), NERSC (CA), PPPL (NJ) and

Rutgers (NJ) – Generates multi-terabytes of data– Data coupled, analyzed and visualized at runtime

ADSS Implementation

Buffer size Prediction

OptimizationLLC Model LLC Controller

Data Rate Prediction

Buffer size Prediction

OptimizationLLC Model LLC Controller

Data Rate Prediction

Bandwidth Measurement

Buffer Size Measurement

Data Rate Measurement

Optimization Function

Element/Service Manager

Contextual StateService State

Rule Base

Bandwidth Measurement

Buffer Size Measurement

Data Rate Measurement

Optimization Function

Element/Service Manager

Contextual StateService State

Rule Base

DTSBMS

SS

Local HD

ADSS

Rule Based Adaptation Grid Middleware {LN}

BackboneDTSBMS

SS

Local HD

ADSS

Rule Based Adaptation Grid Middleware {LN}

Backbone

Page 42: Autonomic Grid Computing: Concepts, Infrastructure and Applications

42

Design of the ADSS Controller

• Dynamics of Data Streaming Model captured using a queuing model

• Key Operating parametersState variables

LLC model for the ADSS Controller

q1(k)

Queue Manager Threads

n1Data Blocks

– State variables• qi(k) : Current Average queue

size at ni– Environment Variables

• λi(k): Data generation rate into the queue at ni

• B(k) : Effective bandwidth of the network link

– Control and Decision Variables• μi(k) : Data-transfer rate over

the network (k) D t t f t t th

“m” simulation processors

“n” Data

λ1(k)

λ2(k)

λn(k)μn(k)

ω1(k)

AnalysisClusters

SimulationProcessors

ω2(k)

μ1(k)

μ2(k)

Queue Manager

Queue Manager

n2

nn

....

• ωi(k), Data-transfer rate to the hard disk

• LLC Controller Problem– Controller Aims to Maintain each

Node (ni) queue around a desired value of q*

Local HD

n Data transfer

processorsωn(k)

qn(k)

Tkkkkqkq iiiii ⋅−−⋅+=+ )))()(1()(ˆ()()1(ˆ ωμλ

)),1(()( kkk ii −= λφλ

System dynamics at each data streaming processor ni

Adaptive Data Transfer

MB

) 100

120

140

ec)80

100

120

0 2 4 6 8 10 12 14 16 18 20 22 24

Dat

a Tr

ansf

errr

ed b

y D

TS(M

0

20

40

60

80

Ban

dwid

th (M

b/se

0

20

40

60

DTS to WANDTS to LANBandwidthCongestion

• No congestion in intervals 1-9 – Data transferred over WAN

• Congested at intervals 9-19 – Controller recognizes this congestion and advises the “Element Manager” which in

turn adapts DTS to transfer data to local storage (LAN).• Adaptation continues until the network is not congested

– Data sent to the local storage by the DTS falls to zero at the 19th controller interval.

Controller Interval0 2 4 6 8 10 12 14 16 18 20 22 24

Page 43: Autonomic Grid Computing: Concepts, Infrastructure and Applications

43

Adaptive Buffer Management

MS

6

7

8

c)80

100

120

Blocks 1 Block = 4MB

Congestion

Bandwidth

Uniform Buffer Management

Aggregate Buffer Management

Blo

cks

Gro

uped

by

BM

0

1

2

3

4

5

Ban

dwid

th (M

b/se

c

0

20

40

60

80Bandwidth

• Uniform buffer management is used when data generation is constant • Aggregate buffer management is triggered when congestion increases

Controller Interval0 2 4 6 8 10 12 14 16 18 20 22 24 26

Self Optimizing Behavior: Buffer Management Service (BMS)

TransferSimulation

Autonomic Data Streaming MB

/blo

ck)

12

14

Aggregate Buffer Management

Data TransferBMS

Autonomic Data Streaming

Num

ber o

f Blo

cks

Sent

(10M

2

4

6

8

10 100Mbps

Uniform Buffer Mangement 50Mbps

120Mbps

• Uniform Buffer Management : buffer divides data blocks into fixed sizes (50Mbps).

• Aggregate Buffer Management : Aggregates block of data across iterations(60-400Mbps).P i i B ff M

Simulation Time (sec)0 50 100 150 200 250 300 350 400• Priority Buffer Management :

order blocks on nature of the data.(2D and 3D data items).

• BMS adaptations based on:– Data Generation Rates– Network connectivity– Nature of data being transmitted

Sample Rule :

IF DataGenerationRate <=50 AND DataType = 2D

THEN Aggregate Buffer Management

ELSE Uniform Buffer Management

Page 44: Autonomic Grid Computing: Concepts, Infrastructure and Applications

44

Adaptation of the Workflow

• Create multiple instances of the Autonomic Data Streaming Service (ADSS) t

80

100

nces4

5

– Effective Network Transfer Rate dips below the Threshold (our case around 100Mbs)

– Maximum number of instances is contained within a certain limit %

Net

wor

k th

roug

hput

0

20

40

60

Num

ber o

f AD

SS In

stan

0

1

2

3

Data Generation Rate (Mbps)0 20 40 60 80 100 120 140 160

0 0

% Network throughput vs MbpsNumber of ADSS Instances vs Mbps

TransferSimulation

ADSS-0 Data TransferBuffer

Data TransferBuffer

Data TransferBuffer

ADSS-1

ADSS-2

% Network throughput is difference between the max and current network transfer rate

Overhead of Self Optimizing in BMS

ulat

ion

15

20

%Overhead vs Mbps using Autonomic Management %Overhead vs Mbps without Autonomic Management

• Overhead = Abs (Time required with and without Autonomic Data Streaming)

% O

verh

ead

on th

e Si

mu

0

5

10Observations• Autonomic Data Streaming is

slightly costly at the start

• Without Autonomic Managementoverhead grows until about 10%

Data Generation Rate (Mbps)0 20 40 60 80 100 120 140 160• With Autonomic Management

overhead is limited to 5%

• Minimize the overhead on the executing simulation.

• Maximum data to be streamed from the simulation.

Page 45: Autonomic Grid Computing: Concepts, Infrastructure and Applications

45

Self Healing behavior

Grid Middleware and Logistical Networking*

backbone

ADSSSimulation

ADSS DTSBMS DASDSSPPPL, NJ

Sample Rule for Self Healing

IF bufferUsage > 90%

THEN DSS at ORNL

ELSE DSS at PPPL

panc

y

80

100

120

DSS

(MB

)

Data Sent to Local DSS (at ORNL) vs Simulation Time(sec)% Buffer Occupancy vs Simulation Time (sec)

DSS

ORNL, TN

• ADSS continuously observes the buffer occupancy to

id i h

Buffer full Local Storage Service Triggered

Simulation Time(sec)0 500 1000 1500 2000

% B

uffe

r Occ

u

0

20

40

60

Dat

a Se

nt to

Loc

al

Buffer full second timeLocal Storage Service Triggered

consider a switch.• Avoid writing to disk at the

simulation end.• Temporarily switch to a Data

Storage Service which is faster such as ORNL.

Heuristic (Rule Based) vs. Control Based Adaptations in ADSS

ancy 60

70

80

90

100

ancy

60

70

80

90

100

% Buffer Vacancy vs TimeMean % Buffer Vacancy

• %Buffer vacancy refers to the empty space in the buffer%Buffer Vacancy using heuristically based rules. %Buffer Vacancy using control based self management.

Time(sec)0 100 200 300 400 500 600 700 800 900 1000

% B

uffe

r Vac

a

0

10

20

30

40

50

% Buffer Vacancy vs TimeMean %Buffer VacancyController Interval

Time (sec)0 100 200 300 400 500 600 700 800 900 1000

% B

uffe

r Vac

a

0

10

20

30

40

50

y p y p– Higher buffer vacancy leads to reduced overheads and data loss.

• Pure reactive scheme was based on heuristics and the element manager was not aware of the current and future data generation rate and the network bandwidth.

– Average buffer vacancy in this case was around 16%• When model-based controller was used in conjunction with the “Element

Manager” for rule based adaptations average buffer vacancy was around 75%.

Page 46: Autonomic Grid Computing: Concepts, Infrastructure and Applications

46

Overhead of the Self-Managing Framework

• Overheads of the framework primarily due to two factors.– Activities of the controller during a controller interval.– Overhead of data streaming on the simulation.

• Overhead due to controller activities– Controller interval of 80 seconds

• Average controller decision-time is 2.5% at start reduced to 0.15% due to local search methods used.

• Network measurement cost was 18.8 sec (23.5%).• Operating cost of the BMS and DTS was 0.2 sec (0.25%) and 18.8 sec (23.5%).• Rule execution for triggering adaptations required less than 0.01 sec.

• Overhead of data streaming– (T’s - Ts)/Ts where T’s and Ts denote the simulation execution time with and

without data streaming respectively.– %Overhead of data streaming on the GTC simulation was less than 9% for

16-64 processors.– %Overhead reduced to about 5% for 128-256 processors.

Outline

• Pervasive Grid Environments - Unprecedented Opportunities

• Pervasive Grid Environments - Unprecedented Challenges,Pervasive Grid Environments Unprecedented Challenges, Opportunities

• Autonomic Grid Computing

• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid Environments

• An Illustrative Application

• Concluding Remarks

Page 47: Autonomic Grid Computing: Concepts, Infrastructure and Applications

47

Conclusion

• Pervasive Grid Environments– Unprecedented opportunity

• can enable a new generation of knowledge-based, data and information driven context-aware computationally intensive pervasive applicationsdriven, context aware, computationally intensive, pervasive applications

– Unprecedented research challenges• scale, complexity, heterogeneity, dynamism, reliability, uncertainty, …• applications, algorithms, measurements, data/information, software

• Autonomic Grid Computing– Addressing the complexity of pervasive Grid environments

• Project AutoMate: Autonomic Computational Science on the GridGrid– Semantic + Autonomics – Accord, Rudder/Comet, Meteor, Squid, Topos, …