autonomic grid computing: concepts, infrastructure and applications
TRANSCRIPT
1
Autonomic Grid Computing: Concepts, Infrastructure and Applications
The Applied Software Systems LaboratoryECE/CAIP, Rutgers University
htt // i t d /TASSLhttp://www.caip.rutgers.edu/TASSL
(Ack: NSF, DoE, NIH)
Outline
• Pervasive Grid Environments - Unprecedented Opportunities
• Pervasive Grid Environments - Unprecedented Challenges
• Autonomic Grid Computing
• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid EnvironmentsAutonomic Applications in Pervasive Grid Environments
• An Illustrative Application
• Concluding Remarks
2
Grid Computing – The Hype!
Grid ComputingBy M. Mitchell WaldropMay 2002
Hook enough computers together and what do you get? A new kind ofutility that offers supercomputer processing on tap.
Is Internet history about to repeat itself?
Defining Grid Computing …
• “… a concept, a network, a work in progress, part hype and part reality, and it’s increasingly capturing the attention of the computing community …” A. Applewhite, IEEE DS-Online
• “ grids are networks for computation – they are thinking number-crunching entities… grids are networks for computation they are thinking, number crunching entities. Like a decentralized nervous system, grids consist of high-end computers, servers, workstations, storage systems, and databases that work in tandem across private and public networks …” O. Malik, Red Herring
• “… a kind of hyper network that links computers and data storage owned by different groups so that they can share computing power…” USA Today
• ‘The Matrix’ crossed with ‘Minority Report’ … D. Metcalfe (quoted in Newsweek)• “… use clusters of personal computers, servers or other machines. They link together
to tackle complex calculations In part grid computing lets companies harness theirto tackle complex calculations. In part, grid computing lets companies harness their unused computing power, or processing cycles, to create a type of supercomputer …”J. Bonasia, Investor’s Business Daily
• “… grid computing links far-flung computers, databases, and scientific instruments over the public internet or a virtual private network and promises IT power on demand…… All a user has to do is submit a calculation to a network of computers linked by grid-computing middleware. The middleware polls a directory of available machines to see which have the capacity to handle the request fastest…” A. Ricadela, Information Week
3
The Grid Vision
• Imagine a world– in which computational power (resources, services, data, etc.)
is as readily available as electrical powery p– in which computational services make this power available to
users with differing levels of expertise in diverse areas– in which these services can interact to perform specified tasks
efficiently and securely with minimum of human intervention• on-demand, ubiquitous access to computing, data, and services• new capabilities constructed dynamically and transparently from
distributed services– a large part of this vision was originally proposed by Fenando
Corbato (The Multics Project, 1965, www.multicians.org)
Grid Idea By A Simple Analogy
Some power stations dispersedOne consumer dispersed everywhere produce the electrical power
The produced power is
distributed over a power network
One consumer wants to access to
that power Now the user is able to access to
the power gridHe/she comes to an agreement with the
electrical societyThe electrical
society provides for a new socket in which the user
can plug
• The user:– Does not need to know anything about what stays beyond the socket.– Can absorb all the power he wants according to the agreement
• The power society– Can modify production technologies at any moment– Manages the power network as it wants– Defines terms and conditions of the agreement
Ack. F. Scibilia
4
In the same way . . .
Some computing farms produce the computing
Computing power is made available over
the Internet
One user wants to access to intensive
• The user:
the computing power
computational power
He/she comes to an agreement with
some society that offers grid services
Now the user accesses to grid facilities as a grid
user
The society will provide for grid facilities allowing the user to
access to its grid resources and providing for proper tools• The user:
– Does not need to know what stays beyond its user interface– Can access to a massive amount of computational power through a simple
terminal• The society:
– Can extend grid facilities at any moment– Manages the architecture of the grid– Defines policies and rules for accessing to grid resources
Ack. F. Scibilia
What about Grid Computing
Grid Computing paradigm is an emerging way of thinking distributed environments in a global scale infrastructure to:
• Share data• Distribute computation• Coordinate works• Access to remote
instrumentation
infrastructure to:
Ack. F. Scibilia
5
Key Enablers of Grid Computing - Exponentials
• Network vs. computer performance– Computer speed doubles every 18
monthsS– Storage density doubles every 12 months
– Network speed doubles every 9 months
– Difference = order of magnitude per 5 years
• 1986 to 2000– Computers: x 500
Scientific American (Jan-2001)– Computers: x 500– Networks: x 340,000
• 2001 to 2010– Computers: x 60– Networks: x 4000
“When the network is as fast as the computer's internal links, the machine disintegrates across
the net into a set of special purpose appliances”
(George Gilder)
Ack: I. Foster
Why Computing Grids now?
• Because the amount of computational power needed by many applications is getting very huge
Thousands of CPUs working at the same time
on the same task
huge
• Because the amount of data requires massive and complex distributed storage systems
• To make easier the cooperation of people and resources belonging to different organizations
From hundreds of Gigabytes to Petabytes (1015) produced
by the same application.
People of several organizations working together to achieve a
common goal
• To access to particular instrumentation that is not easily reachable in a different way
• Because it is the next of step in the evolution of distribution of computation
Because it cannot be moved or replicated or its cost is too much
expensive.
To create a marketplace of computational power and storage over the Internet
Ack. F. Scibilia
6
Who is interested in Grids?
Research community, to carry out important results from experiments that involve many
d l d iand many people and massive amounts of resources
Enterprises that can have huge computation without the need for
extending their current informatics infrastructure
Businesses, which can provide for computational power and data storage against a contract or for rental
Ack. F. Scibilia
Properties of Grids
• Transparency– The complexity of the Grid architecture is hidden to the final user– The user must be able to use a Grid as it was a unique virtual supercomputerThe user must be able to use a Grid as it was a unique virtual supercomputer– Resources must be accessible setting their location apart
• Openness– Each subcomponent of the Grid is accessible independently of the other
components• Heterogeneity
– Grids are composed by several and different resources• Scalability• Scalability
– Resources can be added and removed from the Grid dynamically• Fault Tolerance
– Grids must be able to work even if a component fails or a system crashes• Concurrency
– Different processes on different nodes must be able to work at the same time
Ack. F. Scibilia
7
Challenged Issues in Grids (i)
• Security– Authentication and authorization of users– Confidentiality and not repudiationConfidentiality and not repudiation
• Information Services– To discover and monitor Grid resource– To check for health-status of resources– As basis for decision making processes
• File Management– Creation, modification and deletion of files– Replication of files to improve access performances– Ability to access to files without the need to move tham locally to the
code• Administration
– Systems to administer Grid resource respecting local administration policies
Ack. F. Scibilia
Challenged Issues in Grids (ii)
• Resource Brokering– To schedule tasks across different resources
To make optimal or s boptimal decisions– To make optimal or suboptimal decisions– To reserve (in the future) resources and network bandwidth
• Naming services– To name resources in un unambiguous way in the Grid scope
• Friendly User Interfaces– Because most of Grid users have nothing to do with computing
science (physicians chemistries )science (physicians, chemistries . . .)– Graphical User Interfaces (GUIs)– Grid Portals (very similar to classical Web Portals)– Command Line Interfaces (CLIs) for experts
Ack. F. Scibilia
8
The Grid
“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”
1. Enable integration of distributed resources2. Using general-purpose protocols & infrastructure3. To achieve better-than-best-effort service
Ack: I. Foster
Virtual Organizations (VOs)
• A Virtual Organization is a collection of people and resources that work in a coordinated way to achieve a common goal
• To use Grid facilities, any user MUST subscribe to a Virtual OrganizationTo use Grid facilities, any user MUST subscribe to a Virtual Organization as member
• Each people or resource can be member of more VOs at the same time• Each VO can contain people or resources belonging to different
administration domains
Ack. F. Scibilia
9
Virtual Laboratory
• A new way of cooperating in experimentsexperiments
• A platform that allow scientists to work together on in the same “Virtual” Laboratory Grid Infrastructre
People
Devices
?? Instruments
• Strictly correlated to Grids and Virtual Organizations
Computing resourcesData
Ack. F. Scibilia
Profound Technical Challenges
How do we, in dynamic, scalable, multi-institutional, computationally & data-rich settings:
• Negotiate & manage trust• Access & integrate data• Construct & reuse workflows• Plan complex computations• Detect & recover from failures• Capture & share knowledge
R t & f li i
• Support collaborative work• Define primitive protocols• Build reusable software• Package & deliver software• Deploy & operate services• Operate infrastructure
U d i f t t• Represent & enforce policies• Achieve end-to-end QoX• Move data rapidly & reliably
• Upgrade infrastructure• Perform troubleshooting• Etc., etc., etc.
Ack: I. Foster
10
Globus Alliance
• The Globus Alliance– Is a community of people and organizations involved in projection and
development of Grid technologiesf f– University of Illinois, Argonne National Laboratory, University of Edinburgh,
EPCC, etc…
• The Globus Toolkit (GT)– It is a standard de facto– It is a bag of services– At its fourth release (GT4)– Now adopts Web Services interfacesp
• The Global Grid Forum– It is a forum of grid researchers– Works to define standards and protocols on grid technologies– It is divided in Working Groups (WGs)– http://www.ggf.org
Ack. F. Scibilia
Globus Services
P WS
Delegation Data Replication Grid Telecontrol Protocol
Community Scheduler Framework
WebMDS Python WS Core
Credential Management
Pre-WS Authentication Authorization
GridFTP
Replica Location
Pre-WS Grid Resource Allocation
& Management
Monitoring & Discovery (MDS2)
eXtensible IO
C Common Libraries
Authentication Authorization
Pre-WS Authentication Authorization
Reliable File Transfer
OGSA-DAI
Grid Resource Allocation & Management
Workspace Management
Trigger
Index
Java WS Core
C WS Core
WS Components
Non-WS ComponentsManagement p
Security Data Management
Execution Management
Information Services
Common Runtime
eXtensible IO (XIO)
Components
Core GT Component: public interfaces frozen between incremental releases; best effort support
Contribution/Tech Preview: public interfaces may change between incremental releases
Deprecated Component: not supported; will be dropped in a future releaseAck. F. Scibilia
11
Hourglass Reference Model
• Fabric layer:– Manages resources locally
• Connectivity– Network communications (IP DNS etc )
Application
Network communications (IP, DNS etc.)– Security: authentication, authorization, certification– Single Sign On
• Resource– Allocation, reservation and monitoring of resources– Data access and transport– Gathering of information on resources
• Collective– View of services as collections– Discovery and allocation
Connectivity
Resource
Collective
sco e y a d a ocat o– Replica and catalogue of data– Management of workflow
• Application– User applications– Tools and interfaces
Fabric
Ack. F. Scibilia
Application Areas (i)
• Physicical Science Applications– GryPhiN, http://www.gryphin.org/
Particle Ph sics DataGrid (PPDG) http //grid fnal go /ppdg/– Particle Physics DataGrid (PPDG), http://grid.fnal.gov/ppdg/– GridPP, http://www.gridpp.ac.uk/– AstroGrid, http://www.astrogrid.org/
• Life Science Applications– Protein Data Bank (PDB), http://www.rcsb.org/pdb/Welcome.do
Biomedical Informatics Research Network (BIRN)– Biomedical Informatics Research Network (BIRN), http://www.nbirn.net/
– Telemicroscopy, http://ncmir.ucsd.edu/– myGrid, http://www.mygrid.org.uk/
Ack. F. Scibilia
12
Application Areas (ii)
• Engineering Oriented Applications– NASA Information Power Grid (IPG), http://www.ipg.nasa.gov/
Grid Enabled Optimi ation and Design Search for Engineering– Grid Enabled Optimization and Design Search for Engineering (GEODISE), http://www.geodise.org/
• Commercial Applications– Butterfly Grid, http://www.butterfly.net/– Everquest, http://www.everquest.com/
• E-Utility– ClimatePrediction experiment, http://www.climateprediction.net/
Ack. F. Scibilia
An Example: SETI@Home
• The SETI@Home project– Searches for Extra Terrestrial Intelligence
(SETI)C ll ti l f i i f• Collecting samples of microwaves coming from the Universe through a telescope
• Scheduling tasks spread over Grid nodes to analyse these samples
– Uses desktop computers as Grid nodes– Working nodes are dynamically added and
removed to the grid– The owner of the desktop machine decides
how contribute to the project offering its j gcomputational power
• To contribute to the project– http://setiathome.berkeley.edu/– Download and install the client– Your machine will work as a Grid node when
is idle (in place of your screensaver)
Ack. F. Scibilia
13
A Global Grid Community
Ack: I. Foster
The Grid
“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”
The original Grid concept has moved on!
Ack: I. Foster
14
Emerging Pervasive Information/Computation Infrastructures
• Explosive growth in computation, communication, information and integration technologies– Computing, communication, data is ubiquitousComputing, communication, data is ubiquitous
• Pervasive ad hoc “anytime-anywhere” access environments– Ubiquitous access to information – Peers capable of producing/consuming/processing information at
different levels and granularities– Embedded devices in clothes, phones, cars, mile-markers, traffic
lights, lamp posts, medical instruments …
• “On demand” computational/storage resources, services
Pervasive Computational Ecosystems and Dynamic Information Driven Applications
Resources discovered, negotiated, co-allocated on-the-fly. components
deployed
Experts query, configure resources
Computers, Storage,Instruments, ...
Experts interact and collaborate using ubiquitous and
pervasive portals
Applications& Services
Model A
Model BLaptop
PDA
ComputerScientist
Scientist
Resources
Components write into the archive
Experts monitor/interact with/interrogate/steer models (“what if” scenarios,…). Application
notifies experts of interesting events.
Components dynamically composed.
“WebServices” discovered &
invoked.
Data Archive &Sensors
DataArchives
Sensors, Non-Traditional Data
Sources
Experts mine archive, match
real-time data with history
Real-time data assimilation/injection
(sensors, instruments, experiments, data
archives),
Automated mining & matching
15
Pervasive Grid Environments - Unprecedented Opportunities• Pervasive Grids Environments
– Seamless, secure, on-demand access to and aggregation of, geographically distributed computing, communication and information resources
C t t k d t hi i t t b t i i t• Computers, networks, data archives, instruments, observatories, experiments, sensors/actuators, ambient information, etc.
– Context, content, capability, capacity awareness– Ubiquity and mobility
• Knowledge-based, information/data-driven, context/content-aware computationally intensive, pervasive applications– Symbiotically and opportunistically combine services/computations, real-
time information, experiments, observations, and to manage, control, predict, adapt, optimize, …predict, adapt, optimize, …
• Crisis management, monitor and predict natural phenomenon, monitor and manage engineering systems, optimize business processes
• A new paradigm ?– seamless access
• resources, services, data, information, expertise, …– seamless aggregation– seamless (opportunistic) interactions/couplings
Information-driven Management of Subsurface Geosystems: The Instrumented Oil Field (with UT-CSM, UT-IG, OSU, UMD, ANL)
Detect and track changes in data during production.Invert data for reservoir properties.Detect and track reservoir changes.
Assimilate data & reservoir properties intothe evolving reservoir model.
Use simulation and optimization to guide future production.
Data Driven
ModelDriven
16
LandfillsLandfills OilfieldsOilfieldsVision: Diverse Geosystems – Similar Solutions
ModelsModels SimulationSimulation
DataDataControlControlUndergroundPollution
UndergroundPollution
UnderseaReservoirsUnderseaReservoirs
Management of the Ruby Gulch Waste Repository (with UT-CSM, INL, OU)
• Ruby Gulch Waste Repository/Gilt Edge Mine, South Dakota – ~ 20 million cubic yard of– ~ 20 million cubic yard of
waste rock– AMD (acid mine drainage)
impacting drinking water supplies
• Monitoring System– Multi electrode resistivity system
(523)
– Flowmeter at bottom of dump– Weather-station– Manually sampled chemical/air
ports in wells– Approx 40K measurements/day
(523)• One data point every 2.4
seconds from any 4 electrodes – Temperature & Moisture sensors
in four wells
“Towards Dynamic Data-Driven Management of the Ruby Gulch Waste Repository,” M. Parashar, et al, DDDAS Workshop, ICCS 2006, Reading, UK, LNCS, Springer Verlag, Vol. 3993, pp. 384 – 392, May 2006.
17
Data-Driven Forest Fire Simulation (U of AZ)
• Predict the behavior and spread of wildfires (intensity, propagation speed and direction, modes of spread) – based on both dynamic and
static environmental and vegetation conditions
– factors include fuel characteristics and configurations, chemical reactions balancesreactions, balances between different modes of hear transfer, topography, and fire/atmosphere interactions.
“Self-Optimizing of Large Scale Wild Fire Simulations,” J. Yang*, H. Chen*, S. Hariri and M. Parashar, Proceedings of the 5th International Conference on Computational Science (ICCS 2005), Atlanta, GA, USA, Springer-Verlag, May 2005.
Many Application Areas ….
• Hazard prevention, mitigation and response– Earthquakes, hurricanes, tornados, wild fires, floods, landslides, tsunamis, terrorist
attacks• Critical infrastructure systems
– Condition monitoring and prediction of future capability• Transportation of humans and goods
– Safe, speedy, and cost effective transportation networks and vehicles (air, ground, space)
• Energy and environment– Safe and efficient power grids, safe and efficient operation of regional collections
of buildings• Health
– Reliable and cost effective health care systems with improved outcomesE t i id d i i ki• Enterprise-wide decision making
– Coordination of dynamic distributed decisions for supply chains under uncertainty• Next generation communication systems
– Reliable wireless networks for homes and businesses• … … … …
• Report of the Workshop on Dynamic Data Driven Applications Systems, F. Darema et al., March 2006, www.dddas.org
Source: M. Rotea, NSF
18
Outline
• Pervasive Grid Environments - Unprecedented Opportunities
• Pervasive Grid Environments - Unprecedented ChallengesPervasive Grid Environments Unprecedented Challenges– System, Information, Application Uncertainty
• Autonomic Grid Computing
• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid Environments
• An Illustrative Application
• Concluding Remarks
Pervasive Grid Applications – Unprecedented Challenges: Uncertainty
• System Uncertainty– Very large scales– Ad hoc structures/behaviors
p2p hierarchical etc architect res
• Information Uncertainty– Availability, resolution, quality of
information– Devices capability, operation, • p2p, hierarchical, etc, architectures
– Dynamic• entities join, leave, move, change
behavior– Heterogeneous
• capability, connectivity, reliability, guarantees, QoS
– Lack of guarantees• components, communication
p y, p ,calibration
– Trust in data, data models – Semantics
• Application Uncertainty– Dynamic behaviors
• space-time adaptivityp– Lack of common/complete
knowledge (LOCK)• number, type, location, availability,
connectivity, protocols, semantics, etc.
– Dynamic and complex couplings• multi-physics, multi-model, multi-
resolution, ….– Dynamic and complex (ad hoc,
opportunistic) interactions– Software/systems engineering
issues• Emergent rather than by design
19
Pervasive Grid Computing – Research Issues, Opportunities• Programming systems/models for data integration and runtime self-
management– components and compositions capable of adapting behavior, interactions
and informationcorrectness consistency performance quality-of-service constraints– correctness, consistency, performance, quality-of-service constraints
• Content-based asynchronous and decentralized discovery and access services– semantics, metadata definition, indexing, querying, notification
• Data management mechanisms for data acquisition and transport with real time, space and data quality constraints– high data volumes/rates heterogeneous data qualities sourceshigh data volumes/rates, heterogeneous data qualities, sources – in-network aggregation, integration, assimilation, caching
• Runtime execution services that guarantee correct, reliable execution with predictable and controllable response time– data assimilation, injection, adaptation
• Security, trust, access control, data provenance, audit trails, accounting
Outline
• Pervasive Grid Environments - Unprecedented Opportunities
• Pervasive Grid Environments - Unprecedented Challenges
• Autonomic Grid Computing
• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid EnvironmentsAutonomic Applications in Pervasive Grid Environments
• An Illustrative Application
• Concluding Remarks
20
Integrating Biology and Information Technology: The Autonomic Computing Metaphor
• Current programming paradigms, methods, management tools are inadequate to handle the scale, complexity, dynamism and heterogeneity of emerging systems
• Nature has evolved to cope with scale, complexity, heterogeneity, dynamism and unpredictability, lack of guarantees– self configuring, self adapting, self optimizing, self healing, self
protecting, highly decentralized, heterogeneous architectures that work !!!
• Goal of autonomic computing is to build a self-managing system that addresses these challenges using high level guidance
“Autonomic Computing: An Overview,” M. Parashar, and S. Hariri, Hot Topics, Lecture Notes in Computer Science, Springer Verlag, Vol. 3566, pp. 247-259, 2005.
Adaptive Biological Systems
• The body’s internal mechanisms continuously work together to maintain essential variables within physiologicalessential variables within physiological limits that define the viability zone
• Two important observations:– The goal of the adaptive behavior is
directly linked with the survivability– If the external or internal environment
h th t t id itpushes the system outside itsphysiological equilibrium state the systemwill always work towards coming back tothe original equilibrium state
21
Ashby’s Ultrastable System
Environment Essential Variables
Motorchannels
Sensorchannels
Reacting Part R Step Mechanisms/Input Parameter S
Autonomic Computing Characteristics (IBM)
By IBM
22
Autonomic Computing Architecture
• Autonomic elements (components/services)– Responsible for policy-driven self-management of individual
components
• Relationships among autonomic elements – Based on agreements established/maintained by autonomic
elements– Governed by policies– Give rise to resiliency, robustness, self-management of
systemsystem
Autonomic Computing – Conceptual Architecture (IBM)
23
Autonomic Elements: Structure
• Fundamental atom of the architecture
Ack. IBM
– Managed element(s)• Database, storage system,
server, software app, etc.– Plus one autonomic manager
• Responsible for:– Providing its service– Managing its own behavior in ES
Monitor
Analyze
Execute
Plan
Knowledge
Autonomic Manager
g gaccordance with policies
– Interacting with other autonomic elements
An Autonomic Element
Managed ElementES
Autonomic Elements: Interactions
• Relationships– Dynamic, ephemeral,
Ack. IBM
y , p ,opportunistic
– Defined by rules and constraints
– Formed by agreement• May be negotiated
Full spectrum– Full spectrum• Peer-to-peer• Hierarchical
– Subject to policies
24
Autonomic Systems: Composition of Autonomic Elements
Ack. IBM
ReputationAuthority
Network
EventCorrelator
Monitor
WorkloadManager
Server
Server
Negotiator
Broker
ProvisionerRegistry
BrokerSentinel
Arbiter PlannerWorkloadManager
Database
Network
Registry
Database
MonitorServer
StorageStorage
Storage
Sentinel
Monitor
Aggregator
Monitor
Database
Autonomic Computing: Research Issues and Challenges
Scale, Complexity, Dynamism, Heterogeneity, Unreliability, Uncertainty
• Defining autonomic elementsg– Programming paradigms and development models/frameworks
• Autonomic element definition and construction• Rule definition, representation, and enforcement
• Constructing autonomic systems/applications– Composition, coordination, interactions models and infrastructures
• Dynamic (rule-based) configuration, execution and optimization• Dynamic (opportunistic) interactions, coordination, negotiation
• Execution and managementExecution and management– Runtime and middleware services
• Discovery, Coordination, Messaging, Security, Management, …– Security, protection– Fault tolerance, reliability, availability, …
• Policies, Learning, AI, …
25
Outline
• Pervasive Grid Environments - Unprecedented Opportunities
• Pervasive Grid Environments - Unprecedented Challenges,Pervasive Grid Environments Unprecedented Challenges, Opportunities
• Autonomic Grid Computing
• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid Environments
• An Illustrative Application
• Concluding Remarks
Applications
Programming System(Programming Model + Abstract Machine)
Applications
Programming System(Programming Model + Abstract Machine)
Programming Pervasive Grid Systems
– Programming System• programming model,
languages/abstraction – syntax + semantics
Grid Middleware Infrastructure
(Programming Model Abstract Machine)
Virtualization(Create and manage VOs and VMs)
Abstraction(Support abstract behaviors/assumptions)
Virtual Machine Virtual Machine Virtual Machine
Grid Middleware Infrastructure
(Programming Model Abstract Machine)
Virtualization(Create and manage VOs and VMs)
Abstraction(Support abstract behaviors/assumptions)
Virtualization(Create and manage VOs and VMs)
Abstraction(Support abstract behaviors/assumptions)
Virtual Machine Virtual Machine Virtual Machine
– entities, operations, rules of composition, models of coordination/communication
• abstract machine, execution context and assumptions
• infrastructure, middleware and runtime
– Hide v/s expose uncertainty?
Organization Organization Organization
Virtual Organization
Organization Organization OrganizationOrganizationOrganization OrganizationOrganization OrganizationOrganization
Virtual Organization Hide v/s expose uncertainty?
• virtual homogeneity, ease of programming
• robustness • cross-layer adaptation and
optimization– the inverted stack …
“Conceptual and Implementation Models for the Grid,” M. Parashar and J.C. Browne, Proceedings of the IEEE, Special Issue on Grid Computing, IEEE Press, Vol. 19, No. 3, March 2005.
26
Autonomic Grid Computing – A Holistic Approach
• Computing has evolved and matured to provide specialized solutions to satisfy relatively narrow and well defined requirements in isolation– performance, security, dependability, reliability, availability, throughput,
pervasive/amorphous automation reasoning etcpervasive/amorphous, automation, reasoning, etc.
• In case of pervasive Grid applications/environments, requirements, objectives, execution contexts are dynamic and not known a priori– requirements, objectives and choice of specific solutions (algorithms,
behaviors, interactions, etc.) depend on runtime state, context, and content– applications should be aware of changing requirements and executions
contexts and to respond to these changes are runtime
• Autonomic Grid computing - systems/applications that self-manage – use appropriate solutions based on current state/context/content and
based on specified policies– address uncertainty at multiple levels– asynchronous algorithms, decoupled interactions/coordination,
content-based substrates
Project AutoMate: Enabling Autonomic Applications
Sesame/D
Autonomic Grid Applications
Programming SystemAutonomic Components, Dynamic Composition,
Opportunistic Interactions, Collaborative Monitoring/Controly
Acc
ord
Prog
ram
min
gFr
amew
ork
Rudder
Coordination
Middlew
are
DAIS
Protection S
ervice
Decentralized Coordination EngineAgent Framework,
Decentralized Reactive Tuple Space
Semantic Middleware ServicesContent-based Discovery, Associative Messaging
Content OverlayContent-based Routing Engine,
Self-Organizing Overlay
Ont
olog
y, T
axon
omy
Met
eor/S
quid
Con
tent
-bas
edM
iddl
ewar
e
• Conceptual models and implementation architectures – programming systems based on popular programming models
• object, component and service based prototypes– content-based coordination and messaging middleware– amorphous and emergent overlays
• http://automate.rutgers.edu
27
Project AutoMate: Core Components
• Accord – A Programming System for Autonomic Grid ApplicationsS id D t li d I f ti Di d C t t• Squid – Decentralized Information Discovery and Content-based Routing
• Meteor – Content-based Interactions/Messaging Middleware• Rudder/Comet – Decentralized Coordination Middleware• ACE – Autonomic Composition Engine• SESAME – Context-Aware Access ManagementSESAME Context Aware Access Management• DAIS – Cooperative Protection against Network Attacks
• More information/Papers – http://automate.rutgers.edu“AutoMate: Enabling Autonomic Grid Applications,” M. Parashar et al, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Special Issue on Autonomic Computing, Kluwer Academic Publishers. Vol. 9, No. 2, pp. 161 – 174, 2006.
Accord: Rule-Based Programming System
• Accord is a programming system which supports the development of autonomic applications.– Enables definition of autonomic components with programmable
behaviors and interactions.
– Enables runtime composition and autonomic management of these components using dynamically defined rules.
• Dynamic specification of adaptation behaviors using rules.
• Enforcement of adaptation behaviors by invoking sensors and actuators.p y g
• Runtime conflict detection and resolution.
• 3 Prototypes: Object-based, Components-based (CCA), Service-based (Web Service)“Accord: A Programming Framework for Autonomic Applications,” H. Liu* and M. Parashar, IEEE Transactions on Systems, Man and Cybernetics, Special Issue on Engineering Autonomic Systems, IEEE Press, Vol. 36, No 3, pp. 341 – 352, 2006.
28
Autonomic Element/Behaviors in Accord
Element Manager
Functional PortComputational
Element Manager
Functional PortComputational Application strategiesApplication strategies
Autonomic Element
Control Port
Operational Port
ComputationalElement
Autonomic Element
Control Port
Operational Port
ComputationalElement
E t A t tOther
E t A t tOther
Application workflow
Composition manager
Application strategiesApplication requirements
Interaction rules
Interaction rules
Interaction rules
Interaction rules
Behavior rules
Behavior rules
Behavior rules
Behavior rules
Application workflow
Composition manager
Application strategiesApplication requirements
Interaction rules
Interaction rules
Interaction rules
Interaction rules
Behavior rules
Behavior rules
Behavior rules
Behavior rules
Element Manager
Event generation
Actuatorinvocation
Interfaceinvocation
Internalstate
Contextualstate
Rules
Element Manager
Event generation
Actuatorinvocation
Interfaceinvocation
Internalstate
Contextualstate
Rules
LLC Based Self Management within Accord
Self-Managing Element OptimizationFunction
LLC Controller
El t/S i M t d ith LLC
ComputationalElement
LLC Controller
Element Manager
InternalState
ContextualState
Element Manager
Advice Computational Element Model
• Element/Service Managers are augmented with LLC Controllers– monitors state/execution context of elements– enforces adaptation actions determined by the controller– augment human defined rules
29
The Self-managing Shock Simulation: Self-optimizing Via Component Replacement
The Self-managing Shock Simulation: Self-optimizing Via Component Adaptation
30
The Self-managing Shock Simulation: Self-healing Via Component Replacement
Decentralized (Decoupled/Asynchronous) Content-based Middleware Services
Pervasive Grid Environment
Self-Managing Overlay (SquidTON)
31
SquidTON: Reliable & Fault Tolerant Overlay
• Pervasive Grid systems are dynamic, with nodes joining, leaving and failing relatively often
• => data loss and temporarily inconsistent overlay structure• => the system cannot offer guarantees
B ild d d i t th l t k– Build redundancy into the overlay network– Replicate the data
• SquidTON = Squid Two-tier Overlay Network– Consecutive nodes form unstructured groups, and at the same time are connected
by a global structured overlay (e.g. Chord)– Data is replicated in the group
2522
249
1710
22
1710
33
82
86
9194
128132150
157
141
231
249240
185
192
Groups of nodes Group identifier
250231
249 240
185192
128
132
150
157141
161
8286
9194
3325
22
102
41
Content Descriptors and Information Space
• Data element = a piece of information that is indexed and discovered– Data, documents, resources, services, metadata, messages, events, etc.
• Each data element has a set of keywords associated with it, which ydescribe its content => data elements form a keyword space
2D keyword space for a P2P file sharing system
network
Documentkeyw
ord2
computer keyword1 comp*
Complex query(comp*, *)
keyword1
keyw
ord2
2D keyword space for a P2P file sharing system
network
Documentkeyw
ord2
computer keyword1 comp*
Complex query(comp*, *)
keyword1
keyw
ord2
network
Documentkeyw
ord2
computer keyword1
network
Documentkeyw
ord2
computer keyword1computer keyword1 comp*
Complex query(comp*, *)
keyword1
keyw
ord2
comp*
Complex query(comp*, *)
keyword1
keyw
ord2
comp*
Complex query(comp*, *)
keyword1
keyw
ord2
3D keyword space for resource sharing, using the attributes: storage space, base bandwidth and cost
Storage space (MB)
Bas
e ba
ndw
idth
(Mbp
s)
Computational resource
Cost
30
100
9
Storage space (MB)
Bas
e ba
ndw
idth
(M
bps)
Cost
2025
10
Complex query(10, 20-25, *)
3D keyword space for resource sharing, using the attributes: storage space, base bandwidth and cost
Storage space (MB)
Bas
e ba
ndw
idth
(Mbp
s)
Computational resource
Cost
30
100
9
Storage space (MB)
Bas
e ba
ndw
idth
(M
bps)
Cost
2025
10
Complex query(10, 20-25, *)
Storage space (MB)
Bas
e ba
ndw
idth
(Mbp
s)
Computational resource
Cost
30
100
9
Storage space (MB)
Bas
e ba
ndw
idth
(Mbp
s)
Computational resource
Cost
30
100
9
Storage space (MB)
Bas
e ba
ndw
idth
(Mbp
s)
Computational resource
Cost
30
100
9
Storage space (MB)
Bas
e ba
ndw
idth
(M
bps)
Cost
2025
10
Complex query(10, 20-25, *)
Storage space (MB)
Bas
e ba
ndw
idth
(M
bps)
Cost
2025
10
Complex query(10, 20-25, *)
Storage space (MB)
Bas
e ba
ndw
idth
(M
bps)
Cost
2025
10
Complex query(10, 20-25, *)
32
Content Indexing: Hilbert SFC
• f: Nd → N, recursive generation
1
0
10 00 01 10 11
11
10
01
0000 11
01 10
0000
0010
0001
0011
0100
01010110
011110001001
101010111100
11111110
1101
1
0
10 00 01 10 11
11
10
01
0000 11
01 10
0000
0010
0001
0011
0100
01010110
011110001001
101010111100
11111110
1101
• Properties:– Digital causality– Locality preserving– Clustering
Cluster: group of cells connected by a segment of the curve
Content Indexing, Routing & Querying
itude 0
51
(*, 4)Content profile 3:
Content profile 2
(4-7,0-3)Content profile 2:
Content profile1• Demonstrated analytically and experimentally that
4
4 70
3
Generate the clusters
alt
longitude
13
33
47
51
40
Content profile 3Matching messages2
1 7
7
(2,1)Content profile 1:
p
– for large systems that queries with p% coverage will query p% of the nodes, independent on data distribution.
– the system scales with the number of nodes and data– optimization significantly reduces the number of clusters generated
and messages sent– slightly increases the number of nodes queried – only a small
number of “intermediary” nodes are involvedGenerate the clusters associated with the content profile
Rout to the nodes that store the clusters Send the results to the
requesting node
• Note:– More than one cluster are typically stored at a node– Not all clusters that are generated for a query exist in the network – SFC, clusters generation is recursive, i.e., a prefix tree (trie)
• Optimization: embed the tree into the overlay and prune nodes during construction
number of intermediary nodes are involved
33
Squid Content Routing/Discovery Engine: Experimental Evaluation
• System size: 103 to 106 nodes• Data:
– Uniformly distributed synthetic
• Experiments:– Number of clusters generated for
a queryUniformly distributed, synthetic generated data
– 4*105 CiteSeer data• Load balanced system
– Number of nodes queried• All results are plotted on a
logarithmic scale
Squid Content Routing/Discovery Engine:Optimization
• Number of clusters generated for queries with coverage 1%, 0.1%, 0.01%, with and without optimization
• The results are normalized against the clusters that the query defines onThe results are normalized against the clusters that the query defines on the curve (i.e. without optimization).
0.01
0.1
1
10
ized
num
ber
ofcl
uste
rs
0.001
0.01
0.1
1
10
ized
num
bero
fclu
ster
s
3D Uniformly distributed data 3D CiteSeer data
0.0001
0.001
System size
Nor
mal
i
Clusters 1% query 0.1% query 0.01% query
10 3 10 4 10 50.00001
0.0001
System size
Nor
mal
i
Clusters 1% query 0.1% query 0.01% query
10 3 10 4 10 5
34
Squid Content Routing/Discovery Engine – Nodes Queried
• Percentage of nodes queried for queries with coverage 1%, 0.1%, 0.01%, with and without optimization
0.1
1
10
%of
node
sque
ried
0.1
1
10
%of
node
sque
ried
0.01
System size1%1% optimization
0.1%0.1% optimization
0.01%0.01% optimization
10 3 10 4 10 5 10 6 0.01
System size10 3 10 4 10 5
1%1% optimization
0.1%0.1% optimization
0.01%0.01% optimization
3D Uniformly distributed data 3D CiteSeer data
Project Meteor: Associative Rendezvous
• Content-based decoupled interaction with programmable reactive behaviors
– Messages - (header, action, data)• Symmetric post primitive: does not
diff i i b i /d
profile credentials
message context
TTL (time to live)
Action
Header
• storedifferentiating between interest/data– Associative selection
• match between interest and data– Reactive behavior
• Execute action field upon matching
• Decentralized in-network aggregation – Tries for back-propagating and
aggregating matching data items• Supports WS Notification standard
[Data]
store• retrieve• notify• delete
Profile = list of (attribute, value) pairs:Example:<(sensor_type, temperature), (latitude, 10), (longitude, 20)>
C1
C2
post (<p1, p2>, store, data)
notify_data(C2)
notify(C2)
(1)
(3)
(2) post(<p1, *>, notify_data(C2) )
<p1, p2> <p1, *>
match
35
Heterogeneity Management
• Heterogeneity management and adaptations at AR nodes using reactive behavrio– Policy-based adaptations based on capabilities, preferences, resourcesy p p , p ,
Comet Coordination Space
• A virtual global shared-space constructed from a semantic multi-dimensional information space, which is deterministically mapped onto the system peer nodesdeterministically mapped onto the system peer nodes
• Space is associatively accessible by all system peer nodes. Access is independent of the physical locations of tuples or hosts– Tuple distribution
• A tuple/template (XML) is associated with k keywords• Squid content-based routing engine used or exact and approximateSquid content based routing engine used or exact and approximate
tuple distribution and retrieval– Transient spaces
• Enable application to explicitly exploit context locality
“COMET: A Scalable Coordination Space in Decentralized Distributed Environments,” Z. Li* and M. Parashar, Proceedings of the 2nd International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P 2005), San Diego, CA, USA, IEEE Computer Society Press, pp. 104 – 111, July 2005.
36
Supporting the Rudder Agent Framework
• Agents communication – associatively reading, writing, and extracting tuples
• Agent coordination protocolsg p– Decentralized election protocol
• Based on wait-free consensus protocols– Resilient to node/link failures
– Discovery protocol• Registry implemented using XML tuples• Element registered using Out • Element unregistered using In • Elements discovered using Rd/RdAll operation• Elements discovered using Rd/RdAll operation
– Interaction protocol• Contract-Net protocol• Two agent bargaining protocol
• Workflow engine
“Enabling Dynamic Composition and Coordination of Autonomic Applications using the Rudder Agent Framework,” Z. Li* and M. Parashar, The Knowledge Engineering Review, Cambridge University Press (also SAACS 2005).
Implementation/Deployment Overview
• Current implementation builds on JXTA– SquidTON, Squid, Comet and
• Deployments include– Campus Grid @ Rutgers– Orbit wireless testbed (400q , q ,
Meteor layers are implemented as event-driven JXTA services
Orbit wireless testbed (400 nodes)
– PlanetLab wide-area testbed• At least one node selected from
each continent
37
Outline
• Pervasive Grid Environments - Unprecedented Opportunities
• Pervasive Grid Environments - Unprecedented Challenges,Pervasive Grid Environments Unprecedented Challenges, Opportunities
• Autonomic Grid Computing
• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid Environments
• An Illustrative Application
• Concluding Remarks
The Instrumented Oil Field of the Future (UT-CSM, UT-IG, RU, OSU, UMD, ANL)
• Production of oil and gas can take advantage of installed sensors that will monitor the reservoir’s state as fluids are extracted
• Knowledge of the reservoir’s state during production can result in better engineering decisions
Detect and track changes in data during productionInvert data for reservoir propertiesDetect and track reservoir changes
Assimil t d t & s i p p ti s int
engineering decisions – economical evaluation; physical characteristics (bypassed oil, high pressure zones);
productions techniques for safe operating conditions in complex and difficult areas
Assimilate data & reservoir properties intothe evolving reservoir model
Use simulation and optimization to guide future production, future data acquisition strategy
“Application of Grid-Enabled Technologies for Solving Optimization Problems in Data-Driven Reservoir Studies,” M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz and M Wheeler, FGCS. The International Journal of Grid Computing: Theory, Methods and Applications (FGCS), Elsevier Science Publishers, Vol. 21, Issue 1, pp 19-26, 2005.
38
Effective Oil Reservoir Management: Well Placement/Configuration
• Why is it important – Better utilization/cost-effectiveness of existing reservoirs– Minimizing adverse effects to the environment
Better ManagementBad Management
Less Bypassed OilMuch Bypassed Oil
An Autonomic Well Placement/Configuration Workflow
Start Parallel Instance connects to
DISCOVERNotifies ClientsClients interact
Send guesses
OptimizationService
IPARSFactory
SPSA
VFSA
Exhaustive
DISCOVERclient
client
Generate Guesses Send GuessesStart Parallel
IPARS InstancesInstance connects to
DISCOVERClients interact
with IPARS
If guess not in DBinstantiate IPARS
with guess asparameter
MySQLDatabase
If guess in DB:send response to Clientsand get new guess fromOptimizer
Search
AutoMate Programming System/Grid Middleware
History/ Archived
Data
Sensor/ContextData
Oil prices, Weather, etc.
39
Data-Flow for Autonomic Well Placement/Configuration
Data Conversion
Data ManipulationTools
Seismic Sim
Seismic Sim
Seismic Sim
Seismic DatasetsDistributedExecution
Reservoir Datasets
Seismic Sim
Seismic Sim
Seismic Sim
Seismic Datasets
Reservoir Datasets Data ManipulationTools
Sensor Data
RD
SUM
AVGDIFF
SUM SUM
RD
……..Node 1 Node 20
DIFF DIFF DIFFTransparent Copies
Transparent Copies(one copy per node)
SUMFilters
Seismic Datasets
50.00
50.00
VisualizationPortals
Data
Context Data
AutonomicAutonomic OilOil Well Placement/Configuration
permeability Pressure contours3 wells, 2D profile
Contours of NEval(y,z,500)(10)
Requires NYxNZ (450)evaluations. Minimumappears here.
VFSA solution: “walk”: found after 20 (81) evaluations( )
40
AutonomicAutonomic OilOil Well Placement/Configuration (VFSA)
“An Reservoir Framework for the Stochastic Optimization of Well Placement,” V. Matossian, M. Parashar, W. Bangerth, H. Klie, M.F. Wheeler, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Kluwer Academic Publishers, Vol. 8, No. 4, pp 255 – 269, 2005 “Autonomic Oil Reservoir Optimization on the Grid,” V. Matossian, V. Bhat, M. Parashar, M. Peszynska, M. Sen, P. Stoffa and M. F. Wheeler, Concurrency and Computation: Practice and Experience, John Wiley and Sons, Volume 17, Issue 1, pp 1 – 26, 2005.
Wide Area Data Streaming in the Fusion Simulation Project (FSP)
GTC Runs on Teraflop/Petaflop Supercomputers Large data analysis
End-to-end system with monitoring routines
40Gbps
Data archiving
Data replication
Data replication
User monitoring
Post processing
User monitoring
Visualization
• Wide Area Data Streaming Requirements – Enable high-throughput, low latency data transfer to support near real-time
access to the data– Minimize related overhead on the executing simulation– Adapt to network conditions to maintain desired QoS– Handle network failures while eliminating data loss.
Visualization
41
Autonomic Data Streaming for Coupled Fusion Simulation Workflows
TransferSimulation AnalysisStorage
PPPL, NJNERSC, CAData Grid Middleware
& Logistical Networking
Storage
ORNL, TN
BackboneSimulation : Simulation Service
(SS)
Analysis/Viz :Data Analysis Service (DAS)
Storage : Data Storage Service (DSS)
Data TransferBuffer
Autonomic Data Streaming Service
Managed Service
Transfer : Autonomic Data Streaming Service (ADSS)
– Buffer Manager Service (BMS)
– Data Transfer Service (DTS)
g(ADSS)
• Grid-based coupled fusion simulation workflow– Run for days between ORNL (TN), NERSC (CA), PPPL (NJ) and
Rutgers (NJ) – Generates multi-terabytes of data– Data coupled, analyzed and visualized at runtime
ADSS Implementation
Buffer size Prediction
OptimizationLLC Model LLC Controller
Data Rate Prediction
Buffer size Prediction
OptimizationLLC Model LLC Controller
Data Rate Prediction
Bandwidth Measurement
Buffer Size Measurement
Data Rate Measurement
Optimization Function
Element/Service Manager
Contextual StateService State
Rule Base
Bandwidth Measurement
Buffer Size Measurement
Data Rate Measurement
Optimization Function
Element/Service Manager
Contextual StateService State
Rule Base
DTSBMS
SS
Local HD
ADSS
Rule Based Adaptation Grid Middleware {LN}
BackboneDTSBMS
SS
Local HD
ADSS
Rule Based Adaptation Grid Middleware {LN}
Backbone
42
Design of the ADSS Controller
• Dynamics of Data Streaming Model captured using a queuing model
• Key Operating parametersState variables
LLC model for the ADSS Controller
q1(k)
Queue Manager Threads
n1Data Blocks
– State variables• qi(k) : Current Average queue
size at ni– Environment Variables
• λi(k): Data generation rate into the queue at ni
• B(k) : Effective bandwidth of the network link
– Control and Decision Variables• μi(k) : Data-transfer rate over
the network (k) D t t f t t th
“m” simulation processors
“n” Data
λ1(k)
λ2(k)
λn(k)μn(k)
ω1(k)
AnalysisClusters
SimulationProcessors
ω2(k)
μ1(k)
μ2(k)
Queue Manager
Queue Manager
n2
nn
....
• ωi(k), Data-transfer rate to the hard disk
• LLC Controller Problem– Controller Aims to Maintain each
Node (ni) queue around a desired value of q*
Local HD
n Data transfer
processorsωn(k)
qn(k)
Tkkkkqkq iiiii ⋅−−⋅+=+ )))()(1()(ˆ()()1(ˆ ωμλ
)),1(()( kkk ii −= λφλ
System dynamics at each data streaming processor ni
Adaptive Data Transfer
MB
) 100
120
140
ec)80
100
120
0 2 4 6 8 10 12 14 16 18 20 22 24
Dat
a Tr
ansf
errr
ed b
y D
TS(M
0
20
40
60
80
Ban
dwid
th (M
b/se
0
20
40
60
DTS to WANDTS to LANBandwidthCongestion
• No congestion in intervals 1-9 – Data transferred over WAN
• Congested at intervals 9-19 – Controller recognizes this congestion and advises the “Element Manager” which in
turn adapts DTS to transfer data to local storage (LAN).• Adaptation continues until the network is not congested
– Data sent to the local storage by the DTS falls to zero at the 19th controller interval.
Controller Interval0 2 4 6 8 10 12 14 16 18 20 22 24
43
Adaptive Buffer Management
MS
6
7
8
c)80
100
120
Blocks 1 Block = 4MB
Congestion
Bandwidth
Uniform Buffer Management
Aggregate Buffer Management
Blo
cks
Gro
uped
by
BM
0
1
2
3
4
5
Ban
dwid
th (M
b/se
c
0
20
40
60
80Bandwidth
• Uniform buffer management is used when data generation is constant • Aggregate buffer management is triggered when congestion increases
Controller Interval0 2 4 6 8 10 12 14 16 18 20 22 24 26
Self Optimizing Behavior: Buffer Management Service (BMS)
TransferSimulation
Autonomic Data Streaming MB
/blo
ck)
12
14
Aggregate Buffer Management
Data TransferBMS
Autonomic Data Streaming
Num
ber o
f Blo
cks
Sent
(10M
2
4
6
8
10 100Mbps
Uniform Buffer Mangement 50Mbps
120Mbps
• Uniform Buffer Management : buffer divides data blocks into fixed sizes (50Mbps).
• Aggregate Buffer Management : Aggregates block of data across iterations(60-400Mbps).P i i B ff M
Simulation Time (sec)0 50 100 150 200 250 300 350 400• Priority Buffer Management :
order blocks on nature of the data.(2D and 3D data items).
• BMS adaptations based on:– Data Generation Rates– Network connectivity– Nature of data being transmitted
Sample Rule :
IF DataGenerationRate <=50 AND DataType = 2D
THEN Aggregate Buffer Management
ELSE Uniform Buffer Management
44
Adaptation of the Workflow
• Create multiple instances of the Autonomic Data Streaming Service (ADSS) t
80
100
nces4
5
– Effective Network Transfer Rate dips below the Threshold (our case around 100Mbs)
– Maximum number of instances is contained within a certain limit %
Net
wor
k th
roug
hput
0
20
40
60
Num
ber o
f AD
SS In
stan
0
1
2
3
Data Generation Rate (Mbps)0 20 40 60 80 100 120 140 160
0 0
% Network throughput vs MbpsNumber of ADSS Instances vs Mbps
TransferSimulation
ADSS-0 Data TransferBuffer
Data TransferBuffer
Data TransferBuffer
ADSS-1
ADSS-2
% Network throughput is difference between the max and current network transfer rate
Overhead of Self Optimizing in BMS
ulat
ion
15
20
%Overhead vs Mbps using Autonomic Management %Overhead vs Mbps without Autonomic Management
• Overhead = Abs (Time required with and without Autonomic Data Streaming)
% O
verh
ead
on th
e Si
mu
0
5
10Observations• Autonomic Data Streaming is
slightly costly at the start
• Without Autonomic Managementoverhead grows until about 10%
Data Generation Rate (Mbps)0 20 40 60 80 100 120 140 160• With Autonomic Management
overhead is limited to 5%
• Minimize the overhead on the executing simulation.
• Maximum data to be streamed from the simulation.
45
Self Healing behavior
Grid Middleware and Logistical Networking*
backbone
ADSSSimulation
ADSS DTSBMS DASDSSPPPL, NJ
Sample Rule for Self Healing
IF bufferUsage > 90%
THEN DSS at ORNL
ELSE DSS at PPPL
panc
y
80
100
120
DSS
(MB
)
Data Sent to Local DSS (at ORNL) vs Simulation Time(sec)% Buffer Occupancy vs Simulation Time (sec)
DSS
ORNL, TN
• ADSS continuously observes the buffer occupancy to
id i h
Buffer full Local Storage Service Triggered
Simulation Time(sec)0 500 1000 1500 2000
% B
uffe
r Occ
u
0
20
40
60
Dat
a Se
nt to
Loc
al
Buffer full second timeLocal Storage Service Triggered
consider a switch.• Avoid writing to disk at the
simulation end.• Temporarily switch to a Data
Storage Service which is faster such as ORNL.
Heuristic (Rule Based) vs. Control Based Adaptations in ADSS
ancy 60
70
80
90
100
ancy
60
70
80
90
100
% Buffer Vacancy vs TimeMean % Buffer Vacancy
• %Buffer vacancy refers to the empty space in the buffer%Buffer Vacancy using heuristically based rules. %Buffer Vacancy using control based self management.
Time(sec)0 100 200 300 400 500 600 700 800 900 1000
% B
uffe
r Vac
a
0
10
20
30
40
50
% Buffer Vacancy vs TimeMean %Buffer VacancyController Interval
Time (sec)0 100 200 300 400 500 600 700 800 900 1000
% B
uffe
r Vac
a
0
10
20
30
40
50
y p y p– Higher buffer vacancy leads to reduced overheads and data loss.
• Pure reactive scheme was based on heuristics and the element manager was not aware of the current and future data generation rate and the network bandwidth.
– Average buffer vacancy in this case was around 16%• When model-based controller was used in conjunction with the “Element
Manager” for rule based adaptations average buffer vacancy was around 75%.
46
Overhead of the Self-Managing Framework
• Overheads of the framework primarily due to two factors.– Activities of the controller during a controller interval.– Overhead of data streaming on the simulation.
• Overhead due to controller activities– Controller interval of 80 seconds
• Average controller decision-time is 2.5% at start reduced to 0.15% due to local search methods used.
• Network measurement cost was 18.8 sec (23.5%).• Operating cost of the BMS and DTS was 0.2 sec (0.25%) and 18.8 sec (23.5%).• Rule execution for triggering adaptations required less than 0.01 sec.
• Overhead of data streaming– (T’s - Ts)/Ts where T’s and Ts denote the simulation execution time with and
without data streaming respectively.– %Overhead of data streaming on the GTC simulation was less than 9% for
16-64 processors.– %Overhead reduced to about 5% for 128-256 processors.
Outline
• Pervasive Grid Environments - Unprecedented Opportunities
• Pervasive Grid Environments - Unprecedented Challenges,Pervasive Grid Environments Unprecedented Challenges, Opportunities
• Autonomic Grid Computing
• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid Environments
• An Illustrative Application
• Concluding Remarks
47
Conclusion
• Pervasive Grid Environments– Unprecedented opportunity
• can enable a new generation of knowledge-based, data and information driven context-aware computationally intensive pervasive applicationsdriven, context aware, computationally intensive, pervasive applications
– Unprecedented research challenges• scale, complexity, heterogeneity, dynamism, reliability, uncertainty, …• applications, algorithms, measurements, data/information, software
• Autonomic Grid Computing– Addressing the complexity of pervasive Grid environments
• Project AutoMate: Autonomic Computational Science on the GridGrid– Semantic + Autonomics – Accord, Rudder/Comet, Meteor, Squid, Topos, …