computing outside the box
DESCRIPTION
The past decade has seen increasingly ambitious and successful methods for outsourcing computing. Approaches such as utility computing, on-demand computing, grid computing, software as a service, and cloud computing all seek to free computer applications from the limiting confines of a single computer. Software that thus runs "outside the box" can be more powerful (think Google, TeraGrid), dynamic (think Animoto, caBIG), and collaborative (think FaceBook, myExperiment). It can also be cheaper, due to economies of scale in hardware and software. The combination of new functionality and new economics inspires new applications, reduces barriers to entry for application providers, and in general disrupts the computing ecosystem. I discuss the new applications that outside-the-box computing enables, in both business and science, and the hardware and software architectures that make these new applications possible.TRANSCRIPT
1
Ian FosterComputation Institute
Argonne National Lab & University of Chicago
3
1890
4
1953
5
“Computation may someday be organized as a public utility …
The computing utility could become the basis for a new and
important industry.”
John McCarthy (1961)
6
7
8
9
10
11I-WAY, 1995
12
The grid, 1998
“Dependable, consistent, pervasive access to resources”
Dependable: Performance and functionality guarantees
Consistent: Uniform interfaces to a wide variety of resources
Pervasive: Ability to “plug in” from anywhere
13
Application
Infrastructure
14
Application
InfrastructureService oriented infrastructure
15
Layered grid architecture
Application
Fabric“Controlling things locally”: Access to, & control of,
resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling
use
Collective“Managing multiple resources”: ubiquitous infrastructure
services
User“Specialized services”: user- or appln-specific
distributed services
InternetTransport
Application
Link
Internet Protocol
Architecture
Initially custom … later Web Services
16
17www.opensciencegrid.org
18www.opensciencegrid.org
19
Bennett Berthenthal et al., www.sidgrid.org
20Brian Tieman
21
Simplifiedexampleworkflows
Genome sequence analysis
Physicsdata
analysis
Sloan digital sky
surveywww.opensciencegrid.org
22
“Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node
IoanRaicu
23Same scenario, but with dynamic resource
provisioning
24
Data diffusion ine-wave workload: Summary
GPFS 5.70 hrs, ~8Gb/s, 1138 CPU hrs
DD+SRP 1.80 hrs, ~25Gb/s, 361 CPU hrs DD+DRP 1.86 hrs, ~24Gb/s, 253 CPU hrs
25
Application
InfrastructureService oriented infrastructure
26
ApplicationService oriented applications
InfrastructureService oriented infrastructure
27
28
ApplnService
Create
Index service
StoreRepository ServiceAdvertize
Discover
Invoke;get results
Introduce
Container
Transfer GAR
Deploy
Ohio State University and Argonne/U.Chicago
Creating Services in 2008Introduce and gRAVI
Introduce Define service Create skeleton Discover types Add operations Configure security
Grid Remote Application Virtualization Infrastructure Wrap executables
Globus
29
As of Oct19, 2008:
122 participants105 services
70 data35 analytical
30
Microarray clustering using Taverna
1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub
2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService
1. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage
Workflow in/output
caGrid services
“Shim” servicesothers
Wei Tan
31
Birmingham•
The Globus-basedLIGO data grid
Replicating >1 Terabyte/day to 8 sites
>100 million replicas so farMTBF = 1 month
LIGO Gravitational Wave Observatory
Cardiff
AEI/Golm
32
Pull “missing” files to a storage system
List of required Files
GridFTPLocalReplicaCatalog
ReplicaLocationIndex
Data Replicati
on Service
Reliable File
Transfer Service Local
ReplicaCatalog
GridFTP
Data replication service
“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005
ReplicaLocationIndex
Data MovementData Location
Data Replication
33
Hypervisor/OS Deploy hypervisor/OS
Why not leverage dynamic deployment capabilities?
Physical machineProcure hardware
VM VM Deploy virtual machine
State exposed & access uniformly at all levelsProvisioning, management, and monitoring at all levels
JVM Deploy container
DRS Deploy service GridFTP LRC
VO Services
GridFTP
34
Maybe we need to specialize further …
User
ServiceProvider
“Provide access to data D at S1, S2,
S3 with performance P”
ResourceProvider
“Provide storage with performance P1, network with
P2, …”
D
S1
S2
S3
D
S1
S2
S3Replica catalog,User-level multicast, …
D
S1
S2
S3
35Infrastructure
Applications
36
Energy
Progress of adoption
37
38US$3
39Credit: Werner Vogels
40Credit: Werner Vogels
41
Animoto EC2 image usage
Day 1 Day 8
0
4000
42
Software
Platform
Infrastructure
Saleforce.com, Google,Animoto, …, …, …caBIG, TG gateways
43
Software
Platform
Infrastructure
Saleforce.com, Google,Animoto, …, …, …caBIG, TG gateways
Amazon, GoGrid, Sun,Microsoft, …
44
Software
Platform
Infrastructure
Saleforce.com, Google,Animoto, …, …, …caBIG, TG gateways
Amazon, GoGrid, Sun,Microsoft, …
Amazon, Google,Microsoft, …
45
Dynamo: Amazon’s highly available key-value store (DeCandia et al.,
SOSP’07) Simple query model
Weak consistency, no isolation
Stringent SLAs (e.g., 300ms for 99.9% of requests; peak 500 requests/sec)
Incremental scalability
Symmetry Decentralization Heterogeneity
Technologies used in Dynamo
Problem Technique AdvantagePartitioning
Consistent hashing
Incremental scalability
High Availability for writes
Vector clocks with
reconciliation during reads
Version size is decoupled from update rates
Handling temporary failures
Sloppy quorum and hinted handoff
Provides high availability and
durability guarantee when some of the replicas are not
availableRecovering from
permanent failures
Anti-entropy using Merkle
trees
Synchronizes divergent replicas in the background
Membership and failure detection
Gossip-based membership
protocol and failure
detection.
Preserves symmetry and avoids having a centralized registry
for storing membership and node liveness information
47
ApplicationService oriented applications
InfrastructureService oriented infrastructure
48
Energy Internet
The Shape of Grids to Come?
49
Killers apps for COTB?
Biomedical informatics/Evidence-based medicine
Human responses to global climate disruption
50
My servers
Chicago
Chicago
handle.net
BIRN
Chicago
IaaS provider
Chicago
BIRN
Chicago
Using IaaS in biomedical informatics
51
“The computer revolution
hasn’t happened yet.”
Alan Kay, 1997
52Time
Con
nect
ivity
(on
log
scal
e) Science Enterprise Consumer
“When the network is as fast as the computer's
internal links, the machine disintegrates across
the net into a set of special purpose appliances”
(George Gilder, 2001)
Grid Cloud ????
Computation Institutewww.ci.uchicago.edu
Thank you!