“here comes the grid” mark hayes technical director - cambridge escience centre niees summer...

21
“Here comes the Grid” Mark Hayes Technical Director - Cambridge eScience Centre NIEeS Summer School 2003

Upload: gavin-carr

Post on 28-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

“Here comes the Grid”

Mark Hayes Technical Director - Cambridge eScience Centre

NIEeS Summer School 2003

In the beginning…

"The collection of people, hardware, and software... will become a node in a geographically distributed computer network…. Through the network... all the large computers can communicate with one another. And through them, all the members of the community can communicate with other people, with programs, with data, or with a selected combination of those resources.”

J.C.R.Licklider, “The Computer as a Communication Device”Science and Technology, April 1968

The ARPAnet in 1970

International connectivity - 1991

International connectivity - 1997

International bandwidth

From “3D geographic network displays” - Cox et al, ACM Sigmod Record - December 1996

What does the Internet look like?

http://www.cybergeography.org/

The World Wide Web

Invented at CERN by Tim Berners-Lee in 1989 as a tool for collaboration and information sharing in the particle physicscommunity.

Early distributed computing

1.2 million CPU years so far...

Brute force attempt to crack strong encryption

Protein folding

The Grid - 1998

Editors: Foster & Kesselman

700 pages22 chapters40 authors

Analogy with the electricalpower grid - just plug in.

The Grid - 2003

Editors: Berman, Hey, Fox

1000 pages43 chapters116 authors

Applications, data sharing andvirtual communities.

It’s not just compute cycles...

An exponential growth in data from many areas of science.

4 types of Grid

• CPU intensive cycle scavenging (SETI@home)

• Data sharing

• Application provision

• Human-human interaction (e.g. Access Grid)

SETI@home

The world’s most powerful computerdelivered 52 Teraflops/second yesterday (Earth Simulator is 35 Tflop/s, sum of top 2-10 is 60Tflop/s)

Latest Statshttp://setiathome.ssl.berkeley.edu/totals.html

6th July 2003

4.5 E+18 flops/day

52 Teraflops/s3 E+21 ops3 zeta ops

Floating Point Operations

1,226 years1.5 M yearsTotal CPU time

1.1 M944 MResults received

1,2264,570,474Users

Last 24 HoursTotal 

The data explosion - some big numbers

FTP and GREP are not adequate (Jim Gray)

• CFD turbulence simulations - 100TB• BaBar particle physics experiment - 1TB/day• CERN LHC will generate 1GB/s or 10PB/year• VLBA radio telescope generates 1GB/s today• NCBI/EMBL database is “only 0.5TB” but doubling each year• brain imaging - 4TB/brain at full colour, 10m resolution (4PB/brain at 1m i.e. cellular resolution)• Pixar - 100TB/movie

Application provision

• Google - 10K cpus, 2PB database (2 years ago)

• free email services - HotMail, Yahoo! 2-10PB storage

• netsolve - numerical algorithms on demand with Matlab & Mathematica plugins

• renderfarm.net - graphics rendering on demand

The Access Grid

“...one of the most compelling glimpses into the future I’ve seen since I first saw NCSA Mosaic.” Larry Smarr

Ambient mic(tabletop)

Presentermic

Presentercamera

Audience camera

High end video conferencingand collaboration technology.

O(100) nodes world wide.

• 1 day of cpu time

• 4 GB ram for a day

• 1 GB of network bandwidth

• 1 GB of disk storage

• 10 M database accesses

• 10 TB of disk access (sequential)

• 10 TB of LAN bandwidth (bulk)

£1 buys...

How do you move a terabyte?

14 minutes6172001,920,0009600OC 192

2.2 hours1000Gbps

1 day100100 Mpbs

14 hours97631649,000155OC3

2 days2,01065128,00043T3

2 months2,4698001,2001.5T1

5 months360117700.6Home DSL

6 years3,0861,000400.04Home phone

Time/TB$/TBSent

$/MbpsRent

$/monthSpeedMbps

Context

Source: Terascale SneaketNet, Jim Gray et al

Compute cycles are (almost) free...

by comparison with network costs.

-The cheapest and fastest way to move 1TB of data out from CERN is still by FedEx.

Though this considers only bandwidth,

low latency networks are even more expensive!

(MPI over WAN doesn’t work well.)

Some consequences

A distributed community of users.

Tiny network input & output, huge compute requirement.

Database access & storage is also expensive,

therefore put the computation near the data.

What makes a good Grid application?

Questions?