may 2005 1richard p. mount, slac advanced computing technology overview richard p. mount director:...

19
May 2005 1 Richard P. Mount, SLAC Advanced Computing Technology Overview Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear Accelerator Center May 25, 2005

Upload: leslie-day

Post on 29-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

May 2005 1Richard P. Mount, SLAC

Advanced Computing TechnologyOverview

Richard P. Mount

Director: Scientific Computing and Computing Services

Stanford Linear Accelerator Center

May 25, 2005

May 2005 2Richard P. Mount, SLAC

Advanced Computing TechnologyMy Viewpoint

• Decades of computing for experimental HEP;

• Decades of data-intensive computing;

• Belief that the future will be even more data-intensive for HEP;

• Belief that the many other sciences are also facing a data-intensive future.

May 2005 3Richard P. Mount, SLAC

My History

• Circa 1980– The EMC experiment

– World’s largest collaboration (99 physicists)

– 10,000 tapes/year

– No advance planning for computing resources

– I had to invent a data-handling system to be able to do physics

• 1982 – 1997– The L3 experiment at LEP

– Responsible for L3 computing at CERN

• 1997 – now– BaBar at SLAC

– Future HEP, Particle-astro and X-ray science at SLAC/Stanford

May 2005 4Richard P. Mount, SLAC

CPU, Disk and Network Historyi.e. what I (and Harvey) have bought

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

Farm CPU box MIPS/$M

Doubling in 1.2 years

Raid Disk GB/$M

Doubling in 1.4 years

Transatlantic WAN kB/s per$M/yr

Doubling in 8.4 or 0.7 years

May 2005 5Richard P. Mount, SLAC

What about the future?

• CPU– The clock-speed ramp up has run out of steam

– Intel/AMD response: multicore chips• Fairly easy to use for HEP data processing

– Intel predicts 25 Tflops/chip in 2015 (100 cores)• Close to the “doubling every 1.2 year” extrapolation (costs are

very dependent on memory)

• Disk– “increasing requirements for disk drive improvements

provides a unending challenge to extend GMR technology to its limits, and then to look beyond” (Hitachi GST)

– It seems to be working – no end to capacity growth is in sight yet.

May 2005 6Richard P. Mount, SLAC

Client Client Client Client Client Client

Disk Server

Disk Server

Disk Server

Disk Server

Disk Server

Disk Server

Tape Server

Tape Server

Tape Server

Tape Server

Tape Server

Generic HEP Computing Fabric

IP Network

IP Network

Hundreds of Disk Servers

Thousands of Farm CPUs

Tens of Tape servers

Tens of Tape drives

HEP-specific software (e.g Root/Xrootd)

Mass Storage System Software

May 2005 7Richard P. Mount, SLAC

Some Issues

• While CPU power per $ has been doubling every 1.2 years, Watts per $ have been increasing too.– Infrastructure (power, cooling, space) now costs as much

per year as the computers

• Boxes per $ are also increasing– 10,000 – 100,000 box systems are in sight

– Scalability is vital

– Fault tolerance is a requirement

• The raised-floor switched network (= system backplane) is a potential bottleneck– But if you have a few times $10M then a Cisco CSR-1 can

provide about 10,000 non blocking 10Gbit Ethernet ports on a single switch fabric.

May 2005 8Richard P. Mount, SLAC

Disks: It’s not just about capacity (1)

Ed Grochowski: Hitachi Global Storage Technologies

May 2005 9Richard P. Mount, SLAC

Disks: It’s not just about capacity (2)

Ed Grochowski: Hitachi Global Storage Technologies

10% CGR in speed

May 2005 10Richard P. Mount, SLAC

Disks: Gloom and Doom

• Giora Tarnopolski (TarnoTek)– “I do not believe that there will be much shorter access times

in the future”

• Ed Grochowski (Hitachi GST)– “While rotation rates beyond 15K are possible in the future,

these will likely occur at longer product time intervals”

• BaBar reality:– Micro DST events are replicated about tenfold on disk

– Millions of $$$ per year, and many months of delay, are spent on data reorganization to allow efficient access by thousands of concurrent jobs

May 2005 11Richard P. Mount, SLAC

Technology Issues in Data Access

• Latency

• Speed/Bandwidth

• (Cost)

• (Reliabilty)

May 2005 12Richard P. Mount, SLAC

Latency and Speed – Random Access

Random-Access Storage Performance

0.000000001

0.00000001

0.0000001

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

10

100

1000

10000

log10 (Obect Size Bytes)

Ret

reiv

al R

ate

Mb

ytes

/s

DRAM 675MHz

Seagate400GBSATA

STK9940B

0 1 2 3 4 5 6 7 8 9 10 11

105

May 2005 13Richard P. Mount, SLAC

Latency and Speed – Random Access

0.000000001

0.00000001

0.0000001

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

10

100

1000

10000

log10 (Object Size Bytes)

Ret

riev

al R

ate

MB

ytes

/s

DRAM 675 MHz

Seagate 400GBSATA

STK9940B

RAM 10 yearsago

Disk 10 yearsago

Tape 10 yearsago

0 1 2 3 4 5 6 7 8 9 10 11

May 2005 14Richard P. Mount, SLAC

Death to Disks

• 105 latency gap with respect to memory will be intolerable, eventually even in consumer applications;

• Storage-class memory is in development (see Jai Menon’s talk at CHEP 2004);

• In the meantime we can use DRAM or even Flash memory in latency-critical or throughput-critical applications;

• For example, May 23, 2005 news item:– “Samsung develops flash-based 'disk' for PCs”

• “It uses memory chips instead of a mechanical recording system”

• http://www.computerworld.com/hardwaretopics/storage/story/0,10801,101946,00.html?source=NLT_AM&nid=101946

• Market forces are not yet aligned with scientific needs for massive, random access storage-class memory.

May 2005 15Richard P. Mount, SLAC

Storage-Class Memory Architecture:SLAC Strategy (PetaCache)

• There is significant commercial interest in architectures including massive data-cache memory

• But: from interest to delivery will take 3-4 years

• And: applications will take time to adapt not just codes, but their whole approach to computing, to exploit the new random-access architecture

• Hence: two phases1. Development phase (years 1,2,3)

– Commodity hardware taken to its limits

– BaBar as principal user, adapting existing data-access software to exploit the configuration

– BaBar/SLAC contribution to hardware and manpower

– Publicize results

– Encourage other users

– Begin collaboration with industry to design the leadership-class machine

2. Operational Facility (years 3,4,5)– New architecture

– Strong industrial collaboration

– Wide applicability

May 2005 16Richard P. Mount, SLAC

Development MachineDeployment – Currently Funded

Cisco Switch

Data-Servers 64-128 Nodes, each Sun V20z, 2 Opteron CPU, 16 GB memory

Up to 2TB total MemorySolaris

Cisco Switch

Clients up to 2000 Nodes, each 2 CPU, 2 GB memory

Linux

May 2005 17Richard P. Mount, SLAC

Development MachineDeployment – Possible Next Step

Cisco Switch

Data-Servers 80 Nodes, each 8 Opteron CPU, 128 GB memory

Up to 10TB total MemorySolaris

Cisco Switch

Clients up to 2000 Nodes, each 2 CPU, 2 GB memory

Linux

May 2005 18Richard P. Mount, SLAC

Scalable Object-Serving Software Example

• Xrootd (Andy Hanushevsky/SLAC)– Optimized for read-only access

– Contributes ~29 microseconds to server latency

– Make 1000s of servers transparent to user code

– Load balancing

– Automatic staging from tape

– Failure recovery

• Can allow BaBar to start getting benefit from a new data-access architecture within months without changes to user code

• Minimizes impact of hundreds of separate address spaces in the data-cache memory

May 2005 19Richard P. Mount, SLAC

Summary• Moore’s Law (at least the generalized version) is alive and well

for CPU throughput and disk capacity;

• Moore’s Law seems dead for single-threaded CPU power;

• Moore’s Law never applied to random-access to data;

• At constant cost, computing is getting hotter!

• Disks are now playing the same role in HEP that tapes were in 1990 (i.e. they are not random-access devices);

• Prepare for a random-access future;

• Prepare for a 10,000 to 100,000 box future;

• Scalability and fault tolerance are the challenges we must address.