google and large scientific datasets or how to move 100tb jon trowbridge google space telescope...

12
Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

Upload: stuart-lane

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

Google and LargeScientific Datasets

or

How To Move 100TB

Jon Trowbridge

Google

Space Telescope Science Institute

March 15, 2007

Page 2: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

Organize the world’s information and make it universally accessible

and useful.

Page 3: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

Motivating Problem

What if a piece of information is too large to efficiently transmit across

the Internet as it exists today?

Page 4: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

“Never underestimate the bandwidth of a station wagon full of tapes hurtling down the

highway.”- Andrew Tanenbaum (?)

Page 5: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

Large Dataset Archive

• Move data by shipping hard drives

• Centralized repository stored on Google’s infrastructure

• Accepting data from all disciplines, but it must be open and free

• Ulimate goal: Promiscuous distribution

Page 6: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

Nice Properties ofPhysical HD Shipment

• Uses commodity technologies: Linux, SATA, ext2

• High throughput

• Trivially scalable

• Cheap and easy: $2400 for 3T

• Rapidly getting cheaper

Page 7: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

Real-World Throughputs

Method MiB/s GiB/hr TB/day hrs/TB1200 baud modem 1.14E-04 4.02E-04 9.43E-06 2545166My Home DSL (downstream) 0.3 1.41 0.03 728.18Ethernet: 10baseT 0.8 2.81 0.07 364.09Ethernet: 100baseT 8 28.13 0.66 36.41End-to-end physical shipment 0.88 27.42HD Transfer 30 105.47 2.47 9.71FedEx phase of shipment 3.00 8.00Ethernet: Gigabit 60 210.94 4.94 4.85LBNL, 2002: 10.6 GiB/s 10854 38160 894.38 0.03

Page 8: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

The Cost of 1GB of Storage

• 1986: $100,000

• 1990: $10,000

• 1994: $1,000

• 1997: $100

• 2000: $10

• 2004: $1

• Today: About 40¢

Creative Computing - February, 1980

Page 9: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

Not-So-Nice Properties ofPhysical HD Shipment

• Physical objects break, get stolen, occasionally explode

• HD copying bottleneck

• Customs/duties make international shipments more complicated

Page 10: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

The Big Question

What happens when every astronomer has the complete Hubble Legacy Archive on the

computer in their office?

Page 11: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007

The Big Question

What happens when every high-school student has the complete

Hubble Legacy Archive on thecomputer in their bedroom?

Page 12: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007