in search of petabyte databases

GigaByte

TeraByte

PetaByte

ExaByte

In Search of PetaByte Databases

Jim GrayTony Hey

GigaByte

TeraByte

PetaByte

ExaByte

The Cost of Storage(heading for 1K$/TB soon)

y = 6.7x

y = 17.9x

0100200300400500600700800900

1000

0 20 40 60GB

$ IDE

SCSI

Price vs disk capacity

6

0

5

10

15

20

25

30

35

40

0 10 20 30 40 50 60GB

$

IDE

SCSI

k$/TB

12/1/1999

y = 3.8x

y = 13x

0100200300400500600700800900

1000

0 20 40 60 80Raw Disk unit Size GB

$

SCSI

IDE


0

5

10

15

20

25

30

35

40

0 20 40 60 80Disk unit size GB

$

SCSI

IDE

raw k$/TB

9/1/2000

y = 2.0x

y = 7.2x

0

200

400

600

800

1000

1200

1400

0 50 100 150 200Raw Disk unit Size GB

$ SCSI

IDE


0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

0 50 100 150 200Disk unit size GB

$ SCSI

IDE

raw k$/TB

9/1/2001

GigaByte

TeraByte

PetaByte

ExaByteSummary• DBs own the sweet-spot:

– 1GB to 100TB• Big data is not in databases

HPTS does not own high performance storage (BIG DATA)

• We should• Cost of storage is people:

–Performance goal:1 Admin per PB

GigaByte

TeraByte

PetaByte

ExaByteState is Expensive

• Stateless clones are easy to manage– App servers are middle tier

• Cost goes to zero with Moore’s law.– One admin per 1,000 clones.– Good story about scaleout.

• Stateful servers are expensive to manage– 1TB to 100TB per admin – Storage cost is going to zero(2k$ to 200k$).

• Cost of storage is management cost

GigaByte

TeraByte

PetaByte

ExaBytePersonal 100 GB today

The Personal Petabyte (someday)• It’s coming (2M$ today…2K$ in 10 years)

• Today the pack rats have ~ 10-100GB– 1-10 GB in text (eMail, PDF, PPT, OCR…)– 10GB – 50GB tiff, mpeg, jpeg,…– Some have 1TB (voice + video).

• Video can drive it to 1PB.• Online PB affordable in 10 years.• Get ready: tools to capture, manage,

organize, search, display will be big app.

GigaByte

TeraByte

PetaByte

ExaByte

10 TBAn Image Database: TerraServer

• Snapshot of the USA (1 meter granularity)– 10,000,000,000,000 (=10^13) sq meters– == 15TB raw (some duplicates)– == 5 TB cooked

• 5x compression• + Image pyramid• + gazetteer

• Interesting things: – Its all in the Database– Clustered (allows flaky hardware, online upgrade)

– Triplexed – snapshot each night

GigaByte

TeraByte

PetaByte

ExaByteDatabases (== SQL)

• VLDB survey (Winter Corp).• 10 TB to 100TB DBs.

– Size doubling yearly– Riding disk Moore’s law– 10,000 disks at 18GB is 100TB cooked.

• Mostly DSS and data warehouses.• Some media managers

GigaByte

TeraByte

PetaByte

ExaByteDB iFS

• DB2: leave the files where they live– Referential integrity between DBMS and FS.

• Oracle: put the files in the DBMS– One security model– One storage management model– One space manager– One recovery manger– One replication system– One thing to tune.– Features: transactions,….

GigaByte

TeraByte

PetaByte

ExaByteInteresting facts

• No DBMSs beyond 100TB.• Most bytes are in files.• The web is file centric• eMail is file centric.• Science (and batch) is file centric.• But….• SQL performance is better than CIFS/NFS..

– CISC vs RISC

GigaByte

TeraByte

PetaByte

ExaByte BarBar: the biggest DB

• 350 TB• Uses Objectivity™• SLAC events• Linux cluster scans DB looking for patterns

GigaByte

TeraByte

PetaByte

ExaByte

300 TB (cooked)Hotmail / Yahoo

• Clone front ends ~10,000@hotmail.

• Application servers– ~100 @ hotmail – Get mail box– Get/put mail– Disk bound

• ~30,000 disks

• ~ 20 admins

GigaByte

TeraByte

PetaByte

ExaByteAOL (msn)

(1PB?)• 10 B transactions per day (10% of that)• Huge storage• Huge traffic• Lots of eye candy• DB used for security/accounting.• GUESS AOL is a petabyte

– (40M x 10MB = 400 x 1012)

http://go.msn.com/0/3/

http://r.aol.com/cgi/redir-complex?url=http://www.aol.com&sid=n0

GigaByte

TeraByte

PetaByte

ExaByteGoogle

1.5PB as of last spring• 8,000 no-name PCs

– Each 1/3U, 2 x 80 GB disk, 2 cpu 256MB ram

• 1.4 PB online.• 2 TB ram online• 8 TeraOps • Slice-price is 1K$ so 8M$.• 15 admins (!) (== 1/100TB).

GigaByte

TeraByte

PetaByte

ExaByte

ComputationalScience

• Traditional Empirical Science – Scientist gathers data by direct observation– Scientist analyzes data

• Computational Science– Data captured by instruments

Or data generated by simulator– Processed by software– Placed in a database– Scientist analyzes database– tcl scripts

• on C programs – on ASCII files

http://es.rice.edu/ES/humsoc/Galileo/Images/Astro/Instruments/hevelius_telescope.gif

GigaByte

TeraByte

PetaByte

ExaByteAstronomy

• I’ve been trying to apply DB to astronomy• Today they are at 10TB per data set• Heading for Petabytes• Using Objectivity• Trying SQL (talk to me offline)

GigaByte

TeraByte

PetaByte

ExaByteFast Moving Objects

• Find near earth asteroids:SELECT r.objID as rId, g.objId as gId, r.run, r.camcol, r.field as field, g.field as gField,

r.ra as ra_r, r.dec as dec_r, g.ra as ra_g, g.dec as dec_g,sqrt( power(r.cx -g.cx,2)+ power(r.cy-g.cy,2)+power(r.cz-g.cz,2) )*(10800/PI()) as distance

FROM PhotoObj r, PhotoObj g WHERE

r.run = g.run and r.camcol=g.camcol and abs(g.field-r.field)<2 -- the match criteria-- the red selection criteriaand ((power(r.q_r,2) + power(r.u_r,2)) > 0.111111 )and r.fiberMag_r between 6 and 22 and r.fiberMag_r < r.fiberMag_g and r.fiberMag_r < r.fiberMag_iand r.parentID=0 and r.fiberMag_r < r.fiberMag_u and r.fiberMag_r < r.fiberMag_zand r.isoA_r/r.isoB_r > 1.5 and r.isoA_r>2.0-- the green selection criteriaand ((power(g.q_g,2) + power(g.u_g,2)) > 0.111111 )and g.fiberMag_g between 6 and 22 and g.fiberMag_g < g.fiberMag_r and g.fiberMag_g < g.fiberMag_iand g.fiberMag_g < g.fiberMag_u and g.fiberMag_g < g.fiberMag_zand g.parentID=0 and g.isoA_g/g.isoB_g > 1.5 and g.isoA_g > 2.0-- the matchup of the pairand sqrt(power(r.cx -g.cx,2)+ power(r.cy-g.cy,2)+power(r.cz-g.cz,2))*(10800/PI())< 4.0and abs(r.fiberMag_r-g.fiberMag_g)< 2.0

• Finds 3 objects in 11 minutes• Ugly,

but consider the alternatives (c programs an files and…)

–

GigaByte

TeraByte

PetaByte

ExaByte

Particle Physics – Hunting the Higgs and Dark Matter

• April 2006: First pp collisions at TeV energies at the Large Hadron Collider in Geneva

• ATLAS/CMS Experiments involve 2000 physicists from 200 organizations in US, EU, Asia

• Need to store,access, process, analyse 10 PB/yr with 200 TFlop/s distributed computation

• Building hierarchical Grid infrastructure to distribute data and computation

• Many 10’s of million $ funding – GryPhyN, PPDataGrid, iVDGL, DataGrid, DataTag, GridPP

ExaBytes and PetaFlop/s by 2015

GigaByte

TeraByte

PetaByte

ExaByteAstronomy: Past and

Future of the Universe• Virtual Observatories – NVO, AVO, AstroGrid

– Store all wavelengths, need distributed joins– NVO 500 TB/yr from 2004

• Laser Interferometer Gravitational Observatory– Search for direct evidence for gravitational waves– LIGO 250 TB/yr, random streaming from 2002

• VISTA Visible and IR Survey Telescope in 2004– 250 GB/night, 100 TB/yr, Petabytes in 10 yrs

New phase of astronomy, storing, searching and analysing Petabytes of data

GigaByte

TeraByte

PetaByte

ExaByte

Engineering, Environment and

Medical Applications• Real-Time Health Monitoring– UK DAME project for Rolls Royce Aero Engines– 1 GB sensor data/flight, 100,000 engine hours/day

• Earth Observation – ESA satellites generate 100 GB/day– NASA 15 PB by 2007

• Medical Images to Information– UK IRC Project on mammograms and MRIs– 100 MB/mammogram, UK 3M/yr, US 26M/yr– 200 MB/patient, Oxford 500 women/yr

Many Petabytes of data of real commercial interest

GigaByte

TeraByte

PetaByte

ExaByte

Grids, Databases and Cool Tools

• Scientists:– will build Grids based on Globus Open Source m/w– will have instruments generating Petabytes of data– will annotate their data with XML-based metadata

Realize a version of Licklider and Taylor’s original vision of resource sharing and the ARPANET

• TP and DB community:- Should assist in developing Grid Interfaces to DBMS- Should develop ‘Cool Tools’ for Grid Services

There will be commercial Grid applications and viable business opportunities

GigaByte

TeraByte

PetaByte

ExaByteSummary• DBs own the sweet-spot:

– 1GB to 100TB• Big data is not in databases• HPTS crowd is not really high

performance storage (BIG DATA)• Cost of storage is people:

–Performance goal:1 Admin per PB

in search of petabyte databases

Documents