life and work of jim gray | turing100@persistent

105
Life and Work of Jim Gray January 5, 2013 1 Turing 100 @ Persistent

Upload: persistent-systems-ltd

Post on 12-Jun-2015

1.226 views

Category:

Documents


2 download

DESCRIPTION

Dr. Anand Deshpande, Chairman, Managing Director & CEO, Persistent systems Ltd talks about Life and Work of Jim Gray ( 1998 Turing Award Recipient) during 6th Turing Session

TRANSCRIPT

Page 1: Life and Work of Jim Gray | Turing100@Persistent

1

Life and Work of Jim GrayJanuary 5, 2013

Turing 100 @Persistent

Page 2: Life and Work of Jim Gray | Turing100@Persistent

2

Page 3: Life and Work of Jim Gray | Turing100@Persistent

3

JAMES ("JIM") NICHOLAS GRAY

United States – 1998CITATIONFor fundamental contributions to database and transaction processing research and technical leadership in system implementation from research prototypes to commercial products. The transaction is the fundamental abstraction underlying database system concurrency and failure recovery. Gray’s work [defined] the key transaction properties: atomicity, consistency, isolation and durability, and his locking and recovery work demonstrated how to build … systems that exhibit these properties.

Page 4: Life and Work of Jim Gray | Turing100@Persistent

E. F. Codd invented the Relational Databases in 1970 and created what is a 100+ Billion Dollar/year Industry today.

Page 5: Life and Work of Jim Gray | Turing100@Persistent

5

● Simple model● Data stored in relational tables● Data Independence – separation of

data storage and data access ● Declarative Queries● Algebra to mathematically reason

about data objects – made query optimization possible

● Ad-hoc queries through SQL.● Embedded in operational systems.

Codd’s Relational Model

Page 6: Life and Work of Jim Gray | Turing100@Persistent

6

●Jim Gray defined ACID properties to guarantee database transactions are processed reliably.

ACID properties are fundamental to Relational Systems and necessary for on-line transaction processing (OLTP)systems

Atomicity Consistency Isolation Durability

Page 7: Life and Work of Jim Gray | Turing100@Persistent

7

From Transactions to Transaction Processing Systems - II

Change

Reality Abstraction

Transaction

Qu

ery

AnswerDB'

DB

The real state is represented by an abstraction, called the database, and the transformation of the real state is mirrored by the execution of a program, called a transaction, that transforms the database.

Page 8: Life and Work of Jim Gray | Turing100@Persistent

8

Gray defined Data Manipulation Actions as

• transient and internal state

Unprotected

• grouped into transactions and reflected in the state of transaction outcome

Protected

• involve sensors, actuators etc. They cannot be undone they can be compensated.

Real

Page 9: Life and Work of Jim Gray | Turing100@Persistent

9

Definitions

● A transaction is a sequence of operations that form a single unit of work

● A transaction is often initiated by an application program – begin a transaction

START TRANSACTION – end a transaction

COMMIT (if successful) or ROLLBACK (if errors)

● Either the whole transaction must succeed or the effect of all operations has to be undone (rollback)

● To achieve durable transaction atomicity, the transition to the “committed” state must be accomplished by the single write to non-volatile storage.

Page 10: Life and Work of Jim Gray | Turing100@Persistent

10

Structure of a Transaction Program

BEGIN WORK ()

COMMIT WORK ()

ROLL BACK WORK ()WORK

ROLL BACK WORK ()

Page 11: Life and Work of Jim Gray | Turing100@Persistent

11

While at IBM San Jose Research LaboratoryOctober 1972 to December 1980

● Jim Gray developed three key ideas related to transaction concurrency control: – The notion of transaction– Serializability; degrees of consistency; – Multi-granularity locking.

● There are two main transaction issues – concurrent execution of multiple transactions– recovery after hardware failures and system crashes

Page 12: Life and Work of Jim Gray | Turing100@Persistent

12

Write Ahead Log (WAL) protocol● The WAL protocol records the old and new states induced by

protected actions separately from the actual state changes. ● The logged changes are written to stable storage before the

actual changes are written back to stable storage (that’s the “Write Ahead” part).

● Transactions are committed by simply appending and writing a ‘commit’ record to the recovery log. Logged changes are used to undo protected actions of aborted transactions and of transactions in progress at the time of a system failure.

Page 13: Life and Work of Jim Gray | Turing100@Persistent

13

Write Ahead Log (WAL) protocol● Log records are also used to redo committed

actions whose actual changes have not been written back to stable storage at the time of a system failure.

● The WAL protocol allows changed data to be written to their stable storage home at any time after the log records describing the changes have been written into the stable log.

● This gives the Database Manager great flexibility in managing the contents of its volatile data buffer pools.

Page 14: Life and Work of Jim Gray | Turing100@Persistent

14

ACID Properties: First Definition● Atomicity: A transaction’s changes to the state are atomic:

either all happen or none happen. These changes include database changes, messages, and actions on transducers.

● Consistency: A transaction is a correct transformation of the state. The actions taken as a group do not violate any of the integrity constraints associated with the state. This requires that the transaction be a correct program.

● Isolation: Even though transactions execute concurrently, it appears to each transaction T, that others executed either before T or after T, but not both.

● Durability: Once a transaction completes successfully (commits), its changes to the state survive failures.

Page 15: Life and Work of Jim Gray | Turing100@Persistent

15

[Gray 1993] Jim Gray and Andreas Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann, San Mateo, CA (1993).

Page 16: Life and Work of Jim Gray | Turing100@Persistent

16

In 1985, Jim and a number of other senior leaders in the field of transaction processing started the HPTS (High Performance Transaction Systems) Workshop [HPTS]. This is a biennial gathering of folks interested in transaction systems (and things related to scalable systems). It includes people from competing companies in industry and also from academia. Over the last 22 years, it has evolved to include many different topics as high-end computing morphed from the mainframe to the Internet.

Page 17: Life and Work of Jim Gray | Turing100@Persistent

17

The early years …

● Born January 12, 1944

● 1961 graduated from Westmoor High School in San Francisco.

● 1966 graduated from the University of California at Berkeley with bachelor’s degree in mathematics and engineering.

Page 18: Life and Work of Jim Gray | Turing100@Persistent

18

James Nicholas Gray was born in San Francisco, California on 12 January 1944. 

● In 1961 Gray graduated from Westmoor High School in San Francisco.

● He graduated from the University of California at Berkeley bachelor’s degree in mathematics and engineering in 1966.

● After spending a year in New Jersey working at Bell Laboratories in Murray Hill and attending classes at the Courant Institute in New York City, he returned to Berkeley and enrolled in the newly-formed computer science department, earning a Ph.D. in 1969 for work on context-free grammars and formal language theory.

Page 19: Life and Work of Jim Gray | Turing100@Persistent

19

5-minute rule for Memory vs. Disk Access (1987)

When does it make economic sense to hold pages in memory versus doing IO every time data from the page is accessed?

THE FIVE MINUTE RULEPages referenced every five minutes should be

memory resident.

Page 20: Life and Work of Jim Gray | Turing100@Persistent

20

From Tandem Report 1987:Jim Gray and Gianfranco Putzolu

● The argument goes as follows: A Tandem disc, and half a controller comfortably deliver 15 accesses per second and are priced at 15K$ for a small disc and 20K$ for a large disc (180Mb and 540Mb respectively).

● So the price per access per second is about 1K$. The extra CPU and channel cost for supporting a disc are lK$/a/s. So one disc access per second costs about 2K$ on a Tandem system.

● A megabyte of Tandem main memory costs 5K$, so a kilobyte costs 5$.

Page 21: Life and Work of Jim Gray | Turing100@Persistent

21

● If making a 1Kb record resident saves 1a/s, then it saves about 2K$ worth of disc accesses at a cost of 5$, a good deal. If it saves 0.1 a/s then it saves about 200$, still a good deal. Continuing this, the break even point is an access every 2000/5 - 400 seconds.

● So, any 1KB record accessed more frequently than every 400 seconds should live in main memory. 400 seconds is "about" 5 minutes, hence the name: the Five Minute Rule.

Page 22: Life and Work of Jim Gray | Turing100@Persistent

22

5-minute rule

● The five-minute rule is based on the tradeoff between the cost of RAM and the cost of disk accesses.

=

Page 23: Life and Work of Jim Gray | Turing100@Persistent

23

5-minute rule

● The five-minute rule is based on the tradeoff between the cost of RAM and the cost of disk accesses.

=

Technology   Ratio E conomic   Ratio

Page 24: Life and Work of Jim Gray | Turing100@Persistent

24

1997 – Ten years later

Page 25: Life and Work of Jim Gray | Turing100@Persistent

25

New Storage Metrics: Kaps, Maps, SCAN

● Kaps: How many kilobyte objects served per second– The file server, transaction processing metric– This is the OLD metric.

● Maps: How many megabyte objects served per sec – The Multi-Media metric

● SCAN: How long to scan all the data– the data mining and utility metric

● And– Kaps/$, Maps/$, TBscan/$

Page 26: Life and Work of Jim Gray | Turing100@Persistent

26

Disk Changes

● Disks got cheaper: 20k$ -> 1K$ (or even 200$) – $/Kaps etc improved 100x (Moore’s law!) (or even 500x)– One-time event (went from mainframe prices to PC prices)

● Disk data got cooler (10x per decade):– 1990 disk ~ 1GB and 50Kaps and 5 minute scan– 2000 disk ~70GB and 120Kaps and 45 minute scan

● So– 1990: 1 Kaps per 20 MB– 2000: 1 Kaps per 500 MB– disk scans take longer (10x per decade)

● Backup/restore takes a long time (too long)

Page 27: Life and Work of Jim Gray | Turing100@Persistent

27

Storage Ratios Changed

● 10x better access time● 10x more bandwidth● 100x more capacity● Data 25x cooler

(1Kaps/20MB vs 1Kaps/500MB)

● 4,000x lower media price● 20x to 100x lower disk

price● Scan takes 10x longer (3

min vs 45 min)

● DRAM/disk media price ratio changed– 1970-1990 100:1 – 1990-1995 10:1

– 1995-1997 50:1– today

~ 0.03$/MB disk 100:1 3$/MB dram

Page 28: Life and Work of Jim Gray | Turing100@Persistent

28

The Five Minute Rule

● Trade DRAM for Disk Accesses● Cost of an access (DriveCost / Access_per_second)● Cost of a DRAM page ( $/MB / pages_per_MB)● Break even has two terms:● Technology term and an Economic term

● Grew page size to compensate for changing ratios.● Still at 5 minute for random, 1 minute sequential

From his presentations in 2000

Page 29: Life and Work of Jim Gray | Turing100@Persistent

29

Data on Disk Can Move to RAM in 10 years

Storage Price vs TimeMegabytes per kilo-dollar

0.1

1.

10.

100.

1,000.

10,000.

1980 1990 2000

Year

MB

/k$

100:1

10 years

Page 30: Life and Work of Jim Gray | Turing100@Persistent

30

Storage Hierarchy : Speed & Capacity vs Cost TradeoffsStorage Hierarchy : Speed & Capacity vs Cost Tradeoffs

1015

1012

109

106

103

Typi

cal S

yste

m (

byte

s)

Size vs Speed

Access Time (seconds)10-9 10-6 10-3 10 0 10 3

Cache

Main

Secondary

Disc

Nearline Tape

Offline Tape

Online Tape

102

100

10-2

10-4

10-6

$/M

B

Price vs Speed

Access Time (seconds)10-9 10-6 10-3 10 0 10 3

Cache

MainSecondary

Disc

Nearline Tape

Offline Tape

Online Tape

Page 31: Life and Work of Jim Gray | Turing100@Persistent

31

5-minute rule holds in 1997

● In summary, the five-minute rule still seems to apply to randomly accessed pages, primarily because page sizes have grown from 1KB to 8KB to compensate for changing technology ratios.

Page 32: Life and Work of Jim Gray | Turing100@Persistent

32

Storage Latency: How Far Away is the Data?

RegistersOn Chip CacheOn Board Cache

Memory

Disk

12

10

100

Tape /Optical Robot

109

106

Olympia

This HotelThis Room

My Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 Years

Andromeda

From Jim Gray’s Rules of Thumb in Data Engineering Presentation

Page 33: Life and Work of Jim Gray | Turing100@Persistent

33

What’s TeraByte?

● 1 Terabyte:– 1,000,000,000 business letters 150 miles of book shelf– 100,000,000 book pages 15 miles of book shelf– 50,000,000 FAX images 7 miles of book shelf– 10,000,000 TV pictures (mpeg) 10 days of video– 4,000 LandSat images 16 earth images (100m)– 100,000,000 web page 10 copies of the web HTML

● Library of Congress (in ASCII) is 25 TB – 1980: $200 million of disc 10,000 discs– $5 million of tape silo 10,000 tapes– 1997: 200 k$ of magnetic disc 48 discs– 30 k$ nearline tape 20 tapes

Terror Byte !

Jim Gray’s presentations 1995

Page 34: Life and Work of Jim Gray | Turing100@Persistent

34

How Much Information Is there?

● Soon everything can be recorded and

indexed● Most data never be seen

by humans

● Precious Resource: Human attention

– Auto-Summarization– Auto-Search

is key technology.http://www.lesk.com/mlesk/ksg97/ksg.html

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

KiloA Book

.Movie

All LoC books(words)

All Books MultiMedia

Everything!Recorded

A Photo

24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli

Page 35: Life and Work of Jim Gray | Turing100@Persistent

35

2007: Twenty Years Later

Page 36: Life and Work of Jim Gray | Turing100@Persistent

36

The 5-minute rule holds in 2007● The old five-minute rule for RAM and disk now applies

to 64KB page sizes (334 seconds). – Five minutes had been the approximate break-even interval

for 1KB in 198715and for 8KB in 1997.14 

● The five-minute break-even interval also applies to RAM and the expensive flash memory of 2007 for page sizes of 64KB and above (365 seconds and 339 seconds). – As the price premium for flash memory decreases, so does

the break-even interval (146 seconds and 136 seconds).

Page 37: Life and Work of Jim Gray | Turing100@Persistent

37

Flash memory falls between traditional RAM and persistent mass storage based on rotating disks in terms of acquisition cost, access latency, transfer bandwidth, spatial density, power consumption, and cooling costs.

Page 38: Life and Work of Jim Gray | Turing100@Persistent

38

20 years out:Summary and Conclusion

● The 20-year-old five-minute rule for RAM and disks still holds, but for ever-larger disk pages.

● It should be augmented by two new five-minute rules: – for small pages moving between RAM and flash memory and – for large pages moving between flash memory and traditional

disks.

● For small pages moving between RAM and disk, Gray and Putzolu were amazingly accurate in predicting a five-hour break-even point 20 years into the future.

Page 39: Life and Work of Jim Gray | Turing100@Persistent

39

Page 40: Life and Work of Jim Gray | Turing100@Persistent

40

Page 41: Life and Work of Jim Gray | Turing100@Persistent

41

Data Cube

Page 42: Life and Work of Jim Gray | Turing100@Persistent

42

Aggregates in SQL

● The SQL standard [Melton, Simon] provides five aggregate functions: COUNT, SUM, MIN, MAX, AVG

SELECT [DISTINCT] AVG(Temp)FROM Weather;

● Aggregate functions return a single value. In addition, SQL allows aggregation over distinct values.

● Using GROUP BY , SQL can create a table of aggregate values indexed by a set of attributes.

SELECT Time, Altitude, AVG(Temp)FROM WeatherGROUP BY Time, Altitude;

SUM()

TableSUM()

A

B

C

D

attributeA A A B B B B B C C C C C D D

Page 43: Life and Work of Jim Gray | Turing100@Persistent

43

Problems With This Design

● Users Want Histograms● Users want sub-totals and totals

– drill-down & roll-up reports

● Users want CrossTabs● Conventional wisdom

– These are not relational operators – They are in many report writers and

query engines

sum

M T W T F S S � AIR

HOTEL

FOOD

MISC

F() G() H()

Page 44: Life and Work of Jim Gray | Turing100@Persistent

44

Other Variants – Illustra

● init(&handle): – Allocates the handle and initializes the aggregate

computation.

● iter(&handle, value): – Aggregates the next value into the current aggregate.

● value = final(&handle): – Computes and returns the resulting aggregate by using data

saved in the handle. This invocation deallocates the handle.

Page 45: Life and Work of Jim Gray | Turing100@Persistent

45

DATA CUBE and ROLLUP

SELECT Model, Year, Color SUM(Sales) AS total, SUM(Sales) / total(ALL,ALL,ALL)FROM SalesWHERE Model IN {‘Ford’, ‘Chevy’} AND Year Between 1990 AND 1992GROUP BY CUBE(Model, Year, Color);

CHEVY

FORD 19901991

19921993

REDWHITEBLUE

By Color

By Make & Color

By Make & Year

By Color & Year

By MakeBy Year

Sum

The Data Cube and The Sub-Space AggregatesSum

REDWHITE

BLUE

Chevy Ford

By Make

By ColorCross Tab

REDWHITE

BLUE

By Color

Sum

Group By (with total)

Sum

Aggregate

Page 46: Life and Work of Jim Gray | Turing100@Persistent

46

Page 47: Life and Work of Jim Gray | Turing100@Persistent

47

Page 48: Life and Work of Jim Gray | Turing100@Persistent

48

A Dozen Information Technology Research Goals

1. Scalability: Devise a software and hardware architecture that scales up by a factor of 106. That is, an application’s storage and processing capacity can automatically grow by a factor of million, doing jobs faster (106 x speedup) or doing larger jobs in the same time (106 x scale-up), just by adding more resources.

2. The Turing Test: Build a computer system that wins the imitation game at least 30% of the time.

3. Speech to text: Hear as well as a native speaker.4. Text to speech: Speak as well as a native speaker.5. See as well as person: Recognize objects and motion.

Page 49: Life and Work of Jim Gray | Turing100@Persistent

49

A Dozen Information Technology Research Goals

6. Personal Memex: Record every thing a person sees and hears and quickly re retrieve any iteration on request.

7. World Memex: Build a system that given a text corpus, can answer questions about and summarize the text as precisely and quickly as a human expert in that field. Do the same for music, images, art and cinema.

8. Telepresence: Simulate being some other place retrospectively as an observer.(Teleobserver): hear and see as well as actually being there and as well as participant. Simulate being some other place as a participant (Telepresent): interacting with others and with the environment as though you are actually there.

Page 50: Life and Work of Jim Gray | Turing100@Persistent

50

A Dozen Information Technology Research Goals

9. Trouble-Free Systems: Built a system used by millions of people each day and yet administered and managed by a single part-time person.

10.Secure System: Assure that the system of problem 9 services only authorized users, service cannot be denied by unauthorized users and information cannot be stolen (and prove it).

11.Always Up: Assure that the system is unavailable for less than one second per hundred years – eight s of availability (and prove it).

Page 51: Life and Work of Jim Gray | Turing100@Persistent

51

A Dozen Information Technology Research Goals

12.Automatic Programmer: Devise a specification language or user interface that – Makes it easy for people to express designs (1,000x easier),– Computer can compile, and– Can describe all applications (is complete).The system should reason about application, asking questions about exception cases and incomplete specification. But is should not be onerous to use.

Page 52: Life and Work of Jim Gray | Turing100@Persistent

52

Computer Industry Laws (Rules of thumb)

● Metcalf’s law● Moore’s first law● Bell’s computer classes (7 price tiers)● Bell’s platform evolution● Bell’s platform economics● Bill’s law● Software economics● Grove’s law● Moore’s second law● Is info-demand infinite?● The death of Grosch’s law

Page 53: Life and Work of Jim Gray | Turing100@Persistent

53

Gordon Bell’s Seven Price Tiers

10$: wrist watch computers 100$: pocket/ palm computers 1,000$: portable computers 10,000$: personal computers (desktop) 100,000$: departmental computers (closet) 1,000,000$:site computers (glass house) 10,000,000$: regional computers (glass castle)

Super server: costs more than $100,000“Mainframe”: costs more than $1 million

Must be an array of processors, disks, tapes, comm ports

Page 54: Life and Work of Jim Gray | Turing100@Persistent

54

Information at your fingertips.

Bill Gates is known for his long-standing belief that, as he once put it, ”any piece of information you want should be available to you. -- Putting Information at Your Fingertips.”

Gates championed it as early as 1989, and he was in a position to do something about it. It remained his overriding goal for the next two decades.

Page 55: Life and Work of Jim Gray | Turing100@Persistent

55Federation

The Vision: Global Data Federation ● Massive datasets live near their owners:

– Near the instrument’s software pipeline– Near the applications– Near data knowledge and curation

● Each Archive publishes a (web) service– Schema: documents the data– Methods on objects (queries)

● Scientists get “personalized” extracts● Uniform access to multiple Archives

– A common global schema

Page 56: Life and Work of Jim Gray | Turing100@Persistent

56

Gray and Bell worked closely at Digital and at Microsoft’s Bay Area Research Center since 1994● MyLifeBits

● Terra Server

Page 57: Life and Work of Jim Gray | Turing100@Persistent

57

Gordon Bell’s: MyLifeBits

● MylifeBits is a lifetime store of everything. It is the fulfillment of Vannevar Bush’s 1945 Memex vision including full-text search, text and audio annotations, and hyperlinks.

● The experiment: Gordon Bell has captured a lifetime's worth of articles, books, cards, CDs, letters, memos, papers, photos, pictures, presentations, home movies, videotaped lectures, and voice recordings and stored them digitally. He is now paperless, and is beginning to capture phone calls, IM transcripts, television, and radio.

Page 58: Life and Work of Jim Gray | Turing100@Persistent

58

Page 59: Life and Work of Jim Gray | Turing100@Persistent

59

TerraServer

In late spring of 1996, Paul Flessner, the General Manager of the SQL Server team asked our lab to build a database application that would test and demonstrate the scalability of the next release of SQL Server code named “Sphinx”.

One of Jim’s greatest abilities was to clearly define and articulate the problem. The SQL team gave us two goals:1. Test SQL’s ability to scale up to support a database of

one terabyte or larger.2. An internet application where SQL marketing could

demonstrate Windows and SQL Server’s scalability.

Page 60: Life and Work of Jim Gray | Turing100@Persistent

60

About moving research to production

“ideas don’t transfer, people transfer…”

Page 61: Life and Work of Jim Gray | Turing100@Persistent

61

TerraServer Requirements

● BIG —1 TB of data including catalog, temporary space, etc.● PUBLIC — available on the world wide web● INTERESTING — to a wide audience● ACCESSIBLE — using standard browsers (IE, Netscape)● REAL — a LOB application (users can buy imagery)● FREE —cannot require NDA or money to a user to access● FAST — usable on low-speed (56kbps) and high speeds(T-1+)● EASY — we do not want a large group to develop, deploy, or maintain

the application

● CHEAP – An unwritten requirement (1) because TerraServer was only a prototype, test, and free demonstration; and (2) Jim Gray was a very frugal person!

Page 62: Life and Work of Jim Gray | Turing100@Persistent

62

United States Geological Survey (USGS)

An Interesting Internet Server

 SOVINFORMSPUTNIK (the Russian Space Agency) and Aerial Images

http://msdn.microsoft.com/en-us/library/aa226316(v=sql.70).aspx

Page 63: Life and Work of Jim Gray | Turing100@Persistent

63

Thesis: Scaleable Servers● Scaleable Servers

– Commodity hardware allows new applications– New applications need huge servers– Clients and servers are built of the same “stuff”• Commodity software and • Commodity hardware

● Servers should be able to – Scale up (grow node by adding CPUs, disks, networks)

– Scale out (grow by adding nodes)

– Scale down (can start small)

● Key software technologies– Objects, Transactions, Clusters, Parallelism

Page 64: Life and Work of Jim Gray | Turing100@Persistent

64

Thesis: Scaleable Servers● Scaleable Servers

– Commodity hardware allows new applications– New applications need huge servers– Clients and servers are built of the same “stuff”• Commodity software and • Commodity hardware

● Servers should be able to – Scale up (grow node by adding CPUs, disks, networks)

– Scale out (grow by adding nodes)

– Scale down (can start small)

● Key software technologies– Objects, Transactions, Clusters, Parallelism

Page 65: Life and Work of Jim Gray | Turing100@Persistent

65

Scaleable ServersBOTH SMP And Cluster

Grow up with SMP; 4xP6is now standardGrow out with clusterCluster has inexpensive parts

Clusterof PCs

SMP superserver

Departmentalserver

Personalsystem

Page 66: Life and Work of Jim Gray | Turing100@Persistent

66

SMPs Have Advantages● Single system image

easier to manage, easier to program threads in shared memory, disk, Net

● 4x SMP is commodity● Software capable of 16x● Problems:

– >4 not commodity– Scale-down problem

(starter systems expensive)● There is a BIGGEST one

SMP superserver

Departmentalserver

Personalsystem

Page 67: Life and Work of Jim Gray | Turing100@Persistent

67

Grow UP and OUT

1 billion transactions per day

SMP superserver

Departmentalserver

Personalsystem

1 Terabyte DB Cluster: • a collection of nodes • as easy to program and

manage as a single node

Page 68: Life and Work of Jim Gray | Turing100@Persistent

68

Clusters Have Advantages

● Clients and servers made from the same stuff● Inexpensive:

– Built with commodity components

● Fault tolerance: – Spare modules mask failures

● Modular growth– Grow by adding small modules

● Unlimited growth: no biggest one

Page 69: Life and Work of Jim Gray | Turing100@Persistent

69

Windows NT Clusters● Microsoft & 60 vendors defining NT clusters

– Almost all big hardware and software vendors involved

● No special hardware needed - but it may help● Fault-tolerant first, scaleable second

– Microsoft, Oracle, SAP giving demos today

● Enables – Commodity fault-tolerance– Commodity parallelism (data mining, virtual reality…)– Also great for workgroups!

Page 70: Life and Work of Jim Gray | Turing100@Persistent

70

ParallelismThe OTHER aspect of clusters

● Clusters of machines allow two kinds of parallelism– Many little jobs: online

transaction processing• TPC-A, B, C…

– A few big jobs: data search and analysis

• TPC-D, DSS, OLAP

● Both give automatic parallelism

Page 71: Life and Work of Jim Gray | Turing100@Persistent

71Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

Kinds of Parallel Execution

Pipeline

Partition outputs split N ways inputs merge M ways

Any Sequential Program

Any Sequential Program

Any Sequential

Any Sequential Program Program

Page 72: Life and Work of Jim Gray | Turing100@Persistent

72Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

Data RiversSplit + Merge Streams

River

M ConsumersN producers

Producers add records to the river, Consumers consume records from the riverPurely sequential programming.River does flow control and buffering

does partition and merge of data records River = Split/Merge in Gamma = Exchange operator in Volcano.

N X M Data Streams

Page 73: Life and Work of Jim Gray | Turing100@Persistent

73Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

Partitioned Execution

A...E F...J K...N O...S T...Z

A Table

Count Count Count Count Count

Count

Spreads computation and IO among processors

Partitioned data gives NATURAL parallelism

Page 74: Life and Work of Jim Gray | Turing100@Persistent

74Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

N x M way Parallelism

A...E F...J K...N O...S T...Z

Merge

Join

Sort

Join

Sort

Join

Sort

Join

Sort

Join

Sort

Merge Merge

N inputs, M outputs, no bottlenecks.Partitioned DataPartitioned and Pipelined Data Flows

Page 75: Life and Work of Jim Gray | Turing100@Persistent

75

Year 2000 4B Machine

The Year 2000 commodity PC

●Billion Instructions/Sec

● .1 Billion Bytes RAM

●Billion Bits/s Net

●10 B Bytes Disk

●Billion Pixel display– 3000 x 3000 x 24

●1,000 $

10 GB byte Disk

.1 B byte RAM

1 Bips Processor

1 B

bits

/sec

LA

N/W

AN

Jim Gray & Gordon Bell: 1997 presentations

Page 76: Life and Work of Jim Gray | Turing100@Persistent

76

Super Server: 4T Machine● Array of 1,000 4B machines

– 1 b ips processors– 1 B B DRAM – 10 B B disks – 1 Bbps comm lines– 1 TB tape robot

● A few megabucks● Challenge:

– Manageability– Programmability– Security– Availability– Scaleability– Affordability

● As easy as a single system

Future servers are CLUSTERSof processors, discs

Distributed database techniquesmake clusters work

CPU

50 GB Disc

5 GB RAM

Cyber Bricka 4B machine

Jim Gray & Gordon Bell: 1997 presentations

Page 77: Life and Work of Jim Gray | Turing100@Persistent

77

Jim Gray’s quest for real problems and real data … led to a collaboration with Astronomers.

Alex Szalay

Why Astronomy Data?● It has no commercial value

– No privacy concerns– Can freely share results with others– Great for experimenting with algorithms

● It is real and well documented– High-dimensional data (with confidence

intervals)– Spatial data– Temporal data

● Many different instruments from many different places and many different times

● Federation is a goal● There is a lot of it (petabytes)

Page 78: Life and Work of Jim Gray | Turing100@Persistent

78

Availability and ability

to handlevery large volumes

of storage and complex computing

is redefining how we do Science

Page 79: Life and Work of Jim Gray | Turing100@Persistent

79

First Paradigm:For thousands of years, Science was about empirically describing natural phenomenon

Galileo and his telescope

Page 80: Life and Work of Jim Gray | Turing100@Persistent

80

Second Paradigm:Theoretical Science using models and generalization

Newton

Kepler

Maxwell

Page 81: Life and Work of Jim Gray | Turing100@Persistent

81

Third Paradigm:Computational Science: Simulating Complex Phenomenon

Over the last 25 years

Scientists have used computer

simulation to validate

theories.A hurricane computer simulation.

Page 82: Life and Work of Jim Gray | Turing100@Persistent

82

Fourth Paradigm:Data Intensive Science

The scientific method was traditionally driven by hypothesis.

First scientists predict a good response, then collect experimental data to validate the data against its predictions.

However, in the new data-driven approach researchers start with collecting data and analyze data later.

Page 83: Life and Work of Jim Gray | Turing100@Persistent

83

Scientists are collecting data How to codify data and extract insights and knowledge?

Experiments and Instruments

Simulations

Literature

Other Archives

Question

Answer

Page 84: Life and Work of Jim Gray | Turing100@Persistent

Astronomy

● Help build world-wide telescope– All astronomy data and literature online

and cross indexed– Tools to analyze the data

● Built SkyServer.SDSS.org● Built Analysis system

– MyDB– CasJobs (batch job)

● Results:– It works and is used every day– Spatial extensions in SQL 2005– A good example of Data Grid– Good examples of Web Services.

Page 85: Life and Work of Jim Gray | Turing100@Persistent

World Wide TelescopeVirtual Observatoryhttp://www.us-vo.org/ http://www.ivoa.net/

● Premise: Most data is (or could be online)● So, the Internet is the world’s best telescope:

– It has data on every part of the sky– In every measured spectral band: optical, x-ray, radio..

– As deep as the best instruments (2 years ago).– It is up when you are up.

The “seeing” is always great (no working at night, no clouds no moons no..).

– It’s a smart telescope: links objects and data to literature on them.

Page 86: Life and Work of Jim Gray | Turing100@Persistent

SkyServer.SDSS.org● A modern archive

– Access to Sloan Digital Sky SurveySpectroscopic and Optical surveys

– Raw Pixel data lives in file servers– Catalog data (derived objects) lives in Database– Online query to any and all

● Also used for education– 150 hours of online Astronomy– Implicitly teaches data analysis

● Interesting things– Spatial data search– Client query interface via Java Applet– Query from Emacs, Python, …. – Cloned by other surveys (a template design) – Web services are core of it.

Page 87: Life and Work of Jim Gray | Turing100@Persistent

SkyServerSkyServer.SDSS.org

● Like the TerraServer, but looking the other way: a picture of ¼ of the universe

● Sloan Digital Sky Survey Data: Pixels + Data Mining

● About 400 attributes per “object”

● Spectrograms for 1% of objects

Page 88: Life and Work of Jim Gray | Turing100@Persistent

88

SkyQuery

Page 89: Life and Work of Jim Gray | Turing100@Persistent

SkyQuery (http://skyquery.net/)

● Distributed Query tool using a set of web services● Many astronomy archives from

Pasadena, Chicago, Baltimore, Cambridge (England)● Has grown from 4 to 15 archives,

now becoming international standard

● WebService Poster Child● Allows queries like:SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o,

TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5

AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2

Page 90: Life and Work of Jim Gray | Turing100@Persistent

SkyServer/SkyQuery Evolution MyDB and Batch Jobs

Problem: need multi-step data analysis (not just single query).

Solution: Allow personal databases on portal

Problem: some queries are monsters

Solution: “Batch schedule” on portal. Deposits answer in personal database.

Page 91: Life and Work of Jim Gray | Turing100@Persistent

Ecosystem Sensor NetLifeUnderYourFeet.Org

● Small sensor net monitoring soil● Sensors feed to a database● Helping build system to

collect & organize data.● Working on data analysis tools● Prototype for other LIMS

Laboratory Information Management Systems

Page 92: Life and Work of Jim Gray | Turing100@Persistent

RNA Structural Genomics● Goal: Predict secondary and

tertiary structure from sequence.Deduce tree of life.

● Technique: Analyze sequence variations sharing a common structure across tree of life

● Representing structurally aligned sequences is a key challenge

● Creating a database-driven alignment workbench accessing public and private sequence data

Page 93: Life and Work of Jim Gray | Turing100@Persistent

VHA Health Informatics● VHA: largest standardized electronic medical records system

in US.● Design, populate and tune a ~20 TB Data Warehouse and

Analytics environment● Evaluate population health and treatment outcomes, ● Support epidemiological studies

– 7 million enrollees– 5 million patients– Example Milestones:

• 1 Billionth Vital Sign loaded in April ‘06

• 30-minutes to population-wide obesity analysis (next slide)

• Discovered seasonality in blood pressure -- NEJM fall ‘06

Page 94: Life and Work of Jim Gray | Turing100@Persistent

Wt/Ht 5ft 0in 5ft 1in 5ft 2in 5ft 3in 5ft 4in 5ft 5in 5ft 6in 5ft 7in 5ft 8in 5ft 9in 5ft 10in 5ft 11in 6ft 0in 6ft 1in 6ft 2in 6ft 3in 6ft 4in 6ft 5in Legend100 230 211 334 276 316 364 346 300 244 172 114 73 58 16 11 3 1 1 BMI < 18 Underweight105 339 364 518 532 558 561 584 515 436 284 226 144 102 25 13 4 4 1 BMI 18-24.9 Healthy Weight110 488 489 836 815 955 972 1,031 899 680 521 395 256 161 70 23 10 6 4 BMI 25-29.9 Overweight115 526 614 1,018 1,098 1,326 1,325 1,607 1,426 1,175 903 598 451 264 84 59 17 6 4 BMI 30+ Obese120 644 714 1,419 1,583 1,964 2,153 2,612 2,374 1,933 1,450 1,085 690 501 153 95 38 13 9 125 672 855 1,682 1,933 2,628 3,005 3,521 3,405 2,929 2,197 1,538 1,144 756 253 114 46 32 8 130 753 944 1,984 2,392 3,462 3,968 5,039 4,827 4,285 3,223 2,378 1,765 1,182 429 214 81 41 12 135 753 1,062 2,173 2,852 4,105 4,912 6,535 6,535 5,797 4,500 3,393 2,467 1,668 596 309 108 70 15 140 754 1,073 2,300 3,177 4,937 6,286 8,769 8,750 7,939 6,303 4,837 3,493 2,534 977 513 144 106 22 145 748 1,053 2,254 3,389 5,412 7,334 10,485 11,004 10,576 8,084 6,511 4,686 3,344 1,207 680 221 140 41 150 730 1,077 2,361 3,596 6,152 8,665 12,772 14,335 13,866 11,255 9,250 6,545 4,796 1,792 979 350 162 48 155 683 923 2,178 3,391 6,031 8,891 14,181 15,899 16,594 13,517 11,489 8,056 5,741 2,155 1,203 472 249 70 160 671 872 2,106 3,532 6,184 9,580 15,493 18,869 19,939 17,046 14,650 10,366 7,708 2,831 1,618 615 341 100 165 627 772 1,894 3,074 5,773 9,549 16,332 20,080 22,507 19,692 17,729 12,588 9,558 3,548 2,032 716 399 117 170 596 750 1,716 2,900 5,428 9,080 16,633 21,550 25,051 22,568 21,198 15,552 12,093 4,548 2,626 944 489 124 175 493 674 1,521 2,551 4,816 8,417 15,900 21,420 26,262 24,277 23,756 18,194 13,817 5,361 3,178 1,152 586 144 180 486 599 1,411 2,323 4,584 7,855 15,482 20,873 26,922 26,067 26,313 20,358 16,459 6,451 3,848 1,441 737 207 185 420 546 1,195 1,985 3,905 6,918 13,406 19,362 25,818 25,620 27,037 21,799 18,172 7,206 4,458 1,548 867 247 190 424 495 1,073 1,729 3,383 5,909 11,918 17,640 24,277 25,263 27,398 22,697 19,977 8,344 4,937 1,858 963 287 195 341 463 913 1,474 2,803 5,207 10,584 15,727 22,137 23,860 26,373 22,513 20,163 8,754 5,683 2,178 1,120 309 200 315 384 763 1,338 2,602 4,551 9,413 14,149 20,608 22,541 25,452 23,358 21,548 9,284 6,221 2,294 1,295 372 205 265 338 633 1,026 1,993 3,736 7,765 11,940 17,501 19,944 23,065 21,094 20,354 9,270 6,350 2,597 1,322 376 210 275 284 543 853 1,794 3,148 6,804 10,540 15,647 18,129 21,862 20,540 20,271 9,566 6,816 2,786 1,509 418 215 205 244 501 746 1,389 2,645 5,747 8,712 13,064 15,560 19,089 18,191 19,063 9,019 6,675 2,798 1,509 454 220 168 208 415 652 1,231 2,326 4,950 7,751 11,645 13,900 17,577 17,239 17,583 8,896 6,818 2,948 1,635 484 225 156 160 325 522 968 1,873 4,015 6,340 9,794 11,890 14,898 15,097 15,741 8,332 6,441 2,915 1,647 452 230 141 160 259 486 880 1,653 3,334 5,410 8,657 10,500 13,532 13,488 14,815 7,901 6,258 2,859 1,701 496 235 115 119 244 373 738 1,251 2,795 4,570 7,192 8,784 11,489 11,857 12,796 7,113 5,544 2,744 1,617 465 240 72 116 214 313 562 1,099 2,422 3,861 6,044 7,652 9,982 10,692 11,825 6,496 5,392 2,606 1,581 449 245 71 76 169 253 509 888 1,858 3,167 5,076 6,446 8,312 8,647 9,910 5,638 4,742 2,263 1,479 469 250 70 55 152 226 452 753 1,647 2,826 4,505 5,509 7,569 8,064 8,900 5,183 4,319 2,177 1,451 469 255 59 61 128 174 316 599 1,289 2,130 3,468 4,540 5,957 6,451 7,438 4,320 3,741 1,903 1,271 443 260 50 64 117 167 281 493 1,107 1,929 2,963 3,947 5,190 5,797 6,725 3,900 3,429 1,828 1,218 481 265 37 34 88 122 234 454 894 1,449 2,457 3,152 4,374 4,818 5,729 3,350 2,984 1,539 1,028 406 270 47 42 67 119 203 367 800 1,291 2,110 2,740 3,878 4,133 5,075 2,934 2,685 1,468 918 403 275 22 34 44 85 184 291 662 1,064 1,767 2,235 3,113 3,412 4,267 2,598 2,362 1,247 837 334 280 21 20 51 69 139 286 548 903 1,513 1,955 2,770 3,126 3,604 2,273 2,020 1,152 763 300 285 12 12 36 68 118 201 451 720 1,318 1,613 2,208 2,394 3,132 1,924 1,780 994 677 241 290 16 14 47 38 92 182 387 667 1,050 1,301 1,904 2,150 2,655 1,749 1,529 881 688 252 295 9 12 22 53 92 127 341 493 838 1,162 1,577 1,823 2,338 1,445 1,333 813 533 202 300 12 10 30 43 59 117 309 434 764 988 1,428 1,588 1,989 1,255 1,212 709 479 205

VHA Patients in BMI Categories (Based upon vitals from FY04)

DRAFT

HDR Vitals Based Body Mass Index Calculation on VHA FY04 PopulationSource: VHA Corporate Data Warehouse

Total Patients23,876 (0.7%)

701,089 (21.6%)

1,177,093 (36.2%)

1,347,098 (41.5%)3,249,156 (100%)

Page 95: Life and Work of Jim Gray | Turing100@Persistent

95

Jim Gray’s work on Fourth Paradigm and eScience has had a profound impact on the scientific community.

This work continues …

Page 96: Life and Work of Jim Gray | Turing100@Persistent

96

Jim Gray eScience Award

Each year, Microsoft Research presents the Jim Gray eScience Award to a researcher who has made an outstanding contribution to the field of data-intensive computing. The award recognizes innovators whose work truly makes science easier for scientists.

Page 97: Life and Work of Jim Gray | Turing100@Persistent

97

Page 98: Life and Work of Jim Gray | Turing100@Persistent

98

Jim Gray’s Legacy

● The Prolific Writer– Jim Gray’s two rules for authorship:

• The person who types puts their name first, and• It’s easier to add a name to the list of authors

than deal with someone’s hurt feelings.

● The Masterful Presenter● The Sense of Community● The Patient Listener

Ideas

PeopleCommunity

Page 99: Life and Work of Jim Gray | Turing100@Persistent

99

Jim’s Life was aText Book on Mentoring

● Making time● Simply Listening● Inspiring Self-Confidence● Lighting the Way● Nurturing and Pushing● Following the Muse● Connecting Good People

and Good Ideas Without Boundaries

● Promoting the Young● Sharing Knowledge Selflessly● Displaying Professional

Integrity● Advocating for the Field● Keeping things in

Perspective● Being a friend

Page 100: Life and Work of Jim Gray | Turing100@Persistent

100

Page 101: Life and Work of Jim Gray | Turing100@Persistent

101

Lost at Sea …. January 28, 2007

Page 102: Life and Work of Jim Gray | Turing100@Persistent

102

The Search for Jim Gray

Page 103: Life and Work of Jim Gray | Turing100@Persistent

103

The University of California, Berkeley and Gray's family hosted a tribute to him on May 31, 2008.http://www.youtube.com/user/UCBerk

eleyEvents/videos?query=jim+gray

Page 104: Life and Work of Jim Gray | Turing100@Persistent

104