potential use of cloud computing for streamlining the processing of mt data prof j craig mudge ftse...

34
Potential use of Cloud computing for streamlining the processing of MT data Prof J Craig Mudge FTSE Collaborative Cloud Computing Lab (C3L)

Upload: dustin-bradford

Post on 26-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Potential use of Cloud computingfor streamlining the processing of

MT data

Prof J Craig Mudge FTSECollaborative Cloud Computing Lab

(C3L)

New eScience Lab enabled by

cloud computing

Seed funding from -- minerals and geothermal research at www.pir.sa.gov.au-- Microsoft Research USA Jim Gray Seed Grant

JCM 30 Sept 20102

Prof Graham Heinson Prof J Craig Mudge FTSE

Stephan Thiel Pinaki Chan

Jared Peacock Wei Wang

Andrew Wendelborn

Acknowledgements: David Giles, Richard Lane, Tim Baker, Tristan Wurst

Magnetotelluric (MT) imaging1. Using the magnetic and electric

fields of the earth, MT imaging determines the resistivity structure of a sub-surface area of interest.

2. It goes deeper (hundred or so Km) than seismic (<2 Km) but does not have the same resolution

3. Applications1. mineral exploration, 2. water management in mining, 3. geothermal exploration, 4. carbon storage, 5. aquifer research and management6. earthquake and volcano studies.

[email protected] sep 2010

CO2 in depleted gas field

(Heinson and Mudge, 2010)

3

Overview of cloud computing

4

Ahead for research in minerals and energy

1. Data deluge

25 Terabytes per day 700MB of data per second, 60TB/day, 20PB/year Petabytes per day

2. Computation, e.g., rapid inversion

3. Data and experiments: curation, provenance, sharing, reuse

5

Gene sequencers Large Hadron Collider Square Kilometre Array

6

Approx 100,000 PCs in Google data centres

Google Goose-Creek Google Dalles Oregon

From www.cloudinnovation.com.au

7

Essence of cloud

1. software as a service – applications are delivered over the Internet with a common-or-garden browser

2. significant cost savings, factors of 5x – 7x3. presented as a utility with a matching business model, namely pay-per-use

4. a new data-parallel programming framework

8

Cost savings in warehouse-sized data centres

1. resources in massive warehouse-sized data centres are pooled at scale,

2. built from low-cost commodity chips and disks (run time environment of MapReduce, Dryad

takes care of fault tolerance, scheduling, and load balancing)

3. share the overhead of cooling, refrigeration, physical security, and backup power,

M

Execution of MapReduce

The Map step is shown as

in the following slides

(Dean andGhemawat, 2004) 9

Decomposition

• Task decomposition– How can a problem be decomposed into tasks

that can execute concurrently?• Data decomposition

– How can a problem's data be decomposed into units that can be operated on relatively independently?

then dependencies among the tasks– Group tasks, Order tasks, and Data Sharing

10

Parallel execution of MT data- one per station

M

M

M

M

M

Sort

by

key

R

R

R

Station 1

Station n

11

Parallel execution of gridded exploration data

M

M

M

M

M

by using sub grids when the original is too big to do as one grid

Form sub grids

Concrete example: Map step is an existing MatLab program running on Amazon EC2

R

R

R

Re-combine

12

Potentially energy scavenging, too

Water: Data collection, management, and analysis in the cloud Data integration/ Data use data fusion

Organisations (water, government, regulators, market operators, and researchers) will mine this data.

Data clean Data analysisData repurpose

Visualisaton

Wireless ad-hoc networks - mesh networked motes with sensors

Sensors -- 10 yearsOn 2 AA batteries

40 mm

gateway

Metadata and databases of interest

River data: fromsensors (both mobile , moored)

Existing data bases

WeatherAquiferRiverIrrigation

Remote sense (satellite)

Historical photos

etc

Data collection, aggregation - high volumes of complex heterogeneous data

www.pacific-challenge.comCraig Mudge 13.9.2010 24.9.2010

satellite

Academy Working GroupCloud computing at peta-scale

1. Alex Zelinsky, CSIRO Group Executive, 17 May 2010 “The Academy project has been a real catalyst for getting the cloud computing agenda moving forward in Australia.”

2. Summer internships – cloud computing$1,000 prize won by Jinhui Yao for his security project in an internship hosted by CSIRO

3. Report to be launched October 14 in Canberra

14

www.cloudinnovation.com.au

NBN: fiber/wireless net connecting mobile and fixed clients to a cloud computing infrastructure for applications & content

Cloud Computing:Services &

Content

Mobile Clients

Fixed Clients & Client Nets

NBN

Television Content

Computer person’s view of NBN:“Continuous Services i.e. apps &

Client Connected Devices”

Cloud Computing:Services &

Content

Mobile ClientsConnected Devices

Fixed Clients & Client Nets

NBN

Television Content

Our cloud service providers

Google Amazon Microsoft Azure

Drop box

Document sharing

Computation Computation Document sharing

Calendar Storage Code re-use

Search Search

18

Our first application domain, magnetotellurics

19

Magnetotelluric (MT) imaging1. Using the magnetic and electric

fields of the earth, MT imaging determines the resistivity structure of a sub-surface area of interest.

2. It goes deeper (hundred or so Km) than seismic (<2 Km) but does not have the same resolution

3. Applications1. mineral exploration, 2. water management in mining, 3. geothermal exploration, 4. carbon storage, 5. aquifer research and management6. earthquake and volcano studies.

[email protected] sep 2010

CO2 in depleted gas field

(Heinson and Mudge, 2010)

20

Clea

nBr

oadb

and

proc

essi

ngE

field

con

vers

ion

to s

tand

ard

units inspect

with GMT plots Forward

Modellingand

Inversion

Station 1

Station n

Station 2

MT Station data from logging in the field

Outputs from BIRRP are (a)impedance Z, where E=ZB (b)coherence data(c) Apparent resistivity and phase

BIRRP

Conv

ert

to E

DI

Conv

ert

to E

DI

Conv

ert

to E

DI

21

Time series Apparent resistivity

Forward model and inversion Start

Compute MT response of new model

Compare Model response and MT observed data

Update model

NRequired misfit?

Exceeded max # of iterations?

N

Y Y

<

<

MT Processing

1. Time series data from stationsRemove outliers

- To frequency domain - Apply BIRRP (Chave,

Thomson 1989 (robust METHOD)

Produces resistivity – by frequency and phase

2. Inversion to produce subsurface image

(Siri 2005)

~ 24 hours

~3 to 4 weeks for 3D

Chave and Thompson Bounded influence magnetotelluric response function estimation. Geophys. Jnl. Int. 1989Siripunvaraporn, Egbert, Lenury, and Uyeshima. Three dimensional magnetotelluric inversion: data-space method. Physics of the Earth and Planetary Interiors 150. 2005

Currently

Reflections – September 2010Value of cloud for PIRSA, our MT processing, and CRC DET

1. Access to cheap flexible computing1. Amazon runs Fortran, Matlab, Python, etc. E.g., T Dhu’s gridded execution2. On-demand purchase of a couple of hours of a more powerful computer (generally in

memory – 8 Gbytes, for example); pricing is growing in sophistication – spot pricing, micro- instances, etc.

2. Parallel execution1. Easy to get concurrent execution of steps, e.g., 45 stations2. Parallel within a step (Google’s MapReduce and Dryad/LINQ) is hard work, but have made a

little progress

3. Our future work on integration in multi-layered data bases has been strongly endorsed

DisappointmentsHonours student gave up on Visualisation of sub-surface layers using Bing/Google Earth

eScience workflow was a major contribution (unexpected)1. Less human interaction, repeatable, provenance, sharing of workflows internationally 2. Increasingly important, as volume of data grows

No machines Lab: “built first cloud based server, which is the SVNserver for C3 Lab in the Amazon EC2 cloud. “

Craig Mudge 29.9.2010

24

3/12/09Bill Howe, UW25

Scientific Workflow Systems• Value proposition: More time on science, less time on code, admin

• How: By providing language emphasizing sharing, reuse, reproducibility, rapid prototyping, efficiency

– Provenance– Visual programming– Integration with domain-specific tools– Scheduling– Data curation

2010: Honours project in Geophysics – Tristan Wurst –Steps in MT processing

Yes

NoIs: a)

or b)

?

required m isfit obta ined max. num ber of iteration

exceeded

MTmodel response

gravitymodel response

porositymodel

densitymodel

conductivitymodel

gravityobserved data

MTobserved data

Porosity-Densityrelationship Archie’s Law

compare

Stop

updatemodel

Start

Porosity Joint Inversion

Invert for a single parameter,

to which both techniques are sensitive

(Rachel Maier, 2010)

(Rachel Maier, 2010)

Renmark Trough

5 10 15 20 25 30 35

0

1

2

3

4

5

resistivity (ohm.m

)

dept

h (k

m)

distance (km)

RMSMT

=2.30

0.5

1

1.5

2

5 10 15 20 25 30 35

0

1

2

3

4

5

resistivity (ohm.m

)

dept

h (k

m)

distance (km)

RMSJI=5.3 RMS

MT=5.3 RMS

GV=4.5

0

0.5

1

1.5

2

D

ep

th (km

)

Distance (km)

Devonian D=2360

Basement D=2800

NE SW

Seismic constrained Gravity

MT Inversion

Joint Inversion5 10 15 20 25 30 35

0

1

2

3

4

5

resistivity (ohm.m

)

dept

h (k

m)

distance (km)

RMSMT

=2.30

0.5

1

1.5

2

5 10 15 20 25 30 35

0

1

2

3

4

5

resistivity (ohm.m

)

dept

h (k

m)

distance (km)

RMSJI=5.3 RMS

MT=5.3 RMS

GV=4.5

0

0.5

1

1.5

2

5 10 15 20 25 30 35

-40

-30

-20

-10

g z (m

Gal

s)

distance (km)

100

101

102

TE data

TE model responseTM data

TM model response

10-2

100

102

0

30

60

90

App

Res

(

.m)

phas

e

period (s)

100

101

102

10-2

100

102

0

30

60

90

App

Res

(

.m)

phas

e

period (s)

5 10 15 20 25 30 35

-40

-30

-20

-10

g z (m

Gal

s)

distance (km)

100

101

102

TE data

TE model responseTM data

TM model response

10-2

100

102

0

30

60

90

App

Res

(

.m)

phas

e

period (s)

100

101

102

10-2

100

102

0

30

60

90

App

Res

(

.m)

phas

e

period (s)

5 10 15 20 25 30 35

-40

-30

-20

-10

g z (m

Gal

s)

distance (km)

100

101

102

TE data

TE model responseTM data

TM model response

10-2

100

102

0

30

60

90

App

Res

(

.m)

phas

e

period (s)

100

101

102

10-2

100

102

0

30

60

90

App

Res

(

.m)

phas

e

period (s)

28

Data Computeand geologist’s data integrations

Data logging with near real-time feedback

Sub-surface

Future areas

1. Seismic2. Inversion and forward modelling in general3. Rapid inversion, too4. Data integration or data fusion

across multiple layers

5. Data mining

29

30

Data Computeand geologist’s data integrations

Collaboration

Sensing – a dozen or more sensors

steering

SeismicXRFResistivityetc

drilling machine control system

Geologistin field

Seismic,Satellite,MT,PetrophysicalCores,Densityetc

Sub-surface

A geologist steering a drill in real time, using real-time sensingof the sub-surface and updating geological models, while referring to her cloud-based data sets and collaborating with her team back home

Vision:

Searching the Deep Earth: sustaining your wealth for the next century

32

High Flyers Think Tank

Canberra19–20 Aug 2010

from draft report

... nationally coordinated program to deploy new geophysical tools (magneto telluric, passive seismic) and methods (geochemical) integrated with a comprehensive drilling program....next, using petascale computing, Storage, and network resources these data will be integrated into multi-dimensional databases ...

Searching the Deep Earth: sustaining your wealth for the next century

33

High Flyers Think Tank

Canberra19–20 Aug 2010

from draft report

... nationally coordinated program to deploy new geophysical tools (magneto telluric, passive seismic) and methods (geochemical) integrated with a comprehensive drilling program....next, using petascale computing, Storage, and network resources these data will be integrated into multi-dimensional databases ...

The Power Wall

34www.pacific-challenge.com