potential use of cloud computing for streamlining the processing of mt data prof j craig mudge ftse...
TRANSCRIPT
Potential use of Cloud computingfor streamlining the processing of
MT data
Prof J Craig Mudge FTSECollaborative Cloud Computing Lab
(C3L)
New eScience Lab enabled by
cloud computing
Seed funding from -- minerals and geothermal research at www.pir.sa.gov.au-- Microsoft Research USA Jim Gray Seed Grant
JCM 30 Sept 20102
Prof Graham Heinson Prof J Craig Mudge FTSE
Stephan Thiel Pinaki Chan
Jared Peacock Wei Wang
Andrew Wendelborn
Acknowledgements: David Giles, Richard Lane, Tim Baker, Tristan Wurst
Magnetotelluric (MT) imaging1. Using the magnetic and electric
fields of the earth, MT imaging determines the resistivity structure of a sub-surface area of interest.
2. It goes deeper (hundred or so Km) than seismic (<2 Km) but does not have the same resolution
3. Applications1. mineral exploration, 2. water management in mining, 3. geothermal exploration, 4. carbon storage, 5. aquifer research and management6. earthquake and volcano studies.
[email protected] sep 2010
CO2 in depleted gas field
(Heinson and Mudge, 2010)
3
Ahead for research in minerals and energy
1. Data deluge
25 Terabytes per day 700MB of data per second, 60TB/day, 20PB/year Petabytes per day
2. Computation, e.g., rapid inversion
3. Data and experiments: curation, provenance, sharing, reuse
5
Gene sequencers Large Hadron Collider Square Kilometre Array
6
Approx 100,000 PCs in Google data centres
Google Goose-Creek Google Dalles Oregon
From www.cloudinnovation.com.au
7
Essence of cloud
1. software as a service – applications are delivered over the Internet with a common-or-garden browser
2. significant cost savings, factors of 5x – 7x3. presented as a utility with a matching business model, namely pay-per-use
4. a new data-parallel programming framework
8
Cost savings in warehouse-sized data centres
1. resources in massive warehouse-sized data centres are pooled at scale,
2. built from low-cost commodity chips and disks (run time environment of MapReduce, Dryad
takes care of fault tolerance, scheduling, and load balancing)
3. share the overhead of cooling, refrigeration, physical security, and backup power,
M
Execution of MapReduce
The Map step is shown as
in the following slides
(Dean andGhemawat, 2004) 9
Decomposition
• Task decomposition– How can a problem be decomposed into tasks
that can execute concurrently?• Data decomposition
– How can a problem's data be decomposed into units that can be operated on relatively independently?
then dependencies among the tasks– Group tasks, Order tasks, and Data Sharing
10
Parallel execution of gridded exploration data
M
M
M
M
M
by using sub grids when the original is too big to do as one grid
Form sub grids
Concrete example: Map step is an existing MatLab program running on Amazon EC2
R
R
R
Re-combine
12
Potentially energy scavenging, too
Water: Data collection, management, and analysis in the cloud Data integration/ Data use data fusion
Organisations (water, government, regulators, market operators, and researchers) will mine this data.
Data clean Data analysisData repurpose
Visualisaton
Wireless ad-hoc networks - mesh networked motes with sensors
Sensors -- 10 yearsOn 2 AA batteries
40 mm
gateway
Metadata and databases of interest
River data: fromsensors (both mobile , moored)
Existing data bases
WeatherAquiferRiverIrrigation
Remote sense (satellite)
Historical photos
etc
Data collection, aggregation - high volumes of complex heterogeneous data
www.pacific-challenge.comCraig Mudge 13.9.2010 24.9.2010
satellite
Academy Working GroupCloud computing at peta-scale
1. Alex Zelinsky, CSIRO Group Executive, 17 May 2010 “The Academy project has been a real catalyst for getting the cloud computing agenda moving forward in Australia.”
2. Summer internships – cloud computing$1,000 prize won by Jinhui Yao for his security project in an internship hosted by CSIRO
3. Report to be launched October 14 in Canberra
14
NBN: fiber/wireless net connecting mobile and fixed clients to a cloud computing infrastructure for applications & content
Cloud Computing:Services &
Content
Mobile Clients
Fixed Clients & Client Nets
NBN
Television Content
Computer person’s view of NBN:“Continuous Services i.e. apps &
Client Connected Devices”
Cloud Computing:Services &
Content
Mobile ClientsConnected Devices
Fixed Clients & Client Nets
NBN
Television Content
Our cloud service providers
Google Amazon Microsoft Azure
Drop box
Document sharing
Computation Computation Document sharing
Calendar Storage Code re-use
Search Search
18
Magnetotelluric (MT) imaging1. Using the magnetic and electric
fields of the earth, MT imaging determines the resistivity structure of a sub-surface area of interest.
2. It goes deeper (hundred or so Km) than seismic (<2 Km) but does not have the same resolution
3. Applications1. mineral exploration, 2. water management in mining, 3. geothermal exploration, 4. carbon storage, 5. aquifer research and management6. earthquake and volcano studies.
[email protected] sep 2010
CO2 in depleted gas field
(Heinson and Mudge, 2010)
20
Clea
nBr
oadb
and
proc
essi
ngE
field
con
vers
ion
to s
tand
ard
units inspect
with GMT plots Forward
Modellingand
Inversion
Station 1
Station n
Station 2
MT Station data from logging in the field
Outputs from BIRRP are (a)impedance Z, where E=ZB (b)coherence data(c) Apparent resistivity and phase
BIRRP
Conv
ert
to E
DI
Conv
ert
to E
DI
Conv
ert
to E
DI
21
Time series Apparent resistivity
Forward model and inversion Start
Compute MT response of new model
Compare Model response and MT observed data
Update model
NRequired misfit?
Exceeded max # of iterations?
N
Y Y
<
<
MT Processing
1. Time series data from stationsRemove outliers
- To frequency domain - Apply BIRRP (Chave,
Thomson 1989 (robust METHOD)
Produces resistivity – by frequency and phase
2. Inversion to produce subsurface image
(Siri 2005)
~ 24 hours
~3 to 4 weeks for 3D
Chave and Thompson Bounded influence magnetotelluric response function estimation. Geophys. Jnl. Int. 1989Siripunvaraporn, Egbert, Lenury, and Uyeshima. Three dimensional magnetotelluric inversion: data-space method. Physics of the Earth and Planetary Interiors 150. 2005
Currently
Reflections – September 2010Value of cloud for PIRSA, our MT processing, and CRC DET
1. Access to cheap flexible computing1. Amazon runs Fortran, Matlab, Python, etc. E.g., T Dhu’s gridded execution2. On-demand purchase of a couple of hours of a more powerful computer (generally in
memory – 8 Gbytes, for example); pricing is growing in sophistication – spot pricing, micro- instances, etc.
2. Parallel execution1. Easy to get concurrent execution of steps, e.g., 45 stations2. Parallel within a step (Google’s MapReduce and Dryad/LINQ) is hard work, but have made a
little progress
3. Our future work on integration in multi-layered data bases has been strongly endorsed
DisappointmentsHonours student gave up on Visualisation of sub-surface layers using Bing/Google Earth
eScience workflow was a major contribution (unexpected)1. Less human interaction, repeatable, provenance, sharing of workflows internationally 2. Increasingly important, as volume of data grows
No machines Lab: “built first cloud based server, which is the SVNserver for C3 Lab in the Amazon EC2 cloud. “
Craig Mudge 29.9.2010
24
3/12/09Bill Howe, UW25
Scientific Workflow Systems• Value proposition: More time on science, less time on code, admin
• How: By providing language emphasizing sharing, reuse, reproducibility, rapid prototyping, efficiency
– Provenance– Visual programming– Integration with domain-specific tools– Scheduling– Data curation
2010: Honours project in Geophysics – Tristan Wurst –Steps in MT processing
Yes
NoIs: a)
or b)
?
required m isfit obta ined max. num ber of iteration
exceeded
MTmodel response
gravitymodel response
porositymodel
densitymodel
conductivitymodel
gravityobserved data
MTobserved data
Porosity-Densityrelationship Archie’s Law
compare
Stop
updatemodel
Start
Porosity Joint Inversion
Invert for a single parameter,
to which both techniques are sensitive
(Rachel Maier, 2010)
(Rachel Maier, 2010)
Renmark Trough
5 10 15 20 25 30 35
0
1
2
3
4
5
resistivity (ohm.m
)
dept
h (k
m)
distance (km)
RMSMT
=2.30
0.5
1
1.5
2
5 10 15 20 25 30 35
0
1
2
3
4
5
resistivity (ohm.m
)
dept
h (k
m)
distance (km)
RMSJI=5.3 RMS
MT=5.3 RMS
GV=4.5
0
0.5
1
1.5
2
D
ep
th (km
)
Distance (km)
Devonian D=2360
Basement D=2800
NE SW
Seismic constrained Gravity
MT Inversion
Joint Inversion5 10 15 20 25 30 35
0
1
2
3
4
5
resistivity (ohm.m
)
dept
h (k
m)
distance (km)
RMSMT
=2.30
0.5
1
1.5
2
5 10 15 20 25 30 35
0
1
2
3
4
5
resistivity (ohm.m
)
dept
h (k
m)
distance (km)
RMSJI=5.3 RMS
MT=5.3 RMS
GV=4.5
0
0.5
1
1.5
2
5 10 15 20 25 30 35
-40
-30
-20
-10
g z (m
Gal
s)
distance (km)
100
101
102
TE data
TE model responseTM data
TM model response
10-2
100
102
0
30
60
90
App
Res
(
.m)
phas
e
period (s)
100
101
102
10-2
100
102
0
30
60
90
App
Res
(
.m)
phas
e
period (s)
5 10 15 20 25 30 35
-40
-30
-20
-10
g z (m
Gal
s)
distance (km)
100
101
102
TE data
TE model responseTM data
TM model response
10-2
100
102
0
30
60
90
App
Res
(
.m)
phas
e
period (s)
100
101
102
10-2
100
102
0
30
60
90
App
Res
(
.m)
phas
e
period (s)
5 10 15 20 25 30 35
-40
-30
-20
-10
g z (m
Gal
s)
distance (km)
100
101
102
TE data
TE model responseTM data
TM model response
10-2
100
102
0
30
60
90
App
Res
(
.m)
phas
e
period (s)
100
101
102
10-2
100
102
0
30
60
90
App
Res
(
.m)
phas
e
period (s)
28
Data Computeand geologist’s data integrations
Data logging with near real-time feedback
Sub-surface
Future areas
1. Seismic2. Inversion and forward modelling in general3. Rapid inversion, too4. Data integration or data fusion
across multiple layers
5. Data mining
29
30
Data Computeand geologist’s data integrations
Collaboration
Sensing – a dozen or more sensors
steering
SeismicXRFResistivityetc
drilling machine control system
Geologistin field
Seismic,Satellite,MT,PetrophysicalCores,Densityetc
Sub-surface
A geologist steering a drill in real time, using real-time sensingof the sub-surface and updating geological models, while referring to her cloud-based data sets and collaborating with her team back home
Vision:
www.cloudinnovation.com.au
0417 679 266
Searching the Deep Earth: sustaining your wealth for the next century
32
High Flyers Think Tank
Canberra19–20 Aug 2010
from draft report
... nationally coordinated program to deploy new geophysical tools (magneto telluric, passive seismic) and methods (geochemical) integrated with a comprehensive drilling program....next, using petascale computing, Storage, and network resources these data will be integrated into multi-dimensional databases ...
Searching the Deep Earth: sustaining your wealth for the next century
33
High Flyers Think Tank
Canberra19–20 Aug 2010
from draft report
... nationally coordinated program to deploy new geophysical tools (magneto telluric, passive seismic) and methods (geochemical) integrated with a comprehensive drilling program....next, using petascale computing, Storage, and network resources these data will be integrated into multi-dimensional databases ...