mining virtual universes simulations in a relational database

50
Mining Virtual Universes Simulations in a relational database

Upload: anne-walton

Post on 06-Jan-2018

226 views

Category:

Documents


3 download

DESCRIPTION

Simple observations

TRANSCRIPT

Page 1: Mining Virtual Universes Simulations in a relational database

Mining Virtual Universes

Simulations in a relational database

Page 2: Mining Virtual Universes Simulations in a relational database

Computer simulations.

Why?

Page 3: Mining Virtual Universes Simulations in a relational database

Simple observations

Page 4: Mining Virtual Universes Simulations in a relational database

Simple model

Page 5: Mining Virtual Universes Simulations in a relational database

Simple, analytical solution

Page 6: Mining Virtual Universes Simulations in a relational database

Complex observations

Page 7: Mining Virtual Universes Simulations in a relational database

Galaxy merger

John Hibbard http://www.cv.nrao.edu/~jhibbard/n4038/n4038.html NASA/CXC/SAO/G. Fabbiano et al.

Page 8: Mining Virtual Universes Simulations in a relational database

X-Ray cluster

8

electron density gas temperature

gas pressure

Courtesy Alexis Finoguenov, Ulrich Briel, Peter Schuecker, (MPE)

Page 9: Mining Virtual Universes Simulations in a relational database

Galaxy survey

Page 10: Mining Virtual Universes Simulations in a relational database

N-Body simulations

Page 11: Mining Virtual Universes Simulations in a relational database

Simple dynamics• Newton’s law of gravity for N particles

= -()

Page 12: Mining Virtual Universes Simulations in a relational database

Complex solutions• Only analytical solution for N=2• 3 body not in general• Let alone 10 billion bodies• Need computer simulations

• approximations• scaling like N^2,

Page 13: Mining Virtual Universes Simulations in a relational database
Page 14: Mining Virtual Universes Simulations in a relational database

14Di Matteo, Springel and Hernquist, 2005

Courtesy Volker Springel

Adding hydrodynamics and gas physics

Page 15: Mining Virtual Universes Simulations in a relational database

CMU, CosmoMLStat 15

Millennium-II Simulation

2015-06-03

• 100 Mpc/h

• 1010 particles

• 6.9 106 Msun/h

• ~10 million halos

• ~300GB/snapshot

Boylan-Kolchin etal 2009

Page 16: Mining Virtual Universes Simulations in a relational database

CMU, CosmoMLStat 16

Millennium Simulation

2015-06-03

MRII

• 500 Mpc/h

• 1010 particles

• 8.6 108 Msun/h

• ~18 million halos

• ~300GB/snapshot

Springel etal 2005

Page 17: Mining Virtual Universes Simulations in a relational database

CMU, CosmoMLStat 17

MR-XXL

2015-06-03

MR

• 3Gpc/h• 3x1011 particles• 750 million

halos/snapshot

• 9TB/snapshot

• browse

Page 18: Mining Virtual Universes Simulations in a relational database

FOF groups, (sub)halos and galaxies

Page 19: Mining Virtual Universes Simulations in a relational database
Page 20: Mining Virtual Universes Simulations in a relational database

CMU, CosmoMLStat 202015-06-03

Raw data:Particles

FOF groups and Subhalos

Density fields

Subhalo merger trees

Synthetic galaxies (SAM)Mock catalogues

Page 22: Mining Virtual Universes Simulations in a relational database

millimil@CasJobs

Page 23: Mining Virtual Universes Simulations in a relational database

Revisit relational

databases again

http://www.sdss.jhu.edu/~szalay/class/2015/gl/IntroRDB.html indexing: trees and spatial

Page 24: Mining Virtual Universes Simulations in a relational database
Page 25: Mining Virtual Universes Simulations in a relational database

INDEX-ing• Performance: disk IO is bottleneck• Avoid it as much as possible, but can not store whole DB in

memory• To find rows of interest, avoid having to scan complete tables

• sequential scan ~ O(N)• ~10 min for galaxy tables (109 rows, 250 GB)

• Binary search speed up: requires ordering• ~ O(log(N))• B-Trees

• Can only order in one way: create external data structure, INDEX, ordered according to >=1 columns, with direct pointer to row.

Page 26: Mining Virtual Universes Simulations in a relational database

snapnum, stellarMass, galaxyid

Indexes

mag_b snapnum, x

Page 27: Mining Virtual Universes Simulations in a relational database

B-tree

Page 28: Mining Virtual Universes Simulations in a relational database

Special indexes• trees• spatial

Page 29: Mining Virtual Universes Simulations in a relational database

Time evolution: merger trees

Page 30: Mining Virtual Universes Simulations in a relational database
Page 31: Mining Virtual Universes Simulations in a relational database

Formation histories:Subhalo and Galaxy merger trees

• Tree structure• halos have single descendant• halos have main progenitor

• Hierarchical structures usually handled using recursive code• inefficient for data access• not (well) supported in RDBs

• Tree indexes• depth first ordering of trees• label by rank in order• pointer to “last progenitor” below each node• all progenitors have label BETWEEN label of root AND that of last progenitor• cluster table on label

Page 32: Mining Virtual Universes Simulations in a relational database
Page 33: Mining Virtual Universes Simulations in a relational database

select prog.snapnum, prog.x, prog.y, prog.np from millimil..mpahalo des , millimil..mpahalo prog where prog.haloId between des.haloId and des.lastProgenitorId and des.haloId = 0

Page 34: Mining Virtual Universes Simulations in a relational database
Page 35: Mining Virtual Universes Simulations in a relational database

select prog.snapnum, prog.x, prog.y, prog.mag_b-prog.mag_v as color from millimil..delucia2006a des , millimil..delucia2006a prog where prog.galaxyId between des.galaxyId and des.lastProgenitorId and des.galaxyId = 0

(See topcat)

Galaxies

Page 36: Mining Virtual Universes Simulations in a relational database

Millennium DB Tutorial2007-01-17/19 Leiden

Some more features of the merger tree data model

Leaves :

select galaxyId as leaf from galaxies des where galaxyId

= lastProgenitorId

Branching points :

select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1

Page 37: Mining Virtual Universes Simulations in a relational database

Millennium DB Tutorial2007-01-17/19 Leiden

Main branches • Roots and leaves:

select des.galaxyId as rootId, min(prog.lastprogenitorid) as leafId into rootLeaf from mpagalaxies..delucia2006a des , mpagalaxies.. delucia2006a prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId

• Main branchselect rl.rootId, b.* from rootLeaf rl , mpagalaxies..delucia2006a b where b.galaxyId between rl.rootId and rl.leafId

Page 38: Mining Virtual Universes Simulations in a relational database

38

Query particles in volume

Page 39: Mining Virtual Universes Simulations in a relational database

Find all halos in a subvolume of space:

10 <= x < 2020 <= y < 300 <= z < 10

Page 40: Mining Virtual Universes Simulations in a relational database

Inefficient, even when indexed

select x,y,z from mpahalotrees..mhalo where snapnum = 63 and x between 10 and 20 and y between 20 and 30 and z between 0 and 10

Page 41: Mining Virtual Universes Simulations in a relational database

Why inefficient x y z 15.001083 42.471325 24.673561

15.001247 58.420914 42.722874

15.002215 38.042484 29.557423

15.002735 50.487785 57.716877

15.002753 20.000177 8.21466

15.005095 13.637599 16.135191

15.006593 22.170828 48.242783

15.011488 24.824438 19.773285

15.011741 48.099907 11.500685

15.011868 23.312265 27.858799

15.013065 23.969515 18.883507

15.013158 56.041866 40.82894

15.014361 59.503357 45.31733

15.017322 46.257664 44.37695

15.018202 27.333895 9.441319

Page 42: Mining Virtual Universes Simulations in a relational database

Spatial indexes• Performance of finding things is improved if

those things are co-located on disk: ordering, indices

• Co-locating a 3D configuration of points on a 1D disk can only be done approximately

• Space filling curves: Peano-Hilbert • requires user defined functions

to use

• Simpler: Zones

Page 43: Mining Virtual Universes Simulations in a relational database

43

Query particles in volume

Page 44: Mining Virtual Universes Simulations in a relational database

44

Index cells using space filling curve

Page 45: Mining Virtual Universes Simulations in a relational database

CMU, CosmoMLStat 45

Query particles in sphere/box• Calculate overlap space filling curve with query volume

• Decide (from index table) which files to query

• And where to seek, how far to scan

• Implement as SQLCLR table-valued-function• Run from database

2015-06-03

Page 46: Mining Virtual Universes Simulations in a relational database

Simpler: Zones

Page 47: Mining Virtual Universes Simulations in a relational database

Zone index• Coarse sampling of points in multiple dimensions

allows simple multi-dimensional ordering• ix = floor(x/10Mpc)

iy = floor(y/10Mpc)iz = floor(z/10Mpc)

• index on (snapnum,ix,iy,iz,x,y,z,galaxyId)

Page 48: Mining Virtual Universes Simulations in a relational database

IX IY IZ X Y Z

1 2 0 15.061804 20.891907 4.4156647

1 2 0 15.069336 23.437601 9.812217

1 2 0 15.100678 20.905642 4.613036

1 2 0 15.173968 22.36883 8.01832

1 2 0 15.194122 20.67583 4.8034463

1 2 0 15.2500305 24.246683 1.6651521

1 2 0 15.365576 23.290754 9.404872

1 2 0 15.372606 20.203691 2.0006201

1 2 0 15.524696 21.03997 4.280077

1 2 0 15.583943 22.344622 9.421347

1 2 0 15.6358385 26.785904 9.881406

1 2 0 15.66383 22.829983 7.137772

1 2 0 15.673803 26.918291 3.302736

1 2 0 15.717824 22.365341 9.221828

1 2 0 15.847992 24.700747 1.389664

1 2 0 15.883896 22.593819 7.277129

1 2 0 15.91041 26.531118 2.5693457

1 2 0 15.916905 27.137867 4.289855

1 2 0 16.047333 28.93811 5.414605

Page 49: Mining Virtual Universes Simulations in a relational database

Using zonesselect x,y,z from mpahalo where snapnum = 63 and ix = 1 and iy = 2 and iz = 0

NB does NOT include galaxies with x=20 exactly!

Page 50: Mining Virtual Universes Simulations in a relational database

“20 questions”1. Return the (B-band luminosity function of) galaxies residing in halos of mass

between 10^13 and 10^14 solar masses. 2. Return the galaxy content at z=3 of the progenitors of a halo identified at z=0 3. Return all the galaxies within a sphere of radius 3Mpc around a particular halo 4. Return the complete halo merger tree for a halo identified at z=0 5. Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour

and bulge-to-disk ratio within given intervals. 6. Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a

major merger (mass-ratio < 4:1) since redshift 1.5. 7. Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) 8. Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3 9. Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a

progenitor of BCG's in z=0 cluster of mass >10**14.5 10.Find all z=3 galaxies which have NO z=0 descendant. 11.Return the complete galaxy merging history for a given z=0 galaxy. 12.Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at

some previous redshift. 13.Find the multiplicity function of halos depending on their environment (over density

of density field smoothed on certain scale)14.Find the dependency of halo formation times on environment