2007-05-15tig session 3+millennium database millennium database overview and some first usage...

17
2007-05-15 TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

Upload: clarence-stokes

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Millennium Database

Overview and some first usage experiences

Gerard Lemson and the Virgo Consortium

astro-ph/0608019

Page 2: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

The Virgo consortium’s Millennium simulation

• Millennium simulation– 10 billion particles, dark matter only– 500 Mpc (~2Gly) periodic box– “concordance model” (as of 2004) initial conditions– 64 snapshots– 350000 CPU hours– O(30Tb) raw + post-processed data

• Postprocessing:– dark matter density fields smoothed at various scales (45 * 2563

grid cells)– dark matter cluster merger trees (~750 million)– galaxy merger trees (~1 billion/catalogue)

• DeLucia & Blaizot, 2006• Bower et al, 2006

Page 3: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Dark matter and galaxies

Page 4: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Halos and galaxies

Page 5: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Database design

Page 6: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Database design: “20 queries”

1. Return the galaxies residing in halos of mass between 10^13 and 10^14 solar masses.

2. Return the galaxy content at z=3 of the progenitors of a halo identified at z=0

3. Return the complete halo merger tree for a halo identified at z=0 4. Find properties of all galaxies in haloes of mass 10**14 at redshift 1

which have had a major merger (mass-ratio < 4:1) since redshift 1.5. 5. Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) 6. Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10

Msun/yr) at z=3 7. Find all z=3 galaxies which have NO z=0 descendent. 8. Return all the galaxies within a sphere of radius 3Mpc around a particular

halo 9. Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e.

SFR>10Msun/yr) at some previous redshift.10. Find the multiplicity function of halos depending on their environment

(overdensity of density field smoothed on certain scale)11. Find the dependency of halo formation times on environment

Page 7: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Time evolution: merger trees

Page 8: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Merger trees :select prog. from galaxies des , galaxies prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId

Leaves :select galaxyId as leaf from galaxies des where galaxyId

= lastProgenitorId

Branching points :select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1

Page 9: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

More database design features• Spatial indices

– Peano-Hilbert index links to field (256^3)– Z-curve index (bit interleaved, 256^3)

• SQLServer2005 CLR integration with C# for range queries

– Zone index (ix/iy/iz, 50^3)select * from galaxies where snapnum = 63 and ix = 1 and iy = 5 and iz = 20

• Random samplingselect * from galaxies where snapnum = 63 and random between 1000 and 2000

Page 10: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

the Millennium database web server

• Web application (Java in Apache tomcat web server)– portal: http://www.mpa-garching.mpg.de/millennium/– public DB access: http://www.g-vo.org/Millennium

• 30sec/1000rows | 30sec/unlimited rows

– private access: http://www.g-vo.org/MyMillennium• 30sec/1000rows | 420sec/unlimited rows

– MyDB, 1Gb, sometimes more

• Access methods– browser with plotting capabilities through VOPlot applet– wget + IDL, R– TOPCAT plugin

Page 11: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Page 12: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Page 13: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Page 14: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Usage statistics

• Up since Aug 2006

• Community notified via preprint server http://xxx.lanl.gov/abs/astro-ph/0608019

• Obtained form DB-base log with SQL

• > 130 registered users

• almost 1.7 million queries (not all correct)

• since March 3, >5 billion rows handled

Page 15: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Page 16: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Usage patterns

• Start with milli-Millennium (1/512 of full)• Some download complete set• Mainly to test approach, SQL• Ask for account on full Millennium• Run into timeout

– either ask me– cut query in pieces– execute via script, using wget (good for hit rate count of site!)

• MyDB usage– small projects collaborate via results, – upload own data (when local at MPA, or via me)

Page 17: 2007-05-15TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/0608019

2007-05-15 TIG session 3+Millennium database

Conclusions• If you have valuable data (and “if you build it”), “they will come”• PR helps

– astro-ph/– presentations by owners (Simon White, Volker Springel, Carlos Frenk)

• Users are not stupid– can and will learn SQL – don’t mind learning SQL (especially when relatively young)– come up with interesting solutions on their own

• Documentation important– not optimal yet: indexes, internal relationships

• Help desk (i.e. me) helps and is much appreciated• Possible/planned improvements

– full upload facility into MyDB – mirror machine with CAS jobs

• longer timeouts• batch querying• collaboration easier