http:// nsf dms-0101360 vostat - head 2004 ashish mahabal vostat arming astronomers with advanced...

19
VOStat - HEAD 2004 Ashish Mahabal http://www.vost at.org NSF DMS- VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham, S.G.Djorgovski, R. Williams Penn State: J. Babu (PI), E. Feigelson CMU: R. Nichol, D. Van DenBerk, L.Wasserman

Post on 20-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

VOStat Arming Astronomers with

Advanced Statistics

Caltech: A. Mahabal, M. Graham, S.G.Djorgovski,

R. Williams

Penn State: J. Babu (PI), E. Feigelson

CMU: R. Nichol, D. Van DenBerk, L.Wasserman

Page 2: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Use of statistics

• 15000 astronomical studies per year

• 5% have “statistics” in their abstract

• 20% treat variable objects or multivariate datasets

Page 3: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Traditional methods

• Fourier transform (Fourier 1807)

• Least sq. and chisq (Legendre 1805, Pearson 1901)

• Kolmogorov-Smirnov test (Kolomogrov 1933)

• Principal Component Analysis (Hotelling 1936)

Page 4: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

VOStat

• Web based service

• Simple and sophisticated statistical routines

• Large datasets

• Public domain (R)/ specially written

• General purpose and Virtual Observatory

Page 5: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

VOStat

• ASCII / VOTABLE as input (can be used as an intermediate block for a VO based pipeline)

• CGI routines as prototypes (few 1000 lines)• Webservices (Java GUI) - hundreds of thousands

of lines (limited by R’s capabilities) - distributed, multi-OS, multi-language

Page 6: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Examples of available functions

• Descriptive statistics (e.g. boxplot)• Two- and k-sample tests (e.g. Wilcoxon rank-sum

test)• Density estimation (e.g. Kernel smoothing)• Correlation and regression (e.g. PCA)• Censored data (e.g. Survival)• Multivariate classification (e.g. H clustering)• External functions (e.g. K-density)

Page 7: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

User-friendly GUI• Columns are autoselected (and can be deselected)• Parameter choices for functions are conveniently placed• Can be used from your own webpages on tables residing

elsewhere

Page 8: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Toy Demos

• Rediscovering HR diagram

• Rediscovering FP of Globular Clusters

• Looking for outliers in color-color space

Page 9: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Rediscovering HR diagram

• Hyades stars (Hipparcus main catalog)

• Mean/median/boxplot

• Density estimation (Histogram)

• Kernel smoothing

• Correlation matrix

• X-Y plot

• Multivariate clustering

Page 10: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

• X-Y plot between Vmag and B-V reveals the famous structure in the dataset: the color-magnitude of bright stars showing the main sequence, giant branch (with red clump stars), and a few Hyades white dwarfs.

Page 11: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

FP of Globular clusters

• Matrix of pairwise correlation coefficients

• Pairwise plots

• Principal Component Analysis

Page 12: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

• Core parameters as a group tend to be highly correlated, unlike the half-light parameters. This is indicative of the dynamical evolution driven by the core collapse.

Page 13: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Exploring outliers

• Palomar-QUEST synoptic sky survey

• 9 mix-and-match colors from 8 filters

• Aim: finding outliers in color-color space for spectroscopic follow-up

• 1000 random objects

Page 14: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Boxplot• Reveals relationships between colors

(mean, median, overlap, outliers)

Page 15: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Clustering• K-means provides various cluster centers along

with withinss and a list of possible outliers

Page 16: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Page 17: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

K-density

• Probability - density association for outliers

Page 18: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Visual confirmation(found from 1000 random objects)

Page 19: Http:// NSF DMS-0101360 VOStat - HEAD 2004 Ashish Mahabal VOStat Arming Astronomers with Advanced Statistics Caltech: A. Mahabal, M. Graham,

VOStat - HEAD 2004Ashish Mahabal

http://www.vostat.org NSF DMS-0101360

Summary

• Web-based• VO compatible• Public domain and

specialized routines