statistical inference for complex data in the physical ...annlee/files/annlee_openhouse... ·...

25
Statistical Inference for Complex Data in the Physical Sciences Ann B. Lee Department of Statistics & Data Science Carnegie Mellon University

Upload: others

Post on 10-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Statistical Inference for Complex Data in the Physical Sciences

Ann B. Lee Department of Statistics & Data Science

Carnegie Mellon University

Page 2: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

2

Statistics --- Close Ties to Many Fields

Machine learning (ML) Computer science Engineering

Physical Sciences Astronomy, Earth Sciences Chemistry

Biology Genetics, Medicine Neuroscience (CNBC)

Public policy (Heinz) Social Sciences Finance

STATISTICS AT CMU

Page 3: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

STAMPS@CMU

In 2018, we started the STAtistical Methods for the Physical Sciences (STAMPS) research group.

Problems in the physical sciences have similar statistical challenges —- involving massive data sets from different physical probes (e.g. images from satellites), large simulations, and complex measurement errors.

Page 4: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

STAMPS@CMU

In 2018, we started the STAtistical Methods for the Physical Sciences (STAMPS) research group.

Problems in the physical sciences have similar statistical challenges —- involving massive data sets from different physical probes (e.g. images from satellites), large simulations, and complex measurement errors.

Page 5: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Commonalities in Physical Sciences: ``The Matrix’’ behind STAMPS…

Page 6: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Astronomy/Cosmology Context

Timeline of the Universe

Page 7: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Project 1 with Nic (4th Year): Likelihood-Free Inference and Validation of Complex Simulation Models

as a result of the morphology-density relation(10), narrow-field observations such as thoseundertaken with HST may be significantlybiased by galaxy environment. For example,elliptical galaxies are expected to be over-abundant in those parts of the line-of-sightthat pass through, or close to, clusters ofgalaxies.

A final complication is introduced by thefact that distant galaxies are seen at a range ofredshifts. Therefore, observations made in asingle filter probe each galaxy at a wave-length different from that of the same filter inthe rest frame (i.e., the frame of reference inwhich the galaxy is stationary with respect tothe expansion of the Universe). This poses a

problem for morphologists, because the ap-pearance of a galaxy can vary strongly withwavelength. In the ultraviolet (UV), a gal-axy’s light comes mainly from hot youngstars, which are often distributed in clumpyirregular knots of star formation. At opticalwavelengths, stars on the stellar main se-quence dominate, and galaxies take on theirmost familiar appearance. In the infrared(IR), most of the flux is usually from evolvedold stars, which are quite uniformly distrib-uted, so galaxies appear smoother than atoptical wavelengths. As a result, identicalgalaxies appear different at different red-shifts. The importance of this effect (knownas the “morphological K-correction”) de-pends rather strongly on the distance at whichthe galaxy is seen. For galaxies with redshiftsz ! 0.8 [i.e., for galaxies seen at look-backtimes less than about halfway back to the BigBang (11)], observations made with HST us-ing the F814W filter (commonly used formorphological work) correspond to observa-tions made at familiar optical wavelengths inthe galaxy’s frame of reference. Thus, theeffects of morphological K-corrections arequite benign at z ! 0.8, and much work ongalaxy morphology has focused on this“safe” redshift range. At redshifts z " 0.8,galaxies viewed with the F814W filter arebeing seen at UV wavelengths in the restframe. Unfortunately, rather little is knownabout the appearance of nearby galaxies inthe UV, because our atmosphere is opaque toUV radiation, and only rather small UV im-aging surveys have been undertaken fromspace. It is clear that some nearby galaxiesthat appear optically normal appear strange inthe UV (12, 13). Consequently, galaxies atz " 0.8 are best studied in the IR at wave-lengths greater than 1 #m in order to under-take fair comparisons with local studies madeat optical wavelengths (14). HST had only arather limited capability to undertake suchobservations, using the short-lived Near In-fra-Red Camera and Multi-Object Spectrom-eter (NICMOS).

Morphological Classifications ofDistant GalaxiesThe existence of a population of morphologi-cally peculiar galaxies at high redshifts wasfirst suspected on the basis of HST observa-tions as early as 1994 (15–19). Much of thisearly work proceeded hand in hand with im-proved attempts to quantify galaxy morphol-ogy using automated morphological classifi-cation schemes, as fast and robust replace-ments for traditional visual classifications.Present state-of-the-art automatic classifica-tion systems can group galaxies into broadcategories (such as spirals, ellipticals, andpeculiars) as reliably as can a visual classifier(7, 20, 21), and these systems are in principleextensible to encompass wholly new classes

Fig. 1. Schematic illustration of the formation of galaxies in the hierarchical formation model. Thetime scale for the various phases shown in the illustration depends on cosmological parameters(i.e., the value of the Hubble constant, and the total mass-energy of the Universe contained in bothdark matter and dark energy). For example, in an accelerating Universe most elliptical galaxies format redshifts z " 1, while in a decelerating Universe most ellipticals form at z ! 1. It is worth notingthat morphological transformations are cyclical in this model. Some elliptical galaxies formed bythe collision of spirals will slowly accrete gas from the field, thus growing a new disk, and ultimatelybeing transformed back into spiral galaxies with prominent nuclear bulges.

S C I E N C E ’ S C O M P A S S

17 AUGUST 2001 VOL 293 SCIENCE www.sciencemag.org1274

on

Mar

ch 6

, 201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Theory In(Simulation)

Ensemble ofGalaxies Out

Observed DataAbraham & van den Bergh (2001)

How, Exactly?

Comparing distributions in high dimensions [with Ilmun Kim and Jing Lei]

``Likelihood-free inference’’: Attach meaningful measures of uncertainty to estimates. (Statistical and computational efficiency.)

Model validation. Assessing “emulators” and approximate likelihood models.

Page 8: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Hurricanes Edouard (2014; 109mph) and Nicole (2016; 54mph)

Project 2 with Trey (3rd Year): Modeling Hurricane Intensity Change Using GOES Satellite Imagery

H

Page 9: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Project 3 with Addison (2nd Year ADA), Mikael and Trey: Joint Analysis of TC Intensity Change by

Integrating Satellite and In Situ Observations

Page 10: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Project 4 with Lorenzo (1st Year ADA) and Coty: Chemical Fingerprinting of Wildfire Smoke

Knowing what fuels burned during a wildfire is critical to predicting impact of wildfire smoke on air quality

WildlandFireSmokeChemicalSignature

SantaRosa,CAFires(Oct2018),NASA

>1000uniquecompounds

Page 11: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Key Points

Tons of data and interesting science/methodology/algorithmic problems in the physical sciences.

Look on the Arxiv for my recent papers.

Contact: [email protected]

Page 12: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

References

Page 13: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

https://arxiv.org/abs/1911.11089

Page 14: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

https://arxiv.org/abs/1905.11505 (to appear in AISTATS’2020)

Page 15: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

https://arxiv.org/abs/2002.10399

Page 16: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Statistical Tools and Software for CDE in Python and R https://doi.org/10.1016/j.ascom.2019.100362

Estimate p(θ|x), where θ∈ℝd and x∈ℝp (d≤3, p large)

Page 17: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Extra Slides Start Here

Page 18: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Trey McNeely Rapid Weakening in Tropical Cyclones

TC intensity forecasts have fallen behind trajectory forecasts.

18

Page 19: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)
Page 20: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Leverage the axisymmetric structure of a mature, intense storm

Left: Hurricane Edouard (95 kt) at 18 UTC 16 Sept 2014

Right: Hurricane Nicole (~47 kt) at 1 UTC 9 Oct 2016

Page 21: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Hurricanes and Ocean Heat Content [Hu/Kuusela/Lee/Giglio/Wood]

https://argo.ucsd.edu

Page 22: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Statistical Tools for Comparing and Analyzing Distributions of Images

[Freeman/Kim/Lee 2017, Kim/Lee/Lei 2018, Dalmasso et al 2019]

Can we answer the question if, and if so, how two populations are different without just looking at histograms of just a few individual features?

Page 23: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Statistical Tools for Comparing and Analyzing Distributions of Images

[Freeman/Kim/Lee 2017, Kim/Lee/Lei 2018, Dalmasso et al 2019]

We have developed methods that — in an automated way — can identify differences that are statistically significant (that is, unlikely to occur by chance).

Page 24: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

Visualizing the Results

Page 25: Statistical Inference for Complex Data in the Physical ...annlee/files/AnnLee_OpenHouse... · shifts. The importance of this effect (known as the “morphological K-correction”)

WildlandFireSmokeChemicalSignature

IndividualFuelBurns

Goals:1) Identifythetypesoffuelsthatburned2) Ballparktheamountoffuelthatburned