teaching astronomical data analysis in python prof. joseph harrington department of physics...

28
Teaching Astronomical Data Analysis in Python Prof. Joseph Harrington Department of Physics University of Central Florida Orlando, Florida

Upload: darius-huskey

Post on 16-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Teaching Astronomical Data Analysis in Python

Prof. Joseph HarringtonDepartment of Physics

University of Central FloridaOrlando, Florida

Outline Teaching how to do research My class NumPy doc project My (better) class Examples of early assignments Future thoughts on teaching research students

Grad Aps and Career Prep What do we need to do to prepare students for a research career? Want them to:

Be clear on research as a careerShow career directionField areaExperiment or theory? Demonstrate research capacity, creativity, initiative“Nice grades, but can she work?” YES! Know enough to make good school choice Contact future grad advisors or employers

The best way to prepare to do something is to do it!

Undergrad Research “Sequence”

Y1-2: General tasks, 1-2 projects Y3: Specific task leads Y3-4: Talks at conferences, paper co-authorship Y4: Top students lead a paper (advisor's holy grail)

Preparing Students For Research Need some basic skills to start:

computer programmingData handlingStatistics & Error analysisModelingData presentation

Need to start before most classes taken! One-on-one training by faculty is time-consuming Many research supervisors not current on computational techniques Need an intro course!

Joe's Pet Theory of Learning Expert = knowledge base, not talent 10 years' study What is accumulated? What's needed/important:

Survival, food, protection (big, now)Create envisioned future (big, later)Complete task (small, now)Repeated (must be important)Related to existing knowledge (more are better)

Educator's job:Create sparse but firm knowledge base spanning fieldAll new knowledge encounterd connects, retained

Why do Classes Work?

Equate class success to survivalLearn best, but very risky

Know material needed for career/dream/success Know material needed for test/project (task) They don't! It's what they do, not what you say.

Class Design Intro to programming (the right way!) Intro to stats, error analysis, modeling (Bevington)

Monte Carlo coding exercises teach programming Intro to astronomical data (Howell)

Data formation – photon emission, sky “bg”, PSFDetectors, measurements, file formatsPhotons to electrons, detector systematics

Photometry coding homework sequence Project in photometry, real dataset, 100% own code Intro to spectroscopy Prerequisite: multivariate calculus, Astro 101 Challenge: Get programming in early for HW!

Graduate Students Also need this as a grad class

Many come to planetary sciences from other fieldsMany not prepared in data analysis or detectorsMost have programmed someMost have done error analysis (few understand it)Few understand computational methods

BUT, we don't have enough students for 2 classes Grad class shares undergrad lecture Additional readings (papers, Numerical Recipes) Grads do harder project in spectroscopy

V 1.0: Cornell, IDL Excellent students, 4-6 per year Worked like dogs, up to 15 hours/week HW All who didn't drop in first week did very well

Top students at summer schoolsImmediate, high-level work on, e.g., Mars Rovers

IDL well documented, widely used in astronomy Books available Expensive! Quirky, encourages bad programming Lacking many modern features Unable to cut-n-paste a loop to the command line

Cornell->UCF, IDL->NumPy I got a job! New PhD program in Planetary Sciences at UCF 51,000 undergrads, poorer, many FTIC, awful HSs IDL expensive, encourages bad programming NumPy

Open: you can look at the source (educational)Free – students can use it forever, anywhereEnforces good programming!Code shorter than pseudocode (implicit loops)NOT a data language, a real language doing data!Designed for wrapping, and almost everything isCan be used in cloud computing, clusters, etc.

V 1.99: UCF 2007, NumPy

Students mixed, some Cornell-level, 6 students Removed 30% of class assignments Switched to NumPy NumPy docs not good, few books, poor reference Much larger codebase than IDL, but not integrated Students worked like dogs, 15 hours/week HW No undergrad finished the project (all grads did) Rescoped class in final weeks Graded on basis of 2008 plan All students well prepared for research (!)

Lessons The better students were similar! All were capable. Given Guide to NumPy, web site, mailing lists, me, sources, astro tutorial, encouraged to work together BUT:

Often spent 2 hours to find and learn a simple routineStudents (and I) put heroic effort into class All bombed, even experienced programmersFunction docs not even poor; mostly nonexistentGrads and I learned NumPy by induction from IDLUndergrads lost

What to do for 2008? Must either have docs or go back to IDL (Nooooooo...)

If I can't find a reindeer, I'll make one instead.

-T. Grinch, 1957 Except I'm a pre-tenure prof Research very active, time very constrained Saw what happened to Travis O., expected same Resources:

CommunitySome surplus fundingExperience and skill as a communicatorAbout 8 months

Requirements - General Reference pages and manual Tutorial user guide Quick “Getting Started” guide Topic-specific user guides Examples that work “Live” collection of worked examples (cookbook) Tools to view and search docs interactively Variety of access methods (help(), web, PDF, etc.) Mechanisms for community input and management NumPy, then SciPy, then others

Requirements - Reference Reference page per function/class (“docstrings”)

Thorough, complete, professional textAccessible one level below expected userExamples, crossrefs, math, images (limited)

Pages for general topicsGlossarySlicing, broadcasting, indexing

Reference manualOrganized by topic areasIndex

General Timeline Infrastructure

Doc wiki (and SVN link, stats, reviews, etc.)Templates and standardsIndexing, math, image mechanisms in ReSTProduce text, PDF, HTML automatically

NumPyReference (functions/objects, topic pages)Getting Started GuideTutorial

Reading tools beyond help(), browsers, PDF readers SciPy (yikes)

Milestones

NumPy is big – page triage, UCF course funcs first!

2008 Fall semester: NumPyI 50%+ to first draft

2009 Summer end: NumPyI 100% to first draft

Early 2010: NumPyI Proofed

2010: SciPy...

Organization

Leerless Feeder: jh – organization, plans, standards and processes, funding, whip, person who will die if this fails...

Reliable and available expert (on UCF contract): Stéfan van der Walt – interface with SVN, PDF and HTML, standards and processes, wiki, chief writer, the one person I could dump on mercilessly...

Emmanuelle Guillart – wiki, wiki hosting Pauli Virtanen – wiki, writing Perry Greenfield – numpy.doc pages About 40 grad/PhD-level volunteer doc writers!

How to pay? Shirts! Write 1000 words or do equivalent work

WikiReviewingProofing

Now or later (we'll mail) Offer good until we withdraw it Don't be an “expert”

Tell us the size you wantNot the list of all possible sizes you could wear!

Results – Summer 2008

Quality is high! Lots of examples, and they work. PDF doc has 309 printed pages Doc wiki is your best source for docstrings! docs.scipy.org

V 2.0: UCF, NumPy

Now have docs! Simpler undergrad project Grads have old undergrad project Again mix of levels, 6 students Worked up to 9 hours/week on HW, most less All who tried, finished All did decent to excellent projects Major epiphany occurred in class! Students excited to learn, positive attitude Students well prepared to do research

Examples

Examples

Examples

Examples

Observations NumPy (w/ docs) learned much faster than IDL Students who already program

Do best in courseWork least in courseHardest to teach good programming habits to

HUGE grading load: 15-60 min/student/week How to scale to larger class?

Only fully grade some workPeer evaluation (as HW assignment?)

No matter what happened, ultimate goal met Doing is learning

What's Next? Course requires calculus Success predictor is programming

1980s/90s students knew much more than today's Freshmen need to do research, can't yet take course Why not teach programming for science in year 1? Basic interactive command line Plot, visualize equations or data Run routines on data Basics of programming Tell all their profs to assign vis. and calc probs Students will build their knowledge in all classes