experiment design - colorado state university
TRANSCRIPT
Experiment Design
CS 510 Lecture #26
April 28th, 2014
PA4 • Task:
– Conduct an experiment • Ablation • Replacement • Sensitivity
– Write a report • 8 pages IEEE conference format max
• Due May 9th
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 2
PA4 Details: Report Outline • Six sections:
1. Abstract 2. Introduction 3. Prior Work 4. Methodology 5. Experimental Results 6. Conclusion & Future Work
• Note: this is the basic outline of (almost) every CS paper
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 3
Introduction • Hardest part of paper to write. It covers:
– What question are you trying to answer – Why is this question important – What is the context – How will the question be answered – What (briefly) will the reader learn
• Assuming the reader is a computer scientist
• Total length: 1 page or less
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 4
Prior Work • Your experiment depends on your PA3
system – Describe it – Focusing on relevant issues for the
experiment • In a “real” paper, you would also cover
other related works • Describe how your paper adds to the field
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 5
Methodology
• Describe your experiment design – Goal: what question is being answered – Input/Output
• Training Data • Test Data • Output
– Performance Metric(s)
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 6
Experimental Results • Describe results of experiment
– In text – In figures/plots – With numbers (tables)
• Interpret results for the reader – Present your conclusions – Link them to data – Hypothesize reasons, if appropriate
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 7
Conclusion & Future Work • Conclusion
– Remind reader of goal – Remind reader why its important – Briefly restate conclusion with key supporting
data – 1 paragraph (sometimes 2)
• Future Work – Describe the next experiment – 1 paragraph.
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 8
Abstract • Comes first, but write it last • Max 2 paragraphs
– 1 paragraph is better • Summarizes the whole paper
– What is the goal – Why is it important – How is it tested – What are the results – What is the conclusion
• OK to grab sentences from introduction & conclusion
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 9
Experiment Design • Experiment design is goal driven
– What are you trying to show? – Formally: what is your hypothesis?
• In support of that, choose – Training data – Test data – Ground truth data – Methodology – Performance metrics
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 10
Rules of Design
• Select data – Enough to prove your point – Not more than you can process
• Never test on training data – Partition training & test data
• No overlapping data • Overlapping actions?
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 11
Data Analysis • Quantitative over qualitative • Never make a statement that is not supported
by data • Keep context in mind
– You experimented with a single system – How far do the results extend? – You can expand the reach with small sensitivity
studies • E.g. we repeated the experiment for multiple codebook
sizes…
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 12
Review: ROC Curve
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 13
0% 50% 100% 0%
50%
100% BeGer than random
False Posi4ve Percentage
True
Posi4ve Percentage
Review : Computing ROCs • For every test data, generate a (score, label) pair
– Score is similarity score – Label is true/false based on ground truth
• Sort pairs based on scores – Descending order for similarity scores – Ascending for distance scores
• Put a threshold between every set of non-equal scores • For every threshold,
– compute true positive percent – false positive percent – plot point
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 14
Score Label
0.12 T
0.13 T
0.37 F
1.01 T
1.04 F
1.05 T
1.27 F
Data Analysis: Uncertainty
• If you repeated the experiment with different data, would you get the same result?
• Basic Approaches: – Run the experiment multiple times – Significance testing
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 15
This is a complex topic. I am just going to present two simple methods. There are many more.
N-fold Cross Validation
• Divide your data into N partitions • For each run:
– Train on N-1 partitions – Test on the remaining partition
• Repeat N times with different test partitions
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 16
Analyzing Cross Validation Results • AUC is a scalar
– Compute mean, st. dev., min, max. • ROCs are curves
– Compute bounding curve – Min & max score for every false positive %
• Remember – You are not comparing one cross-validation run to
another – You are comparing sets of runs for two
experimental conditions
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 17
Significance Testing • This is a big topic in statistics • One simple example: McNemar’s test
– Two algorithms – Run on the same data – Returning true/false for every sample
• E.g. pick an operating point on the ROC curve
– Answers whether one is significantly better than another, based on sample size
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 18
McNemar’s Test
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 19
A B
C D
Algorithm A True False Algorithm
B
True
False
χ 2 =b− c( )2
b+ c
Χ2 è p • Χ2 increases if the algorithms make different mistakes from each other
• Χ2 is smaller if the algorithms make similar mistakes
• P is the probability that the differences between the two algorithms are by chance
• Statistics calculators convert Χ2 and N to p – http://www.socscistatistics.com/pvalues/
chidistribution.aspx
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 20
Let’s Design an Experiment
• Let’s get started: – Describe your system – Brainstorm hypotheses
• Now let’s design an experiment…
4/30/14 CS 510, Image Computa4on, ©Ross Beveridge & Bruce Draper 21