introduction to statistics session 1 · = fh;tg. i dice: = f1;2;3;4;5;6g i yes/no poll,...

70
Introduction to statistics Session 1 Grzegorz Chrupala Saarland University October 9, 2012 Chrupala (UdS) Statistics October 9, 2012 1 / 28

Upload: others

Post on 19-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Introduction to statisticsSession 1

Grzegorz Chrupa la

Saarland University

October 9, 2012

Chrupala (UdS) Statistics October 9, 2012 1 / 28

Page 2: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Key concepts

Axioms of probability

Chain rule and Bayes theorem

Random variables

Entropy

Hypothesis testing

Binomial and normal distributions

Linear and logistic regression

If you are familiar with these, you don’t need this course!

Chrupala (UdS) Statistics October 9, 2012 2 / 28

Page 3: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Key concepts

Axioms of probability

Chain rule and Bayes theorem

Random variables

Entropy

Hypothesis testing

Binomial and normal distributions

Linear and logistic regression

If you are familiar with these, you don’t need this course!

Chrupala (UdS) Statistics October 9, 2012 2 / 28

Page 4: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Key concepts

Axioms of probability

Chain rule and Bayes theorem

Random variables

Entropy

Hypothesis testing

Binomial and normal distributions

Linear and logistic regression

If you are familiar with these, you don’t need this course!

Chrupala (UdS) Statistics October 9, 2012 2 / 28

Page 5: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Key concepts

Axioms of probability

Chain rule and Bayes theorem

Random variables

Entropy

Hypothesis testing

Binomial and normal distributions

Linear and logistic regression

If you are familiar with these, you don’t need this course!

Chrupala (UdS) Statistics October 9, 2012 2 / 28

Page 6: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Key concepts

Axioms of probability

Chain rule and Bayes theorem

Random variables

Entropy

Hypothesis testing

Binomial and normal distributions

Linear and logistic regression

If you are familiar with these, you don’t need this course!

Chrupala (UdS) Statistics October 9, 2012 2 / 28

Page 7: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Key concepts

Axioms of probability

Chain rule and Bayes theorem

Random variables

Entropy

Hypothesis testing

Binomial and normal distributions

Linear and logistic regression

If you are familiar with these, you don’t need this course!

Chrupala (UdS) Statistics October 9, 2012 2 / 28

Page 8: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Key concepts

Axioms of probability

Chain rule and Bayes theorem

Random variables

Entropy

Hypothesis testing

Binomial and normal distributions

Linear and logistic regression

If you are familiar with these, you don’t need this course!

Chrupala (UdS) Statistics October 9, 2012 2 / 28

Page 9: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Key concepts

Axioms of probability

Chain rule and Bayes theorem

Random variables

Entropy

Hypothesis testing

Binomial and normal distributions

Linear and logistic regression

If you are familiar with these, you don’t need this course!

Chrupala (UdS) Statistics October 9, 2012 2 / 28

Page 10: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Course structure

Oct 8 - Oct 9Basic concepts of Probability and Informationtheory with Grzegorz Chrupa [email protected]

Oct 10 - Oct 12Statistics for experimental science with FrancescaDelogu

Oct 15 - Oct 17Statistics for NLP – reading group

Oct 18 - 19Linear models: Grzegorz Chrupa la

Chrupala (UdS) Statistics October 9, 2012 3 / 28

Page 11: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Course structure

Oct 8 - Oct 9Basic concepts of Probability and Informationtheory with Grzegorz Chrupa [email protected]

Oct 10 - Oct 12Statistics for experimental science with FrancescaDelogu

Oct 15 - Oct 17Statistics for NLP – reading group

Oct 18 - 19Linear models: Grzegorz Chrupa la

Chrupala (UdS) Statistics October 9, 2012 3 / 28

Page 12: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Course structure

Oct 8 - Oct 9Basic concepts of Probability and Informationtheory with Grzegorz Chrupa [email protected]

Oct 10 - Oct 12Statistics for experimental science with FrancescaDelogu

Oct 15 - Oct 17Statistics for NLP – reading group

Oct 18 - 19Linear models: Grzegorz Chrupa la

Chrupala (UdS) Statistics October 9, 2012 3 / 28

Page 13: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Course structure

Oct 8 - Oct 9Basic concepts of Probability and Informationtheory with Grzegorz Chrupa [email protected]

Oct 10 - Oct 12Statistics for experimental science with FrancescaDelogu

Oct 15 - Oct 17Statistics for NLP – reading group

Oct 18 - 19Linear models: Grzegorz Chrupa la

Chrupala (UdS) Statistics October 9, 2012 3 / 28

Page 14: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Textbook and topics

Foundations of Statistical NLP, Manning andSchutze

I For each of the first three sessions next week everybodyreads a section of the book.

I A group of students will present the material (45-60min).

I Follow up with exercises and discussion.

Chrupala (UdS) Statistics October 9, 2012 4 / 28

Page 15: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Suggested topics

Collocations (5)

Lexical acquisition (8)

Clustering (14)

Other topic possible (talk to me!)

Organize yourselves into three groups and agree ontopics by tomorrow

Within each group, coordinate and decide whopresents which subsection

Chrupala (UdS) Statistics October 9, 2012 5 / 28

Page 16: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Today: Basic concepts in probabilitytheory

Probability notation P (X|Y )I What does this expression mean?I How can we manipulate it?I How can we estimate its value in practice?

Chrupala (UdS) Statistics October 9, 2012 6 / 28

Page 17: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Three aspects of statistics

Descriptive. Mean or median grade at a university.Distribution of heights among a population ofcountry

Confirmatory. Are the results statisticallysignificant?

Predictive. Learn from past data to predict futureevents

Another dimension: Frequentist vs Bayesian(philosophical underpinnings)

Chrupala (UdS) Statistics October 9, 2012 7 / 28

Page 18: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Three aspects of statistics

Descriptive. Mean or median grade at a university.Distribution of heights among a population ofcountry

Confirmatory. Are the results statisticallysignificant?

Predictive. Learn from past data to predict futureevents

Another dimension: Frequentist vs Bayesian(philosophical underpinnings)

Chrupala (UdS) Statistics October 9, 2012 7 / 28

Page 19: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Three aspects of statistics

Descriptive. Mean or median grade at a university.Distribution of heights among a population ofcountry

Confirmatory. Are the results statisticallysignificant?

Predictive. Learn from past data to predict futureevents

Another dimension: Frequentist vs Bayesian(philosophical underpinnings)

Chrupala (UdS) Statistics October 9, 2012 7 / 28

Page 20: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Three aspects of statistics

Descriptive. Mean or median grade at a university.Distribution of heights among a population ofcountry

Confirmatory. Are the results statisticallysignificant?

Predictive. Learn from past data to predict futureevents

Another dimension: Frequentist vs Bayesian(philosophical underpinnings)

Chrupala (UdS) Statistics October 9, 2012 7 / 28

Page 21: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Experiments and Sample Spaces

Consider an experiment or process

Set of possible basic outcomes: sample space ΩI Coin toss: Ω = H,T.I Dice: Ω = 1, 2, 3, 4, 5, 6I Yes/no poll, correct/incorrect: Ω = 0, 1I Number of traffic accidents in an area per year: Ω = NI Misspelling of a word. Ω = Z∗ where Z is an alphabet,

and Z∗ the set of strings over this alphabetI Guess a missing word: |Ω| = vocabulary size

Chrupala (UdS) Statistics October 9, 2012 8 / 28

Page 22: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Experiments and Sample Spaces

Consider an experiment or processSet of possible basic outcomes: sample space Ω

I Coin toss:

Ω = H,T.I Dice: Ω = 1, 2, 3, 4, 5, 6I Yes/no poll, correct/incorrect: Ω = 0, 1I Number of traffic accidents in an area per year: Ω = NI Misspelling of a word. Ω = Z∗ where Z is an alphabet,

and Z∗ the set of strings over this alphabetI Guess a missing word: |Ω| = vocabulary size

Chrupala (UdS) Statistics October 9, 2012 8 / 28

Page 23: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Experiments and Sample Spaces

Consider an experiment or processSet of possible basic outcomes: sample space Ω

I Coin toss: Ω = H,T.I Dice:

Ω = 1, 2, 3, 4, 5, 6I Yes/no poll, correct/incorrect: Ω = 0, 1I Number of traffic accidents in an area per year: Ω = NI Misspelling of a word. Ω = Z∗ where Z is an alphabet,

and Z∗ the set of strings over this alphabetI Guess a missing word: |Ω| = vocabulary size

Chrupala (UdS) Statistics October 9, 2012 8 / 28

Page 24: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Experiments and Sample Spaces

Consider an experiment or processSet of possible basic outcomes: sample space Ω

I Coin toss: Ω = H,T.I Dice: Ω = 1, 2, 3, 4, 5, 6I Yes/no poll, correct/incorrect:

Ω = 0, 1I Number of traffic accidents in an area per year: Ω = NI Misspelling of a word. Ω = Z∗ where Z is an alphabet,

and Z∗ the set of strings over this alphabetI Guess a missing word: |Ω| = vocabulary size

Chrupala (UdS) Statistics October 9, 2012 8 / 28

Page 25: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Experiments and Sample Spaces

Consider an experiment or processSet of possible basic outcomes: sample space Ω

I Coin toss: Ω = H,T.I Dice: Ω = 1, 2, 3, 4, 5, 6I Yes/no poll, correct/incorrect: Ω = 0, 1I Number of traffic accidents in an area per year:

Ω = NI Misspelling of a word. Ω = Z∗ where Z is an alphabet,

and Z∗ the set of strings over this alphabetI Guess a missing word: |Ω| = vocabulary size

Chrupala (UdS) Statistics October 9, 2012 8 / 28

Page 26: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Experiments and Sample Spaces

Consider an experiment or processSet of possible basic outcomes: sample space Ω

I Coin toss: Ω = H,T.I Dice: Ω = 1, 2, 3, 4, 5, 6I Yes/no poll, correct/incorrect: Ω = 0, 1I Number of traffic accidents in an area per year: Ω = NI Misspelling of a word.

Ω = Z∗ where Z is an alphabet,and Z∗ the set of strings over this alphabet

I Guess a missing word: |Ω| = vocabulary size

Chrupala (UdS) Statistics October 9, 2012 8 / 28

Page 27: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Experiments and Sample Spaces

Consider an experiment or processSet of possible basic outcomes: sample space Ω

I Coin toss: Ω = H,T.I Dice: Ω = 1, 2, 3, 4, 5, 6I Yes/no poll, correct/incorrect: Ω = 0, 1I Number of traffic accidents in an area per year: Ω = NI Misspelling of a word. Ω = Z∗ where Z is an alphabet,

and Z∗ the set of strings over this alphabetI Guess a missing word:

|Ω| = vocabulary size

Chrupala (UdS) Statistics October 9, 2012 8 / 28

Page 28: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Experiments and Sample Spaces

Consider an experiment or processSet of possible basic outcomes: sample space Ω

I Coin toss: Ω = H,T.I Dice: Ω = 1, 2, 3, 4, 5, 6I Yes/no poll, correct/incorrect: Ω = 0, 1I Number of traffic accidents in an area per year: Ω = NI Misspelling of a word. Ω = Z∗ where Z is an alphabet,

and Z∗ the set of strings over this alphabetI Guess a missing word: |Ω| = vocabulary size

Chrupala (UdS) Statistics October 9, 2012 8 / 28

Page 29: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events

An event A is a set of basic outcomes. Event Atakes place if the outcome of the experiment ∈ A

A ⊆ Ω, and any A ∈ 2Ω (all subsets of Ω).I Ω

is the certain eventI ∅ is the impossible event

Experiment, toss 3 coinsI Ω =HHH,HHT,HTH,HTT, THH, THT, TTH, TTT

I Event A: there were exactly two tails.F A = HTT, THT, TTH

I Event B: there were three heads.F B = HHH

Chrupala (UdS) Statistics October 9, 2012 9 / 28

Page 30: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events

An event A is a set of basic outcomes. Event Atakes place if the outcome of the experiment ∈ A

A ⊆ Ω, and any A ∈ 2Ω (all subsets of Ω).I Ω is the certain eventI ∅

is the impossible event

Experiment, toss 3 coinsI Ω =HHH,HHT,HTH,HTT, THH, THT, TTH, TTT

I Event A: there were exactly two tails.F A = HTT, THT, TTH

I Event B: there were three heads.F B = HHH

Chrupala (UdS) Statistics October 9, 2012 9 / 28

Page 31: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events

An event A is a set of basic outcomes. Event Atakes place if the outcome of the experiment ∈ A

A ⊆ Ω, and any A ∈ 2Ω (all subsets of Ω).I Ω is the certain eventI ∅ is the impossible event

Experiment, toss 3 coins

I Ω =HHH,HHT,HTH,HTT, THH, THT, TTH, TTT

I Event A: there were exactly two tails.F A = HTT, THT, TTH

I Event B: there were three heads.F B = HHH

Chrupala (UdS) Statistics October 9, 2012 9 / 28

Page 32: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events

An event A is a set of basic outcomes. Event Atakes place if the outcome of the experiment ∈ A

A ⊆ Ω, and any A ∈ 2Ω (all subsets of Ω).I Ω is the certain eventI ∅ is the impossible event

Experiment, toss 3 coinsI Ω =

HHH,HHT,HTH,HTT, THH, THT, TTH, TTTI Event A: there were exactly two tails.

F A = HTT, THT, TTHI Event B: there were three heads.

F B = HHH

Chrupala (UdS) Statistics October 9, 2012 9 / 28

Page 33: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events

An event A is a set of basic outcomes. Event Atakes place if the outcome of the experiment ∈ A

A ⊆ Ω, and any A ∈ 2Ω (all subsets of Ω).I Ω is the certain eventI ∅ is the impossible event

Experiment, toss 3 coinsI Ω =HHH,HHT,HTH,HTT, THH, THT, TTH, TTT

I Event A: there were exactly two tails.

F A = HTT, THT, TTHI Event B: there were three heads.

F B = HHH

Chrupala (UdS) Statistics October 9, 2012 9 / 28

Page 34: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events

An event A is a set of basic outcomes. Event Atakes place if the outcome of the experiment ∈ A

A ⊆ Ω, and any A ∈ 2Ω (all subsets of Ω).I Ω is the certain eventI ∅ is the impossible event

Experiment, toss 3 coinsI Ω =HHH,HHT,HTH,HTT, THH, THT, TTH, TTT

I Event A: there were exactly two tails.F A = HTT, THT, TTH

I Event B: there were three heads.

F B = HHH

Chrupala (UdS) Statistics October 9, 2012 9 / 28

Page 35: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events

An event A is a set of basic outcomes. Event Atakes place if the outcome of the experiment ∈ A

A ⊆ Ω, and any A ∈ 2Ω (all subsets of Ω).I Ω is the certain eventI ∅ is the impossible event

Experiment, toss 3 coinsI Ω =HHH,HHT,HTH,HTT, THH, THT, TTH, TTT

I Event A: there were exactly two tails.F A = HTT, THT, TTH

I Event B: there were three heads.F B = HHH

Chrupala (UdS) Statistics October 9, 2012 9 / 28

Page 36: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events and probability

P (Germany wins the game|no rain) = 0.9

Past performance. Germany won 90% of gameswith no rain

Hypothetical performance. If they played the gamein many parallel universes

Subjective strength of belief. Would bet up to 90cents for a chance to win 1 euro.

Output of some computable formula

Chrupala (UdS) Statistics October 9, 2012 10 / 28

Page 37: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events and probability

P (Germany wins the game|no rain) = 0.9

Past performance. Germany won 90% of gameswith no rain

Hypothetical performance. If they played the gamein many parallel universes

Subjective strength of belief. Would bet up to 90cents for a chance to win 1 euro.

Output of some computable formula

Chrupala (UdS) Statistics October 9, 2012 10 / 28

Page 38: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events and probability

P (Germany wins the game|no rain) = 0.9

Past performance. Germany won 90% of gameswith no rain

Hypothetical performance. If they played the gamein many parallel universes

Subjective strength of belief. Would bet up to 90cents for a chance to win 1 euro.

Output of some computable formula

Chrupala (UdS) Statistics October 9, 2012 10 / 28

Page 39: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events and probability

P (Germany wins the game|no rain) = 0.9

Past performance. Germany won 90% of gameswith no rain

Hypothetical performance. If they played the gamein many parallel universes

Subjective strength of belief. Would bet up to 90cents for a chance to win 1 euro.

Output of some computable formula

Chrupala (UdS) Statistics October 9, 2012 10 / 28

Page 40: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Events and probability

P (Germany wins the game|no rain) = 0.9

Past performance. Germany won 90% of gameswith no rain

Hypothetical performance. If they played the gamein many parallel universes

Subjective strength of belief. Would bet up to 90cents for a chance to win 1 euro.

Output of some computable formula

Chrupala (UdS) Statistics October 9, 2012 10 / 28

Page 41: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Probability notation

P ( Germany wins the game | no rain )Event A Event B

Given that event B happens, how likely is event A?

Germany wins the game is a predicate whichselects the outcomes that are members of event A

Chrupala (UdS) Statistics October 9, 2012 11 / 28

Page 42: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Frequentist probability

For series iI Repeat experiment many timesI Record how many times event A occured: counti(A)

The ratios counti(A)Ti

, where Ti is the number ofexperiments in series i, are close to some unknownbut constant value

We can call this constant P (A)

Chrupala (UdS) Statistics October 9, 2012 12 / 28

Page 43: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Estimating probabilities

The constant P (A) is unknown, but we canestimate it:

I From a single series i: P (A) = countiATi

(the common

case)I Or take the weighted average of all series i

Chrupala (UdS) Statistics October 9, 2012 13 / 28

Page 44: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

ExampleToss three coins.

I Ω =HHH,HHT,HTH,HTT, THH, THT, TTH, TTT

A: there were exactly three tailsI A = HTT, THT, TTH

Run 1000 times

Got one of HTT, THT, TTH 386 times out of 1000

P (A) = 0.386

Run several times: 373, 399, 355, 372, 406, 359

P (A) = 0.379

If each outcome in Ω is equally likelyP (A) = 3/8 = 0.375

Chrupala (UdS) Statistics October 9, 2012 14 / 28

Page 45: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

P as a function of sets of outcomes

P (Germany wins|no rain) =P (Germany wins, no rain)

P (no rain)

no rain

Germany wins

Chrupala (UdS) Statistics October 9, 2012 15 / 28

Page 46: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

P as a function of sets of outcomes

P (A|B) = P( A , B ) / P( B )notation conjunction predicate

Chrupala (UdS) Statistics October 9, 2012 16 / 28

Page 47: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Axioms of probability

P (∅) =

0

P (Ω) = 1

P (A) ≤ P (B) for any A ⊆ B

P (A) + P (B) = P (A ∪B) provided A ∩B = ∅

Chrupala (UdS) Statistics October 9, 2012 17 / 28

Page 48: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Axioms of probability

P (∅) = 0

P (Ω) =

1

P (A) ≤ P (B) for any A ⊆ B

P (A) + P (B) = P (A ∪B) provided A ∩B = ∅

Chrupala (UdS) Statistics October 9, 2012 17 / 28

Page 49: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Axioms of probability

P (∅) = 0

P (Ω) = 1

P (A) ≤ P (B) for any A ⊆ B

P (A) + P (B) = P (A ∪B) provided A ∩B = ∅

Chrupala (UdS) Statistics October 9, 2012 17 / 28

Page 50: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Axioms of probability

P (∅) = 0

P (Ω) = 1

P (A) ≤ P (B) for any A ⊆ B

P (A) + P (B) = P (A ∪B) provided

A ∩B = ∅

Chrupala (UdS) Statistics October 9, 2012 17 / 28

Page 51: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Axioms of probability

P (∅) = 0

P (Ω) = 1

P (A) ≤ P (B) for any A ⊆ B

P (A) + P (B) = P (A ∪B) provided A ∩B = ∅

Chrupala (UdS) Statistics October 9, 2012 17 / 28

Page 52: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Joint and conditional probability

Joint probability and the meaning of commasI P (A,B) = P (A ∩B)I P (Germany wins, no rain) = P (Germany wins ∧ no rain)

P (A|B) = P (A,B)/P (B)I Estimate from counts

P (A|B) =P (A,B)

P (B)(1)

=count(A ∩B)/T

count(B)/T(2)

=count(A ∩B)

count(B)(3)

Chrupala (UdS) Statistics October 9, 2012 18 / 28

Page 53: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Joint and conditional probability

Joint probability and the meaning of commasI P (A,B) = P (A ∩B)I P (Germany wins, no rain) = P (Germany wins ∧ no rain)

P (A|B) = P (A,B)/P (B)I Estimate from counts

P (A|B) =P (A,B)

P (B)(1)

=count(A ∩B)/T

count(B)/T(2)

=count(A ∩B)

count(B)(3)

Chrupala (UdS) Statistics October 9, 2012 18 / 28

Page 54: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Chain rule

P (A|B) = P (A,B)P (B)

Therefore P (A,B) = P (A|B)P (B)

Generalization:P (A1, A2, . . . , An)

= P (A1|A2, . . . , An)P (A2, . . . , An)

= P (A1|A2, . . . , An)P (A2|A3, . . . , An)P (A3, . . . , An)

=n∏

i=1

P (Ai|Ai+1, . . . , An)

Chrupala (UdS) Statistics October 9, 2012 19 / 28

Page 55: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Chain rule

P (A|B) = P (A,B)P (B)

Therefore P (A,B) = P (A|B)P (B)

Generalization:P (A1, A2, . . . , An)

= P (A1|A2, . . . , An)P (A2, . . . , An)

= P (A1|A2, . . . , An)P (A2|A3, . . . , An)P (A3, . . . , An)

=n∏

i=1

P (Ai|Ai+1, . . . , An)

Chrupala (UdS) Statistics October 9, 2012 19 / 28

Page 56: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Independence

Two events A and B are independent ifP (A,B) = P (A)P (B)

For independent A, B, does P (A|B) = P (A) hold?

A and B are conditionally independent ifP (A,B|C) = P (A|C)P (B|C)

Chrupala (UdS) Statistics October 9, 2012 20 / 28

Page 57: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Independence

Two events A and B are independent ifP (A,B) = P (A)P (B)

For independent A, B, does P (A|B) = P (A) hold?

A and B are conditionally independent ifP (A,B|C) = P (A|C)P (B|C)

Chrupala (UdS) Statistics October 9, 2012 20 / 28

Page 58: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Independence

Two events A and B are independent ifP (A,B) = P (A)P (B)

For independent A, B, does P (A|B) = P (A) hold?

A and B are conditionally independent ifP (A,B|C) = P (A|C)P (B|C)

Chrupala (UdS) Statistics October 9, 2012 20 / 28

Page 59: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

There are two urns:

A B

Suppose we pick an urn uniformly at random andthen select a random ball from that urn. What isprobability that you pick urn A, and take a blue ballfrom it?

Chrupala (UdS) Statistics October 9, 2012 21 / 28

Page 60: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

There are two urns:

A B

Suppose we pick an urn uniformly at random andthen select a random ball from that urn. What isprobability that you pick urn A, and take a blue ballfrom it?

Chrupala (UdS) Statistics October 9, 2012 21 / 28

Page 61: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Marginal probability

Given P (A,Bi) for disjoint events Bi, find outP (A).

Use last axiom

P (A) = P ((A ∩B1) ∪ (A ∩B2) ∪ · · · ∪ (A ∩Bn))

= P (A ∩B1) + P (A ∩B2) + · · ·+ P (A ∩Bn)

=n∑

i=1

P (A ∩Bi)

Chrupala (UdS) Statistics October 9, 2012 22 / 28

Page 62: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

A B

Given the same ball-picking procedure as before,what is the probability of picking the blue ball?

Chrupala (UdS) Statistics October 9, 2012 23 / 28

Page 63: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Bayes rule

P (A,B) = P (B,A) since A ∩B = B ∩ A

P (B,A) = P (B|A)P (A)

Therefore

P (A|B) =P (A,B)

P (B)

=P (B,A)

P (B)

=P (B|A)P (A)

P (B)

Chrupala (UdS) Statistics October 9, 2012 24 / 28

Page 64: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Bayes rule

P (A,B) = P (B,A) since A ∩B = B ∩ A

P (B,A) = P (B|A)P (A)

Therefore

P (A|B) =P (A,B)

P (B)

=P (B,A)

P (B)

=P (B|A)P (A)

P (B)

Chrupala (UdS) Statistics October 9, 2012 24 / 28

Page 65: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Bayes ruleIf we are interested in comparing the probability ofevents A1, A2, . . . given B, we can ignore P (B) since it’sthe same for all Ai

argmaxi

P (Ai|B) = argmaxi

P (B|Ai)P (Ai)

P (B)

= argmaxi

P (B|Ai)P (Ai)

This idea is sometimes expressed as

P (A|B) ∝ P (B|A)P (A)

Chrupala (UdS) Statistics October 9, 2012 25 / 28

Page 66: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Bayes ruleIf we are interested in comparing the probability ofevents A1, A2, . . . given B, we can ignore P (B) since it’sthe same for all Ai

argmaxi

P (Ai|B) = argmaxi

P (B|Ai)P (Ai)

P (B)

= argmaxi

P (B|Ai)P (Ai)

This idea is sometimes expressed as

P (A|B) ∝ P (B|A)P (A)

Chrupala (UdS) Statistics October 9, 2012 25 / 28

Page 67: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Example

Suppose we are interested in a test to detect a diseasewhich affects one in 100,000 people on average. A labhas developed a test which works but is not perfect.

If a person has the disease it will give a positiveresult with probability 0.97

if they do not, the test will be positive withprobability 0.007.

You took the test, and it gave a positive result. What isthe probability that you actually have the disease?

Chrupala (UdS) Statistics October 9, 2012 26 / 28

Page 68: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Chrupala (UdS) Statistics October 9, 2012 27 / 28

Page 69: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Chrupala (UdS) Statistics October 9, 2012 28 / 28

Page 70: Introduction to statistics Session 1 · = fH;Tg. I Dice: = f1;2;3;4;5;6g I Yes/no poll, correct/incorrect: = f0;1g I Number of tra c accidents in an area per year: = N I Misspelling

Credits

Some material adapted from:

Foundations of Statistical NLP

Intro to NLP slides by Jan Hajic

How to use probabilities slides by Jason Eisner

Chrupala (UdS) Statistics October 9, 2012 29 / 28