mathematics for language technology: introduction to probability theory

24
Introduction to Probability Last Updated: 20 March 2015 Slideshare: http://www.slideshare.net/marinasantini1/introduction-to-probability-theory Mathematics for Language Technology http://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/ Marina Santini [email protected] Department of Linguistics and Philology Uppsala University, Uppsala, Sweden Spring 2015 1

Upload: marina-santini

Post on 15-Jul-2015

191 views

Category:

Education


0 download

TRANSCRIPT

Introduction to Probability

Last Updated: 20 March 2015

Slideshare: http://www.slideshare.net/marinasantini1/introduction-to-probability-theory

Mathematics for Language Technology http://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/

Marina Santini [email protected]

Department of Linguistics and Philology Uppsala University, Uppsala, Sweden

Spring 2015

1

Acknowledgements   Several slides borrowed from Prof Joakim Nivre.   Practical Activity by Mats Dahllöf

  Required Reading:   E&G (2013): Ch. 5 (pp. 105-110)   Compendium (4): 9.1   E&G (2013): Ch. 5.2-5.3 (self-study)

  Recommended Reading:   Sections 1-3 in Goldsmith J. (2007) Probability for

Linguists. The University of Chicago. The Department of Linguistics:

•  http://hum.uchicago.edu/~jagoldsm/Papers/probability.pdf 2

Outline

 The Notion of Probability  Events  Axioms and Theorems of Probability  Addition Rule

3

Why study probability and statistics?

Developments in NLP have led to the exploitation of language corpora to refine and develop computational models of language. Many of these models exploit basic axioms, theorems and approximations from the field of probability theory and statistical inference.

4

Deterministic vs Non-Deterministic

Generally speaking, a deterministic system is a system in which no randomness is involved in the development of future states of the system. That is, a deterministic model will always produce the same behaviour from a given state. In automata theory, a deterministic finite automaton (DFA) is a finite state machine that accepts/rejects finite strings of symbols and only produces a unique computation (or run) of the automaton for each input string. A nondeterministic finite automaton (NFA), or nondeterministic finite state machine, needn't obey these restrictions.

5

Deterministic vs Non-Deterministic

6

Input string: baaaa!

Input string: abababbbab

Probability Theory

 Probability theory is the branch of mathematics concerned with probability, ie the analysis of random/non-deterministic phenomena.

7

Statistics

 Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data.

8

Probability Theory and Statistics

9

We use probability theory to build models of uncertainty and we can use statistics to ground these models in empirical data.

Probability, Event and Sample Space

10

Ex 2: we have a sample space of sentences and we are interested in the length of these sentences. A relevant event would be the set of all sentences that contain exactly 8 words. And again we can describe this set as the outcome for which the variable "numberOfWords" takes the value 8

Ex 1: we have a sample space consisting of words. An event in that sample space can be the set of NOUNS, ie all the words that belong to the category NOUN. One way of describing this subset is to say that the property PartOfSpeech has the value NOUN. = is an element of

Operators

11

Venn diagrams or Set diagrams to represent logical relations

12

Axioms (Statements that are always accepted as true)

13

Formula & Calculations

14

Calculations: 6x6x6=216; 26x26x26=17576; 216/17576=0.01228949

if A is an event, and x1 to xn are its individual outcomes, then the probability of A can be computed by summing the probability of each outcome because they are disjoint or mutually exclusive.

Read as: sum from i=1 to n or sum over all the elements of the set

There are 26 ways of choosing the first letter, 26 ways of choosing the 2nd letter and 26 ways of choosing the third letter, ie 26*26*26 = 263

But there are only 6 ways of choosing the first vowel… Since we assume that all strings are equally

possible, the probability is simply 1 over the total number of strings.

In order to get the probability of the 3-vowel string, we can simply add the strings that contain exactly 3 vowels. So 6 to the power of 3 plus 26 to the power of 3 gives us approximately.012

In sum

 The probability of an event is the SUM of the probabilities of each outcome

 An event is represented as a variable

15

Theorems

16

A theorem is a statement that has been proven on the basis of previously established statements, such as axioms

Addition Rule: A method for finding the probability that either or both of two events occur

17

In other words: If events A and B are mutually exclusive (disjoint), then: P(A or B) = P(A) + P(B) Otherwise: P(A or B) = P(A) + P(B) – P(A and B)

Say that A is the set of people who have glasses and B is the set of people who are blond We are interested in the set of people who are blond OR have glasses. If we simply add the probabilities of the two simple events, we count blond with glasses twice.Therefore, in order to get the correct probability, we have to subtract

Think of the axiom about disjoint event as a special case where the intersection is empty. Therefore it is not added in the first place, and it has not to be subracted.

Quiz 1: only one answer is correct

18

Quiz 1: Solution

1. 0.01 - incorrect. The probability of an event and its complement must sum to 1. 2. 0.99 - correct. The complement of A has probability 1 - P(A). 3. Impossible to tell - incorrect. The complement of A must have probability 1 - P(A).

19

Quiz 2: more than 1 answers can be correct

20

Quiz 2: Solutions

1.  P(A or B) < P(A and B) - incorrect. Since the union includes the intersection, it can never have lower probability.

2.  2. P(A or B) = P(A and B) - correct. This is possible as a limiting case, for example, when A = B.

3.  3. P(A or B) > P(A and B) - correct. This holds as soon as there is some outcome with a positive probability in A or B that is not in the intersection.

21

Practical Activity

22

We have a regular die. We cast the die twice and we get a two and a four. Therefore, A = {2,4}. Calculate: 1.  The probability of the event A = {2,4} 2.  The probability that the first number is a 6 3.  The probability that the second number is a 5 or a 6 4.  The probability that the first and the second number are the same 5.  The probability that the first number is an odd number 6.  The probability that the first and the second number are both odd numbers

Practical Activity: Solutions

1.  The probability of the event A = {2,4} [1/36 = 0.05] 2.  The probability that the first number is a 6 [1/6 = 0.16] 3.  The probability that the second number is a 5 or a 6 [1/3 = 0.33] 4.  The probability that the first and the second number are the same [1/6 =

0.16] 5.  The probability that the first number is an odd number [1/5 = 0.5] 6.  The probability that the first and the second number are both odd numbers

[1/4 = 0.25]

23

The End

24