computer simulations. decision trees 1. decision trees (classification trees) designed to find the...

35
Computer Simulations

Upload: britton-york

Post on 21-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Computer Simulations

Page 2: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Decision Trees

1. Decision trees (classification trees)Designed to find the combination of variables that accounts for most of the data.

(The measured data are nominal.)

A. Splits are made that maximize one category in one branch and minimize it in the other.

B. All combinations of independent variables are tried.

C. Branches are added until adding more doesn’t give you a better fit.

D. No statistical significance is calculated.

Page 3: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Decision Trees

2. Labov’s Department Store Study

• How it was done?

• What does it show?

• How do you judge the results?

e.g. if there is 25% deletion in Saks and 30% in Macy’s?

• What if there is interaction? (2 or more variables working together)

• Logistic regression is one method, decision trees are another.

Page 4: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Decision Trees

• For Department store study:

Dependent Variable: pronunciation of /r/

Independent Variables:

Store: Klein’s, Macy’s, Saks

Word: fourth, floor

Try #: 1st-normal or 2nd-emphatic

Question: How do the independent variables effect the dependent variable?

Page 5: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Decision Trees

• Lines of data look like this:

Kleins, floor, emphatic, no-R.

Kleins, fourth, emphatic, R.

Saks, floor, non-emphatic, no-R.

Page 6: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

1. Clerks in Klein’s do not pronounce /r/ (195/216, 90.3% correct).

2. Fourth is pronounced without an /r/ (192/270, 71.1% correct).

3. Clerks in Saks pronounce the /r/ in floor (52/82, 63.4% correct).

4. Clerks in Macy’s pronounce /r/ in floor as an emphatic second response (31/51, 60.8% correct).

Page 7: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Decision Trees

2. Oprah Winfrey’s pronunciation of [aj] as [aj] or [a] (monophthongization) (Mendoza-Denton, Hay, Jannedy)

• What social factors affect it? The researchers included:

A. The person Oprah was talking about

B. The race of that person

C. The gender of the person

D. Class of word (I is a pronoun, light is a noun (or verb))

E. Frequency of the word

G. What sound precedes [aj]

Page 8: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Decision Trees

• Lines of data look like this: (“So, I talked to Tina the other day.”)

Tina, black, female, I, 5443.7, [ow]

Page 9: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

• The decision tree only found a good fit with two variables:

A. The person Oprah was talking about

B. The kind of sound that precedes [aj]

• What are the “rules” this tree gives?

• Decision trees are good at making sense of messy data.

Page 10: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical Modeling

Page 11: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Generative Approach

• We need to conserve storage space in our brain

• Store only what is unpredictable• Sang, thought

• Use rules to derive or parse the predictable• Add –ed to regular verbs

• Walked, formed

• Connections between irregular are OK, but not regulars• Rang~sang (causes *brang)• Drove~dove (causes *arrove)• ***steam~seam, truck~luck~tuck***

Page 12: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Spanish Stress

• Generative approach• Most stress is predictable so it isn’t stored• Rules applied in production

if word ends in C (except –s and –n) stress the final syllable/animal/ > [animál], /motor/ > [motór]

if a words ends in V (and –s , -n) stress the penult syllable/tisa/ > [tísa], /komen/ > [cómen]

antepenults stress in unpredictable so it must be stored or marked somehow

/periódiko/ > [periódico], /depósito/ > [depósito]

Page 13: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical Approach

• What evidence is there that we need to conserve storage space in our brain

• Lots of evidence we store many details of words

• Store everything, not just what is unpredictable• Sang, thought, walked, formed

• All word form connections between others that are semantically, phonetically, morphologically, relationally similar• Semantically: cut/tear, break/breach• Phonetically: tribe/bribe• Morphologically: sit/sat, reveal/revelation• Relationally (collocates): homework/school, nurse/shot

Page 14: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Spanish Stress

• Analogical approach• All words stored with stress• No rules needed• Find similar words and apply stress they have• Animál has final stress due to its neighbors with final stress (tamál, formár, . .

. )Antepenults stress IS predictable based on other words with this stress.

Lots of them end in –iko, and -ito/periódiko/ > [periódiko], /depósito/ > [depósito]

Page 15: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• How do you test the theory that analogy explains stress?• You need a model

Page 16: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• How do you test the theory that analogy explains stress?• You need a model

ANALOGICAL MODELING OF LANGUAGE (Royal Skousen)

Page 17: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• You need a database that approximates what speakers know• For Spanish it’s the 5000 most frequent words

• You need information about the words that is used to find similar words• Phonemic and morphological information about Spanish words

• Relatando• Phones by syllable

• Re= la= tan =do=• Morphology

• Re= la= tan =do= Gerund• Stress placement

• Penult

Page 18: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• Tool question• Where would you get the 5000 most frequent words?

Page 19: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• Outcome is probabilistic

Page 20: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• Results (based onmajority rules)

Page 21: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• Leave one out simulation• Take each word out one by one• Pretend you don’t know where the stress is• Use analogy to predict it

Page 22: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• Leave one out simulation• Take each word out one by one• Pretend you don’t know where the stress is• Use analogy to predict it

• Outcome 94.4% correct

Page 23: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• Leave one out simulation• Take each word out one by one• Pretend you don’t know where the stress is• Use analogy to predict it

• Outcome 94.4% correct

• How does this compare to applying rules?

Page 24: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• Leave one out simulation• Take each word out one by one• Pretend you don’t know where the stress is• Use analogy to predict it

• Outcome 94.4% correct

• How does this compare to applying rules?

• Rules get 86.6% correct

Page 25: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogical modeling

• Is AM doing it the way people do?• Hochberg taught kids made up words, then observed their stress errors• AM made regularization and irregularization errors in same direction

Page 26: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of
Page 27: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogy in phonology

• càpitalálistic capi[ɾ]alistic

• mìlitarístic mili[th]aristic

• Same prosodic structure, but different realizations of /t/. Same rule should apply to both

Page 28: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogy in phonology

• Steriade’s (2000) rule predicts flap in both cases

• capi[ɾ]al rule explains the flap in capi[ɾ]alistic

• mili[t]ary analogy messes up the rule in the stop in mili[th]aristic

Page 29: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogy in phonology

• Analogy says there is not rule and it’s all analogy

• Lets’ determine outcome (e.g. [ɾ] or [t]) based on the similarity of the test form to a database of stored instances.

Page 30: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogy in phonology

• Analogy says there is not rule and it’s all analogy

• Lets’ determine outcome (e.g. [ɾ] or [t]) based on the similarity of the test form to a database of stored instances.

• 3,719 instances of allophones of /t/ taken from TIMIT (a tool from LDC!)• 630 speakers read 10 sentences.• Utterances transcribed• 644 [ɾ ], 234 [ʔ ], 284 [Ø], 760 [t], 860 [t˭], and 969 [th], 48 [d].

Page 31: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogy in phonology

• Each instance of /t/ is encoded to include its allophonic realization and the context it appears in.

• The phones or boundaries three slots to the left and right of /t/, and stress are encoded.• e.g. I know I didn't meet her• 1) [ɾ], 2) word boundary, 3) [m], 4) [i], 5) word boundary, 6) [ɚ], 7)pause, 8)

primary stress, 9) unstressed

Page 32: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Analogy in phonology

• Test words: capitalistic, negativistic, positivistic, primitivistic, relativistic, habitability, irritability, immutability, dissatisfaction.• Two simulations:

• Base words of test words contain [ɾ] in database.• Base words of test words contain [th] in database.

Page 33: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

Test Word Simulation Type tʰ ɾ t ∅ ʔ t= d

capitalistic Flapping Simulation 12 78 0 3 3 4 0

Aspiration Simulation 90 0 0 3 3 4 0

negativistic Flapping Simulation 1 98 0 0 0 1 0

Aspiration Simulation 93 6 0 0 0 1 0

positivistic Flapping Simulation 0 99 0 0 1 0 0

Aspiration Simulation 96 3 0 0 1 0 0

primitivistic Flapping Simulation 0 96 1 1 0 1 0

Aspiration Simulation 94 2 1 1 0 1 0

relativistic Flapping Simulation 10 90 0 0 0 0 0

Aspiration Simulation 86 14 0 0 0 0 0

habitability Flapping Simulation 8 80 0 0 0 14 0

Aspiration Simulation 80 0 0 0 0 20 0

irritability Flapping Simulation 3 95 0 0 0 2 0

Aspiration Simulation 96 3 0 0 0 1 0

immutability Flapping Simulation 4 93 0 1 0 3 0

Aspiration Simulation 83 12 0 1 0 4 0

dissatisfaction Flapping Simulation 0 100 0 0 0 0 0

Aspiration Simulation 100 0 0 0 0 0 0

Page 34: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

ANALOGY IN PHONOLOGY

• The pronunciation of the base form influences that of the derived for per analogy.

• The base form is not the only word influencing the derived form.• capi[th]alistic predicted at 90%, yet capital only accounts for 30% of this.

Words such as appetite, hepatitis, and particular also influence the outcome.

Page 35: Computer Simulations. Decision Trees 1. Decision trees (classification trees) Designed to find the combination of variables that accounts for most of

35

Ambisyllabicity

• Common in English• Merriam Webster: si.lly, ho.llow, ba.lance

Cambridge: sill.y, ho.llow or holl.ow, bal.ance

• People vary in their perceptions, practices

• This has implications for doubled consonants (ambisyllabicity)

• Frequently observed in the data• Hessari / Hesaari