grs lx 865 topics in linguistics

26
Week 6. Statistics etc. Week 6. Statistics etc. GRS LX 865 GRS LX 865 Topics in Topics in Linguistics Linguistics

Upload: marlo

Post on 18-Mar-2016

28 views

Category:

Documents


2 download

DESCRIPTION

GRS LX 865 Topics in Linguistics. Week 6. Statistics etc. Update on our sentence processing experiment…. Quick graph of reaction time per region. Update. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GRS LX 865 Topics in Linguistics

Week 6. Statistics etc.Week 6. Statistics etc.

GRS LX 865GRS LX 865Topics in Topics in

LinguisticsLinguistics

Page 2: GRS LX 865 Topics in Linguistics

Update on our sentence Update on our sentence processing experiment…processing experiment…

Quick graph of reaction time per Quick graph of reaction time per regionregion

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8

NPIHeJohn

Page 3: GRS LX 865 Topics in Linguistics

UpdateUpdate Seems nice; there’s a difference in Seems nice; there’s a difference in

region 5 (where the NP, region 5 (where the NP, II, , theythey, , JohnJohn were) and also in region 6. From were) and also in region 6. From slowest to fastest, slowest to fastest, JohnJohn, , hehe, NP, , NP, II..

Something like what we expected—Something like what we expected—but wait…but wait…

Page 4: GRS LX 865 Topics in Linguistics

UpdateUpdate Two further things that this didn’t Two further things that this didn’t

account for:account for: Different people read at different speeds.Different people read at different speeds. II is a lot shorter than is a lot shorter than the photographerthe photographer. .

Might it go faster?Might it go faster?

To take account of people’s reading To take account of people’s reading speeds, tried average RT per speeds, tried average RT per character on the fillers.character on the fillers.

Page 5: GRS LX 865 Topics in Linguistics

Subject RT/cSubject RT/c Average RT per Average RT per

character was character was pretty much all pretty much all over the map.over the map.

So at least it So at least it seemed worth seemed worth factoring out.factoring out.

Overhead?Overhead?0

0.5

11.5

2

2.5

3

3.5

4

70 100130160190 220

N

Page 6: GRS LX 865 Topics in Linguistics

Items?Items? It’s also important It’s also important

to look at the items. to look at the items. Were any Were any alwaysalways incorrect? Those incorrect? Those might have been might have been too hard or had too hard or had something else something else wrong with them. ?wrong with them. ? (Not clear that we (Not clear that we

actually actually carecare whether the answer whether the answer was right)was right)

00.10.20.30.40.50.60.70.80.9

1

1 4 7 10131619

%corr

Page 7: GRS LX 865 Topics in Linguistics

End result so far?End result so far? So, taking that all into account, I So, taking that all into account, I

ended up with this… Not what we ended up with this… Not what we were going for.were going for.

-500

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8

npItheyname

Page 8: GRS LX 865 Topics in Linguistics

So…So… There’s still work to be done. Since There’s still work to be done. Since

I’m not sure exactly I’m not sure exactly whatwhat work that work that is, once again… no lab work to do.is, once again… no lab work to do.

Instead, we’ll talk about statistics Instead, we’ll talk about statistics generally…generally…

Places to go:Places to go: http://davidmlane.com/http://davidmlane.com/hyperstathyperstat// http://www.stat.sc.edu/webstat/http://www.stat.sc.edu/webstat/

Page 9: GRS LX 865 Topics in Linguistics

Measuring thingsMeasuring things When we go out into the world and When we go out into the world and

measure something like reaction time for measure something like reaction time for reading a word, we’re trying to investigate reading a word, we’re trying to investigate the underlying phenomenon that gives rise the underlying phenomenon that gives rise to the reaction timeto the reaction time..

When we measure reaction time of reading When we measure reaction time of reading II vs. vs. theythey, we are trying to find out of there , we are trying to find out of there is a real, systematic difference between is a real, systematic difference between them (such that them (such that II is generally faster). is generally faster).

Page 10: GRS LX 865 Topics in Linguistics

Measuring thingsMeasuring things So, suppose for any given person, it So, suppose for any given person, it

takes A ms to read takes A ms to read II and B ms to read and B ms to read theythey..

If our measurement worked perfectly, If our measurement worked perfectly, we’d get A whenever we measure for we’d get A whenever we measure for II and B whenever we measure for and B whenever we measure for theythey..

But it’s a noisy world.But it’s a noisy world.

Page 11: GRS LX 865 Topics in Linguistics

Measuring thingsMeasuring things Measurement never works perfectly.Measurement never works perfectly.

There is always additional There is always additional noisenoise of some of some kind or another. You’re likely to get a kind or another. You’re likely to get a value value near near A when you measure A when you measure II, but , but you’re not guaranteed to get A.you’re not guaranteed to get A.

Similarly, there are differences between Similarly, there are differences between subjects, differences between items, subjects, differences between items, differences of still other sorts…differences of still other sorts…

Page 12: GRS LX 865 Topics in Linguistics

A common goalA common goal Commonly what we’re after is an answer Commonly what we’re after is an answer

to the question: to the question: are these two things that are these two things that we’re measuring actually different?we’re measuring actually different?

So, we measure for So, we measure for II and for and for theythey. Of the . Of the measurements we’ve gotten, measurements we’ve gotten, II seems to be seems to be around A, around A, theythey seems to be around B, and seems to be around B, and B is a bit longer than A. The question is: B is a bit longer than A. The question is: given the inherent noise of measurement, given the inherent noise of measurement, how likely is it that we got that different how likely is it that we got that different just by chance?just by chance?

Page 13: GRS LX 865 Topics in Linguistics

Some stats talkSome stats talk There are two major uses for statistics:There are two major uses for statistics:

Describing a set of data in some comprehensible Describing a set of data in some comprehensible wayway

Drawing inferences from a sample about a Drawing inferences from a sample about a population.population.

That last one is the useful one for us; by That last one is the useful one for us; by picking some random representative sample picking some random representative sample of the population, we can estimate of the population, we can estimate characteristics of the whole population by characteristics of the whole population by measuring things in our sample.measuring things in our sample.

Page 14: GRS LX 865 Topics in Linguistics

Normally…Normally… Many things we measure, with their Many things we measure, with their

noise taken into account, can be noise taken into account, can be described described (at least to a good approximation) by (at least to a good approximation) by this “bell-shaped” this “bell-shaped” normal normal distributiondistribution..

Often as we do statistics, we Often as we do statistics, we implicitly assume that this is the implicitly assume that this is the case…case…

Page 15: GRS LX 865 Topics in Linguistics

First some descriptive First some descriptive stuffstuff

Central tendency:Central tendency: What’s the What’s the usualusual value for this thing value for this thing

we’re measuring?we’re measuring? Various ways to do it, most common way Various ways to do it, most common way

is by using the is by using the arithmetic meanarithmetic mean (“average”).(“average”).

Average is determined by adding up Average is determined by adding up the measurements and dividing by the measurements and dividing by the number of measurements.the number of measurements.

Page 16: GRS LX 865 Topics in Linguistics

Descriptive statsDescriptive stats SpreadSpread

How often is the measurement right around the How often is the measurement right around the mean? How far out does it get?mean? How far out does it get?

RangeRange (maximum - minimum), kind of basic. (maximum - minimum), kind of basic. VarianceVariance, , standard deviationstandard deviation: a more sophisticated : a more sophisticated

measure of the width of the measurement measure of the width of the measurement distribution.distribution.

You describe a normal distribution in terms of You describe a normal distribution in terms of two parameters, mean and standard two parameters, mean and standard deviation.deviation.

Page 17: GRS LX 865 Topics in Linguistics

Interesting facts about Interesting facts about stdevstdev

About 68% of the observations will be within About 68% of the observations will be within one standard deviation of the mean.one standard deviation of the mean.

About 95% of the observations will be within About 95% of the observations will be within two standard deviations of the mean.two standard deviations of the mean.

Percentile (mean 80, score 75, stdev 5): 15.9Percentile (mean 80, score 75, stdev 5): 15.9

Page 18: GRS LX 865 Topics in Linguistics

So, more or less, …So, more or less, … If we knew the If we knew the actualactual mean of the mean of the

variable we’re measuring and the variable we’re measuring and the standard deviation, we can be 95% standard deviation, we can be 95% sure that any given measurement we sure that any given measurement we do will land within two standard do will land within two standard deviations of that mean—and 68% deviations of that mean—and 68% sure that it will be within one.sure that it will be within one.

Of course, we can’t know the actual Of course, we can’t know the actual mean. But we’d like to.mean. But we’d like to.

Page 19: GRS LX 865 Topics in Linguistics

Confidence intervalsConfidence intervals It turns out that you kind run this logic in It turns out that you kind run this logic in

reverse as well, coming up with a reverse as well, coming up with a confidence confidence intervalinterval (I won’t tell you how precisely, but (I won’t tell you how precisely, but here’s the idea):here’s the idea):

Given where you see the measurements coming Given where you see the measurements coming up, they must be 68% likely to be within 1 CI of up, they must be 68% likely to be within 1 CI of the mean, and 95% likely to be within 2 CI of the mean, and 95% likely to be within 2 CI of the mean, so the more measurements you have the mean, so the more measurements you have the better guess you can make.the better guess you can make.

A 95% CI like 209.9 < µ < 523.4 means “we’re A 95% CI like 209.9 < µ < 523.4 means “we’re 95% confident that the 95% confident that the realreal mean is in there”. mean is in there”.

Page 20: GRS LX 865 Topics in Linguistics

Hypothesis testingHypothesis testing Testing to see if the means generating two Testing to see if the means generating two

distributions are actually different.distributions are actually different. The idea is to determine how likely it is that we The idea is to determine how likely it is that we

could get the difference we observe by could get the difference we observe by chancechance. . After all, you could roll 25 6’es in a row, it’s just After all, you could roll 25 6’es in a row, it’s just very unlikely. (1/6)^25. (very unlikely. (1/6)^25. (Null hypothesis Null hypothesis = = chance).chance).

Once you estimate the Once you estimate the samplesample means and standard means and standard deviations, this is something you basically look up deviations, this is something you basically look up ((tt-test, based on number of observations you -test, based on number of observations you make). This is what you see reported as make). This is what you see reported as p.p.

““p < 0.05” means there’s only a 5% chance this p < 0.05” means there’s only a 5% chance this happened by accident.happened by accident.

Page 21: GRS LX 865 Topics in Linguistics

SignificanceSignificance Generally, 0.05 is taken to be the level Generally, 0.05 is taken to be the level

of “significance”—if the difference you of “significance”—if the difference you measure only has a 5% chance of measure only has a 5% chance of having arisen by pure accident, than having arisen by pure accident, than that difference is that difference is significantsignificant..

There’s no real magic about 0.05, it’s There’s no real magic about 0.05, it’s just a convention. Hard to say that just a convention. Hard to say that 0.055 and 0.045 are seriously 0.055 and 0.045 are seriously qualitatively different.qualitatively different.

Page 22: GRS LX 865 Topics in Linguistics

ANOVAANOVA Analysis of varianceAnalysis of variance—same as the —same as the tt--

test, except for more than two means test, except for more than two means at once. Still trying to discover if at once. Still trying to discover if there are differences in the there are differences in the underlying distributions of several underlying distributions of several means that are unlikely to have means that are unlikely to have arisen just by chance.arisen just by chance.

I hope to come back to this. Perhaps I hope to come back to this. Perhaps it can be tacked on to a different lab.it can be tacked on to a different lab.

Page 23: GRS LX 865 Topics in Linguistics

Statistical powerStatistical power In general, the more In general, the more

samples you get, the samples you get, the better off you are—the better off you are—the more more statistical powerstatistical power your analysis has. Also, your analysis has. Also, the lower the variance, the lower the variance, the significant level the significant level you’ve chosen.you’ve chosen.

Technically, statistical Technically, statistical power has to do with power has to do with how likely it is that you how likely it is that you will correctly reject a will correctly reject a false null hypothesis.false null hypothesis.

H0 H0 truetrue

H0 H0 falsefalse

Reject Reject H0H0

Type I Type I errorerror

CorrecCorrectt

Do not Do not reject reject H0H0

CorrecCorrectt

Type II Type II errorerror

Page 24: GRS LX 865 Topics in Linguistics

Correlation and Chi Correlation and Chi squaresquare

Correlation between Correlation between two two measured two two measured variables is often variables is often measured in terms of measured in terms of (Pearson’s) r.(Pearson’s) r.

If r is close to 1 or -1, If r is close to 1 or -1, the value of one the value of one variable can predict variable can predict quite accurate the quite accurate the value of the other.value of the other.

If r is close to 0, If r is close to 0, predictive power is low.predictive power is low.

Chi-square test is Chi-square test is supposed to help supposed to help us decide if two us decide if two conditions/factors conditions/factors are independent of are independent of one another or not. one another or not. (Does knowing one (Does knowing one help predict the help predict the effect of the effect of the other?)other?)

Page 25: GRS LX 865 Topics in Linguistics

Much more to it…Much more to it… Mainly I just wanted you to see some Mainly I just wanted you to see some

terminology. I hope to get some terminology. I hope to get some workable data from some workable data from some experiment or lab we do that we can experiment or lab we do that we can put into a stats program, perhaps put into a stats program, perhaps just WebStat.just WebStat.

……

Page 26: GRS LX 865 Topics in Linguistics