understand statistical symbols

30
Learning the Language of the Statistician

Upload: udaysk

Post on 09-Jul-2016

216 views

Category:

Documents


1 download

DESCRIPTION

Understand Statistical Symbols

TRANSCRIPT

Page 1: Understand Statistical Symbols

Learning the Language of the Statistician

Page 2: Understand Statistical Symbols

• The following slides contain many of the symbols we will be using in this class. These are the symbols we will be using in formulas. While I do not require you to memorize all of the formulas, it is important that you know what these symbols mean. You will be expected to memorize a few of the simpler formulas for the departmental final.

• To do responsible research, you must assimilate, integrate and apply. This power point presentations concentrates on assimilating this basic information.

Page 3: Understand Statistical Symbols

Sample Sampling Population Distribution -------------------------------------------------------------------------------------------------------- Individual Score yi yi Sample Size n N Mean ӯ µ Standard Deviation 𝑠2 σ/n σ estimated by s/ 𝑛 Variance S2 σ2

Sum ∑ ∑ Proportion p π Hypothesized Mean ӯo µo Hypothesized Proportion p0 πo

Sigma

Pi

Mu

Page 4: Understand Statistical Symbols

Stating Hypotheses with Symbols

• One Sample Hypothesis Test for a Proportion

o Null hypothesis • P = π The sample proportion is the same as the population proportion.

o Research hypothesis • P ≠ π The sample proportion is NOT the same as the population

proportion. If you have a theory, you can use a one-tailed test and indicate that it is greater or less than the population proportion.

• One Sample Hypothesis Test for a Mean

o Null hypothesis • ӯ = µ The sample mean is the same as the population mean.

o Research hypothesis

• ӯ ≠ µ The sample mean is not the same as the population mean. If you have a theory, you can use a one-tailed test and indicate that it is greater or less than the population mean.

Page 5: Understand Statistical Symbols

Stating Hypotheses with Symbols

• Chi Square

o Null hypothesis • H0 E=O, The expected value equal the observed value • The dependent variable is contingent on the independent variable in the population

o Research hypothesis • H1 E≠O, The expected value does not equal the observed value • The dependent variable is NOT contingent on the independent

variable in the population

NOTE – For an Elaborated Chi Square you simply state that E=0 for all of the independent/dependent combinations for the null hypothesis. For the research hypothesis you state that E ≠ 0 for at least one of the combinations. You would actually test each dependent/independent combination separately.

Page 6: Understand Statistical Symbols

Stating Hypotheses with Symbols

• One-Way Anova - with 2 groups

o Null hypothesis

• H0 µ1 = µ2, The Means are equal Or The Mean of Group 1 is the same as the Mean of Group 2 in the population

o Research hypothesis

• Two Tailed – one the computer uses • H0 µ1 ≠ µ2, The Means are not equal OR the Mean of Group 1 is not the

same as the Mean of Group 2 in the population • One Tailed - state a direction • H0 µ1 < µ2, or µ1 > µ2 The Mean of Group 1 lower than the Mean of

Group 2 in the population. The Mean of Group 1 is higher then the mean of Group 2 in the population.

Page 7: Understand Statistical Symbols

Stating Hypotheses with Symbols

• One-Way Anova - with more than 2 groups*

o Null hypothesis • H0 µ1 = µ2……..µk The Means of all the groups are equal.

o Research hypothesis • Two Tailed – one the computer uses • H0 µ1 ≠ µ2,…….. µk The Means are not equal. The Mean of one group is

not equal to the Mean of at least one other group.

o * This is still bi-variate. You don’t have more variables – only more categories

in the categorical variable.

Page 8: Understand Statistical Symbols

Stating Hypotheses with Symbols

• Bi-Variate Regression

o Null hypothesis • H0 Β1 = 0, The regression slope is not different from 0 in the population • There is no relationship between the independent and dependent variables

in the population.

o Research hypothesis • H0 Β1 ≠ 0, The Slope is different from 0 in the population • There is a relationship between the independent and dependent variable in

the population.

• Multi-Variate Regression

o Null hypothesis • H0 Β1…..βk = 0, The regression slope is not different from 0 in the population • There is no relationship between the independent and dependent variable

in the population. o Research hypothesis

• H0 Β1……βk ≠ 0, At leas one of the Slopes is different from 0 in the population. • There is a relationship between the independent variable and at least one

of the dependent variables in the population.

Page 9: Understand Statistical Symbols

Matching Variables with Types of Analysis

Chi-square (2 categorical variables) type of car you drive by gender race by political preference race by eye color gender by YES/NO questions Anova (1 categorical and one continuous variable) gender by yearly income gender by score on self esteem index race by yearly income political preference by yearly income age by whether or not you have children Bi Varate Regression (Two Continuous Variables) yearly income by years of education years married by marital satisfaction (scale score) age by number of children Multiple Regression ( continuous/dummy independent and continuous

dependent) number of dates per year by yearly income, age, height, gender (dummy variable). poverty rates by sex ratio, percent single headed

household, percent employed.

Page 10: Understand Statistical Symbols

Statistics That Do Not Use Hypotheses

• Confidence Intervals

o We generally do not state a hypothesis for a Confidence Interval. Confidence Intervals are used to estimate a population mean or proportion based on a sample mean or proportion. Opinion polls use Confidence Intervals to predict election results etc.

• Pearson Correlation (correlation co-efficient or r)

o We generally do not associate Pearson Correlation Matrixes with

hypotheses. We generally use Pearson Correlation Matrixes for diagnostic purposes and to test the strength of bi-variate relationships.

Page 11: Understand Statistical Symbols

Equations/Formulas Z Tests • Z scores

o Z= 𝒚𝒚 − µ𝝈

o Where yi = individual’s score o µ = population mean o Σ = population standard deviation

o Information needed

• Population mean and standard deviation

o Example of when we would use this • If you knew an individual’s SAT/ACT score, you could

determine what percentile they scored in (i.e., the 95%) • OR if you know what percentile they are in, you can

determine their score.

Page 12: Understand Statistical Symbols

Equations for Inferential Statistics • Summary Statistics

o Mean

• ӯ= ∑𝐲𝐲/n

o Median • 𝒏+𝟏

𝟐 Order values and count up this far

o Variance

• S2 =

o Standard Deviation • S = 𝑠2

∑(𝑦𝑦 − ӯ)2𝑛 − 1

Page 13: Understand Statistical Symbols

Inferring a Population Mean or Proportion Based on Sample Mean or Proportion

• The following Slides Focus on How to Estimate a Population Mean or Proportion if we ONLY have a random sample.

• In these cases we estimate one point in the population (i.e., the mean IQ of USU students)

• BUT we build a confidence interval around this single point – generally a 95% confidence interval

Page 14: Understand Statistical Symbols

error

Page 15: Understand Statistical Symbols
Page 16: Understand Statistical Symbols
Page 17: Understand Statistical Symbols

A One or Large Sample Hypothesis Test

• In the following slides we compare a sample mean or proportion with a population mean or proportion.

• We want to know if our sample mean or proportion is different from the population mean or proportion

• The population mean or proportion could actually be a mean/proportion that is specified by a theory or by past research (rather than a number computed from a population data set)

Page 18: Understand Statistical Symbols

Equations/Formulas for One Sample Hypotheses Tests • The equations are outlined in red • What do the symbols mean

o One sample hypothesis test for Proportion o P = proportion in the sample o Π0 =proportion or hypothesized proportion in the population o n = sample size o Z = computed statistic

o One sample hypothesis test for Mean o Ӯ = mean in the sample o µ0 = mean or hypothesized mean in the population o n = sample size o sӮ = standard error or an estimate of the standard deviation in the population o s 𝒏 = computation for estimating the standard error using standard deviation

of the sample size times the square root of the sample size.

o

Page 19: Understand Statistical Symbols
Page 20: Understand Statistical Symbols
Page 21: Understand Statistical Symbols

Symbols for Statistics that Infer the Relationship in the Sample to the Population

Symbol (s) Interpretation Chi Square X2 Chi Square Statistic

Regression β beta – slope in population b slope in sample ạ alpha – intercept or constant in prediction formula X1…X value of the X variables Ŷ y-hat or predicted Y ӯ Y bar or the mean of Y Anova µ Mu or mean in population yi - ӯ

Page 22: Understand Statistical Symbols

Chi-Square Equation

Page 23: Understand Statistical Symbols

Equations/Formulas for Inferential Statistics o Pearson Correlation Coefficient and R2

• Formula

o r = ∑(𝑿𝒚 − ) (yi – Ӯ) ∑ 𝒙𝒚 − 𝒙 𝟐 ∑(𝒚𝒚 − ӯ) 𝟐 o R2 = r squared

o Multiple Regression

o Prediction Equation • Ŷ = ά + b1x1 + b2x2 + b3x3 +….. • Ŷ = predicted score for the dependent variable • a = intercept or constant • b = slope or parameter estimate for independent variables – unit increase in Y

variable for ever 1 unit increase in X • X = value of the X values – taken from the codebook

o

Page 24: Understand Statistical Symbols

Equations/Formulas for Inferential Statistics • Anova

o Formula

o TSS = ∑𝒚𝒚𝟐𝒋 - G2

n

o SSB = ∑(𝑻𝟐

𝒏𝒚) − 𝑮2

n

o SSW = TSS – SSB

s2B = F statistic

s2w

s2B = SSB/k-1

S2

w = SSW/n-k

o

F = S2B/S2

W df between = k-1 df within = n-k

TSS = Total Sum of Squares SSB = Sum of Squares Within

SSW = Sum of Squares Between

Page 25: Understand Statistical Symbols

Anova and Regression Sums of Squares

• Anova o TSS = Total Sum of Squares o SSW = Sum or Squares within each group o SSB = Sum of Squares between the groups

SSB/TSS = R square or the proportion of the total sum of squares that is explained by group membership

• Regression

o TSS – Total Sum of Squares o SSM – Sum of Squares Model o SSE – Sum of Squares Error

Page 26: Understand Statistical Symbols

Equations/Formulas for Inferential Statistics • Two Sample T-test

o Formula • T = ӯ1 – ӯ2 __________

sӯ1 – ӯ2 this part is computed as follows

sӯ1 – ӯ2 = SP 𝟏/𝒏𝟏 + 𝟏/𝒏𝟐 Estimated standard error of the difference between the two means

Pooled standard Sp = 𝒏𝟏 −𝟏 𝑺𝟐𝟏+ 𝒏𝟐 −𝟏 𝑺𝟐𝟐𝒏𝟏+𝒏𝟐 −𝟐

o deviation standard deviation standard deviation of sample 1 of sample 2

o What symbols mean • t = critical value • Ӯ1 = mean of sample one • Ӯ2 = mean of sample two • n1 = size of sample 1 and n2 = size of sample 2 • Degrees of freedom = df = n1 + n2 – 2

o

• Uses a T distribution

Page 27: Understand Statistical Symbols

Equations/Formulas for Inferential Statistics

• Mann Whitney o Focuses on ranks rather than on means – medians o Two Groups o Formula

• Z= T1 – E(T1) E(T1) = n1 (n+1) 𝒗𝒗𝒗 (𝑻1) 2 • Rank values from smallest to largest • Sum ranks in smaller group = T1

• Compute E(T1) • Compute Variance Var T1 = n1 n2 S2

n

s2 = ∑(Yi - Ӯ )2

n-1 Uses a Z dsitribution.

Page 28: Understand Statistical Symbols

Equations/Formulas for Inferential Statistics • Kruskal Wallis

o Focuses on ranks (medians) rather than on means o More than Two Groups o Formula

𝒙𝟐 = 𝟏𝟐

𝒏 (𝒏+𝟏)∑ 𝒌

𝑻𝟐𝒌𝒏𝒌

-3 (n+1)

T = total sum of ranks for each sample n = total number of cases nk = number of cases for the k sample

Uses X2 Distribution Degrees of Freedom = k-1 (where K is number of groups) Use when you want to compare more than two groups, and the distribution is not normal.

Page 29: Understand Statistical Symbols

Equations/Formulas for Inferential Statistics

• Formulas for Sample Size

Sample size (n) = 𝑁𝑑𝑑

.9604 (𝑁+1)

D = degrees of freedom or margin of error (usually .05) N= population size .9604 = a constant related to at least 95% sure This sample size is large enough that we can be at least 95% sure we can generalize to the population with a margin of error of .05

Page 30: Understand Statistical Symbols

• Prepared by Dr. Carol Albrecht