Download - Introduction to Inferential Statistics and Hypothesis Testingsnu-dhpm.ac.kr/pds/files/Introduction to Inferential... · 2004-11-18 · Introduction to Inferential Statistics and Hypothesis

Introduction to Inferential Statistics and Hypothesis Testing

발제자: 박재현

Normal Curve

Mean, median, mode: center

Symmetrical ball shaped curve

(fig. 3-1)

Eg. Intelligence, attitude, personality

1Z=1SD

95% -> +- 1.96Z

99% -> +- 2.58Z

Percentiles

Relative score of a given score

Compare scores on tests that have different means and SDs

(No. of scores less than a given score/total No. of score)*100

25th percentile=first quartile

50th percentile=second quartile (median)

75th percentile=third quartileIt’s point, not range (table 3-1)

If z score >0 then 50+ ( ) %

If z score <0 then 50- ( ) % (Appendix A)

mean에 가까울수록 등수 올리기가 쉽다.

Standard scores

Relative distance from the mean

600점=1SD -> Z score=1

Page 56

Z score -> mean=0, SD=1

Can compare different distribution

소수나 음수가 생김

Transformed standard scoreT-scores: mean 50, SD 10

다른 score 도 가능

Correcting failures in normality through data transformations

Data that do not meet normality, linearity, homoscedasticity(등분산성)

Recalulate the measure of skewness

Moderately positive skewed -> square root

Substantially positive skewed -> log severely positive skewed -> inverse

Negative skewed -> reverse score (make it positive) or recode -> transform

If not successful -> make it categorical

Central limit theorem

Sample mean = population mean?

Sample No increase -> sample mean: normally distributed

SD of sample mean = Standard error of the mean = SD of sample/square root of n(in one sample: calculated)

SD of sample increase -> need more sample to estimate

Probability

Eg. Mammography(assume that Patient X is on the line, similar with group)

Fig. 3-2, Table 3-2

Probability of event = 100% -probability of opposite of event

Definition of probability

Frequency probabilityThrough empirical observation

Page 62 공식

assume that Patient X is on the line, similar with group

Must be random process(in mathematical theory)

Equal chance of being chosen

Each choice mus be independent

If No. of sample increase -> sample prob. = pop. prob.

위 연구의 한계

Determined logically 이 또한 objective

Priori probability

Table 3-3

Subjective probabilityRational assessment not arbitrary beliefs

17.9%: Is it close to 0 or not?

Partly intuitive and historical

4 heads in a row: 6%, 5 heads in a row: 3%

5% cutoff(Fisher , 1926): convenient

연구의 성격에 따라 1%, 10%, 20% 등 다양하게쓸 수 있음.

Probability rules

Conditional probabilityIf events are dependent

Table 3-4

Multiplication ruleIf events are independent

Independent means that given knowledge of event A does not change of prob of event B

Additional ruleIf events are mutually exclusive

Hypothesis testing

One study can not prove anything!

Null Hypothesis: H0There is no difference -> accepted

There is difference -> reject

Types of errorType I, Type II (p 69) – trade-off 관계

Probability of making type I error = alpha

Alpha ↓-> power of test ↓, risk of type II error ↑Alpha ↑-> power of test ↑, risk of type II error ↓Sample size ↑ , SD ↓ , effect size ↑ -> risk of type II error ↓ (effect size = group 간 차이 / SD

Genetic defect test : type II error 줄이는 것이 중요

Preschool preparation: type I error 줄이는 것이 중요

Power of a test

Powerful test: more likely to reject H0 when the difference exist

One tailed testmore powerful

Must have sound theoretical basis

Fig. 3-3, fig. 3-4

cf> Degrees of freedomRelated to No. of scores, items or other units in data set to the idea of freedom to vary

Sample mean known -> df=n-1

Statistical Inference & statistical Estimate

Statistical InferenceSample mean to be representative -> random selection (every member have same prob to be selected)

Statistical EstimatesSample mean = point estimate

CI = range or interval if value (infer the true value of an unknown population parameter of pop.)

Confidence Intervals

Page 74

Sampling error = the difference between sample mean and pop mean (CI tells us)

Question 1

Question 2

Question 3It happens because we calculate using sample not pop

It differ from sample to sampleWe may not conclude that probability is 95% that “mean of pop is between mean+-CI”95% refers to the average accuracy of the procedure

The relationship between confidence intervals and significance test

Significance test: Page 76 table

(t-test)

CI: Fig. 3-5

Consistency checks for evaluating research reports: cross check!

Value of CIsIt contains more information because it is equivalent to performing a sig. Test for all values of the parameter, not just a single value

Cautionrandom sample > sample from explicitly defined pop >non-representative sample

Sample size

Sample size is related to power, effect size, significance level

Power = the likelihood of rejecting the null hypothesis (80%: adequate level)

Effect sizeCohen(1987)

0.2 SD = small effect size

0.5 SD = moderate effect size

0.8 SD = large effect size

Significance levelProb of rejecting a true H0 (type I error) = alpha

Small sample size -> Non-significant result

등분산성

등분산성

선형회귀모형의 가정독립변수 X값들은 고정되어 있음(비확률 변수, 수리적 변수)

변수 X의 추정치는 오차(error)를 갖지 않음

각 X값에 따라 Y값들의 하부모집단이 존재함. 이하부모집단은 정규분포를 따른다고 가정함.

Y의 하부모집단의 분산들은 모두 동일함(등분산성)

Y의 하부모집단들의 평균은 일직선상에 놓임(선형성의 가정)

Y값들은 통계적으로 독립임.

단순선형 회귀모형의 도식

등분산성, 선형성 분석

• 분석시작 전에 반드시 산점도를 그려보고 문제점을 파악할 것.

• 이 경우에는 선형성과 등분산성에 문제가 있음.

이분산성 해결

이분산성독립변수가 증가함에 따라 분산도 증가하는 경우가 흔함

예X: 개별가구 연간소득

Y: 연간 소비지출

고소득 가구보다는 저소득 가구의 소비 변동량이 일반적으로 작음.

이분산성의 해결독립변수에 로그, ln, root등을 취해transformation을 함.

표본수 계산과 검정력

단순임의추출에서평균비교 시 표본수 계산

단순임의추출에서비율비교 시 표본수 계산

검정력(power)와 effect size

Hand out 참조

신뢰구간의 의미 해석

박재현

평균 추정

가정모집단의평균: μ 분산: σ (보통잘모름)표본집단의 평균 x, 분산: s, 표본크기: n

모집단의 평균을 추정X은 여러 개 있을 수 있음(표본크기가 n인추출가능한 모든 표본을 생각해볼 수 있음)

즉, X1 , X2 , X3 , X4 . . . . 이 있을 수 있음.

이때 표본의 크기가 n인 X1 , X2 , X3 ,X4 . . . 의 평균은 표본수가 많을 수록 μ에근사함

평균의 표준오차(standard error)

평균의표준오차(표준오차) = X1 , X2 , X3 ,

X4 . . . . 들의 표준편차 = σ/ = s/ (σ를 아는경우는매우드뭄)왜냐하면수많은 S2들의평균은 σ2임

구간 추정

σ 를알고있는경우 μ에대한 95% 신뢰구간은 (X – 1.96 σ / , X + 1.96 σ/ )

신뢰구간의의미해석모집단에서표본의크기가 n인표본을반복추출하여이들표본으로부터얻은신뢰구간의 95%에서모수 μ를포함한다는것임.

틀린해석모수 가특정 95%신뢰구간에포함될확률

구간 추정

Download - Introduction to Inferential Statistics and Hypothesis Testingsnu-dhpm.ac.kr/pds/files/Introduction to Inferential... · 2004-11-18 · Introduction to Inferential Statistics and Hypothesis

Top Related