asistensi statprob 2014 prauas

Asistensi StatProb 2014

StatProb 01, 02, dan 03Jumat, 12 Juni 2014

Pengertian dan Konsep DasarProsedur Umum Uji Hipotesis

1. Hipotesis nol (H0) adalah asumsi yang akan diuji dan sementara dianggap benar. Hipotesis nol dinyatakan dalam hubungan =

2. Hipotesis alternatif (H1) adalah hipotesis yang menolak hipotesis nol. Hipotesis alternatif (hipotesis tujuan)dinyatakan dalam hubungan : > ; < ;

3. Tingkat kepentingan/level of significance () menyatakan suatu tingkat resiko kesalahan dengan jika hipotesis nol. (Yang biasa digunakan = 0,05 atau 0,01)

Uji Satu Populasi Uji Hipotesis Mean/Proporsi

Uji Dua Ujung

Uji Satu Ujung

Uji Satu PopulasiUji Hipotesis Mean

1. Jika deviasi standard populasi diketahui gunakan error standard populasi

2. Jika deviasi standard populasi tidak diketahui gunakan error standard estimasi

3. Rasio Uji :ˆx s n

0Hz test

x

xRU z

n kecil, standard deviasi tdk diketahui

n besar, standard deviasi diketahui

1. H0 : = 0 H1 : ≠ 0

2. H0 : ≥ 0

3. H0 : ≤ 0

H1 : < 0

H1 : > 0

2/;10

/

nhit tns

xt thit > tn-1;/2

Statistik uji Daerah penolakan

;10

/

nhit tns

xt

;10

/

nhit tns

xt

thit < -tn-1;

thit > tn-1;

zhit > z/21. H0 : = 0

2. H0 : ≥ 0

3. H0 : ≤ 0

H1 : ≠ 0

H1 : < 0

H1 : > 0

z

nxzhit

/0

z

nxzhit

/0

2/0

/ z

nxzhit

zhit < -z

zhit > z

Teknik Mesin – FTUI ã Dr.

Ir. Harinaldi, M.Eng

• Cara kedua menggunakan p-value atau significance levelDilihat dari tabel Z :Untuk z = 2,5 maka luasnya 0, 0062 yang berarti lebih besar dari nilai α yang ditentukan karena 0,0062 x 2 = 0,0124 > dari α yang ditetapkan yaitu 0,01. Sehingga H0 diterima

• Cara ketiga menggunakan Confidence Interval

99 % CI untuk µ adalah (1450 (2,575)(120/36))(1450 51,5) sehingga 99 % CI untuk µ adalah (1444,85 ; 1501,5)Nilai hipotesis nol yaitu 1500 ada di dalam interval tersebut, berarti H0 diterima

Untuk menguji hipotesis dapat digunakan 3 cara, yaitu :1. Membandingkan statistik uji dengan nilai dalam tabel2. Menggunakan p-value3. Menggunakan Confidence Interval

Pengujian rata-rata(mean) satu populasi

a. Uji dua arahHo : vs H1 :

b. Uji satu arahHo : vs H1 :

c. Uji satu arah Ho : vs H1 :

0 0

0 0

0 0

Teknik Mesin – FTUI ã Dr.

Ir. Harinaldi, M.Eng

• Cara kedua menggunakan p-value Dari tabel diperoleh p-value 0,15 > α yang ditetapkan berarti masuk pada daerah penerimaan. H0 diterima

• Cara ketiga dengan confidence Interval (4460 (2,575)(250/40)) (4460 101,7858 (4358,21 ; 4561,78) Nilai 4500 ada pada interval berarti H0 diterima

Atau menggunakan CI satu arah (4460 (2,325)(250/40)), ) (4460 91,90; ) (4368,1 ;) Nilai 4500 berada pada interval tersebut berarti H0 diterima

• Cara ketiga dengan p-value Nilai 1,02 mempunyai luas 0,15 berarti > dari α yang ditetapkan sehingga H0 diterima

Pengujian selisih rata-rata dua populasi independen

a. Uji dua arahHo : vs H1 :

b. Uji satu arahHo : vs H1 :

c. Uji satu arah Ho : vs H1 :

1 2 1 2

1 2 1 2

1 2 1 2

Statistik Uji

dan diketahui, nilai berbeda maka :

1

2

Ts

n n

n nn n

x xs

s s

p

p

1 2

1 2

2 1 12

2 22

1 21 11 1

2 di mana

( ) ( )

Z x x

n n

1 2

12

1

22

2

1

Jika dan 2 Tidak diketahui maka

n kecil, standard deviasi tdk diketahui

n besar, standard deviasi diketahui

1. H0 : 1 = 2 H1 : ≠ 2

2. H0 : 1 ≥ 2

3. H0 : 1 ≤ 2

H1 : < 2

H1 : > 2

2/;1/21

nhit tnsxxt thit > tn-1;/2

Statistik uji Daerah penolakan

thit < -tn-1;

thit > tn-1;

zhit > z/21. H0 : 1 = 2

2. H0 : 1 ≥ 2

3. H0 : 1 ≤ 2

H1 : 1 ≠ 2

H1 : 1 < 2

H1 : 1 > 2

2//21

z

nxxzhit

zhit < -z

zhit > z

2/;1/21

nhit tnsxxt

2/;1/21

nhit tnsxxt

2//21

z

nxxzhit

2//21

z

nxxzhit

Uji Sampel TunggalUji Hipotesis Proporsi

1. Rasio Uji :

2. Error Standard

0H

z testP

pRU z

0 0(100 )H H

P n

The Analysis of Variance• The name “analysis of variance” stems from a

partitioning of the total variability in the response variable into components that are consistent with a model for the experiment

• The basic single-factor ANOVA model is

2

1, 2,...,,

1, 2,...,

an overall mean, treatment effect,

experimental error, (0, )

ij i ij

i

ij

i ay

j n

ith

NID

Models for the Data

There are several ways to write a model for the data:

is called the effects model

Let , then is called the means model

Regression models can also be employed

ij i ij

i i

ij i ij

y

y

The Analysis of Variance• Total variability is measured by the total sum of

squares:

• The basic ANOVA partitioning is:

2..

1 1

( )a n

T iji j

SS y y

2 2.. . .. .

1 1 1 1

2 2. .. .

1 1 1

( ) [( ) ( )]

( ) ( )

a n a n

ij i ij ii j i j

a a n

i ij ii i j

T Treatments E

y y y y y y

n y y y y

SS SS SS

The Analysis of Variance

• A large value of SSTreatments reflects large differences in treatment means

• A small value of SSTreatments likely indicates no differences in treatment means

• Formal statistical hypotheses are:

T Treatments ESS SS SS

0 1 2

1

:: At least one mean is different

aHH

The Analysis of Variance• While sums of squares cannot be directly compared to

test the hypothesis of equal means, mean squares can be compared.

• A mean square is a sum of squares divided by its degrees of freedom:

• If the treatment means are equal, the treatment and error mean squares will be (theoretically) equal.

• If treatment means differ, the treatment mean square will be larger than the error mean square.

1 1 ( 1)

,1 ( 1)

Total Treatments Error

Treatments ETreatments E

df df dfan a a n

SS SSMS MSa a n

The Analysis of Variance is Summarized in a Table

• Computing…see text, pp 66-70• The reference distribution for F0 is the Fa-1, a(n-1) distribution• Reject the null hypothesis (equal treatment means) if

0 , 1, ( 1)a a nF F

The data in the scatterplot is a random

sample from a population that may

exhibit a linear relationship between x

and y. Different sample different plot.

ˆ y 0.125x 41.4

Now we want to describe the population mean response y as a function of the explanatory

variable x: One simple description is y = 0 + 1x.

And to assess whether the observed relationship is statistically significant (not entirely explained as simply chance events).

Simple linear regression modelWe assume that in the population, the equation is y = 0 + 1x.

Sample data then fits the model:

Data = fit + residual yi = (0+ 1xi) + (i)

The e’s are called errors andrepresent the differences between y and the mean of y for each value of x.

where the i are independent and

Normally distributed N(0,).Linear regression assumes equal variance of y .

y = 0 + 1x

The intercept 0, the slope 1, and the standard deviation of y are

unknown parameters in the regression model. We rely on the data to

provide unbiased estimates of these parameters.

The value of ŷ from the least-squares regression line is really a prediction of

the mean value of y (y) for a given value of x.

The regression line (ŷ = b0 + b1x) obtained from sample data is the best

estimate of the true population regression line (y = 0 + 1x).

ŷ is an unbiased estimate for mean response y

b0 is an unbiased estimate for intercept 0

b1 is an unbiased estimate for slope 1

The regression standard error, s, for n sample data points is

calculated from the residuals (yi – ŷi):

s is an unbiased estimate of the regression standard deviation .

The population standard deviation, for y at any given value of x represents the standard deviation of

the normal distribution of the i around

the mean y .

2)ˆ(

2

22

n

yyn

residuals ii

Conditions for inference The x-y pairs are independent (of other x-y pairs).

The relationship is linear.

The standard deviation of y, σ, is the same for all values of x.

The response y varies normally around its mean.

Using residual plots to check for regression validityThe residuals (y − ŷ) give useful information about the validity of some

of our assumptions.

We view the residuals in

a residual plot:

If residuals are scattered randomly around 0 with uniform variation, it

indicates that the data fit a linear model, have randomly distributed

residuals for each value of x, and constant standard deviation σ.

Residuals are randomly scattered good!

Curved pattern the relationship is not linear.

Change in variability across plot σ not equal for all values of x.

Confidence interval for regression parametersEstimating the regression parameters 0, 1 is a case of one-sample

inference with unknown population variance.

We rely on the t distribution, with n – 2 degrees of freedom.

A level C confidence interval for the slope, 1, has a width that is

proportional to the standard error of the estimate of the slope:

b1 ± t* SEb1

A level C confidence interval for the intercept, 0 , has a width

proportional to the standard error of the estimate of the intercept:

b0 ± t* SEb0

t* is the critical t for the t (n – 2) distribution with area C between –t* and +t*.

Significance test for the slopeWe can test the hypothesis H0: 1 = 0 versus a 1 or 2 sided alternative.

We calculate t = b1 / SEb1

Under Ho t has the t (n – 2)

distribution. From Table D

We can get the p-value of the

Test.

Testing the hypothesis of no relationshipWe may look for evidence of a significant linear relationship between

variables x and y in the population from which our data were drawn.

For that, we can test the hypothesis that the regression slope parameter β1 is

equal to zero.

H0: β1 = 0 vs. H0: β1 ≠ 0

Testing H0: β1 = 0 also allows to test the hypothesis of no

correlation between x and y in the population.

Note: A test of hypothesis for 0 is easy to do but usually of no interest.

1slope y

x

sb r

s

Confidence interval for µy

We can also calculate a confidence interval for the population mean

μy of all responses y when x takes the value x*.

This interval is centered on ŷ, the unbiased estimate of μy.

The true value of the population mean μy at a given

value of x, will indeed be within our confidence

interval in C% of all intervals calculated

from many different random samples.

The level C confidence interval for the mean response μy at a given

value x* of x is centered on ŷ (unbiased estimate of μy):

ŷ ± tn − 2 * SE^

A separate confidence interval is

calculated for μy along all the values

that x takes.

Graphically, the series of confidence

intervals is shown as a continuous

interval on either side of ŷ.

t* is the t critical for the t (n – 2) distribution with area C between –t* and +t*.

95% confidence interval for y

asistensi statprob 2014 prauas

Documents