asistensi statprob 2014 prauas
DESCRIPTION
StatprobTRANSCRIPT
Pengertian dan Konsep DasarProsedur Umum Uji Hipotesis
1. Hipotesis nol (H0) adalah asumsi yang akan diuji dan sementara dianggap benar. Hipotesis nol dinyatakan dalam hubungan =
2. Hipotesis alternatif (H1) adalah hipotesis yang menolak hipotesis nol. Hipotesis alternatif (hipotesis tujuan)dinyatakan dalam hubungan : > ; < ;
3. Tingkat kepentingan/level of significance () menyatakan suatu tingkat resiko kesalahan dengan jika hipotesis nol. (Yang biasa digunakan = 0,05 atau 0,01)
Uji Satu PopulasiUji Hipotesis Mean
1. Jika deviasi standard populasi diketahui gunakan error standard populasi
2. Jika deviasi standard populasi tidak diketahui gunakan error standard estimasi
3. Rasio Uji :ˆx s n
0Hz test
x
xRU z
n kecil, standard deviasi tdk diketahui
n besar, standard deviasi diketahui
1. H0 : = 0 H1 : ≠ 0
2. H0 : ≥ 0
3. H0 : ≤ 0
H1 : < 0
H1 : > 0
2/;10
/
nhit tns
xt thit > tn-1;/2
Statistik uji Daerah penolakan
;10
/
nhit tns
xt
;10
/
nhit tns
xt
thit < -tn-1;
thit > tn-1;
zhit > z/21. H0 : = 0
2. H0 : ≥ 0
3. H0 : ≤ 0
H1 : ≠ 0
H1 : < 0
H1 : > 0
z
nxzhit
/0
z
nxzhit
/0
2/0
/ z
nxzhit
zhit < -z
zhit > z
• Cara kedua menggunakan p-value atau significance levelDilihat dari tabel Z :Untuk z = 2,5 maka luasnya 0, 0062 yang berarti lebih besar dari nilai α yang ditentukan karena 0,0062 x 2 = 0,0124 > dari α yang ditetapkan yaitu 0,01. Sehingga H0 diterima
• Cara ketiga menggunakan Confidence Interval
99 % CI untuk µ adalah (1450 (2,575)(120/36))(1450 51,5) sehingga 99 % CI untuk µ adalah (1444,85 ; 1501,5)Nilai hipotesis nol yaitu 1500 ada di dalam interval tersebut, berarti H0 diterima
Untuk menguji hipotesis dapat digunakan 3 cara, yaitu :1. Membandingkan statistik uji dengan nilai dalam tabel2. Menggunakan p-value3. Menggunakan Confidence Interval
Pengujian rata-rata(mean) satu populasi
a. Uji dua arahHo : vs H1 :
b. Uji satu arahHo : vs H1 :
c. Uji satu arah Ho : vs H1 :
0 0
0 0
0 0
• Cara kedua menggunakan p-value Dari tabel diperoleh p-value 0,15 > α yang ditetapkan berarti masuk pada daerah penerimaan. H0 diterima
• Cara ketiga dengan confidence Interval (4460 (2,575)(250/40)) (4460 101,7858 (4358,21 ; 4561,78) Nilai 4500 ada pada interval berarti H0 diterima
Atau menggunakan CI satu arah (4460 (2,325)(250/40)), ) (4460 91,90; ) (4368,1 ;) Nilai 4500 berada pada interval tersebut berarti H0 diterima
• Cara ketiga dengan p-value Nilai 1,02 mempunyai luas 0,15 berarti > dari α yang ditetapkan sehingga H0 diterima
Pengujian selisih rata-rata dua populasi independen
a. Uji dua arahHo : vs H1 :
b. Uji satu arahHo : vs H1 :
c. Uji satu arah Ho : vs H1 :
1 2 1 2
1 2 1 2
1 2 1 2
Statistik Uji
dan diketahui, nilai berbeda maka :
1
2
Ts
n n
n nn n
x xs
s s
p
p
1 2
1 2
2 1 12
2 22
1 21 11 1
2 di mana
( ) ( )
Z x x
n n
1 2
12
1
22
2
1
Jika dan 2 Tidak diketahui maka
n kecil, standard deviasi tdk diketahui
n besar, standard deviasi diketahui
1. H0 : 1 = 2 H1 : ≠ 2
2. H0 : 1 ≥ 2
3. H0 : 1 ≤ 2
H1 : < 2
H1 : > 2
2/;1/21
nhit tnsxxt thit > tn-1;/2
Statistik uji Daerah penolakan
thit < -tn-1;
thit > tn-1;
zhit > z/21. H0 : 1 = 2
2. H0 : 1 ≥ 2
3. H0 : 1 ≤ 2
H1 : 1 ≠ 2
H1 : 1 < 2
H1 : 1 > 2
2//21
z
nxxzhit
zhit < -z
zhit > z
2/;1/21
nhit tnsxxt
2/;1/21
nhit tnsxxt
2//21
z
nxxzhit
2//21
z
nxxzhit
Uji Sampel TunggalUji Hipotesis Proporsi
1. Rasio Uji :
2. Error Standard
0H
z testP
pRU z
0 0(100 )H H
P n
The Analysis of Variance• The name “analysis of variance” stems from a
partitioning of the total variability in the response variable into components that are consistent with a model for the experiment
• The basic single-factor ANOVA model is
2
1, 2,...,,
1, 2,...,
an overall mean, treatment effect,
experimental error, (0, )
ij i ij
i
ij
i ay
j n
ith
NID
Models for the Data
There are several ways to write a model for the data:
is called the effects model
Let , then is called the means model
Regression models can also be employed
ij i ij
i i
ij i ij
y
y
The Analysis of Variance• Total variability is measured by the total sum of
squares:
• The basic ANOVA partitioning is:
2..
1 1
( )a n
T iji j
SS y y
2 2.. . .. .
1 1 1 1
2 2. .. .
1 1 1
( ) [( ) ( )]
( ) ( )
a n a n
ij i ij ii j i j
a a n
i ij ii i j
T Treatments E
y y y y y y
n y y y y
SS SS SS
The Analysis of Variance
• A large value of SSTreatments reflects large differences in treatment means
• A small value of SSTreatments likely indicates no differences in treatment means
• Formal statistical hypotheses are:
T Treatments ESS SS SS
0 1 2
1
:: At least one mean is different
aHH
The Analysis of Variance• While sums of squares cannot be directly compared to
test the hypothesis of equal means, mean squares can be compared.
• A mean square is a sum of squares divided by its degrees of freedom:
• If the treatment means are equal, the treatment and error mean squares will be (theoretically) equal.
• If treatment means differ, the treatment mean square will be larger than the error mean square.
1 1 ( 1)
,1 ( 1)
Total Treatments Error
Treatments ETreatments E
df df dfan a a n
SS SSMS MSa a n
The Analysis of Variance is Summarized in a Table
• Computing…see text, pp 66-70• The reference distribution for F0 is the Fa-1, a(n-1) distribution• Reject the null hypothesis (equal treatment means) if
0 , 1, ( 1)a a nF F
The data in the scatterplot is a random
sample from a population that may
exhibit a linear relationship between x
and y. Different sample different plot.
ˆ y 0.125x 41.4
Now we want to describe the population mean response y as a function of the explanatory
variable x: One simple description is y = 0 + 1x.
And to assess whether the observed relationship is statistically significant (not entirely explained as simply chance events).
Simple linear regression modelWe assume that in the population, the equation is y = 0 + 1x.
Sample data then fits the model:
Data = fit + residual yi = (0+ 1xi) + (i)
The e’s are called errors andrepresent the differences between y and the mean of y for each value of x.
where the i are independent and
Normally distributed N(0,).Linear regression assumes equal variance of y .
y = 0 + 1x
The intercept 0, the slope 1, and the standard deviation of y are
unknown parameters in the regression model. We rely on the data to
provide unbiased estimates of these parameters.
The value of ŷ from the least-squares regression line is really a prediction of
the mean value of y (y) for a given value of x.
The regression line (ŷ = b0 + b1x) obtained from sample data is the best
estimate of the true population regression line (y = 0 + 1x).
ŷ is an unbiased estimate for mean response y
b0 is an unbiased estimate for intercept 0
b1 is an unbiased estimate for slope 1
The regression standard error, s, for n sample data points is
calculated from the residuals (yi – ŷi):
s is an unbiased estimate of the regression standard deviation .
The population standard deviation, for y at any given value of x represents the standard deviation of
the normal distribution of the i around
the mean y .
2)ˆ(
2
22
n
yyn
residuals ii
Conditions for inference The x-y pairs are independent (of other x-y pairs).
The relationship is linear.
The standard deviation of y, σ, is the same for all values of x.
The response y varies normally around its mean.
Using residual plots to check for regression validityThe residuals (y − ŷ) give useful information about the validity of some
of our assumptions.
We view the residuals in
a residual plot:
If residuals are scattered randomly around 0 with uniform variation, it
indicates that the data fit a linear model, have randomly distributed
residuals for each value of x, and constant standard deviation σ.
Residuals are randomly scattered good!
Curved pattern the relationship is not linear.
Change in variability across plot σ not equal for all values of x.
Confidence interval for regression parametersEstimating the regression parameters 0, 1 is a case of one-sample
inference with unknown population variance.
We rely on the t distribution, with n – 2 degrees of freedom.
A level C confidence interval for the slope, 1, has a width that is
proportional to the standard error of the estimate of the slope:
b1 ± t* SEb1
A level C confidence interval for the intercept, 0 , has a width
proportional to the standard error of the estimate of the intercept:
b0 ± t* SEb0
t* is the critical t for the t (n – 2) distribution with area C between –t* and +t*.
Significance test for the slopeWe can test the hypothesis H0: 1 = 0 versus a 1 or 2 sided alternative.
We calculate t = b1 / SEb1
Under Ho t has the t (n – 2)
distribution. From Table D
We can get the p-value of the
Test.
Testing the hypothesis of no relationshipWe may look for evidence of a significant linear relationship between
variables x and y in the population from which our data were drawn.
For that, we can test the hypothesis that the regression slope parameter β1 is
equal to zero.
H0: β1 = 0 vs. H0: β1 ≠ 0
Testing H0: β1 = 0 also allows to test the hypothesis of no
correlation between x and y in the population.
Note: A test of hypothesis for 0 is easy to do but usually of no interest.
1slope y
x
sb r
s
Confidence interval for µy
We can also calculate a confidence interval for the population mean
μy of all responses y when x takes the value x*.
This interval is centered on ŷ, the unbiased estimate of μy.
The true value of the population mean μy at a given
value of x, will indeed be within our confidence
interval in C% of all intervals calculated
from many different random samples.
The level C confidence interval for the mean response μy at a given
value x* of x is centered on ŷ (unbiased estimate of μy):
ŷ ± tn − 2 * SE^
A separate confidence interval is
calculated for μy along all the values
that x takes.
Graphically, the series of confidence
intervals is shown as a continuous
interval on either side of ŷ.
t* is the t critical for the t (n – 2) distribution with area C between –t* and +t*.
95% confidence interval for y