STK643 PEMODELAN NON-PARAMETRIK
Pendahuluan
MATERI
1. Pendahuluan • Mengapa pemodelan nonparametrik • Penerapan pemodelan nonparametrik (Eksplorasi data dan Inferensia)
2. Pendugaan fungsi kepekatan peubah tunggal • Metode Histogram • Metode Kernel
3. Pendugaan fungsi kepekatan peubah ganda
4. Penerapan pendugaan fungsi kepekatan 5. Pemodelan nonparametrik
• Pemulusan plot tebaran • Metode pemulus Kernel
6. Pemodelan nonparametrik peubah ganda 7. Regresi Spline 8. Model aditif
KEPUSTAKAAN
1) Bowman AW, Azzalini A. 1997. Applied Smoothing Techniques for Data Analysis: the Kernel
Approach With S-Plus Illustrations. Oxford University Press. London.
2) Eubank RL. 1999. Nonparametric Regression and Spline Smoothing. Marcel Dekker. New York.
3) Jianqing Fan. Prospects of nonparametric modeling. http://escholarship.org/uc/item/38w9t0km.
(10 Februari 2016)
4) Hastie T, Tibshirani R. 1990. Generalized Additive Models. Chapman & Hall/CRC. London.
5) Hastie T, Tibshirani R, Friedman J. 2008. The Elements of Statistical Learning: Data Mining,
Inference, and Prediction. Second Eddition. Springer-Verlag. New York.
KEPUSTAKAAN
6) Scott DW. 1992. Multivariate Density Estimation. Theory, Practice, and Visualization. John Willey &
Sons, Inc. New York.
7) Silverman BW. 1986. Density Estimation for Statistics and Data Analysis. Vol. 26 of Monographs on
Statistics and Applied Probability. Chapman & Hall/CRC. London.
8) Simonoff JS. 1996. Smoothing Methods in Statistics. Springer. New York.
9) Rizzo ML. 2008. Statistical Computing with R. Chapman & Hall/CRC. London.
PERANGKAT LUNAK
1) Bahasa pemrograman R (www.r-project.org)
stats , graphics, ash, GenKern (KernSec, KernSur), kerdiest, KernSmooth, ks, np,
plugdensity, sm
2) SAS (www.sas.com)
KDE Procedure (density estimation)
LOESS Procedure (estimating regression surfaces)
TPSPLINE Procedure (multivariate adaptive regression splines)
GAM Procedur (generalized Additive Models)
GAMPL Procedure (generalized additive models based on low-rank regression splines)
ADAPTIVEREG Procedure (multivariate adaptive regression splines)
PENDAHULUAN
• Statistics: collection, summarization, presentation, and interpretation of data
• Data are the key to make inferences
• No assumptions about the underlying process that generated these data
• It is assumed parametric model (such as Gaussian with µ and σ2) or nonparametric
• If the assumed model is not the correct one, then inferences can be worse and
misleading interpretations of the data
PENDAHULUAN WHY NONPARAMETRIC ?
• Parametric:
strict assumptions that are often violated by real data
strict hypotheses → if correct, accurate and precise estimates, otherwise very misleading
linear relationships between the dependent variable and predictor variables (normality, and
linearity)
• Nonparametric:
less-strict assumptions that are less-frequently violated by data
less conditions → free estimates from hypotheses
wide range of relationships between the dependent variable and predictor variables (linear,
moderately nonlinear, or highly-nonlinear)
PENDAHULUAN NONPARAMETRIC (SMOOTHING)
• a bridge between making no assumptions on formal structure (a purely nonparametric
approach) and making very strong assumptions (a parametric approach)
• to identify potentially unexpected structure to more complicated data analysis problems
• to extract more information from the data than is possible purely nonparametrically, as
long as the (weak) assumption of smoothness is reasonable
• to provide analyses flexible and robust
PENDAHULUAN PURPOSE OF NONPARAMETRIC • Exploratory data analysis
• Smoothing highlights important structure clearly
• Model building
• Choosing the appropriate model as the basis of analysis
• Goodness of fit
• Smoothed curves ‘test’ formally the adequacy of fit of a hypothesized model
• Smoothed density estimates and regression curves can be used to construct confidence intervals and regions for
true densities and regression functions, with similar avoidance of restrictive parametric assumptions
• Parametric estimation
• Compared to maximum likelihood, density and regression estimates are often fully efficient and more robust
(less sensitive to an outlier)
(Simonoff 1996)
PENDAHULUAN PURPOSE OF NONPARAMETRIC
• Exploring general relationship between two variables
• Gives predictions of observations without reference to a fixed parametric model
• Provides tool for finding spurious observations by studying the influence of isolated points
• A flexible method of substituting for missing values or interpolating between adjacent X-
values
(Hardle 1994)
An aim of nonparametric techniques is to reduce possible modeling biases of parametric models
HISTOGRAM
52 34 12 69 44 22 36 41 77 39 73 21 37 38 32
22 21 41 41 11 25 22 63 48 18 28 22 70 27 51
44 38 20 20 53 80 30 56 46 32 15 72 92 13 22
34 25 45 30 39 24 18 49 16 10 36 26 19 64 33
37 95 14 26 22 41 83 19 34 24 27 37 46 13 17
56 28 32 53 21 33 58 47 33 28 16 65 30 38 31
53 38 21 23 83 64 49 36 64 18
Dahan Daun
6 1 012334
15 1 566788899
30 2 001111222222344
39 2 556677888
13 3 0001222333444
48 3 666777888899
36 4 111144
30 4 5667899
23 5 12333
18 5 668
15 6 3444
11 6 59
9 7 023
6 7 7
5 8 033
2 8
2 9 2
1 9 5
PROBLEMS WITH THE HISTOGRAM
• Definition of Classes:
choice of intervals and truncation influence estimation (boundary dependence)
• Not Smooth:
stepwise function even if the density function is continuous
Solution:
• local histogram: frees histogram from classes definition
• kernel density estimates: tackles smoothness
HISTOGRAM VS KERNEL ESTIMATOR
• bias vs variance
• bin width
HISTOGRAM DENGAN PEMULUS
9.868 7.724 7.552 8.481 10.756 11.886 10.506 4.954
7.454 7.802 8.562 12.82 9.886 9.847 12.976 7.452
10.624 9.203 6.164 12.08 6.625 7.612 13.990 6.198
8.221 7.312 8.93 9.199 10.305 8.196 8.761 10.057
8.842 7.538 9.24 10.117 5.893 8.865 8.782 9.286
8.448 5.198 10.349 10.454 9.114 5.179 5.883 11.902
5.631 3.959 3.169 4.982 3.148 2.342 3.787 3.269
5.133 5.143 4.293 4.392 5.343 4.591 4.273 6.001
5.932 4.193 3.669 4.738 4.114 3.626 4.165 2.544
4.096 5.312 2.12 3.732 2.936 4.009 4.5 3.411
6.167 3.496 3.009 6.439 4.366 3.653 4.16 3.528
4.192 5.432 4.839 4.217 5.058 2.488 5.05 3.943
3.874 3.509 5.018 4.563
Dahan Daun
5 2 13459
21 3 0112445566677899
40 4 0011111222335557800
14 5 00011113346889
46 6 011146
40 7 34455678
32 8 1244577889
22 9 11222888
14 10 01334567
6 11 89
4 12 0899
𝑤 𝑢 = 1 𝑢𝑛𝑡𝑢𝑘 𝑢 <
1
2
0 𝑢𝑛𝑡𝑢𝑘 𝑢 ≥1
2
FUNGSI KEPEKATAN DENGAN PEMULUS KERNEL
REGRESI NONPARAMETRIK
X
y
9 8 7 6 5 4 3
25
20
15
10
5
y = - 1.07 + 2.74 X
REGRESI NONPARAMETRIK
f x =b0+b1x 𝑓 𝑥 = 𝑏0 + 𝑏1𝑥 + 𝑏2𝑥2
REGRESI NONPARAMETRIK
Qλ f = Yi−f xi
2n
i=1
+λ f " x 2xn
x1
dx f h x = yiK
x−xih
Kx−xih
ni=1
n
i=1
REGRESI NONPARAMETRIK
AREAS OF RESEARCH
• Nonparametric inferences
• High-dimensional nonparametric modeling
• Functional data analysis
• Information engineering and signal processing
• Nonlinear time series and finance modeling
• Nonparametric modeling in biostatistics