estimating distribution functions with the · 2018. 9. 25. · estimating distribution functions...

16
1:: .. ' "';- .... ; , .. ,. -;; :: \\ .;, C ? f. . . lj .'. 'i f\ f . , l ;:- k 1.; F t i:" , }. l! ; j'-' , f; k $' '\ i , \ « .. ;. .{ '9 ": ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS®SYSTEM author: Werner Bundschuh, Uhiversity of Heidelberg, Rechenzentrutn coauthor: Joachim Hefner, now SAS Institute Abstract In tIlis poster we are showing how to to estiniate and construct confidence intervalls and confidence bands for the cumulative distribution function. We are offering two solutions: the first one is a. classical approach using asymtotic results, the second one is using the Bootstrap method. Furthermore we have written a "procedure macro" for density estimation. We have integra.ted a FORTRAN subroutine (subroutine DDESJ{N of the IMSL library)whlcll computes kemel estimates for the density function. Introd tiction A simple but common need arises in data analysis when we have a single set of numbers, and we want to unterstand their basic characteristics as a collection of data. Graphical methods. described in the book of Chambers and Tukey are now implemented in some software packages (e.g. in JMP running on Macintosh - JMP isa software package for statistical graphics developed by SAS Institute). Beyond this we are interested to estimate the density function and the cumulative density function (CDF). In data analysis we could be interested in confidenceintervalls and confidence bands for the CDF, too. When having collected data to. analyze we nQrmally don't knQwanythiltg abQut the underlying CDF F. So we cannQt build a parametricmQdel to. mQdel parameters. A. natural nonparametric estimate of the probability F(a;) is defined by: . . Fn(a;) = [number of Xi:::; a;]/n Fn is called the "empirical cumulative density fUlIction" (ECDF) of the sample Xl"" , X n• For demQn- stratiQn purpQses we have chQsen a data set which cQnsists Qf the elapsed time abQve a certain high level for a series of 66 wave records at San Francisco. bay. The examples show the ECDF for this wave data. This plot reveal that about 50 percent of the waves are above the level criteria for abQut 4 time units or less . Looking at the estimate only we don't have any information about the quality of the estimate. To circum- vent this problem we are cQnstructing cQnfidence intervalls locally and confidence bands glQbally. LQcal confidence intervalls can be easily computed fQr all situatiQns using Normal approximatiQns, However the classicalasymptoticmethQd fo.r cQnstructing bands has the assumption Qf a continuo.us CDF. In o.thercases this metho.d is nQt valid. But there is ano.ther meth?d so.lving this pro.blem: The BQQtstrap methQd. In the case o.f a smo.o.th CDF - the first derivative exists and is co.ntinuQus - we want to. study the distri- butio.nal shape of our po.pulatio.n. So. we try to. estimate the density funciion. In the last years the kernel estimatio.n technique has beco.me an impo.rtant tQo.l in curVe fitting. Therefore we tried to. integrate such a method ill the SASSystem. TheIMSL progiam library contents the FORTRAN subrQutineDDESKN 574

Upload: others

Post on 24-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

J:~ 1:: .. ' ~:

~ ~~ "';-....

~~. ; , .. ,. -;;

:: \\ .;,

C

.~'.

?

f. ~: . . lj .'. 'i

.~~

f\

.~ ~} ~ :~,

f ~,.

~. . , l ;:-k ~-j 1.; 7~ F t i:" , }. l! ~ ; j'-'

~ , f; ~.

k ~, $' ~ '\ ~;

i , ~

\

~ « .. r· 1· :~ ;.

.{ '9

":

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS®SYSTEM

author: Werner Bundschuh, Uhiversity of Heidelberg, Rechenzentrutn

coauthor: Joachim Hefner, now SAS Institute

Abstract

In tIlis poster we are showing how to to estiniate and construct confidence intervalls and confidence bands for the cumulative distribution function. We are offering two solutions: the first one is a. classical approach using asymtotic results, the second one is using the Bootstrap method. Furthermore we have written a "procedure macro" for density estimation. We have integra.ted a FORTRAN subroutine (subroutine DDESJ{N of the IMSL progran~ library)whlcll computes kemel estimates for the density function.

Introd tiction

A simple but common need arises in data analysis when we have a single set of numbers, and we want to unterstand their basic characteristics as a collection of data. Graphical methods. described in the book of Chambers and Tukey are now implemented in some software packages (e.g. in JMP running on Macintosh - JMP isa software package for statistical graphics developed by SAS Institute). Beyond this we are interested to estimate the density function and the cumulative density function (CDF). In data analysis we could be interested in confidenceintervalls and confidence bands for the CDF, too. When having collected data to. analyze we nQrmally don't knQwanythiltg abQut the underlying CDF F. So we cannQt build a parametricmQdel to. ~stimate mQdel parameters. A. natural nonparametric estimate of the probability F(a;) is defined by: . .

Fn(a;) = [number of Xi:::; a;]/n

Fn is called the "empirical cumulative density fUlIction" (ECDF) of the sample Xl"" , X n • For demQn­stratiQn purpQses we have chQsen a data set which cQnsists Qf the elapsed time abQve a certain high level for a series of 66 wave records at San Francisco. bay. The examples show the ECDF for this wave data. This plot reveal that about 50 percent of the waves are above the level criteria for abQut 4 time units or less . Looking at the estimate only we don't have any information about the quality of the estimate. To circum­vent this problem we are cQnstructing cQnfidence intervalls locally and confidence bands glQbally. LQcal confidence intervalls can be easily computed fQr all situatiQns using Normal approximatiQns, However the classicalasymptoticmethQd fo.r cQnstructing confidell~ce bands has the assumption Qf a continuo.us CDF. In o.thercases this metho.d is nQt valid. But there is ano.ther meth?d so.lving this pro.blem: The BQQtstrap methQd.

In the case o.f a smo.o.th CDF - the first derivative exists and is co.ntinuQus - we want to. study the distri­butio.nal shape of our po.pulatio.n. So. we try to. estimate the density funciion. In the last years the kernel estimatio.n technique has beco.me an impo.rtant tQo.l in curVe fitting. Therefore we tried to. integrate such a method ill the SASSystem. TheIMSL progiam library contents the FORTRAN subrQutineDDESKN

574

Page 2: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SYSTEM Bundschuh, W., Hefner, J.

for kernel estimation of. the density function. Using the version 5 interface to FORTRAN functions we developed a "procedure macro". The user of this macro can bring special paramters necessary for the kernel estimation to the subroutine DDESKN. These parameters - bandwidth, kernel and data window - will heavily influence the estimation results. We will show some examples about that subject and give hints how to get good results. A sim.ular implementation of this "procedure macro" should be possible in version 6, too.

Confidence Intervalls for the CDF

The statistical aspects of Fn are well known. Here are some important asymptotic results which can be used to construct confidence intervalls for the CDF: Fn (.) is a consistent estimate of F(.):

Fn(x) --->p F(x) for each x

Fn(x) is asymptotical normal for each x:

Where <I> is the Standard Normal CDF. Using this results we are able to construct a~ asymptotic confidence intervall for F(x):

tn this formula An is the a-quantile of the. Standard Normal distribution. To get a 95%confidellce intervall you have to choose A = 1.96. The value 1.96 is the 97.5% quantile of the Standard Normal distribution.

Confidence Bands for the CDF: Classical Method

Furthermore there are some global results for the ECDF. An important one is concerning the Kolmogorov statistic DR for testing the hypothesis F = Fo:

DR = sup v'nIFn(x) - Fol -oo<:r<oo

Hint: With Fo(.) = <I> C-!) we get the Kohuogorov statistic for testing normality which is used in SAS procedure UNIVARIATE'. The process DR is well st.udied and the limiting process (Browniaa Bridge process) is well known. Therefore we can construct confidence bands for the CDF:

( ,An n ' ( ) - An n ()) lim P FR(X) - c::; F(x)::; Fn x + r::.', for all XI': -00,00 = 1- a n-l'OO yn v,n

An,n is the a-quantile of the limiting process depending on n because of the continuity correct.ion- s. [2].

575

Page 3: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J.

Confidence Bands for the CDF: Bootstrap Method

The class~cal asymtotic construction of fixed width confidence bands for F shown above is feasible only when F is assumed continuous. One alternative method is bootstrap and the bootstrap method is valid whether or not F is continuous. To construct a bootstrap confidence band for F we have to estimate the CDF of the Kolmogorov statistic Dn.We are doing this by computing m bootstrap samples from the original sample. For each bootstrap sample we get a bootstrap Kolmogorov statistic:

D~ = sup VnIF~(a;)':- Fn(a;)1 , i = 1, ... , m -00<:':<00

where F~ is the ECDF of the ithbootstrap sample; The critical value for the booLstrap confidence band can be obtained by calculating the (1 - a)-quantile >'~.n of the sample D!, ... , Dr;:. The (1 - a) confidence band now lookS like follows:

Kernel Estimation of the Density Function

When we are interested in the sha:pe of the distribution we usually look at the density function f of F. The simplest way to get a non parametric density estimate is to smooth histograms. However this is a somewhat heuristic solution. A relative new and thoroughly studied method for density approximation is the kernel estimation. The kernel estimate in(t) for f smoothes the "density" of the ECDF Fn:

Here hn is called the band~idth .andK is the kernel fUlicti~n normally being a density function with fillite or infinite ( e.g. Normal density) support. K determines the influence of the data points on the estimation. hn is strongly connected with the smoothness of the estimate. The asymptotic properties of kernel estimation has been published in many papers and are beyond this paper because of the special notation necessary. In the finite sample case the estimate is heavily influenced by the bandwidth. The choose of the kernel fl,1nction is not so important. But a special high bandwidth depending on n and the range of your data will give you. an extreme smooth estimate vealing interesting information. On the other hand a special low value of hn will show a noisy estimate. We will denlOnstrate these properties in the exaulples.

576

Page 4: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

" , ~.,

i

~ !~ ~'

~ ~,

~ ~,;

~,

, t, '} ,

l: i r j: i! ;;:;

f ~~ ~i

tl ~: !: )"

~ ~ ~~ ~: 'k ~~

~

[ F'

~ l:

~ ~,

~. ~;

~ j' ,~

I .. ,\..

'\

I

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SYSTEM Bundschuh, W.o Hefner, J.

Notes about Realisation with the SAS System

Tosimplify tbe use of tbe programs for estimating distribution functions we bave written SAS macros; Tbe SAS macros CDFCON and CDFBCON will work under SAS version 6. There are necessary slight modifications for CDFCON to run under SAS version 5. In the moment the SAS macro KOENS is runnipg only under SAS version 5. We think to get it work under version 6 when SAS Institute will

'jntegtate tlte fORTRAN interface in v~rsioIi. 6, too; " , ' " Forthe const.ruction of confidence intervalls and confidence bands derived from the asymptotic properties of the ECDF we worked out the SAS macro CDFCON. In this macro tbe ECDF can be estimated by the SAS procedure RANK. The critical value can be calculated for the intervall case just by using SAS function pro bit which is the inverse n6rmal distribution function. If the macro user wishes to construct simultaneous confidence bands he just has two alternatives for the confidence level: 0.05 or 0.01. The corresponding critical values are implemented in the data step. The SAS macro CDFCON has the following parameters:

VAR=

DATA = OUT=XRANK TITLE= ALPHA=0.05 WAY=GLOBAL

specify the variable to be analyzed; this is an necessary option the data set to be used; default is _LAST_ specify the name of the output data set; default is XRANK the TITLE text; default is none the confidence level ;default is 0.05 do you w';'nt confidence intervalls - specify LOCAL; in the other case use the default GLOBAL

The examples of the classical case are both of 95% confidence. The confidence illtervalls are much narrower than the confidence bands. But this is to be expected because the illtervalls can only locally be interpreted for ,iI- fixed x while the confidence bands are Y1l.lid for all x simultaneously.

In the ,bootstrap case,we d,evefopedaSASmacro, namedCDFBCON. There the bootstrap sarq.ples can be obtained by resampling with repl(i.cement(s. [6]) in refer tothe ECDF. The ECDF'sof the original sample and the bootstrap samples can be calculated by the SAS procedure RANK, too. The empirical a quantile of the Kolmogorov statistics D!, ... , D';: can be derived in a data step and then be used to construct the confidence bands. The SAS macro CDFBCON has the following parameters:

VAR=

DATA= "OUT=XRAN-K M;'200 ALPHA=0.05

, specify the variable to be analyzed; , this is annecessaryopti~n the data set to be used; default is _LAST_ specify the name of theoutput data'set; default is XRANK , the number of bootstrap samples; default is 200 tIie confidence level ;default is 0.05

- ,

The confidence bil.lld in t~e bootstrap case is slighty wider than in the classical case. This should

577

: .

Page 5: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SYSTEM Bundschuh, W., Hefner, J.

become clear because the bootstrap method is va.Iid for a greater class of distribution functions.

For some time we had the idea to implement the IMSL FORTRAN subroutine DDESI<:N for kernel estimation of the density function in the SAS system. Therefore we have written a SAS function DICHTE. This function is our "bridge" to the subroutine DDESKN. The data to be analyzed are read in by the SAS function DICHTE from a temporary file and are passed to the subroutine DDESKN. The parameters necessary for the density estimation (e.g. the kernel type or the bandwidth) are the arguments of the SAS fmiction DICHTE. The SAS function DICHTE transfers the necessary parameters and the data vector which wall read before. At the end the results of the density function are written back to the temporary file (see the diagrams Flow chart and Sketch of the SAS Function DICHTE below). . . To simplify the use of the·" de~istypr~gia:lh" we Iiave written theSAS· macrO K])E:NS whiCh performs the data exchange with the function in r~al binary format (SAS Format RB8.). The parameters of the macro are the arguments ofthe function DICHTE (see the diagrams Sketch of the SAS Macro KDENS and the macro code below). Note that the SAS functionDICHTE has to be called only once. The macro KDENShas the following parameters:

VAR

DATA= OUT=XXXX4 WIN::::: 1 XMAX=-l

NXPT=20

KERN=2

PLOT=O

specify the variable to be analyzed; the nUinber of observation must be less than 100000 this parameter must be specified tile data set to be used; default is _LAST ~ specify the name of the output data set; default is XXXX4 the bandwidth for the kernel function

. Cutoff value for the kernel; there is no cutoff by default: XMAX=-I nUinber of points at which a density estimated is desired the range of data is devided in NXPT equal spaced intervals you can choose the kernel fundion; there are implemented five kernel functions; specify not 0 if you wish a plot

The four examples shown above demonstrate the heavy influence of the bandwidth on the smoothness of the estimate. For bandwidth 0.5 we see a very noisy function. On the ot!ler side a bandwidth of 2 extremly smoothes the curve. So a bandwidth of 1 could be the best possible. The demonstratioil shows that we should try several estimates with different bandwidths. Hint: We have choosen the triangle kernel (K(.:) = 1-1.:1, 1.:1 ~ 1) for. the examples. The dashed lines in the plots demonstrate the density of the Standard Normal distribution with the same mean value and variance as the .sample. The density estimates sho~ a depart ure from nornlality. A possible distribution class.with simulat shape of the densities is the family of Gamma distributions.

578

Page 6: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SYSTEM Bundschuh, W., Hefner, J.

The SAS Macro CDFCON

%macro CDFCON(VAR=,DATA=_LAST,OUT=XRANK,ALPHA=O.OS,TITLE=, WAY=GLOBAL)j

%local stopj options nonotes proc rank data=&data f out=&outj ranks cumfjvar &varj proc sort data=&outj-by &varj

%if %upcase(&way)=LOCAL %then %doj

data &out;set tout nobs=n;retain c; If _n_=1 then c=probit(1-&alpha/2)/sqrt(n)j 10w=max(0,cumf-c*sqrt(cumf*(1~cumf»);

high=min(1,cumf+c*sqrt(cumf*(1-cumf»); run;

%endj %else %if %upcase(&way)=GLOBAL %then %do;

data &out;set tout nobs=nj retain c 0; If _n_=1 and &alpha=.OS then c=1.3S8/(sqrt(n)+.12+.11/sqrt(n»; else if _n_=1 and &alpha=.01 then c=1.628/(sqrt(n)+.12+.11/sqrt(n»; else if _n_=1 then do;_ call symput ( I Stop I , I STOP I ) ;

stop;end; 10w=max(0,cumf-c); high=min(1 ,cumf+C);

run; %end-; %else %goto ~1;

%if &.stop=STOP %then %goto AL; proc gplot; plot (low cumf high)*&:var!overlay frj symbol1 v=none i=steplj c=white 1=1; symbo12 v=none i=steplj c=white 1=2; symbo13 v=none i=steplj c=white 1=1; Title &title; run; %goto ende; %1-1: %put ERROR: WAY=&WAY is ,NOT defined; %GOTO ende; %AL: %put ERROR: ALPHA=&ALPHA is not possible for WAY=GLOBAL; %ende: %mend CDFCON;

579

Page 7: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

\

\

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SYSTEM Bundschuh, W., Hefner, J.

The SAS Macro CDFBCON

'l.~IACRO CDFBCON (VAR, DATA= _LASL, M=200 ,ALPHA=O. 05, OUT=XRANK) ;

proc rank data;::&:data f ties=high out=sample; var &:var; ranks fn;

proc sort; by &:var;

d.ata bsample; drop i; do bsample=l to &:In;

do i=l to n; iobs = int(ranuni(O) * n) + 1; set sample(keep=&:var) point=iobs nobs=n; output; end; end;

call symput('stpn', left(n»; stop;

proc sort; by bsample &:var;

proc rank f ties=high out=bsample; var &:var; ranks fnhat; by bsample;

proc sort nodup; bybsample levari

data helps amp; set sample; do bsample=l to &:m; .output; end;

proc sort; by bsample &:var;

data bsample; merge bsample helpsamp; by bsample &:var; if first.bsample then help=O; if fnhat ne . then help = fnhat;

. retain help ; fnhat=help; drop help;

580·

Page 8: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SYSTEM Bundschuh, W~, Hefner, J.

data bsamplej set bsamplej dfn=abs(fnhat-fn)j keep dfn bsamplej

proc summaryj var dfnj by bsample; output out=bsample(drop=_type __ freq_) max=fnstarj

proc sort; by fnstar;

data _null_j i = max( floor( &m * (1 - &alpha», 1 )j set bsample point = ij callsymput ('bound',fnstar); stop;

data &out; set sample(keep=&var fn); lower = max(O, fn - &bound ); upper = min(1, ·fn + &bound );

proc gplot data=&out; plot (fn lower upper) * &var / vaxis=axis1 overlaYj axis1order=0 to 1 by 0.2 label=(f=simplex c=blue 'p'); symbol1 1=1 i=steplj value=none c=black; symbo12 1'=2 i=steplj value=none c=black; symbo13 1=1 i=steplj value=none c=b1ack;

Y.mend;

581

Page 9: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SY.STEM Bundschuh, W., Hefner, J.

The SAS Macro KDENS

%/·IACRO KDENS(VAR, DATEI=_LAST_,WIN=1,XMAX=-1,NXPT=20,KERN=3,OUT=xxx4,PLOT=O);

%* VERSION 2.0 OF KDENS. COPYRIGHT - W. BUNDSCHUH, J. HEFNER;

OPTIONS NOSOURCE l·IPRINT NONOTES;

* ; * CHECK THE PARANETERS *. , %LET A=l; DATA XXXO; SET &DATEI END=EOF;

IF &VAR ~= . THEN DO; N+l; OUTPUT;END; IF EOF THEN CALL SYl·IPUT('XXXXN',N);

KEEP &VAR; RUN; PROC SORT DATA=xxxO OUT=XXX1(KEEP= &VAR);

BY &VAR; PROC CONTENTS DATA=XXXl NOSOURCE NOPRINT OUT=XXX2(KEEP=TYPE NOBS); DATA XXX2; SET XXX2;

IF TYPE = 2 THEN DO; PUT "ERROR: DIE VARIABLE &VAR 1ST KEINE NU1·IERISCHE VARIABLE"; END;

CALL SY1·IPUT( 'YTYPE', TYPE); DATA XXX3; SET XXX2(KEEP=NOBS);

V_WIN = SYMGET('WIN') + 0;

*. ,

V _Xl-lAX SYl-IGET ( , Xl·IAX ,) + 0; LKERN SY1·IGET( , KERN') + 0; V_NXPT = SYl-IGET('NXPT') + 0;

ERR = _ERROR_; CALL SYNPUT( 'CID' ,ERR); IF _ERROR_ = 1 THEN DO;

PUT "ERROR: DIE UEBERGABEPATA/·IETER WIN ,Xl-lAX,KERN UND NXPT " "MUESSEN NUl-lERISCH SEIN";

END;

* CHECK THE P ARAl-!ETERS *. , DATA XXX3; SET XXX3;

IF NOBS > 10.000 THEN DO; PUT "ERROR: ES SIND l-lAXHIAL 10000 BEOBACHTUNGEN ZUGELASSEN"; END;

CALL SYIIPUT('LN',NOBS); IF V_NXPT > 1000 THEN DO;

582

, . ,

Page 10: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

RUNj

*. ,

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SYSTEM Bundschuh, W., Hefner, J.

PUT "ERROR: ES SIND ~uxnlAL 1000 STUETZSTELLEN ZUGELASSEN" j ENDj

IF V_WIN < 0 THEN DOj PUT "ERROR: DIE FENSTERBREITE ~IUSS >= 0 SEIN"j ENDj

IF NOT(V_KERN=l 1 V_KERN=2 1 V~KERN=3 1 V_KERN=4 1 V_KERN=5) THEN DOj

PUT "ERROR: DIE ZULAESSIGEN WERTE FUER KERN SIND 1,2,3,4,5"j ENDj

%IF &YTYPE ~ 2 %THEN %GOTO ABBRUCHj %IF &CID = 1 %THEN %GOTO ABBRUCHj %IF &LN > 10000 %THEN %GOTO ABBRUCHj %IF &NXPT > 1000 %THEN %GOTO ABBRUCHj %IF &WIN < 0 %THEN %GOTO ABBRUCHj %IF NOT( &KERN = 1 1 &KERN = 2 1 &KERN = 3 1

&KERN = 41 &KERN = 5 ) %THEN %GOTO ABBRUCHj

* WRITING THE DATA IN THE TE.1PORARY FILEj *j DATA _~ruLL_j SET XXXlj

*. ,

FILE FT16FOOlj PUT &VAR RB8. j

* CALL THE FUNCTIONj * . , DATA _NULL_j SET XXX3j

CALL DICHTE(V_WIN,V_XMAX,V_NXPT,NOBS,V_KERN)j *. , * READ THE RESULTS j * . , DATA &OUTj INFILE FT16FOOlj

INPUT&VAR RB8. / FIT RB8.j

* * SHOULD WE PLOT? *j %if &plot ne 0 %then %doj PROC FREQ DATA=XXXO j TABLES &VAR/ OUT=XXXXF NOPRINTj DATA XXXXFjSET XXXXFj XXXXF=-.1+(COUNT/&XXXXN)*.lj %DCLANNOj %SYSTEM(2,2,4)j

583

Page 11: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

[: I, f; }~

t :; r t D ','

" " [>: I' (.

t, Fi: ~: ,j;

~ := ~~ ~ ..

f ~; I,' 1: i· t,

f f ;

i ~ f "

I "

I:'

~ \ ~

\

f: 1.1"

~ i j {

~

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SY,STEM Bundschuh, W., Hefner, J.

%LINE(&VAR,-.l,&VAR,XXXXF,BLACK,1,.2)j PROC 11EANS DATA=XXXl NOPRINTj VAR &VARj OUTPUT OUT=XXXXP MEAN=XXXXM STD=XXXXSj DATA XXXXPjSET XXXXPj SET &OUT END=ENDj

XXXXNF=EXP(-. 5*«&VAR-XXXXN)/XXXXS)**2)/(SQRT(6. 28)*XXXX S)j OUTPUTj DO WHILE(END=O)j SET &OUT END=ENDj

XXXXNF=EXP (-.5* «&VAR-XXXXI4) /XXXXS)**2)/ (SQRT(6. 28) *XXXXS) j OUTPUTjENDj PROC NEANS NOPRINT DATA=XXXXPj OUTPUT OUT=XXXXM MAX=MAXF MAXMjVAR FIT XXXXNFj DATA XXXXNjSET XXXXMj max=max(maxf,maxm); x=log10(max); if x <0 then do; x=int(-x)+l; MAX =(INT(MAX*10**x)+1)/10**x; endj else MAX=(INT(NAX*10)+1)/10j XXXXB=11AX/5j CALL SYI1PUT('XXXXB' ,XXXXB); CALL SYr1PUT (' lUX , ,~IAX) j RUNj *j * PLOT OF THE DENSITY WITH SPLINE INTERPOLATIONj *;

PROC GPLOT DATA=XXXXP; PLOT FIT * &VAR= 1 XXXXNF*&VAR=2/VREF=0 LVREF=2 ANNO=XXXXF

OVERLAY HAIlS =AXISl VAXIS = AXIS2j SY1-iBOL1 V == none I = SPLINE C=BLACKj SY1·iBOL2 V=NONE 1=S1>LINE C=IiLACK L=2 j AXIS1 label = none minor = none; AXIS2 LABEL = (F=SIl1PLEX 'FIT') ~IINOR = NONE VALUE= (F=SHIPLEX

,T:l H=O) ORDER=-.1 0 TO &MAX BY &XXXXBj

title f=S~lissl "kernel estimate of &var"; RUN; *DELETE TE1·\PORARY WORK FILES;

, Proe datasets nolist nofs ddname=work; delete xxxxp xxxxm xxxxf;

%ENDj Proedat,asets nofs nolist ddname=workj delete xxxO xxx1 xxx2 xxx3 ; %ABBRUCH:

584

t· 0"

I:,',

! ....

Page 12: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

~'.

')",

::." ~'r

t f" l~ " r." :2 '. '0 ~.,

l' tO~

~i

~ '. ~ ;.

! R }:" ~:

',. i:. ~i

K ~'

t:

I f;

[ I·

~. V ',:

F

~ ;i ~ ~

\ .. \

ESTIMATING DISTRIBUTION FUNCTIONS WITH THESAS SYSTEM Bundschuh, W., Hefner, J.

OPTIONS SOURCE NOTES; RUN; %l~END KDENS;

author address: Werner Bundschuh Universitaet Heidelberg ReChenzentrum Im.Neuenheimer Feld 293 6900 Heidelberg

Electronic Mail (Bitnet): X16 at DHDURZ1

585

:_ 0"

Page 13: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

ESTIMATING DISTRIBUTION FUNCTIONS WITH. THE SAS SYSTEM Bundschuh, W., Hefner, J.

ClassIcal Confidence Intervalls .

~ • <>

0."

0 ....

0 ....

0.0

0.0

0 ....

0.'"

o . 2

o • ~

0.0

0 .. .. ... a .. ... .. .. "10 ~ "I

><

ClassIcal Confidence Bands

".0

<> ...

0 ....

0.7

0 ....

O.D

0 ......

0.'" 0.2

o ...

o. o· 0 .. "" '" ... '" .. "7 .. .. "10 ."1 ..

><

Boot;Strap Confidence Bands

... 0

o ...

C> ...

0 ....

0."

<>.0

0· .. .. '" .... .. .. "7 .. .. ~o .... ><

586

Page 14: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

",'.:

c ~/ L r .' (;.

" !~ I-,. ! f c ~~ i~

~ ?" ~ r h r r'-S:

b :~ i' j;

~ w: ~ T.'. r,;' ;&'

~-~i r! f ~

i \ , ~-

~ \.

~;

t ~ ~

r~

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SYSTEM _Bundschuh, W., Hefner. J.

kernel estimate with bandwidth 0.5 kemel estimate with bandwidth 1.0

FIT 0.30

FIT 0.30

0.24

o~oo+---~----~---===-

I I 2 J f 5 & 1 B J 10 II

kernel estimate with bandwidth 1.5 kernel-estimate with bandwidth 2.0

FIT

0.2°1 0.16

0.12

o:~o-·---

o 1 2 J f 5 6 1 I 9 10 11

FIT 0.20

0.16

0.04

0.001-1------

..... _ •• _. - '''_e ... _ .0_. • •..•.

I I 2 J I 5 6 7 B J 10 II

587

Page 15: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SYSTEM Bunjschuh, W., Hefner, J.

Flow Chart

%MACRO name(parameter);

syntax check logical check '

CALL fortran;

Read resul ts

Stop %MEND name;

B

Sketch of the MACRO KDENS

%MACRO KDENS(parameter); syntax check logical check

transfer of data to temporary file

DATA _NULL_; CALL DICHTE(arguments);

read the data from temporary file

Q real binary Format

call the FORTRAN function

<;=n RBS. Format

II further code it necessary

%MEND KOENS;

Sketch of the'SAS Function DICHTE

read the data

call DDESKN

Write the results

temporary file

There Is no printer out of the SAS function

588

Page 16: ESTIMATING DISTRIBUTION FUNCTIONS WITH THE · 2018. 9. 25. · ESTIMATING DISTRIBUTION FUNCTIONS WITH iHE SAS SYSTEM Bundschuh, W., Hefner, J. Confidence Bands for the CDF: Bootstrap

" ~'

f;:

:::

~: l~

f~ ;'. F

t ~;

" ,. {;I

t· ~.

[ ~ t ~. ~~ r.~

f " ~'

~. t r ~. , ':l:,

~ r i:

~ ~:

;\

I ~,

, ~.

\, "

. References

ESTIMATING DISTRIBUTION FUNCTIONS WITH THE SAS SYSTEM Bundschuh, W., Hefner, J •

[I] R. Beran, BOOTSTRAP METHODS IN STATISTICS, Preprint Nr. 203, Sonderforschungsbereich 123, Universitaet Heidelberg, 1983

[2} Peter J . Bickel and Kjell A. Doksum, MATHEMATICAL .sTATISTICS: Basic Ideas and Selected Topics, Holden Day 1977.

[3] J.M.Chambers, W.S. Cleveland,B. Kleiner,P.A. Tukey, GRAPHICAL METHODS FOR DATA ANALYSIS, Duxbury Press 1983

[4] R.A. Tapia and J.R. Thompson, NONPARAMETRIC PROBABILITY DENSITY ESTIMATION, Johns Hopkins University Press,Baltimore 1978

[5] USER'S GUIDE of the IMSL Stat Libtary,.FORTRAN Subroutines for Statistical Analysis

[6] SAS Applications Guide, 1987 ECiition

[7] SAS User's Guide: Basics, Version 5 Edition

[8] SAS Guide to Macro Processing

[9] SAS User's Guide: Statistics, Version 5 and Version 6

[10] SAS Programmer's Guide, Version 5 Edition

589