package ‘smovie’ - cran.r-project.org central limit theorem (clt) description a movie to...

22
Package ‘smovie’ April 21, 2018 Type Package Title Some Movies to Illustrate Concepts in Statistics Version 1.0.1 Date 2018-04-21 Description Provides movies to help students to understand statistical concepts. The 'rpanel' package <https://cran.r-project.org/package=rpanel> is used to create interactive plots that move to illustrate key statistical ideas and methods. There are movies to: visualise probability distributions (including user-supplied ones); illustrate sampling distributions of the sample mean (central limit theorem), the sample maximum (extremal types theorem) and (the Fisher transformation of the) Pearson product moment correlation coefficient; examine the influence of an individual observation in simple linear regression; illustrate key concepts in statistical hypothesis testing. Also provided are dpqr functions for the distribution of the Fisher transformation of the correlation coefficient under sampling from a bivariate normal distribution. Depends R (>= 3.3.0), rpanel (>= 1.1-3) License GPL (>= 2) Encoding UTF-8 LazyData TRUE RoxygenNote 6.0.1 Imports graphics, methods, revdbayes (>= 1.1.0), stats, SuppDists Suggests knitr, numDeriv, tkrplot, testthat, rmarkdown VignetteBuilder knitr URL http://github.com/paulnorthrop/smovie BugReports http://github.com/paulnorthrop/smovie/issues NeedsCompilation no Author Paul J. Northrop [aut, cre, cph] Maintainer Paul J. Northrop <[email protected]> Repository CRAN Date/Publication 2018-04-21 18:10:09 UTC 1

Upload: letruc

Post on 14-Jul-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Package ‘smovie’April 21, 2018

Type Package

Title Some Movies to Illustrate Concepts in Statistics

Version 1.0.1

Date 2018-04-21

Description Provides movies to help students to understand statisticalconcepts. The 'rpanel' package <https://cran.r-project.org/package=rpanel>is used to create interactive plots that move to illustrate key statisticalideas and methods. There are movies to: visualise probability distributions(including user-supplied ones); illustrate sampling distributions of thesample mean (central limit theorem), the sample maximum (extremal typestheorem) and (the Fisher transformation of the) Pearson product momentcorrelation coefficient; examine the influence of an individual observationin simple linear regression; illustrate key concepts in statisticalhypothesis testing. Also provided are dpqr functions for the distribution ofthe Fisher transformation of the correlation coefficient under sampling froma bivariate normal distribution.

Depends R (>= 3.3.0), rpanel (>= 1.1-3)

License GPL (>= 2)

Encoding UTF-8

LazyData TRUE

RoxygenNote 6.0.1

Imports graphics, methods, revdbayes (>= 1.1.0), stats, SuppDists

Suggests knitr, numDeriv, tkrplot, testthat, rmarkdown

VignetteBuilder knitr

URL http://github.com/paulnorthrop/smovie

BugReports http://github.com/paulnorthrop/smovie/issues

NeedsCompilation no

Author Paul J. Northrop [aut, cre, cph]

Maintainer Paul J. Northrop <[email protected]>

Repository CRAN

Date/Publication 2018-04-21 18:10:09 UTC

1

2 clt

R topics documented:clt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2continuous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6discrete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8ett . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10FPearson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13lev_inf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14movies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15shypo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16smovie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18wws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Index 22

clt Central Limit Theorem (CLT)

Description

A movie to illustrate the ideas of the sampling distribution of a mean and the central limit theorem.

Usage

clt(n = 20, distn, params = list(), panel_plot = TRUE, hscale = NA,vscale = hscale, n_add = 1, delta_n = 1, arrow = TRUE,leg_cex = 1.25, ...)

Arguments

n An integer scalar. The size of the samples drawn from the distribution chosenusing distn.

distn A character scalar specifying the distribution from which observations are sam-pled. Distributions "beta", "binomial", "chisq", "chi-squared", "exponential","f", "gamma", "geometric", "gev", "gp", "hypergeometric", "lognormal","log-normal", "negative binomial", "normal", "poisson", "t", "uniform"and "weibull" are recognised, case being ignored.If distn is not supplied then distn = "exponential" is used.The "gev" and "gp" cases use the gev and gp distributional functions in therevdbayes package.The other cases use the distributional functions in the stats-package. If distn = "gamma"then the (shape, rate) parameterisation is used. If scale is supplied viaparams then rate is inferred from this. If distn = "negative binomial"then the (size, prob) parameterisation is used. If mu is supplied via paramsthen prob is inferred from this (and size). If distn = "beta" then ncp isforced to be zero.

clt 3

params A named list of additional arguments to be passed to the density function associ-ated with distribution distn. The (shape, rate) parameterisation is used forthe gamma distribution (see GammaDist) even if the value of the scale parame-ter is set using params.If a parameter value is not supplied then the default values in the relevant distri-butional function set using distn are used, except for "beta" (shape1 = 2, shape2 = 2),"chisq" (df = 4), "f" (df1 = 4, df2 = 8), "gev" (shape = 0.2). "gamma"(shape = 2, "gp" (shape = 0.1), "poisson" (lambda = 5) and "t" (df = 4)and "weibull" (shape = 2).

panel_plot A logical parameter that determines whether the plot is placed inside the panel(TRUE) or in the standard graphics window (FALSE). If the plot is to be placedinside the panel then the tkrplot library is required.

hscale, vscale Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE.The default values are 1.4 on Unix platforms and 2 on Windows platforms.

n_add An integer scalar. The number of simulated datasets to add to each new frameof the movie.

delta_n A numeric scalar. The amount by which n is increased (or decreased) after oneclick of the + (or -) button in the parameter window.

arrow A logical scalar. Should an arrow be included to show the simulated samplemaximum from the top plot being placed into the bottom plot?

leg_cex The argument cex to legend. Allows the size of the legend to be controlledmanually.

... Additional arguments to the rpanel functions rp.button and rp.doublebutton,not including panel, variable, title, step, action, initval, range.

Details

Loosely speaking, a consequence of the Central Limit Theorem is that the mean of a large numberof independent and identically distributed random variables, each with mean µ and finite standarddeviation σ has approximately a normal distribution, even if these original variables are not nor-mally distributed.

This movie considers examples where this limiting result holds and illustrates graphically the close-ness of the limiting approximation provided by the relevant normal limit to the true finite-n distri-bution. Of course, when distn = "normal" this result is exact.

Samples of size n are repeatedly simulated from the distribution chosen using distn. These samplesare summarized using a plot that appears at the top of the movie screen. For each sample the meanof these n values is calculated, stored and added to another plot, situated below the first plot. Thisplot is either a histogram or an empirical c.d.f., chosen using a radio button. A rug is added to ahistogram provided that it contains no more than 1000 points.

The p.d.f. (for a continuous variable) or p.m.f. (for a discrete variable) of the original variablesis added to the top plot. There is a checkbox to add to the bottom plot the approximate (large n)normal p.d.f./c.d.f. (with mean µ and standard deviation σ/

√n), implied by the CLT.

Once it starts, four aspects of this movie are controlled by the user.

• There are buttons to increase (+) or decrease (-) the sample size, that is, the number of valuesover which a maximum is calculated.

4 continuous

• Each time the button labelled "simulate another n_add samples of size n" is clicked n_addnew samples are simulated and their sample mean are added to the bottom histogram.

• There is a button to switch the bottom plot from displaying a histogram of the simulated meansand the limiting normal p.d.f. to the empirical c.d.f. of the simulated data, the exact c.d.f. andthe limiting normal c.d.f.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# Exponential dataclt()

# Uniform dataclt(distn = "uniform")

# Poisson dataclt(distn = "poisson")

continuous Univariate Continuous Distributions: p.d.f and c.d.f.

Description

A movie to illustrate how the probability density function (p.d.f.) and cumulative distribution func-tion (c.d.f.) of a continuous random variable depend on the values of its parameters.

Usage

continuous(distn, var_range = NULL, params = list(), param_step = list(),param_range = list(), p_vec = NULL, smallest = 0.01,plot_par = list(), panel_plot = TRUE, hscale = NA, vscale = hscale,...)

Arguments

distn Either a character string or a function to choose the continuous random variable.Strings "beta", "cauchy", "chisq" "chi-squared", "exponential", "f","gamma", "gev", "gp", "lognormal", "log-normal", "normal", "t", "uniform"and "weibull" are recognised, case being ignored. The relevant distributionalfunctions dxxx and pxxx in the stats-package are used. The abbreviations

continuous 5

xxx are also recognised. The "gev" and "gp" cases use the gev and gp distri-butional functions in the revdbayes package. If distn = "gamma" then the(shape, rate) parameterisation is used, unless a value for scale is providedvia the argument params when the (shape, scale) parameterisation is used.Valid functions are set up like a standard distributional function dxxx, with firstargument x, last argument log and with arguments to set the parameters of thedistribution in between. See the CRAN task view on distributions.If distn is not supplied then distn = "normal" is used.

var_range A numeric vector of length 2. Can be used to set a fixed range of values overwhich to plot the p.d.f. and c.d.f., in order better to see the effects of changingthe parameter values. If var_range is set then it overrides p_vec (see below).

params A named list of initial parameter values with which to start the movie. Ifdistn is a string and a particular parameter value is not supplied then the fol-lowing values are used. "beta": shape1 = 2, shape2 = 2, ncp = 0;"cauchy": location = 0, scale = 1; "chi-squared": df = 4, ncp = 0;"exponential": rate = 1; "f": df1 = 4, df2 = 8, ncp =0; "gamma":shape = 2, rate = 1; "gev": loc = 0, scale = 1, shape = 0.1; "gp":loc = 0, scale = 1, shape = 0.1; "lognormal": meanlog = 0, sdlog = 1;"normal": mean = 0, sd = 1; "t": df = 4, ncp = 0; "uniform":min = 0, max = 1; "weibull": shape = 2, scale = 1.If distn is a function then params must set any required parameters.If parameter value is outside the corresponding range specified by param_rangethen it is set to the closest limit of the range.

param_step A named list of the amounts by which the respective parameters in params areincreased/decreased after one click of the +/- button. If distn is a functionthen the default is 0.1 for all parameters. If distn is a string then a sensibledistribution-specific default is set internally.

param_range A named list of the ranges over which the respective parameters in params areallowed to vary. Each element of the list should be a vector of length 2: thefirst element gives the lower limit of the range, the second element the upperlimit. Use NA to impose no limit. If distn is a function then all parameters areunconstrained.

p_vec A numeric vector of length 2. The p.d.f. and c.d.f. are plotted between the100p_vec[1]% and 100p_vec[2]% quantiles of the distribution. If p_vec isnot supplied then a sensible distribution-specific default is used. If distn is afunction then the default is p_vec = c(0.001, 0.999).

smallest A positive numeric scalar. The smallest value to be used for any strictly positiveparameters when distn is a string.

plot_par A named list of graphical parameters (see link[graphics]{par}) to be passedto plot. This may be used to alter the appearance of the plots of the p.m.f. andc.d.f.

panel_plot A logical parameter that determines whether the plot is placed inside the panel(TRUE) or in the standard graphics window (FALSE). If the plot is to be placedinside the panel then the tkrplot library is required.

hscale, vscale Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE.The default values are 1.4 on Unix platforms and 2 on Windows platforms.

6 correlation

... Additional arguments to be passed to rp.doublebutton, not including panel,variable, title, step, action, initval, range.

Details

The movie starts with a plot of the p.d.f. of the distribution for the initial values of the parameters.Buttons increase (+) or decrease (-) each parameter. There are radio buttons to switch the plot fromthe p.d.f. to the c.d.f. and back.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# Normal examplecontinuous()# Fix the range of values over which to plotcontinuous(var_range = c(-10, 10))

# The same example, but using a user-supplied function and setting manually# the initial parameters, parameter step size and rangecontinuous(distn = dnorm, params = list(mean = 0, sd = 1),

param_step = list(mean = 1, sd = 1),param_range = list(sd = c(0, NA)))

# Gamma distribution. Show the use of var_rangecontinuous(distn = "gamma", var_range = c(0, 15))

correlation Sampling distribution of the Pearson correlation coefficient movie

Description

A movie to illustrate how the sampling distribution of the Pearson product moment sample correla-tion coefficient r depends on the sample size n and on the true correlation ρ.

Usage

correlation(n = 30, rho = 0, panel_plot = TRUE, hscale = NA,vscale = hscale, delta_n = 1, delta_rho = 0.1, ...)

correlation 7

Arguments

n An integer scalar. The initial value of the sample size. Must not be less than 2.

rho A numeric scalar. The initial value of the true correlation ρ. Must be in [-1, 1].

panel_plot A logical parameter that determines whether the plot is placed inside the panel(TRUE) or in the standard graphics window (FALSE). If the plot is to be placedinside the panel then the tkrplot library is required.

hscale, vscale Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE.The default values are 1.4 on Unix platforms and 2 on Windows platforms.

delta_n An integer scalar. The amount by which the value of the sample size is in-creased/decreased after one click of the +/- button.

delta_rho A numeric scalar. The amount by which the value of rho is increased/decreasedafter one click of the +/- button.

... Additional arguments to the rpanel functions rp.button and rp.doublebutton,not including panel, variable, title, step, action, initval, range.

Details

Random samples of size n are simulated from a bivariate normal distribution bivariate normal dis-tribution in which each of the variables has a mean of 0 and a variance of 1 and the correlation ρbetween the variables is chosen by the user.

The movie contains two plots. On the top is a scatter plot of the simulated sample, illustratingthe strength of the association between the simulated values of the variables. A new sample isproduced by clicking "simulate another sample. For each simulated sample the sample (Pearsonproduct moment) correlation coefficient r is calculated and displayed in the title of the top plot.

The values of the sample correlation coefficients are stored and are plotted in a histogram in thebottom plot. A rug displays the individual values, with the most recent value coloured red. As weaccumulate a large number of values in this histogram the shape of the sampling distribution of remerges. The exact p.d.f. of r is superimposed on this histogram, as is the value of ρ.

The bottom plot can be changed in two ways: (i) a radio button can be pressed to replace thehistogram and pdf with a plot of the empirical c.d.f. and exact cdf; (ii) the variable can be changedfrom ρ to Fisher’s z-transformationF (ρ) = arctanh(ρ) = [ln(1+ρ)−ln(1−ρ)]/2. For sufficientlylarge values of n, F (ρ) has approximately a normal distribution with mean ρ and variance 1/(n−3).The values of the sample size n or true correlation coefficient ρ can be changed using the respective+/- buttons. If one of these is changed then the bottom plot is reset using the sample correlationcoefficient of the first sample simulated using the new combination of n and ρ.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

8 discrete

Examples

correlation(rho = 0.8)correlation(n = 10)

discrete Univariate Discrete Distributions: p.m.f and c.d.f.

Description

A movie to illustrate how the probability mass function (p.m.f.) and cumulative distribution function(c.d.f.) of a discrete random variable depend on the values of its parameters.

Usage

discrete(distn, var_support = NULL, params = list(), param_step = list(),param_range = list(), p_vec = NULL, smallest = 0.01,plot_par = list(), panel_plot = TRUE, hscale = NA, vscale = hscale,observed_value = NA, ...)

Arguments

distn Either a character string or a function to choose the discrete random variable.Strings "binomial", "geometric", "hypergeometric", "negative binomial"and "poisson" are recognised, case being ignored. The relevant distributionalfunctions dxxx and pxxx in the stats-package are used. The abbreviations xxxare also recognised. If distn = "hypergeometric" then the (size, prob) pa-rameterisation is used, unless a value for mu is provided via the argument paramswhen the (size, mu) parameterisation is used.Valid functions are set up like a standard distributional function dxxx, with firstargument x, last argument log and with arguments to set the parameters of thedistribution in between. See the CRAN task view on distributions. It is as-sumed that the support of the random variable is a subset of the integers, unlessvar_support is set to the contrary.If distn is not supplied then distn = "binomial" is used.

var_support A numeric vector. Can be used to set a fixed set of values for which to plot thep.m.f. and c.d.f., in order better to see the effects of changing the parametervalues or to set a support that isn’t a subset of the integers. If var_support isset then it overrides p_vec (see below).

params A named list of initial parameter values with which to start the movie. If distn isa string and a particular parameter value is not supplied then the following valuesare used. "binomial": size = 10, prob = 0.5; "geometric": prob = 0.5;"hypergeometric": m = 10, n = 7, k = 8; "negative binomial":size = 10, prob = 0.5; "poisson": lambda = 5.If distn is a function then params must set any required parameters.If parameter value is outside the corresponding range specified by param_rangethen it is set to the closest limit of the range.

discrete 9

param_step A named list of the amounts by which the respective parameters in params areincreased/decreased after one click of the +/- button. If distn is a functionthen the default is 0.1 for all parameters. If distn is a string then a sensibledistribution-specific default is set internally.

param_range A named list of the ranges over which the respective parameters in params areallowed to vary. Each element of the list should be a vector of length 2: thefirst element gives the lower limit of the range, the second element the upperlimit. Use NA to impose no limit. If distn is a function then all parameters areunconstrained.

p_vec A numeric vector of length 2. The p.d.f. and c.d.f. are plotted between the100p_vec[1]% and 100p_vec[2]% quantiles of the distribution. If p_vec isnot supplied then a sensible distribution-specific default is used. If distn is afunction then the default is p_vec = c(0.001, 0.999).

smallest A positive numeric scalar. The smallest value to be used for any strictly positiveparameters when distn is a string.

plot_par A named list of graphical parameters (see link[graphics]{par}) to be passedto plot. This may be used to alter the appearance of the plots of the p.m.f. andc.d.f.

panel_plot A logical parameter that determines whether the plot is placed inside the panel(TRUE) or in the standard graphics window (FALSE). If the plot is to be placedinside the panel then the tkrplot library is required.

hscale, vscale Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE.The default values are 1.4 on Unix platforms and 2 on Windows platforms.

observed_value A non-negative integer. If observed_value is supplied then the correspondingline in the plot of the p.m.f. is coloured in red.

... Additional arguments to be passed to rp.doublebutton, not including panel,variable, title, step, action, initval, range.

Details

The movie starts with a plot of the p.m.f. of the distribution for the initial values of the parameters.Buttons increase (+) or decrease (-) each parameter. There are radio buttons to switch the plot fromthe p.m.f. to the c.d.f. and back.

Owing to a conflict with the argument size of the function rp.control the parameter size of, forexample, the binomial and negative binomial distributions, is labelled as n.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

10 ett

Examples

# Binomial examplediscrete()

# The same example, but using a user-supplied function and setting manually# the initial parameters, parameter step size and rangediscrete(distn = dbinom, params = list(size = 10, prob = 0.5),

param_step = list(size = 1),param_range = list(size = c(1, NA), prob = c(0, 1)))

# Poisson distribution. Show the use of var_supportdiscrete(distn = "poisson", var_support = 0:20)

ett Extremal Types Theorem (ETT)

Description

A movie to illustrate the extremal types theorem, that is, convergence of the distribution of themaximum of a random sample of size n from certain distributions to a member of the GeneralizedExtreme Value (GEV) family, as n tends to infinity. Samples of size n are simulated repeatedly fromthe chosen distribution. The distributions (simulated empirical and true) of the sample maxima arecompared to the relevant GEV limit.

Usage

ett(n = 20, distn, params = list(), panel_plot = TRUE, hscale = NA,vscale = hscale, n_add = 1, delta_n = 1, arrow = TRUE,leg_cex = 1.25, ...)

Arguments

n An integer scalar. The size of the samples drawn from the distribution chosenusing distn. n must be no smaller than 2.

distn A character scalar specifying the distribution from which observations are sam-pled. Distributions "beta", "cauchy", "chisq", "chi-squared", "exponential","f", "gamma", "gp", lognormal, log-normal, "ngev", "normal", "t", "uniform"and "weibull" are recognised, case being ignored.If distn is not supplied then distn = "exponential" is used.The "gp" case uses the gp distributional functions in the revdbayes package.The "ngev" case is a negated GEV(1 / ξ, 1, ξ) distribution, for ξ > 0, and usesthe gev distributional functions in the revdbayes package. If ξ = 1 then thiscoincides with Example 1.7.5 in Leadbetter, Lindgren and Rootzen (1983).The other cases use the distributional functions in the stats-package. If distn = "gamma"then the (shape, rate) parameterisation is used. If scale is supplied viaparams then rate is inferred from this. If distn = "beta" then ncp is forcedto be zero.

ett 11

params A named list of additional arguments to be passed to the density function associ-ated with distribution distn. The (shape, rate) parameterisation is used forthe gamma distribution (see GammaDist) even if the value of the scale parame-ter is set using params.If a parameter value is not supplied then the default values in the relevant distri-butional function set using distn are used, except for "beta" (shape1 = 2, shape2 = 2),"chisq" (df = 4), "f" (df1 = 4, df2 = 8), "ngev" (shape = 0.2). "gamma"(shape = 2, "gp" (shape = 0.1), "t" (df = 4) and "weibull" (shape = 2).

panel_plot A logical parameter that determines whether the plot is placed inside the panel(TRUE) or in the standard graphics window (FALSE). If the plot is to be placedinside the panel then the tkrplot library is required.

hscale, vscale Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE.The default values are 1.4 on Unix platforms and 2 on Windows platforms.

n_add An integer scalar. The number of simulated datasets to add to each new frameof the movie.

delta_n A numeric scalar. The amount by which n is increased (or decreased) after oneclick of the + (or -) button in the parameter window.

arrow A logical scalar. Should an arrow be included to show the simulated samplemaximum from the top plot being placed into the bottom plot?

leg_cex The argument cex to legend. Allows the size of the legend to be controlledmanually.

... Additional arguments to the rpanel functions rp.button and rp.doublebutton,not including panel, variable, title, step, action, initval, range.

Details

Loosely speaking, a consequence of the Extremal Types Theorem is that, in many situations, themaximum of a large number n of independent random variables has approximately a GEV(µ, σ, ξ))distribution, where µ is a location parameter, σ is a scale parameter and ξ is a shape parameter.See Coles (2001) for an introductory account and Leadbetter et al (1983) for greater detail andmore examples. The Extremal Types Theorem is an asymptotic result that considers the possiblelimiting distribution of linearly normalised maxima as n tends to infinity. This movie considersexamples where this limiting result holds and illustrates graphically the closeness of the limitingapproximation provided by the relevant GEV limit to the true finite-n distribution.

Samples of size n are repeatedly simulated from the distribution chosen using distn. These samplesare summarized using a histogram that appears at the top of the movie screen. For each sample themaximum of these n values is calculated, stored and added to another plot, situated below the firstplot. A rug is added to a histograms provided that it contains no more than 1000 points. This plotis either a histogram or an empirical c.d.f., chosen using a radio button.

The probability density function (p.d.f.) of the original variables is superimposed on the top his-togram. There is a checkbox to add to the bottom plot the exact p.d.f./c.d.f. of the sample maximaand an approximate (large n) GEV p.d.f./c.d.f. implied by the ETT. The GEV shape parameter ξthat applies in the limiting case is used. The GEV location µ and scale σ are set based on constantsused to normalise the maxima to achieve the GEV limit. Specifically, µ is set at the 100(1-1/n)%quantile of the distribution distn and σ at (1 / n) / f(µ), where f is the density function of thedistribution distn.

12 ett

Once it starts, four aspects of this movie are controlled by the user.

• There are buttons to increase (+) or decrease (-) the sample size, that is, the number of valuesover which a maximum is calculated.

• Each time the button labelled "simulate another n_add samples of size n" is clicked n_addnew samples are simulated and their sample maxima are added to the bottom histogram.

• There is a button to switch the bottom plot from displaying a histogram of the simulatedmaxima, the exact p.d.f. and the limiting GEV p.d.f. to the empirical c.d.f. of the simulateddata, the exact c.d.f. and the limiting GEV c.d.f.

• There is a box that can be used to display only the bottom plot. This option is selected auto-matically if the sample size n exceeds 100000.

• There is a box that can be used to display only the bottom plot. This option is selected auto-matically if the sample size n exceeds 100000.

For further detail about the examples specified by distn see Chapter 1 of Leadbetter et al. (1983)and Chapter 3 of Coles (2001). In many of these examples ("exponential", "normal", "gamma", "lognormal", "chi-squared", "weibull", "ngev")the limiting GEV distribution has a shape parameter that is equal to 0. In the "uniform" case thelimiting shape parameter is -1 and in the "beta" case it is -1 / shape2, where shape2 is the secondparameter of the Beta distribution. In the other cases the limiting shape parameter is positive, withrespective values shape ("gp", see gp), 1 / df ("t", see TDist), 1 ("cauchy", see Cauchy), 2 / df2("f", see FDist).

Value

Nothing is returned, only the animation is produced.

References

Coles, S. G. (2001) An Introduction to Statistical Modeling of Extreme Values, Springer-Verlag,London. http://dx.doi.org/10.1007/978-1-4471-3675-0_3

Leadbetter, M., Lindgren, G. and Rootzen, H. (1983) Extremes and Related Properties of RandomSequences and Processes. Springer-Verlag, New York. http://dx.doi.org/10.1007/978-1-4612-5449-2

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# Exponential data: xi = 0ett()

# Uniform data: xi =-1ett(distn = "uniform")

# Student t data: xi = 1 / dfett(distn = "t", params = list(df = 5))

FPearson 13

FPearson Fisher’s transformation of the Pearson product moment correlationcoefficient

Description

Density, distribution function, quantile function and random generator for the distribution of Fisher’stransformation of Pearson’s product moment correlation, based on a random sample from a bivariatenormal distribution

Usage

dFPearson(x, N, rho = 0, log = FALSE)

pFPearson(q, N, rho = 0, lower.tail = TRUE, log.p = FALSE)

qFPearson(p, N, rho = 0, lower.tail = TRUE, log.p = FALSE)

rFPearson(n, N, rho = 0, lower.tail = TRUE, log.p = FALSE)

Arguments

x, q Numeric vectors of quantiles.N Numeric vector. Number of observations, (N > 3).rho Numeric vector. Population correlations, (-1 < rho < 1).log, log.p A logical scalar; if TRUE, probabilities p are given as log(p).lower.tail A logical scalar. If TRUE (default), probabilities are P[X <= x], otherwise, P[X

> x].p A numeric vector of probabilities in [0,1].n Numeric scalar. The number of observations to be simulated. If length(n) > 1

then length(n) is taken to be the number required.

References

Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samplesof an indefinitely large population. Biometrika, 10(4), 507-521. http://dx.doi.org/10.2307/2331838

Fisher, R. A. (1921). On the "probable error" of a coefficient of correlation deduced from a smallsample. Metron, 1, 3-32. https://digital.library.adelaide.edu.au/dspace/bitstream/2440/15169/1/14.pdf

See Also

Pearson in the SuppDists package for dpqr functions for the untransformed Pearson produce mo-ment correlation coefficient.

correlation: correlation sampling distribution movie.

14 lev_inf

Examples

dFPearson(-1:1, N = 10)dFPearson(0, N = 11:20)

pFPearson(0.5, N = 10)pFPearson(0.5, N = 10, rho = c(0, 0.3))

qFPearson((1:9)/10, N = 10, rho = 0.2)qFPearson(0.5, N = c(10, 20), rho = c(0, 0.3))

rFPearson(6, N = 10, rho = 0.6)

lev_inf Leverage and influence in simple linear regression movie

Description

A movie to examine the influence of a single outlying observation on a least squares regression line.

Usage

lev_inf(association = c("positive", "negative", "none"), n = 25,panel_plot = TRUE, hscale = NA, vscale = hscale)

Arguments

association A character scalar. Determines the type of association between (not-outlying)observations: "positive" for positive linear association; "negative" negative lin-ear association; "none" for no association.

n An integer scalar. The size of the sample of (non-outlying) observations.

panel_plot A logical parameter that determines whether the plot is placed inside the panel(TRUE) or in the standard graphics window (FALSE). If the plot is to be placedinside the panel then the tkrplot library is required.

hscale, vscale Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE.The default values are 1.4 on Unix platforms and 2 on Windows platforms.

Details

n pairs of observations are simulated with the property that the mean of response variable y is alinear function of the values of the explanatory variable x. These pairs of observations are plottedusing filled black circles. An extra observation is plotted using a filled red circle. Initially thisobservation is placed in the middle of the plot.

Superimposed on the plot are two least squares regression lines: one based on all the data (‘withobservation’) and one in which the ‘red’ observation has been removed (‘without observation’.Initially these lines coincide.

movies 15

The location of the ‘red’ observation can be changed using the +/- buttons so that the effect of theposition of this observation on the ‘with observation’ line can be seen.

We see that if the red observation is outlying, that is, it is far from the least squares line fittedto the other observations, then its influence on the least squares regression line depends on itsx-coordinate. If its x-coordinate is much larger or smaller than the x-coordinate of the other obser-vations (high leverage) then the influence is higher than if it has a similar x-coordinate to the otherobservations (low leverage). An observation with high leverage does not necessarily have highinfluence: if its y-coordinate falls very close to the regression line fitted to the other observationsthen its influence will be low.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# Positive associationlev_inf()

# No associationlev_inf(association = "none")

movies Main menu for smovie movies

Description

Uses the template rp.cartoons function to produce a menu panel from which any of the moviesin smovie package can be launched. For greater control of an individual example call the relevantfunction directly.

Usage

movies(fixed_range = TRUE, hscale = NA, vscale = hscale)

Arguments

fixed_range A logical scalar. Only relevant to the Discrete and Continuous menus. IfTRUE then in the call to discrete or continuous the argument var_support(discrete) or var_range (continuous) is set so that the values on the horizon-tal axes are fixed at values that enable the movie to show the effects of changingthe parameters of the distribution, at least locally to the default initial values forthe parameters. For greater control call discrete or continuous directly.

16 shypo

hscale, vscale Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE.The default values are 1.4 on Unix platforms and 2 on Windows platforms.

See Also

discrete, continuous, clt, ett, correlation, lev_inf, wws, shypo.

smovie: general information about smovie.

Examples

movies()

shypo Testing simple hypotheses

Description

A movie to illustrate statistical concepts involved in the testing of one simple hypothesis against an-other. The example used is a random sample from a normal distribution whose variance is assumedto be known. The simple hypotheses relate to the value of the mean µ.

Usage

shypo(mu0 = 0, sd = 6, eff = sd, n = 10, a = mu0 + eff/2,target_alpha = 0.05, target_beta = 0.1, panel_plot = TRUE,hscale = NA, vscale = hscale, delta_n = 1, delta_a = sd/(10 *sqrt(n)), delta_eff = sd, delta_mu0 = 1, delta_sd = 1)

Arguments

mu0 A numeric scalar. The value of µ under the null hypothesis H0 with which tostart the movie.

sd A positive numeric scalar. The (common) standard deviation σ of the normaldistributions of the data under the two hypotheses.

eff A numeric scalar. The effect size. The amount by which the value of µ under thealternative hypothesis is greater than the value mu0 under the null hypothesis.That is, mu1 = eff + mu0. eff must be non-negative.

n A positive integer scalar. The sample size with which to start the movie.

a A numeric scalar. The critical value of the test with which to start the movie.H0 is rejected if the sample mean is greater than a.

target_alpha A numeric scalar in (0,1). The target value of the type I error to be achieved bysetting a and/or n if the user asks for this using a radio button.

target_beta A numeric scalar in (0,1). The target value of the type II error to be achieved bysetting a and/or n if the user asks for this using a radio button.

shypo 17

panel_plot A logical parameter that determines whether the plot is placed inside the panel(TRUE) or in the standard graphics window (FALSE). If the plot is to be placedinside the panel then the tkrplot library is required.

hscale, vscale Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE.The default values are 1.4 on Unix platforms and 2 on Windows platforms.

delta_mu0, delta_eff, delta_a, delta_n, delta_sd

Numeric scalars. The respective amounts by which the values of mu0, eff, a, nand sd are increased (or decreased) after one click of the + (or -) button in theparameter window.

Details

The movie is based on two plots.

The top plot shows the (normal) probability density functions of the sample mean under the nullhypothesis H0 (mean mu0) and the alternative hypothesis H1 (mean mu1, where mu1 > mu0), with thevalues of mu0 and mu1 indicated by vertical dashed lines. H0 is rejected if the sample mean exceedsthe critical value a, which is indicated by a vertical black line.

The bottom plot shows how the probabilities of making a type I or type II error (alpha and betarespectively) depend on the value of a, by plotting these probabilities against a.

A parameter window enables the user to change the values of n, a, mu0, eff = mu1 - mu0 or sd byclicking the +/- buttons.

Radio buttons can be used either to:

• set a to achieve the target type I error probability target_alpha, based on the current valueof n;

• set a and (integer) n to achieve (or better) the respective target type I and type II error proba-bilities of target_alpha and target_beta, based on the current value of n.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# 1. Change a (for fixed n) to achieve alpha = 0.05# 2. Change a and n to achieve alpha <= 0.05 and beta <= 0.1shypo(mu0 = 0, eff = 5, n = 16, a = 2.3, delta_a = 0.01)

18 smovie

smovie smovie: some movies to illustrate concepts in statistics

Description

These movies are animations used to illustrate key statistical ideas. They are produced using therpanel-package, which has BWidget as a system requirement. BWidget is included in the Rinstallers for Windows and macOS. For other platforms please see Section 1.1.7 of Writing R Ex-tensions for installation advice.

Details

When one of these functions is called R opens up a small parameter window containing clickablebuttons that can be used to change parameters underlying the plot. For the effects of these buttonssee the documentation of the individual functions.

See vignette("smovie-vignette", package = "smovie") for an overview of the package andthe user-friendly menu panel.

There are movies on the following topics.

Probability distributions

• Discrete distributions

• Continuous distributions

Sampling distributions

• Central Limit Theorem: sampling distribution of a sample mean

• Extremal Types Theorem: sampling distribution of a sample maximum

• Pearson product moment correlation coefficient

Regression

• Leverage and influence in simple linear regression

Hypothesis testing

• Wald, Wilks and Score tests

• Testing simple hypotheses

References

Bowman, A., Crawford, E., Alexander, G. and Bowman, R. W. (2007). rpanel: Simple InteractiveControls for R Functions Using the tcltk Package. Journal of Statistical Software, 17(9), 1-18.http://www.jstatsoft.org/v17/i09/.

wws 19

wws Wald, Wilks and Score tests

Description

A movie to illustrate the nature of the Wald, Wilks and score likelihood-based test statistics, for amodel with a scalar unknown parameter θ. The user can change the value of the parameter undera simple null hypothesis and observe the effect on the test statistics and (approximate) p-valuesassociated with the tests of this hypothesis against the general alternative. The user can specifytheir own log-likelihood or use one of two in-built examples.

Usage

wws(model = c("norm", "binom"), theta_range = NULL, mult = 3,theta0 = if (!is.null(theta_range)) sum(c(0.25, 0.75) * theta_range) elseNULL, panel_plot = TRUE, hscale = NA, vscale = hscale,delta_theta0 = if (!is.null(theta_range)) abs(diff(theta_range))/20 elseNULL, theta_mle = NULL, loglik = NULL, alg_score = NULL,alg_obs_info = NULL, digits = 3, ...)

Arguments

model A character scalar. Name of the the distribution on which one of two in-builtexamples are based.If model = "norm" then the setting is a random sample of size n from a normaldistribution with unknown mean mu = θ and known standard deviation sigma.If model = "binom" then the setting is a random sample from a Bernoullidistribution with unknown success probability θ.The behaviour of these examples can be controlled using arguments suppliedvia .... In particular, the data can be supplied using data. If model = "norm"then n, mu, and sigma can also be chosen. The default cases for these examplesare:

• model = "norm": n = 10, mu = 0, sigma = 1 and data contains a sampleof a sample of size n simulated, using Normal, from a normal distributionwith mean mu and standard deviation sigma.

• model = "binom": data = c(7, 13), that is, 7 successes and 13 failuresobserved in 20 trials. For the purposes of this movie there must be at leastone success and at least one failure.

theta_range A numeric vector of length 2. The range of values of θ over which to plot thelog-likelihood. If theta_range is not supplied then the argument mult is usedto set the range automatically.

mult A positive numeric scalar. If theta_range is not supplied then an interval ofwidth 2 x mult standard errors centred on theta_mle is used. If model = "binom"then theta_range is truncated to (0,1) if necessary.

theta0 A numeric scalar. The value of θ under the null hypothesis to use at the start ofthe movie.

20 wws

panel_plot A logical parameter that determines whether the plot is placed inside the panel(TRUE) or in the standard graphics window (FALSE). If the plot is to be placedinside the panel then the tkrplot library is required.

hscale, vscale Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE.The default values are 1.4 on Unix platforms and 2 on Windows platforms.

delta_theta0 A numeric scalar. The amount by which the value of theta0 is increased (ordecreased) after one click of the + (or -) button in the parameter window.

theta_mle A numeric scalar. The user may use this to supply the value of the maximumlikelihood estimate (MLE) of θ. Otherwise, optim is used to search for the MLE,using theta0 as the initial value and theta_range as bounds within which tosearch.

loglik An R function, vectorised with respect to its first argument, that returns the valueof the log-likelihood (up to an additive constant). The movie will not work if theobserved information is not finite at the maximum likelihood estimate.

alg_score A R function that returns the score function, that is, the derivative of loglikwith respect to θ.

alg_obs_info A R function that returns the observed information that is, the negated secondderivative of loglik with respect to θ.

digits An integer indicating the number of significant digits to be used in the displayedvalues of the test statistics and p-values. See signif.

... Additional arguments to be passed to loglik, alg_score and alg_obs_infoif loglik is supplied, or to functions functions relating to the in-built examplesotherwise. See the description of model above for details.

Details

The Wald, Wilks (or likelihood ratio) and Score tests are asymptotically equivalent tests of a simplehypothesis that a parameter of interest θ is equal to a particular value θ0. The test statistics areall based on the log-likelihood l(θ for θ but they differ in the way that they measure the distancebetween the maximum likelihood estimate (MLE) of θ and θ0. The Wilks statistic is the amount bywhich the log-likelihood evaluated θ0 is smaller than the log-likelihood evaluated at the MLE. TheWalk statistics is based on the absolute difference between the MLE and θ0. The score test is basedon the gradient of the log-likelihood (the score function) at θ0. For details see Azzalini (1996).

This movie illustrates the differences between the test statistics for simple models with a singlescalar parameter. In the (default) normal example the three test statistics coincide. This is not truein general, as shown by the other in-built example (distn = "binom").

A user-supplied log-likelihood can be provided via loglik.

Value

Nothing is returned, only the animation is produced.

References

Azzalini, A. (1996) Statistical Inference Based on the Likelihood, Chapman & Hall / CRC, London.

wws 21

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

Examples

# N(theta, 1) example, test statistics equivalentwws(theta0 = 0.8)

# binomial(20, theta) example, test statistics similarwws(theta0 = 0.5, model = "binom")

# binomial(20, theta) example, test statistic rather different# for theta0 distant from theta_mlewws(theta0 = 0.9, model = "binom", data = c(19, 1), theta_range = c(0.1, 0.99))

# binomial(2000, theta) example, test statistics very similarwws(theta0 = 0.5, model = "binom", data = c(1000, 1000))

set.seed(47)x <- rnorm(10)wws(theta0 = 0.2, model = "norm", theta_range = c(-1, 1))

# Log-likelihood for a binomial experiment (up to an additive constant)bin_loglik <- function(p, n_success, n_failure) {

return(n_success * log(p) + n_failure * log(1 - p))}

wws(loglik = bin_loglik, theta0 = 0.5, theta_range = c(0.1, 0.7),theta_mle = 7 / 20, n_success = 7, n_failure = 13)

bin_alg_score <- function(p, n_success, n_failure) {return(n_success / p - n_failure / (1 - p))

}bin_alg_obs_info <- function(p, n_success, n_failure) {

return(n_success / p ^ 2 + n_failure / (1 - p) ^ 2)}wws(loglik = bin_loglik, theta0 = 0.5, theta_range = c(0.1, 0.7),

theta_mle = 7 / 20, n_success = 7, n_failure = 13,alg_score = bin_alg_score, alg_obs_info = bin_alg_obs_info)

Index

Beta, 12

Cauchy, 12Central Limit Theorem: sampling

distribution of a sample mean,18

clt, 2, 16continuous, 4, 15, 16Continuous distributions, 18correlation, 6, 13, 16

dFPearson (FPearson), 13discrete, 8, 15, 16Discrete distributions, 18

ett, 10, 16Extremal Types Theorem: sampling

distribution of a samplemaximum, 18

FDist, 12FPearson, 13

GammaDist, 3, 11gev, 2, 5, 10gp, 2, 5, 10, 12

legend, 3, 11lev_inf, 14, 16Leverage and influence in simple

linear regression, 18

movies, 4, 6, 7, 9, 12, 15, 15, 17, 21

Normal, 19

optim, 20

Pearson, 13Pearson product moment correlation

coefficient, 18

pFPearson (FPearson), 13plot, 5, 9

qFPearson (FPearson), 13

revdbayes, 2, 5, 10rFPearson (FPearson), 13rp.button, 3, 7, 11rp.cartoons, 15rp.control, 9rp.doublebutton, 3, 6, 7, 9, 11rug, 3, 11

shypo, 16, 16signif, 20smovie, 4, 6, 7, 9, 12, 15–17, 18, 21smovie-package (smovie), 18

TDist, 12Testing simple hypotheses, 18

user-friendly menu panel, 18

Wald, Wilks and Score tests, 18wws, 16, 19

22