elementary statistics for foresters
DESCRIPTION
Elementary statistics for foresters. Lecture 3 Socrates/Erasmus Program @ WAU Spring semester 2005/2006. Statistical distributions. Statistical distributions. Empirical distributions Why distributions? Variable types Sample theoretical distributions Normal distribution - PowerPoint PPT PresentationTRANSCRIPT
Elementary statistics for foresters
Lecture 3
Socrates/Erasmus Program @ WAU
Spring semester 2005/2006
Statistical distributions
Statistical distributions
• Empirical distributions
• Why distributions?
• Variable types
• Sample theoretical distributions– Normal distribution– Binomial distribution
Empirical distributions
• Graphical representation of the data in a form of frequency distribution, histogram, polygon, etc.
Graphical description of data
Histogram for dk
dk
freq
uenc
y
0 3 6 9 12 15 180
20
40
60
80
100
Graphical description of data
polygon
dk
freq
uenc
y
0 3 6 9 12 15 180
20
40
60
80
100
Why distributions?
• In some cases it is necessary to formulate hypotheses about the specific distribution of the investigated variable. – For example, we can think of a wood density as
following the normal distribution, and use this information for modeling and inferential statistics purposes.
Why distributions?
• When using distributions for predictive purposes it is often desirable to understand the shape of the underlying distribution of the population.
• To determine this distribution, it is common to fit the observed distribution to a theoretical distribution by comparing the observed frequencies to the expected frequencies of the theoretical distribution.
Why distribution?
• To do this, maximum likelihood method or the method of moments are used.
• Another common application of theoretical distributions is to be able to verify the assumption of normality before using some parametric test.
Variable types
• Variables can be qualitative (which means: describing belonging to a group or category, eg. sex, hair color, tree species), and quantitative (which means: possible to measure using a numerical scale, or numeric values for which addition and averaging make sense, eg. DBH, height, crown ratio, ...).
Variable and distribution types
• If variables can take only a finite set of values, we are talking about discrete variables (eg. age, DBH class, ...), and about probability distribution.
• If variables can take any value (or any value from a given interval), we are talking about continuous variables (eg. height, DBH, ...), and probability density.
Variable and distribution types
• In many cases, due to measurement limitations or simplifications, continuous variables can be treated as discrete (eg., when DBH measured as rounded to 1mm).
Sample distributions
• Beta distribution is used to model the distribution of order statistics, and to representing processes with natural lower and upper limits.
• binomial distribution is used for describing binomial events, such as the number of M/F in a random sample, or the number of defective components in samples of n units taken from a production process.
Sample distributions
• chi-square distribution is most frequently used in modeling random variables representing frequencies.
• exponential distribution is frequently used to model the time interval between successive random events.
• logistic distribution is used to model binary responses.
Sample distributions
• normal distribution is a theoretical function commonly used in inferential statistics as an approximation to sampling distributions.
• Poisson distribution is used to model rare events.
• Weibull distribution is often used as a model of failure time or in reliability testing.
• ...
Normal distribution
• The most frequently used distribution in statistics
• The basic assumption of many statistical methods, such as estimation, hypotheses testing, regression and correlation, analysis of variance, ...
Normal distribution
• Usually variables whose values are determined by an infinite number of independent random events will be distributed following the normal distribution.
• The normal distribution is an example of the distribution of continuous variables. Its probability density function can be described as following:
Normal distribution
• where:– x is a variable of interest– µ is an arithmetic mean– σ is standard deviation
Normal distribution
Normal distribution properties:
• the probablility density function rises for x<µ, and lowers for x>µ
• the probability density function has its maximum at x = µ
• the expected value of the X variable E(X)=µ
• variance of the X variable: D2X = σ2
Normal distribution properties
• at x = µ the probability density function has a value of
• the distribution has 2 inflection points (the function changes from concavitate to convexitate or from convexitate to concavitate) for x=µ - σ and x = µ + σ
• the normal distribution is symmetric, and the symmetry axe is defined as x = µ
Normal distribution properties:
• if variance/standard deviation is low, the probability density function is narrower
• the probablity function of the normal distribution is an integral of the probability density function
Normal distribution properties:
Standarized normal distribution
• Every normal distribution can be normalized, i.e. can be written as the distribution with mean equal 0 and standard deviation equal 1: N(0,1).
• The expected value of the standarized normal distribution equals zero (EZ = 0) and its variance equals 1 (D2Z = 1).
Standarized normal distribution
• The standarization process is nothing else but changing variable x to z, where:
• The probability density function of such a distribution is:
Standarized normal distribution
Normal distribution properties:
• Between µ - σ and µ + σ about 68% of all variable values occur
• In the interval from μ - 2*σ to μ + 2*σ are about 95% of all values of the variable
• In the interval from μ - 3*σ to μ + 3*σ are about 99,7% of all observations
Cumulative distribution
cumulative histogram
dk
freq
uenc
y
0 3 6 9 12 15 180
50
100
150
200
250
Cumulative dustribution
Cumulative normal distribution
Cumulative normal distribution
Cumulative normal distribution
Binomial distribution
• Example of the probability distribution
• Describes the probability of getting k number of successes in n independently repeated samples, where probability of a success in just one sample equals p
Binomial distribution
Binomial distribution
Binomial distribution properties
• the graph of the distribution is symmetric for p = 0.5
• for p < 0.5 the distribution is positively skewed
• for p > 0.5 is negatively skewed
Binomial distribution properties
• Expected value E(X) = n * p
• Variance D2X = n p q
• Standard deviation
• Sample exercises using the binomial distribution