6.1 what is statistics? definition: statistics – science of collecting, analyzing, and...
Post on 28-Dec-2015
224 Views
Preview:
TRANSCRIPT
6.1 What is Statistics?
• Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively evaluated.
3 Phases:1. Collecting data
2. Analyzing data
3. Interpreting data
6.1 What is Statistics?
• Descriptive Statistics – summarize and describe a characteristic of a groupexample: batting average
• Inferential Statistics – used to estimate, infer, or conclude something about a larger groupexample: polls
• Sample – subset of the group of data available for analysis
6.1 What is Statistics?
• Population – the entire set• Bias – favoring of certain outcomes over
others• Census – collects data from all members of
the population• Parameter – characteristic value of a
population• Statistic – characteristic value of a sample
6.2 Organizing Data
• Stem and Leaf Diagram:data – 35, 52, 37, 44, 51, 48, 45, 12
Stem Leaves
5 1 2
4 4 5 8
3 5 7
2
1 2
6.2 Organizing Data
• Frequency Table:data – 35, 52, 37, 44, 51, 48, 45, 12
Range Frequency
50-59 2
40-49 3
30-39 2
20-29 0
10-19 1
6.3 Displaying Data
• Ways to display data:– Frequency histogram– Relative frequency histogram– Multiple bar graph– Stacked bar graph– Line graph– Pie chart
6.3 Displaying Data
Frequency Histogram
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8
Series1
6.3 Displaying Data
Relative Frequency Histogram
0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7 8
Re
lati
ve F
req
ue
nc
y
Series1
6.3 Displaying Data
Multiple Bar Graph
0
1000
2000
3000
4000
5000
lower
upper
graduate
6.3 Displaying Data
Stacked Bar Graph
010002000300040005000600070008000
graduate
upper
lower
6.3 Displaying Data
Line Graph
0
1000
2000
3000
4000
5000
lower
upper
graduate
6.3 Displaying Data
Pie Chart
Pie Chart
Comm
Edu
Natsci
Socsci
Nurs
Arts
6.4 Measures of Central Tendency
• Central Tendency – the propensity of data to be located or clustered about some point.
• Arithmetic Mean – sum of the values of all the observations divided by the total number of observations
• For sample data, mean is
n
xx
n
ii
1
6.4 Measures of Central Tendency
• For population data, the mean is
• Median – the median is the middle value of a set of data when data is arranged in ascending order
n
xn
ii
1
6.4 Measures of Central Tendency
• Finding the median:1. Arrange the data in increasing order or
decreasing order.
2. Determine if n is even or odd.
a. If n is odd, pick the middle value
b. If n is even, take the average of the two middle values
6.4 Measures of Central Tendency
• Mode – is the value or values that occur most frequently.Note: If all values occur with the same frequency, then there is no mode.
• Symmetric Distribution
Mean, Median, and Mode
6.4 Measures of Central Tendency
• Distribution skewed to the left
Mean MedianMode
• Distribution skewed to the right
MeanMedianMode
6.5 Measures of Variability
• Definition: The range of a set of n measurements, x1, x2, x3, … xn is the difference between the largest and the smallest amounts.
• Variance -
N
xN
ii
1
2
2
)(
6.5 Measures of Variability
Problem with the variance: the units are the original units squared.
• Standard deviation – population standard deviation is the square root of the population variance.
• Sample variance -
• s = square root of the sample variance
1
)(1
2
2
n
xxs
n
ii
6.5 Measures of Variability
• Short cut formulas for s2 and 2 are given on page 495 (provided with test).
• Short cut formula for frequency data is given on page 499 (provided with test).
• Short cut formulas are genuinely easier to calculate.
• Approximating the standard deviation:s (R/4) where R is the range.
6.6 Measures of Relative Position
• pth percentile - for a data in increasing order - p% of the data are less than that value and (100 – p)% of the data are greater than that value.
6.6 Measures of Relative Position
• Z-scores – The sample z-score for a measure x is:
The population z-score for a measure x is: z-score represents the # of standard deviations away from the mean.
s
xxz
x
z
6.7 Normal Distribution
• Definition: Standardizing – converting data to z-scores.
• Some empirical rules:1. About 68% of data is within one of the
mean.2. About 95% of data is within two of the
mean.3. About 99% of data is within three of the
mean.
6.7 Normal Distribution
• The normal distribution looks like:
1. Bell-shaped2. Symmetric3. Mean = median = mode
6.7 Normal Distribution
• Definition: Standard normal distribution – normal distribution with = 1 and = 0.
The standard normal distribution table (page 511 or in appendix page 647) can be used to determine probabilities for a range of z-values
6.8 Confidence Intervals
• Central Limit Theorem: For a large sample size, the random variable x is approximately normally distributed with mean and standard deviation /n where is the population mean of the x’s and is the population standard deviation of the x’s.
6.8 Confidence Intervals
• - may be replaced by s
• Common levels of confidence (n 30):
Level of Confidence z/2
80 1.28
90 1.645
95 1.96
99 2.575
nZx
2
6.8 Confidence Intervals
• Margin of Error: margin of error of an estimate of a sample proportion is given by:
n
Z
22
6.9 Regression and Correlation
• Scatter Plot – a plot of data consisting of 2 variables
• Linear Regression – modeling the data with the line that “best fits” – usually a “least squares” line or regression line
• Least Squares Line – is the line that minimizes the sum of the squared errors for a set of data points (formulas given on page 531 and shortcut formulas are on page 532 – formulas to be provided on test)
6.9 Regression and Correlation
• Correlation Coefficient r – is a measure of the strength of the linear relationship between the 2 random variables x and y.
Note: The closer the correlation is to 1 or –1, the stronger the relationship between the x and y variables. A correlation of zero means there is no evidence of a linear pattern.
top related