inen 270 engineering statistics fall 2011 introduction

Download INEN 270 ENGINEERING STATISTICS Fall 2011 Introduction

If you can't read please download the document

Upload: huraish-almakaeel

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

Lecture 1: Introduction to StatisticsPurpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics Lecture Summary

TRANSCRIPT

INEN 270 ENGINEERING STATISTICS Fall 2011 Introduction Agenda Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics Lecture Summary Lecture 1: Introduction to StatisticsPurpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics Lecture Summary Why Study Statistics? You need to know how to evaluate published numerical facts. Your profession or employment may require you to interpret the results of sampling or to employ statistical methods of analysis to make inferences in your work. What Is the Purpose of Statistics? One purpose of statistics is to make sense of your data. Statistics provide information about your data so you can answer questions and make informed business decisions. Lecture 1: Introduction to Statistics Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics Lecture Summary Objectives Explain use of statistics. Define population and sample. Describe processes involved in statistical analysis. Compare descriptive and inferential statistics. Discuss the sampling plan. Defining the Problem Before you begin any analysis, you should complete certain tasks. 1. Outline the purpose of the study. 2. Document the study questions. 3. Define the population of interest. 4. Determine the need for sampling. 5. Define the data collection protocol. Example: Speeding Data 65 mph 50 mph 48 mph Speed Limit mph Population and Sample Basic Definition STATISTICS: Area of science concerned with extraction of information from numerical data and its use in making inference about a population from data that are obtained from a sample. Population (set of all measurements) Sample (set of measurements selected from the population) ? ? Extract Information Make Inference Basic Definition Population and Parameter Population: set representing all measurements of interest to the investigator. Parameters: an unknown population characteristic of interest to the investigator. Sample and Statistic Sample: subset of measurements selected from the population of interest. Statistic: a sample characteristic of interest to the investigator. Descriptive Statistics Center of location: mean, median, mode Variability: variance, standard deviation Distribution Examples of Population and Sample Selecting the proper diet for shrimp or other sea animals is an important aspect of sea farming. A researcher wishes to estimate the average weight of shrimp maintained on a specific diet for a period of 6 months. One hundred shrimp are randomly selected from an artificial pond and each is weighed. Identify the population Identify the sample Identify the parameter Identify the statistic Simple Random Sampling Convenience Sampling Process of Statistical Data Analysis Population Random Sample Sample Statistics Make Inferences Describe Sampling Plan Lecture 1: Introduction to Statistics Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics Lecture Summary Objectives Compute and interpret statistics describing the location of a set of values, such as the mean and median and mode. Compute and interpret statistics describing the variability in a set of values, such as the range and standard deviation. Compute and interpret the measures of shape, skewness and kurtosis. Produce graphical displays of data. Some Frequently Used Statistics and Parameters Measure of Location Descriptive statistics that locate the center of your data are called measures of central tendency Sample Mean The sample mean of a set of n measurements (x 1, x 2,x n ) is equal to the sum of the measurements divided by n. Sample Median Median: the middle value (also known as the 50th percentile) The median of a set of n measurements (x 1, x 2,x n ) is the value that falls in the middle position when the measurements are ordered from the smallest to the largest. x 1,x n are arranged in increasing order of magnitude Measure of Location RULE FOR CALCULATING THE MEDIAN 1. Order the measurements from the smallest to the largest. 2. A) If the sample size is odd, the median is the middle measurement. B) If the sample size is even, the median is the average of the two middle measurements. n=3n= n=3n=3 median Percentiles th Percentile=91 50 th Percentile=80 25 th Percentile=59 Quartiles break your data up into quarters. third quartile first quartile Example A random sample of six values were taken from a population. These values were: x 1 =7, x 2 =1, x 3 =10, x 4 =8, x 5 =4, and x 6 =12. What are the sample mean and sample median for these data? Sample Mean CALCULATIONS FOR THE SAMPLE MEDIAN 1. Order Sample 2.Median x 2 =1, x 5 =4, x 1 =7, x 4 =8, x 3 =10, x 6 =12 1. Order Sample MEDIAN = ( ) / 2 = 7.5 Example Given a set of data: 1.7, 2.2, 3.11, 3.9, and 14.7 Sample mean= Sample median = Example Consider the following sample: Which measure of central tendency best describes the central location of the data: THE SAMPLE MEAN OR SAMPLE MEDIAN? Why? the median Why? Because there is an outlier (extreme value),4 in the data set, the mean is heavily influenced by this single outlier. Solution: Trimmed meandrops the highest and lowest extreme values and averages the rest. e.g. 5% trimmed mean drops the highest and lowest 5% and averages the rest. Sample Mode What is the mode for the previous example? 44 (occurs twice) 49 (occurs twice) Measures of Central Tendency (Mode, mean and median) How are they related to a given data set? Depending on the skewness of the population (a) A bell-shaped distribution (b) A distribution skewed to the left A: mean B: median C: mode A: mode B: median C: mean (c) A distribution skewed to the right Suppose IRS wants to measure the central tendency of the income of the American population, which measure will you recommend and why? Hint: Bill Gates Skewed to the right Other Measures of Locations Trimmed means Computed by trimming away a certain percent of both the largest and smallest set of values. Less sensitive to outliers than the mean but more-so than the median. What is the relationship between trimmed mean and the median? Example: The Spread of a Distribution: Variation MeasureDefinition rangethe difference between the maximum and minimum data values interquartile rangethe difference between the 25th and 75th percentiles (IR or IQR) variancea measure of dispersion of the data around the mean standard deviationa measure of dispersion expressed in the same units of measurement as your data (the square root of the variance) coefficient of variation standard deviation as a percentage of of the mean Typical Variation: Standard Deviation The variance is a measure of variation. The square root of the variance, or standard deviation, is a measure of variation in terms of the original linear scale. is the population standard deviation is an estimate of the population standard deviation. Typical Variation: Average Squared Deviation Consider the data {3, 4, 8} ObsDataDeviation(Deviation) Sum15014 Average5014/3 Measures of Variability Sample Range X Max -X Min Sample Variance Sample Standard Deviation Obs Obs. Sample Variance Unbiased Estimate of Population Variance Calculate the unbiased estimate of population variance by averaging with n-1 instead of n. This estimator is unbiased because, on average, it equals the population variance. Discrete and Continuous Data Discrete Data Counted: # of defective items, # of accidents Continuous Data Measured: all possible heights, weights, distance,etc. Distributions When you examine the distribution of values for speed, you can determine the range of possible data values the frequency of data values whether the data values accumulate in the middle of the distribution or at one end. Graphical Methods and Data Description Stem and Leaf Plot Relative Frequency distribution Relative Frequency Histogram Construction of a Stem-Leaf Display List the stem values, in order, in a vertical column Draw a vertical line to the right of the stem values For each observation, record the leaf portion of the observation in the row corresponding to the appropriate stem Reorder the leaves from the lowest to highest within each stem row If the number of leaves appearing in each stem is too large, divide the stems into two groups, the first corresponding to leaves 0 through 4, and the second corresponding to leaves 5 through 9. (This subdivision can be increased to five groups if necessary). Car Battery Life STEM LEAF Frequency Stem and Leaf Plot of Battery Life STEM LEAF Frequency 1 * 21 2 * * 5773 Double-Stem and Leaf Plot of Battery Life Relative Frequency Distribution Group data into different classes or intervals Counting leaves belonging to each stem Each stem defines a class interval Divide each class frequency by the total number of observations, we obtain the proportion of the set of observations in each of the classes. Relative Frequency Distribution of Battery Life Class IntervalClass midpointFrequency, fRelative frequency ??? ??? ??? Class IntervalClass midpointFrequency, fRelative frequency Relative Frequency Histogram of Battery Life Picturing Distributions: Histogram PERCENT Bins Each bar in the histogram represents a group of values (a bin). The height of the bar is the percent of values in the bin. Measures of Shape: Skewness Measures of Shape: Kurtosis Box and Whisker Plot or Boxplot Pth Percentile The Pth Percentile is the value X p such that p% of the measurements will fall below that value and (100-p)% of the measurements will fall above the value. Quartile Quartiles divide the measurements into four parts such that 25% of the measurements are contained in each part. The first quartile (Lower Quartile) is denoted by Q 1, the second by Q 2, and the third (Upper Quartile) by Q 3. Data Displays and Graphical Methods P%(100-P)% XpXp Q1Q1 Q2Q2 Q3Q3 InterQuartile Range (IQR) IQR=Q 3 -Q 1 Outlier Observations that are considered to be unusually far removed from the bulk of the data. We label the observations as outliers when the distance from the box exceeds 1.5 times the interquartile range (in either direction). Box encloses the interquartile range of the data Whiskers show the extreme observations in the sample. Box and Whiskers Plot or Boxplot Calculating Fence Values Lower Inner Fence: Q1-1.5(IQR) Upper Inner Fence: Q3+1.5(IQR) Lower Outer Fence: Q1-3(IQR) Upper Outer Fence: Q3+3(IQR) Maximum Upper Quartile Median Lower Quartile Minimum A Quick Method 1. Order the data from smallest to largest value. 2. Divide the ordered data set into two data sets using the median as the dividing value. 3. Let the lower quartile be the median of the set of values consisting the smaller values. 4. Let the upper quartile be the median of the set of values consisting of the larger values. Example Nicotine content was measured in a random sample of 40 cigarettes. The data is displayed below. Order the data from the smallest to the largest 2.Divide the ordered data set into two data sets using the median as the dividing value Q2=? Q1=? Q3=? IQR=Q3-Q1=? Q1=( )/2=1.635 Q2=( )/2=1.77 Q3=( )/2=2.000 IQR=Q3-Q1=0.365 Box-whisker Plot Outlier 1. The center of the distribution is indicated by the median line in the box. 2. A measure of the variability is given by the interquartile range, the length of the box. 3. The relative position of the median line indicates the symmetry of the middle 50% of the data. 4. The skewness can be obtained by the length of the whiskers. 5. The presence of outliers can be examined. Information Drawn from Boxplot A quantile plot simply plots the data values on the vertical axis against an empirical assessment of the fraction of observations exceeded by the data value. Quantile Plot Where i is the order of observations when they are ranked from low to high. Quantile Plot for paint data (table 8.2 page 238) The normal quantile-quantile plot is a plot of y(i) (ordered observations) against Normal Quantile Plots Lecture 1: Introduction to Statistics Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics Lecture Summary Objectives Understand the importance of making inference. Understand the steps conducting a statistical study. Statistical Inference making an "INFORMED GUESS" about a parameter based on a statistic. (This is the main objective of statistics.) STATISTICAL INFERENCE GATHER DATA MAKE INFERENCES PARAMETERS SAMPLE STATISTICS POPULATION SAMPLE Variable A VARIABLE is a characteristic of an individual or object that may vary for different observations. A QUANTITATIVE VARIABLE measures a variable on some sort of scale. A QUALITATIVE VARIABLE categorizes the values of the variable. RAISIN BRAN EXAMPLE A cereal company claims that the average amount of raisins in its boxes of raisin bran is two scoops. A random sample of five boxes was taken off the production line, and an analysis revealed an average of 1.9 scoops per box. Components of the Problem Identify the population Identify the sample Identify the symbol for the parameter Identify the symbol for the statistic Five Steps in a Statistical Study : 1. Stating the problem 2. Gathering the data 3. Summarizing the data 4. Analyzing the data 5. Reporting the results Stating the Problem Specifically identifying the population to be sampled Identifying the parameter (s) being studied Gathering the Data SURVEYS Random Sampling Stratified Sampling Cluster Sampling Systematic sampling EXPERIMENTS Completely Randomized Design Randomized Block Design Factorial Design Lecture 1: Introduction to Statistics Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics Lecture Summary Summary Basics of statistics Descriptive statistics and graphs Inferential statistics Textbook Chapter 1 (page 1-28) Chapter 8 (page )