d2 basic stat
Post on 31-Jan-2016
220 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Basic StatisticsBasic Statistics
1. Introduction to Statistics
2. Probability distributions
- Binomial distribution
- Poisson Distribution
- Normal distribution
3. Sampling distributions and Estimation.
2
1. The concept of Statistics1. The concept of StatisticsWhy Statistics?
Through the advancement of electronics and computers, today’s society is inundated with vast amount of data. In its raw form, this data is of little use. But, with statistical analysis, the data can be transformed into valuable information. This knowledge is vital for drawing conclusions and making decisions.
“ Statistical thinking will be one day be as necessary for efficient citizenship as the ability to read and write.”
H.G. Wells
3
The field of statistics can be broken into two major areas: Descriptive statistics and Inferential
Statistics
• Descriptive statistics: It describes some of the fundamental features of a set of data (Population or Sample) such as mean, median, standard deviation,…
• Inferential statistics: It deals with drawing conclusions from a population based on information of the sample (drawn from the population).
4
Probability and Statistics
Descriptive
Statistics
Inferential
Statistics
Population
Sample
Probability
5
Data Collection
A decision can be no better than the data upon which it was based.
Why do we need to collect data?1. To identify and/or verify a problem.
2. To Analyze a problem.
3. To understand, describe, or monitor a process
4. To Test a hypothesis
5. To find a relationship between inputs and outputs of a process
6
Two kinds of Numerical Data
• Continuous data: length, height, volume,….
• Discrete data: number of defects, number of failures,….
7
Population and Sample
• Population is a set or collection of all possible objects or individuals of interest.
• Finite population: ex) The number of employees in Samsung Electro-Mechanics as of January 1, 2001.
• Infinite population: ex) MLCC chips coming from the production line.
8
Population and Sample
• A Sample is any subset or sub collection of a population.
• A Random Sample of size n is a sample chosen in such a way that every possible sample of size n has a likely chance of being chosen equally. (unbiased).
It is highly unlikely to know the true population parameters. There is a need to draw conclusions from sample statistics.
9
Characteristics of distributionCharacteristics of distribution
Statistical analysis is detecting the characteristics of data distribution and expressing that characteristics into figures.
Characteristics of distributionCharacteristics of distribution Central tendencyCentral tendency (mean, median,mode) - It shows the location where data is centered.
VariationVariation (range, variance, standard deviation) - Degree of data scattering centered on the arithmetic mean
ShapeShape
- In what direction is the data biased?
10
Central tendencyCentral tendencyMode
Most frequently occurring value in a data set.
Median
Number reflecting the 50% rank of a set of values.
1) In case of data in odd number : Data in the middle
2) In case of data in even number : (Sum of two data in the middle)/2
Mean(arithmetic mean)
Average of population
Sample of population
µ = = X1 + X2 + X3 + …+ Xn
N
∑Xi
N
X = = X1 + X2 + X3 + …+ Xn
n
∑Xi
n
11
Variability Variability RangeRange
Numerical distance between the highest and the lowest
values in a data set.
Variance and Standard deviationVariance and Standard deviation
Population variance population standard dev.
Sample variance Sample standard dev
The arithmetic mean is a one-dimensional value, while variance is a two-dimensional value. We get the standard deviation by extracting the square root of the variance. In sample statistics, however, the variance loses 1 degree of freedom.. In case of the sample, it has n-1 degree of freedom as divisor.
2 =∑ ( Xi – X )2
N =
∑ ( Xi – X )2
N
S2 =∑ ( Xi – X )2
n-1 S =∑ ( Xi – X )2
n-1
12
Value population sample statistics
number of set N n
mean X
variance 2 s2
St. dev s
Correlation coefficient r
Regression coefficient , a, b
Error e
Comparison of symbols between parameter Comparison of symbols between parameter and statisticsand statistics
13
2. Probability Distribution
The Probability Distribution of a discrete random variable is an assignment of probabilities to each of the possible values that the random variable can take on. And, its mathematical model is the Probability Density Function.
It is the major pillar of the bridge that allows us to make inferences about a population based on information
obtained from a sample
14
(1) Binomial distribution(1) Binomial distribution
The problem of determining the probability associated with defective data.
A Binomial Distribution needs to satisfy the following conditions:
1) A sequence of n Bernoulli trials.(Only two possible outcomes)
2) Trials are identical.
3) Trials are independent.
4) Probability of success on every trial is the same.
15
Example
<Problem>
In a certain diode manufacturing process, the defective rate is known to be 1%. When the inspector take 50 random sample every hour, what is the probability of finding no more than 1 defective.
<Solution>
The solution can be obtained by adding the probability of finding none and one.
At first, we will try to find the probability of finding none of defectives,
16
From Minitab menu
Calc>Probability Distributions>Binomial
This is the place where all the probability
distributions can be found!
17
Probability of finding none of defectives
Number of Random Sample
Defective rate
No defective
18
Result in Session window
Defective rate of 1%
Number of Random sample
Probability of no defective is
0.6050.
19
Next, probability of one defective
In this case, we put 1 here
Result is 0.3056
Total Probability: 0.6050+3056=0.9106
20
Another way of calculation using worksheet.
Prepare a following worksheet.
Input the number of defect in C1( named x)
Prepare a column for
probability(named p)
21
From Minitab Menu Calc>Probability Distribution>Binomial
We use this
22
Probability of no defective
Probability of one defective
Final answer is additives.
Result is..
23
To find cumulative probability at a time
Cumulative Probability
Check here!
24
Understanding of Binomial Distribution
The binomial probability distribution is defined by
P(X=x)=nCxpx(1-p)n-x
nCx = ( ) =
n!
x!(n-x)!nx
The Binomial distribution is used frequently in quality control. It is appropriate probability model for sampling from an infinitely large population, where p represents the defective rate and x, the number of defects out of n sample.
The control chart of defects is based on the Binomial distribution with the mean and variance in the next page.
25
The property of binomial distributionThe property of binomial distribution
0 1 2 3 4
P(X)
x1/16
2/16
3/16
4/16
5/16
6/16
0 1 2 3 4
P(X)
x
0.1
0.2
0.3
Binomial distribution for n=4, p=1/2
Binomial distribution for n=9, p=1/3
5 6 7 8 9
Form of binomial distributionForm of binomial distribution
1) The probability distribution always shows
symmetry in p=0.5 although n is low.
2) If n increases, probability distribution gets near
symmetry even not in p=0.5.
Expectation value, standard deviation, variaExpectation value, standard deviation, variance of binomial distributionnce of binomial distribution
Expectation value : = E(X) = np
Variance : 2 = Var(X) = np(1-p) = npq
Standard deviation : = √np(1-p) = √npq
26
(2) Poisson distribution(2) Poisson distributionPoisson distribution is characterized by the form
“ the number of occurrences per unit interval”
Defect, Electric or Mechanical failure, an arrival, call,..
Time, space, area,…
27
example
<Problem>
Suppose that the number of wire-bonding defects per unit that occur in a semiconductor device is Poisson distributed with mean=4. Then, what is the probability that a randomly selected semiconductor device will contain two or fewer wire-bonding defect?
28
From Minitab menu File>New>Minitab Worksheet
In the worksheet, make one column of defect number(x),
And another column for cumulative probability(p)
29
Calc>Probability Distribution>Poisson
1. Select Cumulative
2. Mean=4
3. Input defect number column and output
column
30
Probability of no defect
Cumulative Probability of 0,1
Cumulative Probability of 0, 1, 2
31
Examples for Poisson Distribution
1. The number of speeding tickets issued in a certain county per week
2. The number of disk drive failures per month for a particular kind of disk drive
3. The number of calls arriving at an emergency dispatch station per hour.
4. The number of flaws per square yard in a certain type of fabric.
32
Relationship with RTYRelationship with RTY
•
When x=0
RTY = e-dpu
dpu = -ln(RTY)
P(X=x) = e-m mx
x!m : Average
x : no of occurence
33
(3) Normal distribution(3) Normal distribution
The normal distribution is probably the most important distribution in quality control and statistical analysis.
X~N( )2 ,
Variable Normal distribution
Mean Standard deviation
Normal distribution is defined by the mean and standard deviation.
34
The shape of normal distribution?The shape of normal distribution?
95.595.5%%
43210-1-2-3-4
68.368.3%%
99.7399.73%%
Symmetric
Unimodal
Bell-shaped
35
What is Sigma?What is Sigma?
95.595.5%%
43210-1-2-3-4
68.368.3%%
99.7399.73%%
The distance from mean to deflection
point.
68.3% of the population values fall
between the limits defined by the mean plus and minus one
sigma.
36
Probability density function Probability density function
The Probability distribution function is defined by
37
Shapes of Normal curveShapes of Normal curve
95.595.5%%
43210-1-2-3-4
68.368.3%%
99.7399.73%%
1 2
1 = 1
1 2
1 2
1
2
2
1
[For difference and ]1 2 , 1 = 2
1 = 2 , 1 2
1 2 , 1 2
38
Standard Normal DistributionStandard Normal Distribution
It becomes normal distribution with mean=0 and standard deviation=1.
X - Z = ————
Is used for coordinate transformation.
95.595.5%%
43210-1-2-3-4
68.368.3%%
99.799.73%3%
N(0,12)
39
Minitab application
Calc>Probability distribution>Normal
X
Find area(probability)
with known x
Find x with known
Probability
Minitab recognizes left-sided area as cumulative probability
40
Normal distribution Example 1
<Problem> The tensile strength of a certain product is an important quality characteristics. It is known that the strength is normally distributed with mean=40 and standard distribution of 2, denoted as N(40,22).
When the customer wants a strength of at least 35, what is the probability of customer satisfaction?
41
solution
40
2
35
Known spec.
What is the
area?N(40,22).
Minitab solution provides area here!
42
Check here
Mean is 40
St. deviation is 2
X is 35
Calc>Probability Distribution>Normal
43
The area we want(probability) is
1-0.0062=0.9938
44
Example 2It is known that the quality characteristics of certain process follows normal probability function(mean=0, st.dev.=1). When the defective rate is 1%, what is the sigma level?
<Solution> The problem is to find the value of z when the cumulative probability is known. In minitab, the inverse cumulative probability is used.
45
Check here
Input 1-0.01=0.99
46
Z is 2.33
47
3. Sampling Distributions and Estimation
Question:
When we do not know the mean of the population, we use sample but what is degree of accuracy that this represent
the population mean?
48
Standard Error of the MeanStandard Error of the Mean
x2 =_ 2
n
x =_ √n
Variance of the sample mean
Standard error of the mean
Mean of the sample mean
=
49
Central Limit TheoremCentral Limit Theorem
Z=
/n
X-
For almost all populations, the sampling distribution of the mean can be approximated closely by a normal distribution, provided the sample size is sufficiently
large.
50
Estimation
Estimate parameters out of sample
1) Point Estimation
single number
2) Interval Estimation
estimate confidence interval
51
Confidence interval for population mean.
0
/2 = 0.025
-Z0.025= -1.96
/2 = 0.025
Z0.025= 1.96
=0.05 일때 Z/2 와 -Z /2 의 값
즉 , 신뢰수준 : 95%
1) Known standard deviation :Known standard deviation : use Normal distribution
P(-Z /2 < < Z /2 ) = 1-
X - /√n
P(L< <U) = 1-
이를 에 대해서 풀면
X- Z /2 /√n < < X+ Z /2 /√n 의 100(1-) 신뢰구간
52
=0.05 일 때 t/2 와 -t /2 의 값
즉 , Reliability standard : 95%
2)unknown standard deviation : t-distribution
P(-t /2 < < t /2 ) = 1-
X - S/√n
P(L< <U) = 1-
이를 에 대해서 풀면
X- t /2 S/√n < < X+ t /2 S/√n 의 100(1-) 신뢰구간
참고 ) 상기의 모든 t- 분포는 자유도가 n-1 인 t /2, n-1 을 의미합니다 .
53
Example
1. A random sample of 64 customers at a local supermarket showed that their average shopping time was 33 minutes with a sample standard deviation of 16 minutes. Find a 90% confidence interval for the true average shopping time.
2. A test on a random sample of 9 cigarettes yielded an average nicotine content of 15.6 milligrams and a standard deviation of 2.1 milligrams. Construct a 99% confidence interval for the true but unknown average nicotine content of this particular brand of cigarette. Assume that nicotine content is normally distributed.
top related