Introduction of Statistics
By: Dr. Po-Lin Lai
Dr. PO-LIN LAI PhD. of Cardiff University Research interests: Air transportation Air cargo supply chain Airport management and operations. Airline finance
Email: [email protected]
This module is an applied statistics course for the 1st year students of Department of International Logistics.
Topics include main elements of business statistics and tutorials.
This course provides students with a range of statistical approaches to data analysis and techniques that would be useful in applications related to management and business related activities.
This course aims at getting used to statistics concepts, theory, terms, and applying to practical situations.
In addition students will learn how to analyse data using SPSS.
Anderson, D. Sweeney, D, Williams, T. Statistics for Business and Economics, 11th edition, South-Western.
Silver, M. Business Statistics, 2nd Edition, McGrawHill
Pallant, J. SPSS survival manual, 10th Edition, McGrawHill
Mid-Term Exam 35% Final Exam 35% Assignment 20% Class Participation & Discussion 10%
Content The concepts of Statistics Types of Statistical Applications in
Business Types of Data Collecting Data
Introduce the field of statistics Demonstrate how statistics applies to business Establish the link between statistics and data Identify the different types of data and data-
collection methods Differentiate between population and sample
data Differentiate between descriptive and inferential
statistics
Statistics is the science of data.It involves collecting, classifying, summarising, organising, analysing, and interpreting numerical information.
Population Sample Statistical inference
Population A population is the group of all items of
interest to a statistics practitioner. It is frequently very large or may infinitely large.
In the language of statistics, population does not necessarily refer to a group of people. It may refer to the population of ball bearings produced at a large plant.
Population A descriptive measure of a population is
called a parameter. Sample A sample is a set of data drawn from the
studied population. A descriptive measure of a sample is called a statistic. We use statistics to make inferences about parameters.
Some universities have signed agreements with a variety of private companies. These agreements bind the university to sell these companies’ products exclusively on the campus.
EX: CAU with a total enrollment of about 5,000 students has offered Pepsi-Cola an exclusivity agreement that would give Pepsi exclusive rights to sell its products at all university facilities for the next year with an option for future years. In return, the university would receive 35% of the on-campus revenues and an additional lump sum of $200,000 per year.
In Pepsi case, the statistic we would compute is the mean number of soft drinks consumed in the last week by the 500 students in the sample. We would then use the sample mean to infer the value of the population mean, which is the parameter of interest in this problem.
The parameter of interest in Pepsi Case is the mean number of soft drinks consumed by all the students at the university. In most applications of inferential statistics the parameter represents the information we need.
Statistical inference Statistical inference is the process of making an
estimate, prediction, or decision about a population based on sample data.
Because populations are almost always very large, investigating each member of the population would be impractical and expensive. It is far easier and cheaper to take a sample from the population of interest and draw conclusions or make estimates about the population on the basis of information provided by the sample.
Statistical inference However, such conclusions and estimates are not
always going to be correct. For this reason, we build into the statistical inference a measure of reliability. There are two such measures: the confidence level and the significance level. The confidence level is the proportion of times that an estimating procedure will be correct.
For example, in Pepsi Case, we will produce an estimate of the average number of soft drinks to be consumed by all 5,000 students that has a confidence level of 95%. In other words, estimates based on this form of statistical inference will be correct 95% of the time.
Accounting Sampling: audit
Marketing Consumer
Preferences: Tesco Financial Trends
Economics Forecasting Demographics
Finance Recommendations
for investment: coefficient of variation
StatisticalMethods
DescriptiveStatistics
InferentialStatistics
Involves Collecting Data Presenting Data Characterizing Data
Purpose Describe Data
X = 30.5 S2 = 113
0
25
50
Q1 Q2 Q3 Q4
$
InvolvesEstimationHypothesis
Testing
PurposeMake decisions about
population characteristics
Population?
Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation.From the type, data can be classified in to Quantitative Data Qualitative Data
Quantitative Data:are recorded on a naturally occurring numerical scale.
Height, weight, salaries, and distances
3
52
71
4
8
943
120 12
21
Qualitative data: cannot be measured on a natural numerical scale; they can only be classified into one of a group of categories.
Classified into categories. College major of each
student in a class. Gender of each employee
at a company. Method of payment
(cash, check, credit card).
$ Credit
From the way you get the data Primary: is commissioned to solve this problem. Secondary: commissioned by somebody else.
From the purpose of analysis Cross-sectional: “snapshot”, same point in time.
▪ A market research report of the EU car market in 2009.▪ The UK temperature at July of 2012.
Time Series: over several periods of time.▪ The UK temperature at 200 to 2012.▪ Price of petrol between 2005 to 2009.
But the majority of data are derived from surveys, so we must consider possible sources of error.
5 types Mis-cording: human error linked with data entry. Interviewer records respondent’s age as 32, not 23.
Sampling error: Occurs naturally, depends on the sampling method. Can be calculated (estimated, e.g., +/- 3%) Declines as sample size rises, but not proportionately.
Response error: Arises because questions are asked in a social context. Importance of question wording. E.g. attribute ratings: 10 point scale, allocate 100 points. People answer because of what is expected of them.
Non-response error: Low response rates in surveys are normal (40% excellent). Low response reduces precision (increase sampling error) Non-response bias: responder differs from non-responder. Scott Armstrong: compare early and late respondents on
key questions.
Design error: Arises because of inappropriate sampling methods Choice of sampling frame (list) Problems with quota sampling. E.g.: Members of CIM; Cardiff high-street Tuesday a.m.
Data from a published source Data from a designed experiment Data from a survey Data collected observationally
Published source:book, journal, newspaper, Web site
Designed experiment:researcher exerts strict control over units
Survey:a group of people are surveyed and their responses are recorded
Observation study:units are observed in natural setting and variables of interest are recorded
A representative sample exhibits characteristics typical of those possessed by the population of interest.
A random sample of n experimental units is a sample selected from the population in such a way that every different sample of size n has an equal chance of selection.
Every sample of size n has an equal chance of selection.
1. Typical Software• SPSS• MINITAB• Excel
2. Need Statistical Understanding
• Assumptions• Limitations
Content Describing Qualitative Data Graphical Methods for Describing
Quantitative Data
Learning Objectives Describe data using graphs
Key terms A class is one of the categories into which
qualitative data can be classified. The class frequency is the number of
observations in the data set falling into a particular class.
The class relative frequency is the class frequency divided by the total numbers of observations in the data set.
The class percentage is the class relative frequency multiplied by 100.
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
DotPlot
1. Lists categories & number of elements in category
2. Obtained by tallying responses in category3. May show frequencies (counts), % or both
Row Is Category Tally:
|||| |||||||| ||||
Major CountAccounting 130Economics 20Management 50Total 200
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
DotPlot
0
50
100
150
Acct. Econ. Mgmt.Major
Vertical Bars for Qualitative Variables
Bar Height Shows Frequency or %
Zero Point
PercentUsed Also
Equal Bar Widths
Freq
uenc
y
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
DotPlot
Econ.10%
Mgmt.25%
Acct.65%
Shows breakdown of total quantity intocategories
Useful for showingrelative differences
Angle size (360°)(percent)
Majors
(360°) (10%) = 36°
36°
Bar graph: The categories (classes) of the qualitative variable are represented by bars, where the height of each bar is either the class frequency, class relative frequency, or class percentage.
Pie chart: The categories (classes) of the qualitative variable are represented by slices of a pie (circle). The size of each slice is proportional to the class relative frequency.
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
DotPlot
Determine range Select number of classes Usually between 5 & 15 inclusive
Compute class intervals (width) Determine class boundaries (limits) Compute class midpoints Count observations & assign to classes
Boundaries (Lower + Upper Boundaries) / 2
Width
Class Midpoint Frequency
15.5 – 25.5 20.5 3
25.5 – 35.5 30.5 5
35.5 – 45.5 40.5 2
Percentage Distribution
Relative Frequency Distribution
Class Prop.
15.5 – 25.5 .325.5 – 35.5 .535.5 – 45.5 .2
Class %
15.5 – 25.5 30.025.5 – 35.5 50.035.5 – 45.5 20.0
© 2011 Pearson Education, Inc
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
DotPlot
Horizontal axis is a scale for the quantitative variable, e.g., percent.
The numerical value of each measurement is located on the horizontal scale by a dot.
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
DotPlot
Divide each observation into stem value and leaf value• Stems are listed in
order in a column• Leaf value is
placed in corresponding stem row to right of bar
Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
262 144677
3 028
4 1
Data Presentation
QualitativeData
QuantitativeData
SummaryTable
Stem-&-LeafDisplay
FrequencyDistribution
HistogramBarGraph
PieChart
DotPlot
012345
Frequency
Relative Frequency
Percent
0 15.5 25.5 35.5 45.5 55.5Lower Boundary
Bars Touch
Class Freq.15.5 – 25.5 325.5 – 35.5 535.5 – 45.5 2
Count