statistics-mat 150 chapter 2 descriptive statistics prof. felix apfaltrer [email protected]...
Post on 15-Jan-2016
216 views
TRANSCRIPT
Statistics-MAT 150
Chapter 2Descriptive Statistics
Prof. Felix Apfaltrer
Office:N518
Phone: 7421
Office Hours: Tue/Thu 1:30-3:00pm
Characteristics of data
• Center middle or average value
• Variation measures the amount data values vary
• Distribution nature or shape of distribution of data
• Outliers values that are very far out
• Time changing characteristics of data in time
Observe data set and give intuitive examples of these characteristics!
Mnemonic: CVDOT - computer viruses destroy or terminate
Organizing Qualitative Data• Qualitative data values can be organized by a frequency
distribution
• A frequency distribution lists
– Each of the categories
– The frequency for each category• Good practices in constructing bar graphs• The horizontal scale
– The categories should be spaced equally apart– The rectangles should have the same widths
• The vertical scale– Should begin with 0– Should be incremented in reasonable steps– Should go somewhat, but not significantly, beyond the largest frequency or relative frequency
• A simple data set is
blue, blue, green, red, red, blue, red, blue
• A frequency table for this qualitative data is
• The most commonly occurring color is blue
Color Frequency
Blue 4
Green 1
Red 3
• The relative frequencies are the proportions
• (or percents) of the observations out of the total
Color Relative Frequency
Blue .500
Green .125
Red .375
Bar graphs for our simple data (using Excel)
1) Frequency bar graph
2) Relative frequency bar graph
• Good practices in constructing bar graphs
• The horizontal scale
•The categories should be spaced equally apart
•The rectangles should have the same widths
• The vertical scale
•Should begin with 0
•Should be incremented in reasonable steps
•Should go somewhat, but not significantly, beyond the largest frequency or relative frequency
● An example side-by-side bar graph comparing
educational attainment in 1990 versus 2003
Type ComplaintsRates and services 4473Marketing 1007International calls 766Access charges 614Operator services 534Slamming 12478Cramming 1214
Complaints to FCC
0200040006000
8000100001200014000
Rate
s an
d se
rvices
Mar
ketin
g
Inte
rnat
iona
l calls
Acce
ss cha
rges
Opera
tor s
ervice
s
Slam
min
g
Cram
min
g
Nu
me
r o
f co
mp
lain
ts
Complaints to FCC
4473
1007
766
614
53412478
1214Rates andservicesMarketing
International callsAccesschargesOperatorservicesSlamming
Cramming
• A pie chart is a circle divided into sections, one for each category
• The area (angle) of each sector is proportional to the frequency of that category
• Pie charts are useful to show the relative proportions of each category, compared to the whole
Person SMOKER ETS NOETS1 1 384 02 0 0 03 131 69 04 173 19 05 265 1 06 210 0 07 44 178 08 277 2 09 32 13 0
10 3 1 011 35 4 012 112 0 913 477 543 014 289 17 015 227 1 016 103 0 0 Smokers ETS NOETS17 222 51 0 0-99 11 34 3818 149 0 0 100-199 12 2 019 313 197 244 200-299 14 1 120 491 3 0 300-399 1 1 121 130 0 1 400-499 2 0 022 234 3 0 500-599 0 2 023 164 1 024 198 45 025 17 13 9026 253 3 127 87 1 028 121 1 30929 266 1 030 290 0 031 123 0 032 167 551 033 250 2 034 245 1 035 48 1 036 86 1 037 284 0 038 1 74 039 208 1 040 173 241 0
Average 172 61 16
Smokers, ETS and NOETS cotinine levels
0
100
200
300
400
500
600
1 4 7 10 13 16 19 22 25 28 31 34 37 40
Person
Coti
nin
e L
evel
SMOKER
ETS
NOETS
Frequency distributions of cotinine
0
10
20
30
40
0-99 100-199
200-299
300-399
400-499
people
cotinine level
Smokers
ETS
NOETS
Data
• Organizing Quantitative Data:
The Popular Displays• Learning objectives
– Organize discrete data in tables– Construct histograms of discrete data– Organize continuous data in tables– Construct histograms of continuous data– Draw stem-and-leaf plots– Draw dot plots– Identify the shape of a distribution
Frequency distributions• Lower class limits: 0,100,…
• Upper class limits: 99,…• Class boundaries: numbers used
to separate classes without gaps; 99.5, 199.5,…
• Class midpoints: center of class; 49.5, 149.5, …
• Class width: diference between two consecutive lower (or upper) class limits: 100
Frequency Distribution
Cotinine Level Smokers0-99 11
100-199 12200-299 14300-399 1400-499 2500-599 0
0
Constructing frequency distribution
1. Decide on number of classes n : 5-20
2. Class width =(highest value-lowest value)/n
3. Starting point: lowest data value of convenient lowest value (smaller)
4. List lower class limits
5. List upper class limits
6. Tally data: count the data values falling in each class
Frequency Distribution
Cotinine Level Smokers0-99 11
100-199 12200-299 14300-399 1400-499 2500-599 0
Relative Frequency Distribution
Cotinine Level Smokers0-99 27.5%
100-199 30.0%200-299 35.0%300-399 2.5%400-499 5.0%500-599 0.0%
Q. How do you go from one to the other?
Cummulative Frequency distribution
Frequency Distribution Relative Frequency Distribution Cummulative Frequency Distribution
Cotinine Level Smokers Cotinine Level Smokers Cotinine Level Smokers0-99 11 0-99 27.5% <100 11
100-199 12 100-199 30.0% <200 23200-299 14 200-299 35.0% <300 37300-399 1 300-399 2.5% <400 38400-499 2 400-499 5.0% <500 40500-599 0 500-599 0.0% <600 40
Visualizing data• Histogram
– A histogram is a bar graph in which the horizontal scale represents classes of data values and the vertical scale represents frequencies
Smokers cotinine level
0
2
4
6
8
10
12
14
16
0-99 100-199 200-299 300-399 400-499 500-599
(Relative) frequency histograms, polygons and ojives
(Frequency) histogram
0
2
4
6
8
10
12
14
16
0-99 100-199 200-299 300-399 400-499 500-599
Relative Frequency histogram
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
0-99 100-199 200-299 300-399 400-499 500-599
Frequency Polygon
0
2
4
6
8
10
12
14
16
0-99 100-199 200-299 300-399 400-499 500-599
Ojive
0
5
10
15
20
25
30
35
40
45
<100 <200 <300 <400 <500 <600
Other ways of representing data
• Dot plot: find out what this is!
• Stem-and-leaf plot– keep track of all your data
– only works in certain specific cases
– condensed stem-and-leaf plot
Class grades: Stem Leaf10
82 89 60 51 71 9 335861 60 93 54 73 8 022244984 79 60 80 95 7 11337977 82 93 82 84 6 000189 57 73 98 71 5 147
…and more ways of representing data
Which of the following are represented in the data sheet given in class?
• Pareto charts • Pie charts• Time charts• Napoleon’s chart• Scatter plot
Complaints to FCC
02000400060008000
100001200014000
Rate
s and
serv
ices
Mar
ketin
g
Inte
rnat
iona
l calls
Acce
ss cha
rges
Opera
tor s
ervice
s
Slam
min
g
Cram
min
gNum
er
of
com
pla
ints
Waist vs Weight
100.0
110.0
120.0
130.0
140.0
150.0
160.0
170.0
180.0
190.0
60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0 150.0
Waist
Weig
ht
Napoleon’s campaign chart 1812
Class sheet page 1
Person SMOKER ETS NOETS1 1 384 02 0 0 03 131 69 04 173 19 05 265 1 06 210 0 07 44 178 08 277 2 09 32 13 0
10 3 1 011 35 4 012 112 0 913 477 543 014 289 17 015 227 1 016 103 0 0 Smokers ETS NOETS17 222 51 0 0-99 11 34 3818 149 0 0 100-199 12 2 019 313 197 244 200-299 14 1 120 491 3 0 300-399 1 1 121 130 0 1 400-499 2 0 022 234 3 0 500-599 0 2 023 164 1 024 198 45 025 17 13 9026 253 3 127 87 1 028 121 1 30929 266 1 030 290 0 031 123 0 032 167 551 033 250 2 034 245 1 035 48 1 036 86 1 037 284 0 038 1 74 039 208 1 040 173 241 0
Average 172 61 16Median 170 2 0St. Dev 119 138 63Variance 13507 16318 3903
Smokers, ETS and NOETS cotinine levels
0
100
200
300
400
500
600
1 4 7 10 13 16 19 22 25 28 31 34 37 40
Person
Coti
nin
e L
evel
SMOKER
ETS
NOETS
Frequency distributions of cotinine
0
10
20
30
40
0-99 100-199
200-299
300-399
400-499
people
cotinine level
Smokers
ETS
NOETS
Waist vs Weight
100.0
110.0
120.0
130.0
140.0
150.0
160.0
170.0
180.0
190.0
60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0 150.0
Waist
We
igh
t
Organizing andSummarizing Data
Summary• Summaries of qualitative data
– Frequency tables– Bar graphs
• Summaries of quantitative data– Frequency tables– Histograms– Pie graphs, time-series graphs, etc.– Cumulative frequencies, ogives, etc.