statistics-mat 150 chapter 2 descriptive statistics prof. felix apfaltrer [email protected]...

19
Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer [email protected] Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Post on 15-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Statistics-MAT 150

Chapter 2Descriptive Statistics

Prof. Felix Apfaltrer

[email protected]

Office:N518

Phone: 7421

Office Hours: Tue/Thu 1:30-3:00pm

Page 2: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Characteristics of data

• Center middle or average value

• Variation measures the amount data values vary

• Distribution nature or shape of distribution of data

• Outliers values that are very far out

• Time changing characteristics of data in time

Observe data set and give intuitive examples of these characteristics!

Mnemonic: CVDOT - computer viruses destroy or terminate

Page 3: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Organizing Qualitative Data• Qualitative data values can be organized by a frequency

distribution

• A frequency distribution lists

– Each of the categories

– The frequency for each category• Good practices in constructing bar graphs• The horizontal scale

– The categories should be spaced equally apart– The rectangles should have the same widths

• The vertical scale– Should begin with 0– Should be incremented in reasonable steps– Should go somewhat, but not significantly, beyond the largest frequency or relative frequency

Page 4: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

• A simple data set is

blue, blue, green, red, red, blue, red, blue

• A frequency table for this qualitative data is

• The most commonly occurring color is blue

Color Frequency

Blue 4

Green 1

Red 3

• The relative frequencies are the proportions

• (or percents) of the observations out of the total

Color Relative Frequency

Blue .500

Green .125

Red .375

Page 5: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Bar graphs for our simple data (using Excel)

1) Frequency bar graph

2) Relative frequency bar graph

• Good practices in constructing bar graphs

• The horizontal scale

•The categories should be spaced equally apart

•The rectangles should have the same widths

• The vertical scale

•Should begin with 0

•Should be incremented in reasonable steps

•Should go somewhat, but not significantly, beyond the largest frequency or relative frequency

Page 6: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

● An example side-by-side bar graph comparing

educational attainment in 1990 versus 2003

Page 7: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Type ComplaintsRates and services 4473Marketing 1007International calls 766Access charges 614Operator services 534Slamming 12478Cramming 1214

Complaints to FCC

0200040006000

8000100001200014000

Rate

s an

d se

rvices

Mar

ketin

g

Inte

rnat

iona

l calls

Acce

ss cha

rges

Opera

tor s

ervice

s

Slam

min

g

Cram

min

g

Nu

me

r o

f co

mp

lain

ts

Complaints to FCC

4473

1007

766

614

53412478

1214Rates andservicesMarketing

International callsAccesschargesOperatorservicesSlamming

Cramming

• A pie chart is a circle divided into sections, one for each category

• The area (angle) of each sector is proportional to the frequency of that category

• Pie charts are useful to show the relative proportions of each category, compared to the whole

Page 8: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Person SMOKER ETS NOETS1 1 384 02 0 0 03 131 69 04 173 19 05 265 1 06 210 0 07 44 178 08 277 2 09 32 13 0

10 3 1 011 35 4 012 112 0 913 477 543 014 289 17 015 227 1 016 103 0 0 Smokers ETS NOETS17 222 51 0 0-99 11 34 3818 149 0 0 100-199 12 2 019 313 197 244 200-299 14 1 120 491 3 0 300-399 1 1 121 130 0 1 400-499 2 0 022 234 3 0 500-599 0 2 023 164 1 024 198 45 025 17 13 9026 253 3 127 87 1 028 121 1 30929 266 1 030 290 0 031 123 0 032 167 551 033 250 2 034 245 1 035 48 1 036 86 1 037 284 0 038 1 74 039 208 1 040 173 241 0

Average 172 61 16

Smokers, ETS and NOETS cotinine levels

0

100

200

300

400

500

600

1 4 7 10 13 16 19 22 25 28 31 34 37 40

Person

Coti

nin

e L

evel

SMOKER

ETS

NOETS

Frequency distributions of cotinine

0

10

20

30

40

0-99 100-199

200-299

300-399

400-499

people

cotinine level

Smokers

ETS

NOETS

Data

Page 9: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

• Organizing Quantitative Data:

The Popular Displays• Learning objectives

– Organize discrete data in tables– Construct histograms of discrete data– Organize continuous data in tables– Construct histograms of continuous data– Draw stem-and-leaf plots– Draw dot plots– Identify the shape of a distribution

Page 10: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Frequency distributions• Lower class limits: 0,100,…

• Upper class limits: 99,…• Class boundaries: numbers used

to separate classes without gaps; 99.5, 199.5,…

• Class midpoints: center of class; 49.5, 149.5, …

• Class width: diference between two consecutive lower (or upper) class limits: 100

Frequency Distribution

Cotinine Level Smokers0-99 11

100-199 12200-299 14300-399 1400-499 2500-599 0

0

Page 11: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Constructing frequency distribution

1. Decide on number of classes n : 5-20

2. Class width =(highest value-lowest value)/n

3. Starting point: lowest data value of convenient lowest value (smaller)

4. List lower class limits

5. List upper class limits

6. Tally data: count the data values falling in each class

Frequency Distribution

Cotinine Level Smokers0-99 11

100-199 12200-299 14300-399 1400-499 2500-599 0

Relative Frequency Distribution

Cotinine Level Smokers0-99 27.5%

100-199 30.0%200-299 35.0%300-399 2.5%400-499 5.0%500-599 0.0%

Q. How do you go from one to the other?

Page 12: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Cummulative Frequency distribution

Frequency Distribution Relative Frequency Distribution Cummulative Frequency Distribution

Cotinine Level Smokers Cotinine Level Smokers Cotinine Level Smokers0-99 11 0-99 27.5% <100 11

100-199 12 100-199 30.0% <200 23200-299 14 200-299 35.0% <300 37300-399 1 300-399 2.5% <400 38400-499 2 400-499 5.0% <500 40500-599 0 500-599 0.0% <600 40

Page 13: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Visualizing data• Histogram

– A histogram is a bar graph in which the horizontal scale represents classes of data values and the vertical scale represents frequencies

Smokers cotinine level

0

2

4

6

8

10

12

14

16

0-99 100-199 200-299 300-399 400-499 500-599

Page 14: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

(Relative) frequency histograms, polygons and ojives

(Frequency) histogram

0

2

4

6

8

10

12

14

16

0-99 100-199 200-299 300-399 400-499 500-599

Relative Frequency histogram

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

40.0%

0-99 100-199 200-299 300-399 400-499 500-599

Frequency Polygon

0

2

4

6

8

10

12

14

16

0-99 100-199 200-299 300-399 400-499 500-599

Ojive

0

5

10

15

20

25

30

35

40

45

<100 <200 <300 <400 <500 <600

Page 15: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Other ways of representing data

• Dot plot: find out what this is!

• Stem-and-leaf plot– keep track of all your data

– only works in certain specific cases

– condensed stem-and-leaf plot

Class grades: Stem Leaf10

82 89 60 51 71 9 335861 60 93 54 73 8 022244984 79 60 80 95 7 11337977 82 93 82 84 6 000189 57 73 98 71 5 147

Page 16: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

…and more ways of representing data

Which of the following are represented in the data sheet given in class?

• Pareto charts • Pie charts• Time charts• Napoleon’s chart• Scatter plot

Complaints to FCC

02000400060008000

100001200014000

Rate

s and

serv

ices

Mar

ketin

g

Inte

rnat

iona

l calls

Acce

ss cha

rges

Opera

tor s

ervice

s

Slam

min

g

Cram

min

gNum

er

of

com

pla

ints

Waist vs Weight

100.0

110.0

120.0

130.0

140.0

150.0

160.0

170.0

180.0

190.0

60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0 150.0

Waist

Weig

ht

Page 17: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Napoleon’s campaign chart 1812

Page 18: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Class sheet page 1

Person SMOKER ETS NOETS1 1 384 02 0 0 03 131 69 04 173 19 05 265 1 06 210 0 07 44 178 08 277 2 09 32 13 0

10 3 1 011 35 4 012 112 0 913 477 543 014 289 17 015 227 1 016 103 0 0 Smokers ETS NOETS17 222 51 0 0-99 11 34 3818 149 0 0 100-199 12 2 019 313 197 244 200-299 14 1 120 491 3 0 300-399 1 1 121 130 0 1 400-499 2 0 022 234 3 0 500-599 0 2 023 164 1 024 198 45 025 17 13 9026 253 3 127 87 1 028 121 1 30929 266 1 030 290 0 031 123 0 032 167 551 033 250 2 034 245 1 035 48 1 036 86 1 037 284 0 038 1 74 039 208 1 040 173 241 0

Average 172 61 16Median 170 2 0St. Dev 119 138 63Variance 13507 16318 3903

Smokers, ETS and NOETS cotinine levels

0

100

200

300

400

500

600

1 4 7 10 13 16 19 22 25 28 31 34 37 40

Person

Coti

nin

e L

evel

SMOKER

ETS

NOETS

Frequency distributions of cotinine

0

10

20

30

40

0-99 100-199

200-299

300-399

400-499

people

cotinine level

Smokers

ETS

NOETS

Waist vs Weight

100.0

110.0

120.0

130.0

140.0

150.0

160.0

170.0

180.0

190.0

60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0 150.0

Waist

We

igh

t

Page 19: Statistics-MAT 150 Chapter 2 Descriptive Statistics Prof. Felix Apfaltrer fapfaltrer@bmcc.cuny.edu Office:N518 Phone: 7421 Office Hours: Tue/Thu 1:30-3:00pm

Organizing andSummarizing Data

Summary• Summaries of qualitative data

– Frequency tables– Bar graphs

• Summaries of quantitative data– Frequency tables– Histograms– Pie graphs, time-series graphs, etc.– Cumulative frequencies, ogives, etc.