1. mean - easy to calculate but is affected by extreme values - to calculate use: sum of all values...

27
Statistics

Upload: elwin-harvey

Post on 18-Dec-2015

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Statistics

Page 2: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Measures of Central Tendency (Averages)

1. Mean

- easy to calculate but is affected by extreme values

- to calculate use:Sum of all values

Total number of values

e.g. Calculate the mean of 6, 11, 3, 14, 8

6 + 11 + 3 + 14 + 8

5Mean = =

42

5

Push equals on calculator BEFORE dividing

= 8.4

e.g. Calculate the mean of 6, 11, 3, 14, 8, 100

Mean =6 + 11 + 3 + 14 + 8 + 100

6 =

142

6 = 23.7 (1 d.p.)

Page 3: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

2. Median

- middle number when all are PLACED IN ORDER (two ways)

- harder to calculate but is not affected by extreme values

a) for an odd number of values, median is the middle value

e.g. Find the median of 39, 44, 38, 37, 42, 40, 42, 39, 32

32, 37, 38, 39, 39, 40, 42, 42, 44To find placement of median use: n + 1 2n = amount of data

9 + 1 = 10 = 5 2 2 OR

Cross of data, one at a time from each end until you reach the middle value.

b) for an even number of values, median is average of the two middle values

e.g. Find the median of 69, 71, 68, 85, 73, 73, 64, 75

64, 68, 69, 71, 73, 73, 75, 85

n + 1 = 8 + 1 = 4.5 2 2

Median = 71 + 73 = 144 = 72 2 2

OR

Median = 39

Page 4: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

3. Mode - only useful to find most popular item - is the most common value (can be none, one or more)

e.g. Find the mode of 188, 93, 4, 93, 15, 0, 100 15

Mode = 15 and 93

Range- can show how spread out the data is- is the difference between the largest and smallest values

e.g. Find the range of 4, 2, 6, 9, 8

highest valuelowest value

Range = 9 – 2

Note: Its a good idea to write in brackets the values that make up the range.

(2 – 9)= 7

Page 5: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Ungrouped Frequency Tables- Useful when dealing with large amounts of discrete data

e.g. Here are the number of fundraising tickets sold by 25 members of a Hockey team. Place data on a frequency table.

3, 5, 0, 1, 0, 2, 5, 2, 4, 0, 1, 2, 3, 5, 7, 2, 3, 3, 1, 4, 3, 3, 2, 0, 1

No. of tickets sold (x)

Tally Frequency (f) x.f

01234567

Total

IIIIIIIIIIII

IIII IIIIII

I

445623

125

0 x 4 = 01 x 4 = 4

2 x 5 = 103 x 6 = 184 x 2 = 8

5 x 3 = 150 0

7 x 1 = 762

To find the mean, we need the sum of the ticket numbers multiplied by their frequencies, and divide this by the total frequency.

Mean = sum of x.f . total frequency

= 62 25

= 2.48 tickets

Check total frequency matches question!

Page 6: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

No. of tickets sold (x)

Tally Frequency (f) x.f

01234567

Total

IIIIIIIIIIII

IIII IIIIII

I

445623

125

0 x 4 = 01 x 4 = 4

2 x 5 = 103 x 6 = 184 x 2 = 8

5 x 3 = 150 0

7 x 1 = 762

To find the median, determine its position by using the previous formula.

n + 1 = 25 + 1 = 13 2 2

Now, by adding down the frequency column, locate position of median

48

13

Therefore: Median = 2 tickets

To find the mode, look for the highest frequency

Therefore: Mode = 3 tickets

Page 7: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Types of Data1. Discrete Data– usually found by counting, usually whole numbers

e.g. Number of cars passing the school

2. Continuous Data– usually found by measuring

e.g. Weights and heights of students

Data Display1. Bar Graph– shows discrete data – must have GAPS between bars Number of dinners Frequency

0 21 62 83 64 45 2

e.g. Beside are the number of times 28 students went out for dinner last month. Place data on a bar graph.

Page 8: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

x-10 -5 5 10

y

-10

-5

5

10

Students out for Dinner

6

4

2

00 1 2 3 4 5

Number of dinners

Fre

quen

cy8

Don’t forget a title

Or axis labels

Note gaps between bars

Page 9: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

2. Dot Plots– are like a bar graph– each dot represents one item

e.g. Plot these 15 golf scores on a dot plot 

70, 72, 68, 74, 74, 78, 77, 70, 72, 72, 76, 72, 76, 75, 78

68 70 72 74 76 78

Golf Scores

Range plot between lowest and highest values

Page 10: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

3. Pictograms– uses symbols to represent fixed numbers– key shows the value of the symbol

e.g. Using an appropriate symbol, draw a pictogram displaying the number of hours per week spent completing homework for the following subjects.

Hours of Study in a Week

Science

English

Maths

KEY

1 hour

Page 11: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

4. Pie Graphs– show comparisons– slices are called sectors– uses percentages and angles (protractor and compass)

e.g. Students of a class arrived to school in the following manner. Show on a Pie Graph

Walked = 6 Cycled = 5 Car = 4 Bus = 9

To calculate angle of sectors use:Amount of sector x 360 Total Data

Walked = 6 x 360 = 90 24

75° 135°60°90°

Student Mode of Transport

Cycled

WalkedCar

BusNote: Instead of labels, a key could also be used.

Page 12: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

5. Strip Graph– shows the proportion of each part to the whole– should have a scale– linked to pie graphs

e.g. Using Pie Graph example, Strip Graph drawn could use a scale of1 cm = 2 students

Quartiles– are measures of spread which with the median splits the data into quarters– method used is similar as to when finding median

When the data is in order:– the lower quartile (LQ) has – the upper quartile (UQ) has

25% or ¼ 75% or ¾

of the data below it. of the data below it.

– the Interquartile Range (IQR) = UQ – LQ

Page 13: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

e.g. Find the LQ, UQ and the interquartile range of the following data

6, 6, 6, 7, 8, 9, 10, 10, 11, 14, 16, 16, 17, 19, 20, 20, 24, 24, 25, 29

Note: always find the median first 20 + 1 = 21 = 10.5 2 2

10 + 1 = 11 = 5.5 2 2

ORLQ =8 + 9 = 17 = 8.5 2 2

UQ = 20 + 20 = 40 = 20 2 2

OR

e.g. Find the LQ, UQ and the interquartile range of the following data

5, 6, 8, 10, 11, 11, 12, 15, 18, 22, 23, 28, 30

Remember, always find the median first 13 + 1 = 14 = 7 2 2

or cross off data

or cross off data

As the median is an actual piece of data, it is ignored when finding the LQ and UQ

6 + 1 = 7 = 3.5 2 2

LQ = 8 + 10 = 18 = 9 2 2

UQ = 22 + 23 = 45 = 22.5 2 2

IQR = UQ - LQ IQR = 20 – 8.5 = 11.5

IQR = 22.5 – 9 = 13.5

Page 14: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Stem and Leaf Graphs– records and organises data– most significant figures form the stem and the final digits the leaves– can be in back to back form in order to compare two sets of data

e.g. Place the following heights (in m) onto a back to back stem and leaf plot BOYS = 1. 59, 1.69, 1.47, 1.43, 1.82, 1.70, 1.73, 1.35, 1.76, 1.68, 1.62, 1.84, 1.45, 1.50, 1.54, 1.73, 1.84, 1.71, 1.66  GIRLS = 1. 44, 1.46, 1.63, 1.29, 1.48, 1.57, 1.51, 1.42, 1.34, 1.45, 1.57, 1.59, 1.42

Unordered Graph of Heights Ordered Graph of Heights Boys Girls Boys Girls  1.8 1.8

1.7 1.7 1.6 1.6 1.5 1.5 1.4 1.4 1.3 1.3 1.2 1.2

Look at the highest and lowest data values to decide the range of the stem

Place the final digits of the data on the graph on the correct side

,7

,9,9

,3

,2,0,3

5

,6,8,2

4

5,0 4

,3 1 6

,4

4, 6,

3

9

8,7, 1,

2,4

5,7, 9

2

4, 4, 26, 3, 3, 1, 0

9, 8, 6, 2 9, 4, 07, 5, 3

5

3 1, 7, 7, 92, 2, 4, 5, 6, 8

4 9

Page 15: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Calculating Statistics from Stem and Leaf Graphs

Graph of Heights Boys Girls  1.8 1.7

1.6 1.5 1.4 1.3 1.2

4, 4, 26, 3, 3, 1, 0

9, 8, 6, 2 9, 4, 07, 5, 3

5

3 1, 7, 7, 92, 2, 4, 5, 6, 8

4 9

e.g. From the ordered plot state the minimum, maximum, LQ, median, UQ, IQR and range statistics for each side

BOYS GIRLS

Minimum:Maximum:LQ:Median:UQ:IQR:Range:

For each statistic, make sure to write down the whole number, not just the ‘leaf’!

1.29 m1.63 m

1.63 – 1.29 = 0.34 m

Median = 13 + 1 = 7 2

When finding median, LQ and UQ, make sure you count/cross in the right direction!

LQ/UQ = 6 + 1 = 3.5 2

1.42 m1.46 m1.57 m

1.57 – 1.42 = 0.15 m

Remember: If you find it hard to calculate stats off graph, write out data in a line first!

1.35 m1.84 m1.50 m1.68 m1.73 m

1.73 – 1.50 = 0.23 m1.84 – 1.35 = 0.49 m

Page 16: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Box and Whisker Plots– shows the minimum, maximum, LQ, median and UQ– ideal for comparing two sets of data

e.g. Using the height data from the Stem and Leaf diagrams, draw two box and whisker plots (Boys and Girls)

1.20 1.30 1.40 1.50 1.60 1.70 1.80

Height (m)

1.90

Note: Use the minimum and maximum values to determine length of scale

Males

Females

Question: What is the comparison between the boy and girl heights?

ANSWER?

Minimum LQ Median UQ Maximum

EVIDENCE?

Box and Whisker Plot of Boys and Girls Heights

Page 17: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Grouped Frequency Tables– used when dealing with a large amount of continuous data and groups are needed

e.g. Listed below are the heights (in cm) of 25 students. Represent the data on a frequency table 167, 173, 171, 149, 162, 174, 185, 165, 160, 170, 173, 161, 158, 172, 168, 168, 178, 170, 180, 166, 183, 150, 164, 161, 164

Interval Tally Freq. (f) Midpoint (x) x.f140 – 149150 – 159160 – 169170 – 179180 – 189

TOTAL

To calculate the mean a midpoint is needed and the formula used is:

Mean = sum of midpoint xtotal frequency

e.g. Calculate the mean from the above data and state the modal interval

III

IIII IIII IIIII III

III

12

1183

25

(140 + 149) / 2144.5154.5164.5174.5184.5

144.5 x 1144.5309

1809.51396

553.54212.5

Mean = 4212.5 25

= 168.5 cm Modal Interval = 160 – 169 cm

Note: Make sure you have enough groups but don’t make them too small!

Page 18: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Histograms– display grouped data– frequency is along vertical axis, group intervals are along horizontal axis– there are NO gaps between bars

e.g. Graph the grouped frequency table data about heights onto a histogram

F

req

ue

ncy

(n

o.

of

stu

de

nts

)

140 150 160 170 180 190

Height (cm)

Student Heights

2

0

4

10

6

8

12

Note: The groups from the table form the intervals along the horizontal axis and the highest frequency determines the height of the vertical axis.

Page 19: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

– Side by side histograms can also be used to compare data

Female Heights

140 150 160 170 180 190 200

2

4

6

8

Male Heights

140 150 160 170 180 190 200

2

4

6

8

Question: What is the comparison between the female and male heights?ANSWER?

EVIDENCE?

Page 20: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Scatter Graph/Plot– looks for a relationship between two measured variables – points are plotted like co-ordinates e.g. Below are the heights and weights of Year 7 boys. Place on a scatter plot.Height (cm)

Weight (kg)

144 48152 52161 50148 49155 53140 47158 58139 45147 50150 51152 49138 46137 44145 49

45

50

55

60

Wei

ght (

kg)

135 140 145 150 155 160

Height (cm)

Scatter Diagram for boys heights and weights

Use the data to determine scale to use on both axes

If points form a line (or close to) we can say there is a relationship between the two variables.

Line of best fit

Outliers can generally be ignored

What is the relationship between the boys height and weight?

ANSWER?EVIDENCE?

Page 21: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Time Series– a collection of measurements recorded at specific intervals where the quantity changes with time.

Features of Time Series a) Order is important with all measurements retained to examine trendsb) Long term trends where measurements definitely tend to increase or decreasec) Seasonal trends resulting in up and down patterns

e.g. Draw a time series graph for the following data:

Season Quarterly sales

Sept. 9040Dec. 8650

Mar. 96 8370June 9250Sept. 9033Dec. 8578

Mar. 97 8495June 9407Sept. 9209Dec. 8740

Mar. 98 8618June 9504Sept. 9246Dec. 8929

Mar. 99 8670

Quarterly Sales for Elliots's Fish and Chips Shop

8300

8500

8700

8900

9100

9300

9500

Sale

s (

$)

Sept.

Dec.

Mar.

96

June

June

Mar.

97

Dec.

Sept.

June

Mar.

98

Dec.

Sept.

Mar.

99

Dec.

Sept.

Quarter Years

9700

9900

Join up each of the points

What are the short and long term trends? ANSWER? EVIDENCE?

Page 22: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Misleading GraphsGood graphs should have:- an accurate heading (watch emotive headings)- scales in even steps- scales from zero unless a break is shown- values easy to read- bar graphs have the same width bars and similar shading

Statistical Investigation Terms

Sample: When part of the group is surveyedCensus: Whole population is surveyed

Population: The entire group of members under consideration

Survey: Collection of information from some or all members of a populationSampling Frame: A list covering the target population

A Good Sampling Frame: - should have each unit listed only once- has each unit distinguishable from others- is up to date

Page 23: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Investigations

When planning an investigation:

- think carefully about what you are trying to find (question)- what data is needed- how will you obtain the data- is the method practical and convenient- how will you record the information- how will you present the data

Page 24: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Choosing a SampleA sample should:1) Be large enough to be representative of whole population2) Have people/items in it that are representative of the population

It is best to choose samples that are large and random but size may be affected by time, money, personnel, equipment etc.

Some Sampling MethodsSimple random sampling:1- obtain a population list2- number each member3- use random table or random number on calculator

Systematic sampling:

1- obtain a population list2- randomly select a starting point on the list3- select every nth member until desired sample size is reached

Note: every nth member is found by: Population/group size

Size of sample needed

Page 25: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Accuracy of Results/Sources of Error- Biased sample- Wrong measurements- Poorly worded, misleading questions- Mistakes in calculations and/or display

Posing and Answering QuestionsFor when comparing two sets of data.

1. If the two sets of data are NOT related (have no affect on each other)Use the words COMPARE OR COMPARISONe.g. What is the COMPARISON between…

How does … COMPARE to …

THEN: (also if justifying statements)- Get as many statistics as possible (averages, quartiles, max and min, range etc)- Draw a STEM and LEAF GRAPH and a BOX and WHISKER PLOT (maybe SIDE BY SIDE HISTOGRAMS)- Answer your question in one sentence- Back up your answer with at least 2-3 statements using the data from your statistics/graphs (at least one each on average and spread)

Remember to use “generally/on average”

Page 26: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

2. If the two sets of data ARE related (do have an affect on each other)Use the words RELATE OR RELATIONSHIPe.g. What is the RELATIONSHIP between…

How does … RELATE to …

THEN:- Get as many statistics as possible - Draw a SCATTERPLOT- Answer your question in one sentence- Back up your answer with at least 2-3 statements

3. If it is a single set of data taken over time we look for short and long term trends.- Write your question in the following manner:What are the SHORT and LONG TERM trends in ….THEN: - Get as many statistics as possible - Draw a TIME SERIES GRAPH- Answer your question and back it up with justifications

Page 27: 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g. Calculate the mean of

Limitations and Improvements1. In terms of Data CollectionTypical Limitations Improvements- Sample too small- Not random orrepresentative

- Outliers distort data

- Get a representativesample

- Taken over too shorta time period

- Take data over a longer time period

- Obtain a bigger sample

- Ignore extreme outliers

2. In terms of Your ProcessTypical Limitations Improvements- Not enough statisticscalculated

- Calculate more statistics

- Not enough graphs used,data could be compared better

- Use other graphs (i.e. comparative histograms)

- Scales on graphs too large - Change scales on graph (smaller)

- Way graphs are drawn - Alter the way the graphs may be drawn