1 statistical concepts module 1, session 2. 2 objectives from this session participants will be able...

41
1 Statistical concepts Module 1, Session 2

Upload: lucy-hicks

Post on 01-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

1

Statistical concepts

Module 1, Session 2

Page 2: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

2

Objectives

From this session participants will be able to:

Define statistics Enter simple datasets once the data entry form is set up Recognise the type of each variable in a dataset Know some ways to summarise data of each main type

Explain how statistical investigations deal with variability Differentiate between descriptive and inferential statistics

Page 3: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

3

Activities

1. This introduction

2. Entry of the data from the CAST survey

3. Discussion/presentation on statistical concepts

1. Using the data entered

2. And other case studies

4. The statistical glossary1. For when you need to remind yourself about

terminology

Page 4: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

4

What is statistics - 1?

From RSS webpage:

1. Statistics changes numbers into information.

2. Statistics is the art and science of deciding: what are the appropriate data to collect, deciding how to collect them efficiently and then using them to give information, answer questions, draw inferences and make decisions.

Page 5: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

5

What is statistics - 2?

3. Statistics is making decisions when there is uncertainty.

We have to make decisions all the time, in everyday life, and as part of our jobs. Statistics helps us make better decisions.

4. Statistics is NOT just collecting a lot of numbers It is collecting numbers for a purpose

Page 6: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

6

What is statistics - 3? From Wikipedia:

5. Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation and presentation

of data.

6. Statistics are used for making informed decisions and misused for other reasons

in all areas of business and government

Page 7: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

7

What is statistics - 4?

From the book “Statistics: A guide to the unknown”:

7. Statistics is the science of learning from data.

Question 1 in the practical sheet

From these 7 definitions – in the practical sheet either chose the one you think is most appropriate or make your own

a) A one – line definition

b) A longer definition

Page 8: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

8

Data checking and entry – Question 2 What can we learn from the data you collected? Work in pairs or small groups First check the data from the CAST survey Check each others, not your own

Is it legible? Can it be entered into the computer? Is the response to the open-ended question clear? Can the text be simplified? If there are many points, ask the respondent to state which are

the most important 2 or 3. Brief notes (as a report) to be made

to establish the data are ready for entry

Page 9: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

9

Data entry into Excel

Just type the number. The label is

automatic

Page 10: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

10

Data entry and checking – Question 3 The data are now entered This can be a class exercise

on a single computer

Data is entered by someone else for each respondent (never by themselves)

Then it must be checked read it out check by reading back

Put the record number from the Excel form on your original sheet or add your names as another field in the Excel sheet

Why might it be better to just have a number?

Page 11: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

11

Data entry and checking

You should now have completed question 3 On the practical sheet

How long to you estimate For 1000 records to be entered?

Page 12: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

12

Once the data are entered Remember:

“Statistics is the science of learning from data.” To learn as much as possible

we must have confidence in the data so they must be entered and checked well

This is what we have done in the groups Now the data are ready for the analysis Before that, look at some other data sets

Look for the common points That apply to all the sets and look for differences

Page 13: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

13

Types of data - 1 The analysis depends on the type of data What are the types in the CAST qusetionnaire? For questions 1 to 6

Your answer was one of 5 categories e.g. 1: Strongly agree, 2: Agree, … 5: Strongly disagree These categories have an ordering from strongly agree to strongly disagree

This type of data are called categorical or factor or qualitative

With the ordering, they are sometimes called ordered categorical data

Page 14: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

14

Types of data - 2 The last question in the survey

was a sentence or two that was written

This is also an example of qualitative data It is an open-ended response

These data can be reported and reporting the sentences can be very useful

So it is good if they are entered as they stand

To summarise perhaps the responses can be coded?

Page 15: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

15

Coding open-ended questions –Question 4

This is question 4 in the practical sheet Looking at the responses in your groups

Could you code them? What different codes would you have? How would you enter the codes?

Might you lose anything by coding For a quick analysis

Could you enter the complete texts And analyse the other columns And then code later?

What might you lose by coding?

Page 16: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

16

Coding and entering open-ended data Discuss the suggestions for the codes. If some points are made by many students then

prepare a summary, how many as a frequency and as a percentage

With the small number of responses there is no need to enter them into the computer

But discuss how it could be done It is an example of a multiple response question

because respondents may give no points or more than one point

If you ask for the most important observation then it becomes a single qualitative response

Page 17: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

17

Other data sets

Zambia rainfall data Tanzania agriculture survey

Look for the layout of the data is it the same as for the simple CAST survey?

Look for the types of data Which are the qualitative variables?

are they ordered? Which are the quantitative variables?

which of them are discrete? and which are continuous? have any been coded to become qualitative?

Page 18: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

18

Annual climatic data from Zambia

Page 19: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

19

Survey data from Tanzania - 1

Page 20: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

20

Survey data from Tanzania - 2

Page 21: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

21

Discussion- Question 5

The layout of the data Was always the same! In a rectangle

Each row is a record There are as many records (rows of data) as there were respondents, or students, or units

Each column is a variable Variables can be qualitative or they can be quantitative

Discuss which type they are For each data sets complete the tables in the practical sheet, question 5

Page 22: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

22

Qualitative variables

They are categorical They may be nominal, (which implies there is

no ordering) Give some examples from the Tanzania survey

They may be ordered – as in the CAST survey Give an ordered example from the Tanzania

survey

Page 23: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

23

Examples of analysis – Tanzania surveyQuestion 6

There are 3223 records, but just take the 18 you can see in the figure

Count the values for Q0123 – head of household There were 6 Females and 12 Males So 2/3 of the 18 households had a male head That’s about 70% but percentages are a bit misleading with so few numbers

Now you give a similar summary for Q021 type of agricultural household

And also Q3464 how often did the household have food problems

Page 24: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

24

Add a simple chart A simple chart can also be sketched Here is one by Excel But a sketch can be “by hand”

Excel will be used for these tasks from Session 3

Page 25: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

25

Examples of analysis – CAST survey Question 7

Do a similar analysis of the CAST survey To make it quick

each group could initially process just one question then report the results to the class

Include a hand drawn chart Sketch a simple bar chart and include the numbers on the chart as shown earlier

Page 26: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

26

Quantitative variables- Question 8

They may be discrete (whole numbers) Give examples from the climatic data

And the Tanzania survey They may be (conceptually) continuous

Give examples from the data sets

Also they may be coded into (ordered) categories

Give an example from the Tanzania survey

Page 27: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

27

Examples of analysis – Tanzania survey An analysis of the 18 values in Q3462

The number of times meat was eaten last week minimum = 0 maximum = 5 adding the values: total = 31, so the mean = 31/18 about 1.7 times per week

Note: the mean does not have to be an integer just because the individual values are whole numbers

Repeat this analysis for Q3463 – times fish eaten last week and HHsize

Page 28: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

28

Data analysis

As the layout of the data is always the same Once you know how to analyse one data set You will have the principles to analyse them all And we have just done one analysis!

You have seen that The appropriate analysis depends on the type of data

So what are the principles of analysing (summarising) data of the different types?

Page 29: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

29

The methods of analysis How many?

are questions for qualitative variables for example the CAST survey, the Tanzania survey

You used summaries Like counts, or proportions or percentages

How large? How variable?

are questions for quantitative variables for example the climatic data or the Tanzania survey

We used summaries Like averages, extremes and measures of spread

Page 30: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

30

A toolkit for analysis

Different types of graph are also used

Qualitative data “how many”

Quantitative data how large how variable

Page 31: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

31

Statistics and variation In the CAST survey - why not just ask one

student? In the climatic data - why not just use one year? In the agriculture survey - why not just use one

household? Because there is variation between the

responses

Remember this definition? “Statistics is making decisions when there is uncertainty.”

Page 32: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

32

Variation is everywhere!

In the book “Statistics a guide to the unknown”

“Variation is everywhere. Individuals vary Repeated measurements on the same individual vary

The science of statistics provides tools for dealing with variation”

So statistics is concerned with making sense from data, when there is variation

Page 33: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

33

Fighting the curse of variation

To do good statistics you must tame variation fight the curse of variation

You have 2 main strategies for overcoming variation 1. Take enough observations

In the Tanzania survey there were 3223 households just from this one region

2. Measure characteristics that explain variation Variation itself is not necessarily the problem Variation you do not understand is the problem

Page 34: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

34

An example: explaining variation

Take the CAST survey Add a new record for an imaginary student Make it VERY DIFFERENT to the existing records So if most students were positive about CAST Then make this record very negative, etc

You have added variation Now what could you (should you) have

measured to explain this variation?

Page 35: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

35

What you could have measured This little survey only asked about CAST It did not ask about you, e.g.

male/female experience age computer access etc

These measurements could help to understand the difference with this new student

The Tanzania survey also asked about Education Possessions, etc

Why – to be able to understand/explain variation

Page 36: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

36

Analysis and variation together For statistical analysis you have:

summarised columns of data i.e. summarised individual variables

You did this for qualitative and quantitative variables

To fight the curse of variation You take measurements So you add to the rows of data

That helps you to explain the variation That’s statistics for you!

You analyse the columns, i.e. the variables And you understand variability by looking at the rows

Page 37: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

37

Types of statistics Wikepedia says roughly:

Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics.

In addition, patterns in the data may be modelled and then used to draw inferences about the process or population being

studied; this is called inferential statistics.

Both descriptive and inferential statistics comprise applied statistics.

Page 38: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

38

Descriptive and inferential statistics

We have just done descriptive statistics We will only do descriptive statistics in this module The sample in the Tanzania agricultural survey

was 3223 households That’s just under 1% of the households in the region

See the column called WT – with values like 137 So each observation “represents 137 households

But with such a large sample The inferences for the whole region Will be quite precise

So most of what we need now is descriptive tools In the later modules we add ideas of inferential statistics

Page 39: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

39

Glossary of statistical terms Each subject becomes easier

when you understand the terms

A glossary is supplied Called the SSC Statistical Glossary

It explains most of the terms For the 3 levels of this course

So some terms may be new to you now An example is on the next slide

You can print the glossary if you wish But it is good to look on-line Then all the terms in blue are links So you can easily move about in the document

Page 40: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

40

Example from the glossary Descriptive statistics If you have a large set of data, then descriptive

statistics provides graphical (e.g. boxplots) and numerical (e.g. summary tables, means, quartiles) ways to make sense of the data.

The branch of statistics devoted to the exploration, summary and presentation of data is called descriptive statistics.

If you need to do more than descriptive summaries and presentations it is to use the data to make inferences about some larger population.

Inferential statistics is the branch of statistics

devoted to making generalizations.

Page 41: 1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once

41

Learning objectivesCan you now: Define statistics Enter simple datasets once the data entry form is set up

Recognise the type of each variable in a dataset Know some ways to summarise data of each main type

Explain how statistical investigations deal with variability Differentiate between descriptive and inferential statistics