data collection & sampling techniques 1. meaning of statistics statistics is used to mean either...

Post on 23-Dec-2015

229 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Data Collection & Sampling Techniques

2

MEANING OF STATISTICS

• Statistics is used to mean either statistical data or statistical methods

• Statistics is a method of collecting, organising and analysing the numerical data for understanding a phenomenon or making wise decisions

3

FUNCTIONS OF STATISTICS

• 1. To present facts in proper form• 2. To simplify unwieldy and complex data and to

make them easily understandable.• 3. To help the classification of data according to

various characteristics.• 4. To provide techniques for making comparisons• 5. To study relationships between different

phenomena.• 6. To indicate the trend behaviou.

4

LIMITATIONS OF STATISTICS

• 1. Statistics does not study individuals• 2. Statistics does not study qualitative

phenomena• 3. Statistical results are true only on an average.• 4. Statistical laws are not exact. (like laws of

physical and natural sciences, statistical laws are only approximations and not exact.

• 5. Statistics does not reveal the entire story• 6. Statistics is liable to be misused.

5

Uses of Statistics

• Describe data• Compare two or more data sets• Determine if a relationship exists between

variables• Make estimates about population characteristics• Predict past or future behavior of data

6

Misuse of statistics

• “There are three types of lies---lies, damn lies, and statistics” Benjamin Disraeli

• “Figures don’t lie, but liars figure”• “Statistics can be used to prove anything ---

especially statisticians” Franklin P. Jones

7

Sources of Misuse

• There are two main sources of misuse of statistics: – An agenda on the part of a dishonest researcher – Unintentional errors on part of a researcher

8

Misuses of Statistics

• Survey Questions– Loaded Questions---unintentional wording to elicit

a desired response– Order of Questions– Nonresponse (Refusal)—subject refuses to answer

questions– Self-Interest ---Sponsor of the survey could enjoy

monetary gains from the results

9

Misuses of Statistics

• Missing Data (Partial Pictures)– Detached Statistics ---no comparison is made – Percentages --

• Implied Connections– Correlation and Causality –when we find a

statistical association between two variables, we cannot conclude that one of the variables is the cause of (or directly affects) the other variable

Exercise 1

10

Data Collection

• In research, statisticians use data in many different ways.

• Data can be used to describe situations. • Data can be collected in a variety of ways, BUT

if the sample data is not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them.

11

Data Analysis

12

Course objectives

Trainees will analyze graphs.a. Analyze data presented in a graph.b. Compare and contrast multiple graphic

representations (circle graphs, line graphs, line plot graphs, pictographs, Venn diagrams, and bar graphs) for a single set of data and discuss the advantages/disadvantages of each.

c. Determine and justify the mean, range, mode, and median of a set of data.

13

Terms

• Mean: The sum of the numbers in a set of data divided by the number of pieces of data. ( D+ X analysis, scan compliance, delivery percentage etc in MNOP KPI, work load calculation for post office )

• Median: The number in the middle of a set of data when the data are arranged in order from least to greatest. When there are 2 middle numbers, the median is the number that is halfway between the two middle numbers.

14

Terms

• Mode: The number that occurs most frequently in a set of numbers.

• Range: The difference between the largest and smallest values in a numerical data set.

15

Finding the Mean

• Step 1: Add all numbers in your set of data.• Step 2: Divide the sum by the number of

pieces of data.

Example:Set of Data: 15, 15, 14, 16

Sum: 60Total number of pieces of

data: 4Mean: 60 ÷ 4 = 15

16

Finding the Median

• Step 1: Put all numbers in order from least to greatest.

• Step 2: Find the middle number.

Example:Set of Data: 15, 15, 14, 16

Ordered: 14, 15, 15, 16Middle Number: 15 and 15

Median: 15

17

• Ex: test check figures for two days for unregistered article . The day should be a normal working day – ex Wednesday or Thursday .

• Here what we are assuming is that these days will have normal transactions. Hence these are the median for normal transactions.

• Other days work may vary between minimum and maximum

18

Finding the Mode

• Step 1: Put all numbers in order from least to greatest.

• Step 2: Find the most popular number.

Example:Set of Data: 15, 15, 14, 16

Ordered: 14, 15, 15, 16Mode: 15

19

• Ex – checking post man in the beat by the PRIP . The PRIP should select a point where the probability of the post man visiting the point is high . The Prip should be selecting the mode i. e. the point visited more frequently

20

Finding the Range

• Step 1: Put all numbers in order from least to greatest.

• Step 2: Subtract the lowest number from the highest number.

Example:Set of Data: 15, 15, 14, 16

Ordered: 14, 15, 15, 16Range: 16 – 14 = 2

21

Activity

• The no of articles booked in an MPCM in a post office is as follows – Monday – 175– Tuesday - 202– Wednesday - 180 – Thursday – 130 – Friday – 198– Saturday – 175

• Find the mean , median , mode and range for the above set of data

22

Types of Graphs

Bar GraphCircle GraphLine Graph

Line Plot GraphVenn Diagram

Pictograph

Bar Graph

Definition: a graph that shows data using horizontal or vertical bars.

Advantages:

•Easy to read

•Compares multiple sets of data

Disadvantages

•Not best for showing trends 23

24

ExercisePrepare a bar chart with the given information

Year revenue

2009-10 6266

2010-11 6962

2011-12 7899

2012-13 9366

2013-14 10720

Circle ( Pie )Graph

Definition: A graph that shows data in the form of a circle.

Advantages:

•Shows percentages

•Shows how a total is divided into parts

Disadvantages

•Not best for showing trends

25

26

Exercise-----Prepare a pie chart Revenue year 2013-14Products Revenue Speed Post 1372.0Business Post 1029.4Bill Mail Service 103.0Express Parcel Post 77.6Retail Post 70.2Sale of Postage Stamps 622.8Logistic Post 15.3 Money Orders 606.9Others 852.5Revenue fro P.O. 4749.6SBCC 5971.3Total Revenue 10720.9

Line Graph

Definition: A graph that shows data in the form of a line.

Advantages:

•Shows change over time

•Helps you see trends

Disadvantages

•Not easy to use to compare different categories of data 27

28

Exercise From the following table prepare a line diagram

Year Expenditure

2009-10 13346

2010-11 13793

2011-12 14163

2012-13 15481

2013-14 16796

Pictograph

Definition: A graph that displays data using symbols or pictures.

Advantages:

•Compares multiple sets of data

•Visually appealing

Disadvantages

•Hard to read when there are parts of pictures. 29

30

Venn Diagram

Definition: Circles that show relationships among sets.

Advantages:

•Shows comparisons and contrasts easily.

Disadvantages

•Does not show trends

31

• 100 trainees attended PA induction program in your PTC

• 80 trainees attended IP induction program in your PTC

• 20 have attended both IP and PA induction program in the same PTC – make a Venn diagram

Exercise 2

32

Sampling and Sampling Distributions

33

Sample and population (ASW, 15)• A population is the collection of all the elements of

interest.(census enumeration)• A sample is a part of the population.– Good or bad samples.– Representative or non-representative samples. A

researcher hopes to obtain a sample that represents the population, at least in the variables of interest for the issue being examined.

– Probabilistic samples are samples selected using the principles of probability. This may allow a researcher to determine the sampling distribution of a sample statistic.

34

MEANING OF SAMPLING Sampling is a method in which only those items that are included in the sample are observed for purpose of drawing conclusions about the population from which sample is drawn.The so obtained sample will be called as statistic (i.e. The measures of central tendency and measures of dispersion are called statistic and are used as a basis for estimation population parameters).

35

NEED FOR SAMPLING

• 1. Savings in time and money• 2. When the population is infinately large• The fact that the characteristics of the

sample are able to provide an approximately correct idea about the population parameters is borne out by the theory of probability.

36

Methods of sampling – probabilistic• Random sampling methods – each member has an equal probability of

being selected.• Systematic – every kth case. Equivalent to random if patterns in list are

unrelated to issues of interest. Eg. Inspection of BO by divisional head.• Stratified samples – sample from each stratum or subgroup of a

population. Eg. SB withdrawal verification( more than 10000) .• Cluster samples – sample only certain clusters of members of a

population. Eg. city blocks, firms, test cards only on the addressees in the periphery of the jurisdiction, SB withdrawal checked only for C class offices , inspection of bad Bos .

• Multistage samples – combinations of random, systematic, stratified, and cluster sampling. Ex – checking of transaction particulars of selected days during the inspection of BO

• If probability involved at each stage, then distribution of sample statistics can be obtained.

37

Basic Methods of Sampling

• Random Sampling– Selected by using chance

or random numbers– Each individual subject

(human or otherwise) has an equal chance of being selected

– Examples: • MO verification by PRIP • Drawing names from a

hat• Random Numbers

38

Basic Methods of Sampling

• Systematic Sampling– Select a random starting point and then select every kth

subject in the population– Simple to use so it is used often

39

Basic Methods of Sampling

Convenience SamplingUse subjects that are easily accessible Examples:

Using family members or students in a classroomMall shoppers

40

Basic Methods of Sampling

Stratified SamplingDivide the population into at least two different groups with

common characteristic(s), then draw SOME subjects from each group (group is called strata or stratum)

Basically, randomly sample each subgroup or strataResults in a more representative sample

41

Basic Methods of SamplingCluster Sampling

Divide the population into groups (called clusters), randomly select some of the groups, and then collect data from ALL members of the selected groups

Used extensively by government and private research organizations

Examples:Exit Polls

42

Objects of sampling

1. To Obtain information about the population on the basis of sample drawn from such population.

2. To setup the limits of accuracy of the estimates of the population parameters computed on the basis of sample statistic.

43

Some terms used in sampling

• Sampled population – population from which sample drawn (ASW, 258). Researcher should clearly define.

• Frame – list of elements that sample selected from (ASW, 258). Eg. telephone book, city business directory. May be able to construct a frame.

• Parameter – Numerical characteristics of a population (ASW, 259). Eg. total (annual GDP or exports), proportion p of population that votes Liberal in federal election. Also, µ or σ of a probability distribution are termed parameters.

• Statistic – numerical characteristics of a sample. Eg. pre-election polls.

• Sampling distribution of a statistic is the probability distribution of the statistic.

44

Sampling distribution of a sample

• Sampling distribution of a statistic refers to the distribution of the various values, which can be assumed by that statistic, computed from the various samples of the same size randomly drawn from the population. Any statistical measure of statistic like mean, standard deviation etc. may be computed for each of the samples so drawn and a series of those value of statistic may be compiled. The various values of the statistic so obtained may be arranged as a frequency distribution which is known as the sampling distribution.

45

Selecting a sample (ASW, 259-261)

• N is the symbol given for the size of the population or the number of elements in the population.

• n is the symbol given for the size of the sample or the number of elements in the sample.

• Simple random sample is a sample of size n selected in a manner that each possible sample of size n has the same probability of being selected.

• In the case of a random sample of size n = 1, each element has the same chance of being selected.

46

Selecting a simple random sample

• Sample with replacement – after any element randomly selected, replace it and randomly select another element. But this could lead to the same element being selected more than once.

• More common is sample without replacement. Make sure that on each stage, each element remaining in the population has the same probability of being selected.

47

Simple random sample of size 2 from a population of 4 elements

Population elements are A, B, C, D. N=4, n=2.1st element selected could be any one of the 4 elements and

this leaves 3, so there are 4 x 3 = 12 possible samples, each equally likely: AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC.

If the order of selection does not matter (ie. we are interested only in what elements are selected), then this reduces to 6 combination. If {AB} is AB or BA, etc., then the equally likely random samples are {AB}, {AC}, {AD}, {BC}, {BD}, {CD}. This is the number of combinations (ASW, 261, note 1).

12)!24(

!4

)!(

!

nN

NP

N

n

6)!24(!2

!4

)!(!

!

nNn

NC

N

n

48

Standard error of a statistic

• The average amount of variability of the observations of a population is computed, it is known as standard deviation and the average amount of variability of observations of a sampling distribution computed is known as standard error.

49

Sampling from a process (ASW, 261)• Careful design for sample is especially important.– Sample production of milk at random times.– Sample of data of various products in the department

Like speed post, logistic post, business post etc .,

– we need to calculate the mean and standard deviation for the observations from the samples.

– How to calculate the mean and standard deviation of the population.

– (the standard deviation is the square root of the average of the squared distances of the observations from the mean.)

top related