data collection and presentation

Post on 20-Jan-2017

277 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Collection & Presentation

Presented by:Nasif Hassan Khan Abir ………… ID # 61531-24-007Md. Ferdaus Alam ………… ID # 61531-24-010Zakir Husain ………… ID # 61325-18-058Md. Faruqul Islam ............ ID # 61325-18-029

Data

Data Collection The collection, organization, and presentation of data are basic

background material for learning descriptive and inferential statistics and their applications

Method of Collecting DataOn the basis of the source of collection data may be classified as: Primary data Secondary data

Types of DataThere are two types of data. They are: Numerical Data Categorical Data

Collection of Data

Collection of Data The data which are originally collected for the first time for the

purpose of the survey are called primary data. For example facts or data collected regarding the habit of taking tea or coffee in a village by an investigator.

Method of Collecting Primary DataThere are several methods for collecting primary data. Some of them

are: Direct personal investigation Indirect investigations Through correspondent By mailed questionnaire Through schedules

Collection of Data(cont’d)Secondary Data When we use the data, which have already been collected by

others, the data are called secondary data. This data is said to be primary for the agency which collects it first, and it becomes secondary for all the other users.

Method of Collecting Secondary Data Published reports of newspapers, RBI and periodicals. Publication from trade associations Financial data reported in annual reports Information from official publications Publication of international bodies such as UNO, World Bank etc. Internal reports of the government departments Records maintained by the institutions Research reports prepared by students in the universities

Types of Data

Categorical Data Categorical data is the statistical data type consisting of

categorical variables or of data that has been converted into that form, for example as grouped data. For example- Marital Status, Political Party, Eye Color, etc.

Numerical Data Numerical values or observations can be measured. And these

numbers can be placed in ascending or descending order. Numerical data can be divided into two groups:

Discrete(Counted Items such as- number of children, defects per hour etc.)

Continuous(Measured Characteristics such as- weight, voltage etc.)

Types of Data(cont’d)Level of Measurement/Measurement Scale

Interval Data

Ordinal Data

Nominal Data

Height, Age, Weekly Food Spending

Service quality rating, Standard & Poor’s bond rating, Student letter grades

Marital status, Type of car owned

Ratio Data

Temperature in Fahrenheit, Standardized exam score

Categories (no ordering or direction)

Ordered Categories (rankings, order, or scaling)

Differences between measurements but no true zero

Differences between measurements, true zero exists

EXAMPLES:

Data PresentationPresentation of Data Data collected in the form of schedules and questionnaires are

not self explanatory. These are in the form of raw data. In order to make them meaningful, these are to be made presentable.

 Presentation of Categorical Data Categorical Data can be presented by two ways: Tabulating Data(Summary Table) Graphing Data (Bar Chart, Pie Chart, Pareto Diagram)

The Summary Table

The summary table is a visualization that summarizes statistical information about data in table form.

 Example: Current Investment Portfolio

Investment Amount Percentage Type (in thousands $) (%)

Stocks 46.5 42.27Bonds 32.0 29.09CD 15.5 14.09Savings 16.0 14.55 Total110.0 100.0

Bar Chart

Bar charts are often used for qualitative data (categories or nominal scale). Height of bar shows the frequency or percentage for each category. Bar Chart for the previous summary table is

StocksBonds

CDSavings

0 5 10 15 20 25 30 35 40 45 50

Investor's Portfolio

Amount in $1000's

Pie Chart

Pie charts are often used for qualitative data (categories or nominal scale). Size of pie slice shows the frequency or percentage for each category. Pie Chart for the previous summary table is shown below

Pareto Diagram

Used to portray categorical data A bar chart, where categories are shown in descending order of frequency A cumulative polygon is often shown in the same graph Used to separate the “vital few” from the “trivial many”

Stocks Bonds Savings CD0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%Current Investment Portfolio

Series1Series2

% invested in each category (bar graph)

cumulative % invested (line

graph)

Presentation of Numerical DataCategorical Data can be presented by two ways:

Ordered Array (Stem-and-Leaf Display) Frequency/Cumulative Distributions (Histogram, Polygon,

Ogive)

Ordered Array  A sequence of data in rank order: Shows range (min to max) Provides some signals about variability within the range May help identify outliers (unusual observations) If the data set is large, the ordered array is less useful Example- Data in raw form (as collected): 24, 26, 24, 21, 27, 27,

30, 41, 32, 38 Data in ordered array from smallest to largest:21, 24, 24, 26, 27,

27, 30, 32, 38, 41

Stem-and-Leaf Diagram A simple way to see distribution details in a data set. To make

this diagram first

We have to separate the sorted data series into leading digits (the stem) and the trailing digits (the leaves).

Stem and Leaves of 21, 38 and 41 is,

Stem Leaf2 13 84 1

Frequency/Cumulative Distributions

What is a Frequency Distribution? A frequency distribution is a list or a table Containing class groupings (ranges within which the data fall) The corresponding frequencies with which data fall within each

grouping or category.

The reasons for using Frequency Distributions are: It is a way to summarize numerical data It condenses the raw data into a more useful form It allows for a quick visual interpretation of the data

Frequency/Cumulative Distributions(cont’d)

Class Intervals and Class Boundaries Each class grouping has the same width Determine the width of each interval by

Usually at least 5 but no more than 15 groupings Class boundaries never overlap Round up the interval width to get desirable endpoints

groupingsclassdesiredofnumberrangeintervalofWidth

Frequency Distributions Example

A manufacturer of insulation randomly selects 20 winter days

and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41,

43, 44, 27, 53, 27 For frequency distribution we need to follow the following steps:

Sort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Find range: 58 - 12 = 46 Select number of classes: 5 (usually between 5 and 15) Compute class interval (width): 10 (46/5 then round up) Determine class boundaries (limits): 10, 20, 30, 40, 50, 60 Compute class midpoints: 15, 25, 35, 45, 55 Count observations & assign to classes

Frequency Distributions Example(cont’d)

The Histogram

A graph of the data in a frequency distribution is called a histogram

The class boundaries (or class midpoints) are shown on the horizontal axis

the vertical axis is either frequency, relative frequency, or percentage

Bars of the appropriate heights are used to represent the number of observations within each class

Example-For previous data the Histogram should be like this. There will be no gap between bars.

5 15 25 35 45 55 650

1

2

3

4

5

6

7

Histogram: Daily High Temperature

Class Midpoints

Freq

uenc

y

The Frequency Polygon

In a percentage polygon the vertical axis would be defined to show the percentage of observations per class.

Example-For previous data the Frequency Polygon should be like this,

5 15 25 35 45 55 650

1

2

3

4

5

6

7

Frequency Polygon: Daily High Temperature

Class Midpoins

Freq

uenc

y

The Ogive

It is also known as the cumulative percent polygon.Example-For previous data the Ogive or Cumulative percent Polygon should be like this,

10 20 30 40 50 600

10

20

30

40

50

60

70

80

90

100

Ogive: Daily High Temperature

Class Boundaries (Not Midpoints)

Cum

ulat

ive

Perc

enta

ge

Guidelines for good data presentation

Not distorting the data Avoiding unnecessary adornments (no “chart junk”) Using a scale for each axis on a two-dimensional graph The vertical axis scale should begin at zero Properly labeling all axes The graph should contain a title Using the simplest graph for a given set of data

THANK YOU !!!

top related