qt1 - 02 - frequency distribution
TRANSCRIPT
Tables and Graphs
Frequency Distributions
QUANTTECHINTEUQIASEVIT10S
Contents
Basics of Data
Samples and Populations
Data Array
Frequency Distributions
Relative Frequency Distributions
Classes
Qualitative versus Quantitative
Discreet versus Continuous
Illustrating Data
Histograms
Polygons
Data Basics
Data are collections of any number or related observations
Number of telephones installed by all workers in one day
Number of telephones installed by one worker in one day
Number of tourists in Finland on every Diwali day ??
Data is useful when they
Reveal some kind of pattern
Temperature in December is less than that in June
Lead to some logical conclusion
Senior citizens avoid investing in equity markets
Data Collection & Sanity Check
Source of Data
Actual observation in the field
Physical records available with source organisation
Third party data sources
Commercial data sellers
Free data sources available in the web
Basic Sanity Check
Is the source trustworthy ?
Is there something missing in the data ?
Do we have enough observations ?
Is the conclusion logical ? Garbage In Garbage Out
Is there double counting ?
Samples and Populations
Population : is a collection of all elements about whom we are trying to draw conclusions
Women in Calcutta with age > 18
Sample : is a collection of some, not all, elements of the population about whom we are in a position to gather data
Statisticians gather data from a sample and then use this data to draw inferences about the population
Representative Sample : it should reflect the characteristics of the underlying population
Selecting a sample from of women from Calcutta Club may not be representative of all women in Calcutta !
Organising Data
Organising data enables us to quickly spot some of the characteristics of the data
Range : Highest Value ? Lowest Value ?
Clustering : Are the values grouped around a specific value ?
Popularity : Which value occurs most frequently
Ways of organising data
Simple ascending or descending order
Group by certain characteristic
Age ? Income ? Education Level ?
Colour ? Material ?
Examples of Raw Data
Retail Sales Figures
Examples of Raw Data
Forbes500 Company Data
Examples of Raw Data
US Cereals Data
Examples of Raw Data
Stockmarket Price Data
Examples of Raw Data
Examination Marks
Examples of Raw Data
Engine Pollution Data
Data Array
The Data Array arranges values in ascending or descending order
Why Create a Data Array ?
We can quickly get the highest and lowest value
In Hydrocarbon : 0.34 .. 1.1
We can divide the data into sections
First 1/3 : Between 0.34 and 0.46
Second 1/3 : Between 0.47 and 0.56
Last 1/3 : Between 0.56 and 1.1
We can see whether some value appears multiple times
We can observe difference between successive values of the data
Limitations of Data Array
Cumbersome to use when the volume of data is very large
Utility goes down as human mind cannot comprehend so much data in one shot
There is a need to compress this data and make it more accessible
Frequency Distribution
A frequency distribution is a table that organises data into classes
A class is a group of values describing ONE characteristic of the data
It shows the number of observations from the data that fall into each class
Frequency distribution can be constructed by determining how often ('with what frequency') values occur inside each class of a data set
Fewer classes mean more data compression
Frequency Distribution
Relative Frequency Distribution
Frequency of each value can be expressed as a fraction or percentage of the total number of observations
This could help us compare data from samples that are of different sizes
Discrete & Continuous Classes
DISCRETE : In this case, the data in a class can take ONE discrete value :
0, 1, 2, ...
CONTINUOUS : In this case, the data in a class can take any value in a range
> 0; 1; 2; ) Lower Class Boundary
Less Than OR Equal to (