statistical techniques 4 credits [3-1-0] course coordinator: prateek sharma

59
Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Upload: roy-nicholson

Post on 25-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Statistical techniques 4 Credits [3-1-0]

Course coordinator: Prateek Sharma

Page 2: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Learning Objectives

• Need for studying environmental statistics

• Become aware of a wide range of applications of statistics in environmental management & decision making

• Define statistics

• Differentiate between descriptive and inferential statistics

Page 3: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Course Domain• The course is intended to provide students to develop a comprehensive and

understandable framework for applying statistical methods to various types of environmental problems.

• It includes – Grasping the language of statistics through descriptive statistics– Developing sampling design to determine "where to sample", "when to sample", "how to sample" and "how much to sample";– Understanding the theory behind the connect between the sample and the population;– making generalization and inferences about the population from the collected samples– Development of statistical models for environmental decision making and data analysis

• Thus the typical uses of statistical methods are analyzing environmental monitoring data, describing the frequency distribution of exposures of the population, ascertaining the degree of compliance with standards of the environmental monitoring data, and predicting the impact of pollutant source reductions on the quality of environment.

Page 4: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Course outline

IntroductionRelevance of statistics, mathematical models – deterministic and stochastic; random variables; populations and samples; parameters and statistics.

Review of basic conceptsMeasurement theory, levels of measurement; numerical measures of data; graphical presentation of data; Chebyshev’s theorem; measurement uncertainty.

Probability theoryProbability concepts; axioms of probability; probability distribution functions and their applications – discrete and continuous distributions.

Page 5: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Data samplingMethods for selecting sampling locations and times; types of sampling designs –probability and non-probability sampling; sampling theory, sampling distributions; parameter estimation, point and interval estimates; sample size determination for different sampling designs.

Tests of hypothesisHypothesis testing – parametric and non-parametric tests.

Quality assurance and quality controlQuality assurance, internal and external quality control; control charts – description and theory, application and limitations; outlier detection – different tests for outlier detection; errors, different types of errors; error propagation.

Page 6: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Data analysisExploratory data analysis; techniques for smoothing data; correlation, serial correlation; parameter estimation using method of least squares; empirical model building by linear regression; coefficient of determination; calibration; trend analysis - detecting and estimating trend, trends and seasonality.-----------------------

Page 7: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Evaluation criteria

1. Two minor exams each of 15% weightage

Tentative datesFirst Minor: February 14, 2011Second Minor:March 28, 2011

2. Assignment of 20% weightage

3. Major examination of 50% weightage.

Tentative date:May 09, 2011.

Page 8: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Need for studying environmental statistics

Consider following questions

• What is the probability of exceedence of NAAQS of a criteria pollutant at a given receptor location ?

• How many soil samples should be collected in an area contaminated by a toxicant to give 95% assurance that a threshold limit is not accidentally overlooked ?

• What is the probability that a water quality standard is violated in a given 24 hours period as a result of effluent discharged into a river ?

• At one of the National Ambient Air Quality Monitoring Stations (NAAQMS) run by the Central Pollution Control Board (CPCB) in Delhi, the probability that the National Ambient Air Quality Standard (NAAQS) for the pollutants CO, NOx and SOx will be exceeded, is respectively 0.24, 0.19 and 0.09; the probablity that NAAQS for CO and NOx to exceed is 0.06, for CO and SOx to exceed is 0.16, for NOx and SOx to exceed is 0.11; and the probability of NAAQS exceedence for CO, NOx as well as SOx is 0.04.– Determine the probability that the NAAQS is exceeded for any of the

pollutants.

Page 9: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

A company investing in renewable energy uses three different brands of hydropower turbines. Of its total installation, 50% are brand 1, 30% are brand 2 and 20% are brand 3. Each manufacturer offers a 1 year warranty on parts and labour. It is known that 25% of brand 1 turbines require repair within warranty period, whereas the corresponding percentages for brand 2 and 3 are 20% and 10% respectively. If a randomly selected turbine needs repair under warranty, what is the probability that it is a brand 1 turbine?

If the probability of allergic reaction from a certain drug is 0.001, compute the chance that out of 2000 individuals more than 2 will get allergy.

Page 10: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

A fragment of sandstone has been found in a streambed a student field party has to search for the deposit. Unfortunately, the source of the rock cannot be identified with certainty because it was found below the juncture of two dried stream tributaries. The drainage basin of the larger stream contains about 18 km2, while the basin drained by the smaller stream includes only about 10 km2. However, an examination of a geologic report and map of the region discloses the additional information that about 35% of the rock outcrops in the larger basin are of marine origin while almost 80% of the rock outcrops in the smaller basin are of marine origin, the remaining being of the igneous type. What is the likelihood that the fragment of sandstone came from the smaller basin?

Page 11: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Contd…

• In a suburban neighbourhood in Mumbai, 20% of the homes have slab foundations, and 80% do not. Research studies suggest that 75% of the homes with slab foundations in this region have indoor radon problems due to intrusion of radon gas form the soil beneath. Of these homes without slab floors, 15% have indoor radon problems due to the remaining indoor sources of radon.– What is the probability of a house reporting a radon problem to have a slab

foundation?– What is the probability of a house not reporting a radon problem to have a

slab foundation?

• If 10 in 100 cars in Delhi violate the emission norms, what is the probability that a traffic police inspector, who randomly selects 4 cars for inspection, will catch– none of the cars that violate the emission norms;– one of the cars that violate the emissions norms;– at least three cars that violate the emissions norms.

Page 12: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Contd…

• A Quality Control (QC) team from a government agency was assigned to assess the measurement process for Nitrate concentration of a laboratory. The QC team randomly inserted 15 specimens having known concentration of 10.0 mg/L into routine work of the laboratory, over a period of one week. The work was arranged so that the observed values would be random and independent. The chemists were ignorant of the fact that their performance was being assessed. The results in the order of observations, in mg/l, were:

8.8, 9.9, 10.7, 11.7, 10.1, 8.6, 11.4, 12.1, 10.7, 6.9, 7.4, 7.2, 12.7, 8.5, 10.3.– Assuming the replication process averages out the random errors,

estimate the bias associated with the measurement process.

Page 13: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Department of Agriculture routinely projects future grain production, based on estimates from sample. A sample of 100 plots (50 hectares each) produced a mean of 2.16 tonnes / acre. The Department assumes a population standard deviation of 0.278 tonnes. Calculate the 95% confidence interval.

It is understood that in a certain area 60% of the population have secondary sources of income other than farming, their primary source of income. A random sample of 600 persons from the site indicates 354 (59%) to have off-farm jobs. Determine whether the proportion of these farmers holding secondary jobs is truly 60%, using α = 0.01.

Page 14: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Average expected price of a transformer part over the next year is projected by a journal to be no more than Rs. 70. A manufacturer consults 15 market experts to obtain the mean projected price as Rs 75 with a standard deviation of Rs 5. Using α = 0.01, test the hypothesis that the population mean is Rs 70.

A leading GIS company arranged a special summer training programme for students of a reputed University. The scores obtained by a random sample of 10 students are given below. Use α = 0.10 to determine whether there is a significant improvement in knowledge of the students after attending the training programme.

Page 15: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

• Four students (A-D) each perform an analysis in which exactly 10.00 ml of exactly 0.1 M sodium hydroxide is titrated with exactly 0.1 M hydrochloric acid. Each student performs five replicate titrations, with the results shown in the table below. Comment on the accuracy, bias, and precision of each student.

Student A Student B Student C Student D

10.08 9.88 10.19 10.04

10.11 10.14 9.79 9.98

10.09 10.02 9.69 10.02

10.10 9.80 10.05 9.97

10.12 10.21 9.78 10.04

Page 16: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Contd…

• Following is the general assessment for the flow and concentration fluctuations for four different streams.

Stream Concentration Fluctuation Flow Fluctuation

A Small Large

B Large Large

C Small Small

D Large Small

Suggest suitable choice of sampling methods/technique for collecting representative sample from the respective water matrix.

• Estimate is required for average DO in similar streams in a region. What is the minimum number of observations required to estimate the mean DO within 0.5 mg/l with a given level of confidence, say 95%?

Page 17: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

• The average annual rainfall at a certain locality is 30.0 inches. This value has been established from a long history of weather data. In recent years, certain climatologically changes seem to be affecting, among other things, the annual precipitation. It is hypothesized that in fact the annual rainfall has increased. The past 8 years have yielded the following annual precipitation (inches):34.1, 33.7, 27.4, 31.1, 30.9, 35.2, 28.4, 32.1.Can we conclude from the above data that there is an increase in annual rainfall?

• The discharge permit for an industry requires the monthly average COD concentration to be less than 50 mg/L. For this 20 measurements are taken each month. For the following 20 observations, would the industry be in compliance with the standard?45, 63, 56, 55, 52, 49, 44, 49, 56, 71, 44, 51, 50, 49, 42, 46, 52, 59, 48, 51.

Page 18: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

• A small lake is fed by streams from a watershed that has a high density of commercial land use (CLU), and a watershed that is mainly residential (RLU). The historical data below for chloride concentration (in mg/L) were collected at random intervals over a period of fours years. Are the chloride concentrations of the two streams different?

CLU: 140 134 130 132 135 145 118 157RLU: 120 114 142 100 100 92 122 97 145 130

• Two atomic absorption spectrophotometers (AAS) were used determine antimony in the atmosphere. For samples from an urban atmosphere the following results were obtained (in mg/m3):

Sample No.: 1 2 3 4 5 6AAS No. 1: 22.2 19.2 15.7 20.4 19.6 15.7AAS No. 2: 25.0 19.5 21.3 20.7 16.6 16.8

Can we conclude that two AASs have the same precision?

Page 19: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

• A large portion of chromium contaminated water was divided into 20 identical samples. Five samples were sent to each of four laboratories and the following data were produced. Are the laboratories making consistent measurements?

• The numbers of glassware breakages reported by five laboratory workers in an Environmental Monitoring Laboratory over a given period are shown below. Is there any evidence that the workers differ in their reliability.

24, 17, 11, 9, 19.

Laboratory I Laboratory II Laboratory III Laboratory IV

26.1 18.3 19.1 30.7

21.5 19.7 13.9 27.3

22.0 18.0 15.7 20.9

22.6 17.4 18.6 29.0

24.9 22.6 19.1 20.9

Page 20: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

• Two different heating systems (natural gas and cogeneration) are offered for greenhouse use. An agricultural engineer is interested in determining if there is a difference in cost of operation between the two systems. A sample of 16 greenhouses using natural gas produces an average annual cost of Rs 1750000 with a standard deviation of Rs. 40000. Another sample of 14 green houses with space capacity equal to sample one produces an average annual cost of Rs. 1640000, with a standard deviation of Rs. 50000. Can we conclude that average costs of the two systems are different?

Page 21: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

• The following table gives the logarithms of N = 60 total suspended particulate (TSP) air data that were collected on five randomly selected days each month at one of the ambient air quality stations. How can we check quality control of this data?

ni/Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1 4.1 2.9 2.5 3.2 3.1 2.4 2.1 2.4 2.8 3.2 3.8 3.9

2 3.8 3.0 2.6 3.1 3.2 2.5 2.1 2.4 2.6 3.2 3.7 4.0

3 3.2 3.1 2.4 2.9 2.8 2.6 2.2 2.5 2.9 3.4 3.8 3.8

4 3.5 2.8 2.4 2.7 2.4 2.3 2.4 2.6 3.0 3.3 3.4 3.7

5 3.1 3.0 2.5 3.7 2.9 2.5 2.3 2.7 3.1 3.1 3.6 3.9

• The following values were obtained fro the nitrite concentration (mg/l) in a sample of stream water:

0.34, 0.36, 0.32, 0.35, 0.50.

Can we reject the last measurement, which appears to be suspect (an outlier)?

Page 22: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Yields of maize (in tonnes per hectare) collected from ten field plots and the amount of fertilizer applied (in kilograms per hectare) are furnished in a table below. Check whether X and Y are linearly related at a significance level of α = 0.05.

Yield Y (tonnes / hectare) 5.0 5.7 6.0 6.2 6.3 6.5 6.8 7.0 6.9 6.6

Fertilizer X (kgs/hectare) 5 10 12 18 25 30 36 40 45 48

After testing generators that can run on bio-diesel produced at a plant during demonstration run, the manager wanted to assess whether the number of defects found in the sets follow Poisson distribution. Apply χ2 test of goodness of fit to the test results given below

No. of defects 0 1 2 3 4 5

Observed frequency 6 13 13 8 4 3

Expected frequency (Poisson) 6.24 13.52 13.52 9.01 4.5 1.8

Page 23: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

• Wolfer sunspot numbers are an index of activity on the solar surface. They have been investigated for their impact on terrestrial climate and for the resulting environmental effects. Twenty annual observations are listed here fir the period 1770-1789:

101 82 66 35 31 7 20 92 154 125 85 68 38 23 10 24 83 132 131 118.

Can we ascertain any trend from the data set?

Page 24: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

The modelling approach

The problem fundamental to all modelling studies in physical system is the identification of the function “F” that would allow the prediction of the pollutant physical quantity of interest C(x, y, z, t) at any point in space (x, y, z) and time (t) if the pollutant loading and other system physical variables are given.

Three different approaches have established to identify “F”

• Deterministic mathematical modelling

– Analytical models

– Numerical models

• Statistical modelling

• Physical modelling

Page 25: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Approaches to analyse any system, phenomenon or process

Any phenomenon/processor system

Deterministic approach Stochastic approach Physical approach

Page 26: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Statistics - Introduction

The processing of statistical information has a history that extends

back to the beginning of mankind. In early biblical times nations

compiled statistical data to provide descriptive information

relative to all sorts of things, such as taxes, wars, agricultural

crops, and even athletic events.

Today, with the development of probability theory, we are able to

use statistical methods that not only describe important features

of the data but methods that allow us to proceed beyond the

collected data into the area of decision making through

generalisations and predictions.

Page 27: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Statistics was developed to assist in those areas where laws of cause and effect are not apparent to the observer and where an objective approach is needed.

Historically, many environmental studies were qualitative than quantitative. However, in recent years the need to develop and use quantitative mathematical analysis has become apparent to environmental researcher and policy makers.

Page 28: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

28

What is Statistics…

• Science of gathering, analyzing, interpreting, and presenting data

• Facts and figures

• Measurement taken on a sample

• It is the technique of drawing inferences about the population from the collected samples

Statistics is the science of understanding the “order” behind “disordered array of numbers”. It provides a tool to understand the “process of generation” of “groups of numbers” in efficient and objective way.

Page 29: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Concept of a random variable

A random processWhen the outcome of a phenomenon or process or experiment is dependent on several causative variables, some which may or may not be known to the analyst, the process is known as random process.

Important note: If all of the causative variables were known, and the cause-effect relationships were well-understood, then the process would be deterministic. In deterministic process, the outcome is known to the analyst with certainty.

Page 30: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

The outcomes of a random process, when represented in terms of numbers, will thus be variable.

A random variable, usually written X, is a variable whose allpossible values are numerical outcomes of a random phenomenon.

There are two types of random variables - discrete and continuous.

A A discrete random variablediscrete random variable may assume either a finite may assume either a finite

number of values or an infinite sequence of values.number of values or an infinite sequence of values.

Page 31: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Discrete random variable with a finite Discrete random variable with a finite number of valuesnumber of values

Let Let xx = number of TV sets sold at the store in = number of TV sets sold at the store in one day where one day where xx can take on 5 values (0, 1, 2, 3, can take on 5 values (0, 1, 2, 3, 4)4)

Discrete random variable with an infinite Discrete random variable with an infinite sequence of valuessequence of values

Let Let xx = number of customers arriving in one day = number of customers arriving in one day

where where xx can take on the values 0, 1, 2, . . . can take on the values 0, 1, 2, . . .

We can count the customers arriving, but there We can count the customers arriving, but there is no finite upper limit on the number that might is no finite upper limit on the number that might arrive.arrive.

Examples

Page 32: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

A A continuous random variablecontinuous random variable may assume any may assume any numerical value in an interval or collection of numerical value in an interval or collection of intervals.intervals.

ExamplesExamples include height, weight, the amount of sugar in include height, weight, the amount of sugar in an orange, the time required to run a mile.an orange, the time required to run a mile.

Page 33: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Key Definitions

• A population (universe) is the collection of things under consideration

• A sample is a portion of the population selected for analysis

• A pararmeter is a summary measure computed to describe a characteristic of the population

• A statistic is a summary measure computed to describe a characteristic of the sample

Page 34: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Population and Sample

Population Sample

Use parameters to summarise features

Use statistic to summarise features

Inference on the population from the sample

Page 35: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Statistics

• Descriptive statisticsUsed for summarising the information contained in an array of numbers (data set) so that interpretations can be made about the underyling data generation process.

• Inferential statistics

Used for drawing conclusions about the population using a representative sample.

Page 36: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

36

Descriptive statistics

• Encompasses the following:– Graphical or pictorial display– Condensation of large masses of data into a form

such as tables– Preparation of summary measures to give a concise

description of complex information (e.g. an average figure)

– Exhibition of patterns that may be found in sets of information

Page 37: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Descriptive Statistics

• Collect data– e.g. Survey

• Present data– e.g. Tables and graphs

• Characterize data– e.g. Sample mean =

iX

n

Page 38: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Descriptive statistics

Methods concerned with collecting and describing a set of data so as to yield meaningful information.

• Numerical descriptors (numerical summaries of data)

Measures of – Central tendency– Variation (dispersion)– Position– Shape

• Pictorial/graphical descriptors– Graphs for single variable– Graphs for two or more variables

Page 39: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Inferential statistics

Methods concerned with the analysis of a subset of data (sample) leading to predictions or inferences about the entire set of data (population).

• Theory of estimation

• Theory of hypothesis testing

Page 40: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Inferential Statistics

• Estimation– e.g.: Estimate the population mean

weight using the sample mean weight

• Hypothesis testing– e.g.: Test the claim that the population

mean weight is 120 pounds

Drawing conclusions and/or making decisions concerning a population based on sample results.

Page 41: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

41

Inferential Statistics..

• Especially relates to:

– Determining whether characteristics of a situation are unusual or if they have happened by chance

– Estimating values of numerical quantities and determining the reliability of those estimates

– Using past occurrences to attempt to predict the future

Page 42: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Data and Data Sets

• Data are the facts and figures that are collected, summarized, analysed, and interpreted.

• The data collected in a particular study are referred to as the data set.

Page 43: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Qualitative and Quantitative Data

• Data can be further classified as being qualitative or quantitative.

• The statistical analysis that is appropriate depends on whether the data for the variable are qualitative or quantitative.

• In general, there are more alternatives for statistical analysis when the data are quantitative.

Page 44: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Qualitative Data

• Qualitative data are labels or names used to identify an attribute of each element.

• Qualitative data use either the nominal or ordinal scale of measurement.

• Qualitative data can be either numeric or nonnumeric.

• The statistical analysis for qualitative data are rather limited.

Page 45: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Quantitative Data

• Quantitative data indicate either how many or how much.– Quantitative data that measure how many are discrete.

– Quantitative data that measure how much are continuous because there is no separation between the possible values for the data.

• Quantitative data are always numeric.

• Ordinary arithmetic operations are meaningful only with quantitative data.

Page 46: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Measurement

To bring in objectivity in our decision making towards the solution of a problem we need to “measure” the “process”/“phenomenon”, and “objects”/ “observations” within it.

Measurement is the process of assigning numbers to objects or observations.

Level of Measurement has to do with precision associated with that level.

It depends on the “rules” under which the numbers are assigned.

Page 47: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Measurement (contd.)

• Measurement is important not only in data analysis but also in the selection of the appropriate statistical/mathematical treatment to which the data can be subjected to extract meaningful information.

• Generally speaking, the differences between the data types affect the choice of statistical technique to be used to analyse the data.

Page 48: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Data types and measurement scales

Data

Non-metricor

qualitative

Metricor

quantitative

Ratioscale

Intervalscale

Ordinalscale

Nominalscale

Page 49: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Measurement scales

• Nominal scale

• Ordinal scale

• Interval scale

• Ratio scale

Page 50: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Nominal scale

• Numbers, or symbols are used to identify groups or classes to which various objects belong.

• Provide convenient ways of keeping track of people, objects and events.

• Used to separate samples for analysis.– e.g. mean pollutant levels from different types of vehicles can be

compared to see whether they differ in amount of lead they emit.

• Also be used for frequency analysis.– e.g. examining the number of times flow exceeds the danger levels at a

river gauging station.

• Frequencies can only assume discrete (integer) values.

Page 51: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Nominal scale (contd.) • Arithmetic can be performed on the frequencies but not on the

group identification.

• The categories are mutually exclusive.

• Although numbers can be used to code each category, these are pure labels and have no value.

• Therefore, no mathematical operators can be used to extract meaning out of the data.

Page 52: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Nominal scale (contd.)

• Least powerful level of measurement

• Indicates no order or distance relationship and has no arithmetic origin.

• Statistical tools that can be employed on nominal data– Mode as a measure of central tendency

– Chi-square test

– Contingency coefficient as a measure of correlation

Page 53: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Ordinal scale

• Essential feature is that the relative order of the objects or classes can be identified but not quantified.

• Represents next higher level of measurement precision.

• Variables can be ordered or ranked with ordinal scales in relation to the amount of the attribute possessed.– e.g. strength of opinion regarding a particular topic– Complexity of environment

• Every subclass can be compared with another in terms “>” or “<” relationship.

• Numbers utilised are non-quantitative, since they indicate only relative positions in an ordered series.

Page 54: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Ordinal scale (contd.)

• Ordinal data are on a scale with a defined direction.– e.g.one point can be described larger than another

• The scale places events in order, but there is no attempt to make the intervals of the scale equal in terms of some rule.

• Statistical tools that can be employed on nominal data– Median is used as a measure of central tendency

– Percentile or quartile is used as a measure of dispersion

– Non-parametric tests

Page 55: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Interval scale• In addition to inequalities, the interval sizes (difference)between groups are

measurable.

• Mathematical operators valid:– “<“ and “>”, “+” and “-”

• Multiplication and division invalid

• Because scale has arbitrary zero I.e. the 0 on an interval scale does not indicate the complete absence of whatever quantity we are trying to measure.

• The scale is characterised by a “unit of measurement” that assigns a real number to the relationships (distances) between all pairs of objects or groups.

• All statistical tools/computations applicable except those involving fractions like coefficient of variation, geometric mean, harmonic mean etc.

Page 56: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Ratio scale• Measurements with all the characteristics of an interval scale

plus a physically definable zero-point (absolute zero). This scale must contain a zero value that indicates that nothing exists for the variable at the zero point.

• That is in addition to setting up inequalities and forming differences we can also form quotients.

• Mathematical operators valid:– All customary operators (“<“ & “>”, “+” & “-”, “” and “”)

• Most precise of all scales.

• Examples: length, mass, weight, flow, temperature measured in Kelvin scale etc.

Page 57: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Measurement scales – Final remark

• Proceeding from the nominal scale (the least precise type of scale) to ratio scale (the most precise, relavant information is obtained increasingly

• If the nature of variables permits, the researcher should use the scale that provides the most precise description.

• Generally, measurements in physical sciences are in ratio scales, however, in management, behaviour sciences measurements are restricted to interval scales.

• Finally, the interval scale is the first quantitative measurement scale. The nominal scale names and counts or attributes of objects, and the ordinal scale arranges objects.

Page 58: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Types of data

• Primary data

Refers to information obtained firsthand on the variables of interest for the specific purpose of the study.

• Secondary data

Refers to information gathered from sources already existing.

Page 59: Statistical techniques 4 Credits [3-1-0] Course coordinator: Prateek Sharma

Data Sources

PrimaryData Collection

SecondaryData Compilation

Observation

Experimentation

Survey

Print or Electronic