research design

Research Design

Meaning of Research Design• Task of defining the research problem is the preparation of the research

project, popularly known as the “research design". • Decisions regarding what, where, when, how much, by what means

concerning an inquiry or a research study constitute a research design.• A plan or strategy for conducting the research.• A research design is one that minimizes bias and maximizes the reliability

of the data. • It also yields maximum information, gives minimum experimental error,

and provides different aspects of a single problem. • A research design depends on the purpose and nature of the research

problem. Thus, one single design cannot be used to solve all types of research problem, i.e., a particular design is suitable for a particular problem.

Features of a good research designThe means of obtaining information The availability and skills of the researcher and his staff, if

any;The availability of time and money for the research work. It should be flexible enough to consider different aspects

of the study in case of exploratory.The design should be accurate with minimum bias in

case of accurate descriptionControl of extraneous variablesStatistical correctness for testing hypothesis

Research design have following parts

• Sampling design • Observational design• Statistical design• Operational design

..cont..Sampling Design

• Which deals with the methods of selecting items to be observed for the study.

Observational design

• Which relates to the condition under which the observation are to be create.

Statistical design

• Which concern the question of the of How the information and data gathered are to be analyzed ?

Operational design • Which deals with techniques by which the procedures satisfied in

sampling .

Different Research Designs

1) Exploratory research design2) Descriptive research design3) Diagnostic research design4) Hypothesis-testing research design

Exploratory research design

• Termed as formulative research because its main purpose is formulating a problem for more precise investigation.

• It is a type of research conducted for a problem, but the problem itself has not been clearly understood.

• In other words, exploratory research is a process of gathering facts and doing research that later allows for the team to create the best research design or data collection method available for specific subjects.

• This process will draw definitive conclusions only with caution due to the nature of the process. In many cases, this process leads to the understanding that no problem actually exists.

Exploratory research is guided by a set of hypotheses1) Operational definition2) Statistical testing

Descriptive research design• Descriptive research design is a type of research

method that is used when one wants to get information on the current status of a person or an object.

• It is used to describe what is in existence in respect to conditions or variables that are found in a given situation.

• Concerned with describing the characteristics of a particular individual or group .

Diagnostic research design

• Diagnostic research studies determine the frequency of occurrence of something or its association with something else. From the research design point of view, the design of such studies should be rigid and should focus on the following:

1) Objective of the study (what the study is about and why is it being made?)2) Methods of data collection (what techniques of data collection will be

adopted?)3) Sample selection (how much material will be needed?)4) Data collection (where the required data can be found and with what time

frequency should the data be related?)5) Data processing and analysis6) Reporting the findings

Hypothesis-testing research designHypothesis-testing research studies, also known as experimental studies, an experiment is a

study in which a treatment, procedure, or program is intentionally introduced and a result or outcome is observed. The American Heritage Dictionary of the English Language defines an experiment as “A test under controlled conditions that is made to demonstrate a known truth, to examine the validity of a hypothesis, or to determine the efficacy of something previously untried.”

True experiments have four elements:1) manipulation2) control 3) random assignment and 4) random selection.

The most important of these elements are manipulation and control. Manipulation means that something is purposefully changed by the researcher in the environment. Control is used to prevent outside factors from influencing the study outcome

Important Concepts relating to research design

• Dependent and independent variables• Extraneous variable• Control• Experimental and non-experimental

hypothesis- testing research• Experimental and control groups• Treatments

Variables

• Any characteristic which is subject to change and can have more than one value such as age, intelligence, motivation, gender, etc.

• Types of variables• Independent vs. Dependent vs. Controlled Variables

• Categorical vs. Continuous Variables

• Quantitative vs. Qualitative Variables

…cont..Dependent Variable• Variable affected by the independent variable• It responds to the independent variable.• In an experiment that which is supposed to be changed by the

independent.Independent Variable• Variable that is presumed to influence other variable• It is the presumed cause, whereas the dependent variable is the

presumed effect.• In an experiment that which is supposed to be manipulated by you. • The variable manipulated by the experimenter.

Examples (1)You are interested in “How stress affects

mental state of human beings?” Independent variable ----- Stress Dependent variable ---- mental state of human beingsYou can directly manipulate stress levels in your human subjects and measure how those stress levels change mental state.

(2) Promotion affects employees’ motivation Independent variable ----- Promotion Dependent variable ----Employees motivation

Other Names for Dependent and Independent Variables

• Dependent Variable• Explained• Predictand• Regressand• Response• Outcome• Controlled• Independent Variable• Explanatory• Predictor• Regressor• Stimulus• Covariate• Control

Terms related to Research DesignExtraneous variable• Independent variable that are not related to the purpose of the study, but

may affect the dependent variable are termed as extraneous variables.• Whatever effect is notices on dependent variable as a result of extraneous

variable is technically described as an ‘experimental error’.Control• One of the important characteristics of a good research design is to

minimize the influence or effect of extraneous variable. Experimental and non experimental hypothesis testing• Research in which the independent variable is manipulated is termed

experimental hypothesis –testing research.• Research in which an independent variable is not manipulated is called

non-experimental hypothesis.

..cont..Example of experimental and non-experimental• Suppose a researcher wants to study whether intelligence affects reading ability

for a group of students and for this purpose he randomly selects 50 students and tests their intelligence and reading ability by calculating the coefficient of correlation between the two sets of scores . This is an example of non-experimental hypothesis –testing research because herein the independent variable (intelligence)is not manipulated.

• But now suppose that our researcher randomly selects 50 students from a group of students who are to take a course in statistics and then divides them into two groups by randomly assigning 25 to Group A , the usual studies programme, and 25 to group B , the special studies programme. At the end of the course , he administers a test to each group in order to judge the effectiveness of the training programme on the student’s performance level. This is an example of experimental hypothesis testing.

..cont..Experimental and control group• In an experimental hypothesis –testing research when a group

is exposed to usual conditions, it is termed as control group. • When a group is exposed to some novel or special condition it

is termed as experimental group.Treatments• The different conditions under which experimental and

control groups are put are usually referred to as treatments.

Categorical vs. Continuous Variables• Categorical variables are variables that can take on specific values only within a defined range of values

like gender, marital status• consisting of discrete, mutually exclusive categories, such as “male/female,” “White/Black,” etc

• Continuous variables are variables that can theoretically take on any value along a continuum like age, income weight, height etc..

• When compared with categorical variables, continuous variables can be measured with a greater degree of precision.

• The choice of which statistical tests will be used to analyze the data is partially dependent on whether the researcher uses categorical or continuous variables.

• Certain statistical tests are appropriate for categorical variables, while other statistical tests are appropriate for continuous variables.

• As with many decisions in the research-planning process, the choice of which type of variable to use is partially dependent on the question that the researcher is attempting to answer.

Quantitative vs. Qualitative Variables

• Qualitative variables are variables that vary in kind, like “attractive” or “not attractive,” “helpful” or “not helpful,” or “consistent” or “not consistent”

• Quantitative variables are those that vary in amount like height, weight, salary etc

Hypothesis

• The research hypothesis is a predictive statement that relates an independent variable to dependent variable.

• A hypothesis may be defined as a proposition or set of proposition set forth as an explanation for occurrence of some specified group of phenomena either asserted merely as a provisional conjucture to guide some investigation or accepted as highly probable in the light of established facts

Purpose• Guides/gives direction to the study/investigation

• Defines Facts that are relevant and not relevant

• Suggests which form of research design is likely to be the most appropriate

• Provides a framework for organizing the conclusions of the findings

• Limits the research to specific area

• Offers explanations for the relationships between those variables that can be empirically tested

• Furnishes proof that the researcher has sufficient background knowledge to enable her/him to make suggestions in order to extend existing knowledge

• Structures the next phase in the investigation and therefore furnishes continuity to the examination of the problem

Experimental and non-experimental hypothesis testing

• When a group is exposed to usual conditions, it is termed as a control group.

• But when the group is exposed to be some special condition, it is termed as Experimental group

A Hypothesis• must make a prediction• must identify at least two variables• should have an elucidating power• should strive to furnish an acceptable explanation or

accounting of a fact• must be falsifiable meaning hypotheses must be capable of

being refuted based on the results of the study• must be formulated in simple, understandable terms• should correspond with existing knowledge• In general, a hypothesis needs to be unambiguous, specific,

quantifiable, testable and generalizable.

Categorizing Hypotheses

Can be categorized in different ways

1. Based on their formulation • Null Hypotheses and Alternate Hypotheses

2. Based on direction• Directional and Non-directional Hypothesis

3. Based on their derivation• Inductive and Deductive Hypotheses

..cont

1. Null Hypotheses and Alternate Hypotheses• Null hypothesis always predicts that – no differences between the groups being studied (e.g.,

experimental vs. control group) or – no relationship between the variables being studied

• By contrast, the alternate hypothesis always predicts that there will be a difference between the groups being studied (or a relationship between the variables being studied)

Example of Null and alternative hypothesis

• If we are to compare method A with method B about its superiority and if we proceed on the assumption that both methods are equally good, then this assumption is termed as the null hypothesis.

• As against this, we may think that the method A is superior or the method B is inferior , we are then stating what is termed as alternative hypothesis.

..cont..

• Alternate Hypothesis can further be classified as

2. Directional Hypothesis and Non-directional Hypothesis

..cont..2. Directional Hypothesis and Non-directional Hypothesis

• Simply based on the wording of the hypotheses we can tell the difference between directional and non-directional

– If the hypothesis simply predicts that there will be a difference between the two groups, then it is a non-directional hypothesis. It is non-directional because it predicts that there will be a difference but does not specify how the groups will differ.

– If, however, the hypothesis uses so-called comparison terms, such as “greater,”“less,”“better,” or “worse,” then it is a directional hypothesis. It is directional because it predicts that there will be a difference between the two groups and it specifies how the two groups will differ

The level of significance• This is very important concept in the context of hypothesis testing.• It is usually 5% which should be chosen with great care, thought and reason. In

case we take the significance level at 5% , then this implies that H0 will be rejected when the sampling result has a less that .05 probability of occurring if H0 is true.

• In other words , the 5% level of significance means that researcher is willing to take as much as 5% risk of rejecting the null hypothesis when H0 happens to be true. Thus the significance level is the maximum value of the probability of rejecting H0 when it is true and is usually determined in advance before testing the hypothesis.

Type 1 and type 2 errors• We may reject H0 when H0 is true , is type 1 error and is denoted by Alpha.• We may accept H0 when in fact H0 is not true, is type 2 error and is denoted

by beta.• The probability of type 1 error is usually determined in advance and is

understood as the level of significance of testing the hypothesis. • If type 1 error is fixed at 5% , it means that there are about 5 chances in 100

that we will reject H0 when H0 is true. We can control type 1 error just by fixing it at a lower level.

• With a fixed sample size , n, when we try to reduce Type 1 error , the probability of committing type 2 error increases. Both types of errors can not be reduced simultaneously.

• To deal with this trade off in business situations , decision makers decide the appropriate level of Type 1 by examining the cost or penalties attached to both types of errors.

Two-tailed and one-tailed tests• A two-tailed test, also known as a non directional

hypothesis, is the standard test of significance to determine if there is a relationship between variables in either direction. Two-tailed tests do this by dividing the .05 in two and putting half on each side of the bell curve.

• A two-tailed test rejects the null-hypothesis if sample mean is significantly higher or lower than the hypothesised value of the mean of the population.

• Such test is appropriate when the null hypothesis is some specified value and alternative hypothesis is a value not equal to the specified value of the null hypothesis

..cont..• Let's say I do a simple test called a t-test, which compares two averages.

I have the average stress levels from last year compared to the average stress levels of this year.

• After some probability calculations, I learn that there is no significant difference between last year's and this year's stress levels. This tells me that this year's stress levels are neither higher nor lower than last years.

• What if I jump in my time machine again and go back 15 years. The average age of my subjects is currently 26, so I will talk to them when they are about 11. I collect their stress levels and then jump back to the present and do another t-test, and I find out that their stress levels are lower now than when they were younger. The beauty of the two-tailed test is that when you run your numbers, the math will tell you if it's significantly higher or lower.

One-Tailed Test

• A one-tailed test, also known as a directional hypothesis, is a test of significance to determine if there is a relationship between the variables in one direction. A one-tailed test is useful if you have a good idea, usually based on your knowledge of the subject, that there is going to be a directional difference between the variables.

• It is used when we are to test whether the population mean is either lower than or higher than some hypothesised value.

Example for one tailed and two tailed tests

• Because the one-tailed test provides more power to detect an effect, you may be tempted to use a one-tailed test whenever you have a hypothesis about the direction of an effect.

• Before doing so, consider the consequences of missing an effect in the other direction. • Imagine you have developed a new drug that you believe is an improvement over an

existing drug. You wish to maximize your ability to detect the improvement, so you opt for a one-tailed test. In doing so, you fail to test for the possibility that the new drug is less effective than the existing drug. The consequences in this example are extreme, but they illustrate a danger of inappropriate use of a one-tailed test.

• So when is a one-tailed test appropriate? If you consider the consequences of missing an effect in the untested direction and conclude that they are negligible and in no way irresponsible or unethical, then you can proceed with a one-tailed test. For example, imagine again that you have developed a new drug. It is cheaper than the existing drug and, you believe, no less effective. In testing this drug, you are only interested in testing if it less effective than the existing drug. You do not care if it is significantly more effective. You only wish to show that it is not less effective. In this scenario, a one-tailed test would be appropriate.

Forming/Developing a Hypothesis

• Articulating the hypotheses that will be tested is one of the steps in the planning phase of a research study

• A hypothesis is formulated after – the problem has been stated and – the literature study has been conducted

• It is formulated when the researcher is totally aware of the theoretical and empirical background to the problem

The Initial Idea

• The initial idea is the starting point– Often vague or general, it requires refining before research

hypotheses can be generated

• Refinement of the initial idea is based on (1) a search of relevant research literature

(2) initial observations of the phenomenon

• Narrow and formalize the initial idea into a statement of the problem

Statement of the Problem

• In the form of a question that clearly indicates an expected relationship

– The nature of the question will dictate the required level of constraint of a study• Causal questions will require experimental research• Questions about relationships can be answered with lower

constraint research

• Convert into research hypothesis by operationally defining the variables

Hypothesis Testing

• All hypothesis tests are conducted the same way. • The researcher

1. states a hypothesis to be tested,

2. formulates an analysis plan,

3. analyzes sample data according to the plan, and

4. accepts or rejects the null hypothesis, based on results of the analysis.

Hypothesis Testing (Cont..)1. State the hypotheses.

Every hypothesis test requires the analyst to state a null and an alternative hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

2. Formulate an analysis plan. The analysis plan describes how to use sample data to accept or reject the null

hypothesis. It should specify the following elements. – Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0

and 1 can be used.– Test method. Typically, the test method involves a test statistic and a sampling distribution. Computed from

sample data, the test statistic might be a mean score, proportion, difference between means, difference between proportions, z-score, t-score, chi-square, etc. Given a test statistic and its sampling distribution, a researcher can assess probabilities associated with the test statistic. If the test statistic probability is less than the significance level, the null hypothesis is rejected.

http://stattrek.com/Help/Glossary.aspx?Target=Null%20hypothesis

http://stattrek.com/Help/Glossary.aspx?Target=Alternative%20hypothesis

http://stattrek.com/Help/Glossary.aspx?Target=Significance%20level

http://stattrek.com/Help/Glossary.aspx?Target=Sampling%20distribution

Hypothesis Testing (Cont..)• Analyze sample data. Using sample data perform computations called for in the analysis plan.

Test statistic. When the null hypothesis involves a mean or proportion, use either of the following equations

to compute the test statistic.• Test statistic = (Statistic - Parameter) / (Standard deviation of statistic)

Test statistic = (Statistic - Parameter) / (Standard error of statistic)• where Parameter is the value appearing in the null hypothesis, and Statistic is the

point estimate of Parameter. As part of the analysis, you may need to compute the standard deviation or standard error of the statistic. Previously, we presented common formulas for the standard deviation and standard error.

When the parameter in the null hypothesis involves categorical data, you may use a chi-square statistic as the test statistic. Instructions for computing a chi-square test statistic are presented in the lesson on the chi-square goodness of fit test.– P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic,

assuming the null hypothesis is true.

http://stattrek.com/Help/Glossary.aspx?Target=Point%20estimate

http://stattrek.com/AP-Statistics-4/Standard-Error.aspx

http://stattrek.com/AP-Statistics-4/Goodness-Of-Fit.aspx

Flow Diagram for Hypothesis testing

Tests Of Hypotheses• Hypothesis testing determines the validity of the assumption with

a view to choose between two conflicting hypotheses about the value of a population parameter.

• Hypothesis testing helps to decide on the basis of a sample data , whether a hypothesis about the population is likely to be true or false.

• Statisticians have developed several tests of hypotheses (also known as tests of significance) for the purpose of testing of hypothesis which can be classified as

a. Parametric tests or standard tests of hypotheses b. Non-Parametric tests or distribution –free tests of

hypothesis.

Tests Of significance

Parametric tests

• If the information about the population is completely known by means of its parameters then statistical test is called parametric test

• Eg: t- test, f-test, z-test, ANOVA

Nonparametric test

• If there is no knowledge about the population or parameters, but still it is required to test the hypothesis of the population. Then it is called non-parametric test

• Sample distribution is unknown.• When the population distribution is abnormal i.e.

too many variables involved.

• Eg: mann-Whitney, rank sum test, Kruskal-Wallis test

scale of measurement

Nominal • define an attribute• e.g. gender, marital status

Ordinal • rank or order the observations as

scores or categories from low to high in terms of «more or less»

• e.g. education, attitude/opinion scales

Interval • interval between observations in

terms of fixed unit of measurement

• e.g. measures of temperature

Ratio •The scale has a fundamental zero point•e.g. age, income

Nonparametric

*Parametric

1. Nominal data synonymous with categorical data, assigned names/ categories based on

characters with out ranking between categories. ex. male/female, yes/no, death /survival.

2. Ordinal data ordered or graded data, expressed as Scores or ranks. ex. pain graded as

mild, moderate and severe

3. Interval data an equal and definite interval between two measurements it can be

continuous or discrete .ex. weight expressed as 20, 21,22,23,24 here,interval between 20 &

21 is same as 23 &24

In addition to scale of measurement, we should look at the population distribution.

• Population is normally distributed-Parametric (may be used)

• Not normally distributed population or no assumption can be made about the population distribution -Nonparametric

(have to be used)

SAMPLE SIZE:

• Large Sample : sample of size is more than 30• Small Sample: sample of size less than or equal to 30• Many statistical test are based upon the assumption

that the data are sampled from a Gaussian distribution.

• Procedures for testing hypotheses about parameters in a population described by a specified distributional form, (normal distribution) are called parametric tests.

Types of Parametric tests

1. Large sample tests Z-test

2. Small sample tests t-test* Independent/ unpaired t-test

* Paired t-test ANOVA (Analysis of variance) * One way ANOVA * Two way ANOVA

Z- Test:• A z-test is used for testing the mean of a population versus a

standard, or comparing the means of two populations, with large (n ≥ 30) samples whether you know the population standard deviation or not.

• It is also used for testing the proportion of some characteristic versus a standard proportion, or comparing the proportions of two populations.Ex. Comparing the average engineering salaries of men versus women.Ex. Comparing the fraction defectives from two production lines.

T- test:• Properties of t distribution:i. It has mean 0ii. It has variance greater than oneiii. It is bell shaped symmetrical distribution about mean• Assumption for t test:i. Sample must be random, observations independentii. Standard deviation is not knowniii. Normal distribution of population

Uses of t test:

iv. The mean of the sample

v. The difference between means or to compare two samples

vi. Correlation coefficient

Types of t test:

a. Paired t test

b. Unpaired t test

Paired t test:

• Consists of a sample of matched pairs of similar units,

or one group of units that has been tested twice (a

"repeated measures" t-test).

• Ex. where subjects are tested prior to a treatment, say

for high blood pressure, and the same subjects are

tested again after treatment with a blood-pressure

lowering medication.

http://en.wikipedia.org/wiki/Unit_(statistics)

Unpaired t test:

• When two separate sets of

independent and identically distributed samples are obtained,

one from each of the two populations being compared.

• Ex: 1. compare the height of girls and boys.

2. compare 2 stress reduction interventions

when one group practiced mindfulness meditation while the

other learned progressive muscle relaxation.

http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables

ANALYSIS OF VARIANCE(ANOVA):

• Analysis of variance (ANOVA) is a collection of

statistical models used to analyze the differences between group

means and their associated procedures (such as "variation" among

and between groups),

• Compares multiple groups at one time

• Developed by R.A. Fisher.

• Two types: i. One way ANOVA

ii. Two way ANOVA

http://en.wikipedia.org/wiki/Statistical_model

http://en.wikipedia.org/wiki/Ronald_Fisher

One Way ANOVA:

It compares three or more unmatched groups when data are categorized in one way

Ex.

1. Compare control group with three different doses of aspirin in rats

2. Effect of supplementation of vit C in each subject before , during and after the treatment.

Two way ANOVA:

• Used to determine the effect of two nominal

predictor variables on a continuous outcome

variable.

• A two-way ANOVA test analyzes the effect of the

independent variables on the expected outcome

along with their relationship to the outcome itself.

Difference between one & two way ANOVA

• An example of when a one-way ANOVA could be used is if we want to determine if there is a difference in the mean height of stalks of three different types of seeds. Since there is more than one mean, we can use a one-way ANOVA since there is only one factor that could be making the heights different.

• Now, if we take these three different types of seeds, and then add the possibility that

three different types of fertilizer is used, then we would want to use a two-way ANOVA.

• The mean height of the stalks could be different for a combination of several reasons:

• The types of seed could cause the change,

the types of fertilizer could cause the change, and/or there is an interaction between

the type of seed and the type of fertilizer.

• There are two factors here (type of seed and type of fertilizer), so, if the assumptions

hold, then we can use a two-way ANOVA.

Parametric tests applied for different type of data

Types of Non-parametric test

1. One sample test• Chi-square test• One sample sign test

2. Two samples test• Median test• Two samples sign test

3. K-samples test• Median tets• Kruskal Wallis test

Types of Non-parametric test• Chi-square test (χ2):

– Used to compare between observed and expected data.1. Test of goodness of fit2. Test of independence3. Test of homogeneity

• Kruskal-Wallis test-– for testing whether samples originate from the same distribution.– used for comparing more than two samples that are independent, or

not related– Alternative to ANOVA.

• Wilcoxon signed-rank-– used when comparing two related samples or repeated measurements

on a single sample to assess whether their population mean ranks differ.

• Median test-– Use to test the null hypothesis that the medians of the populations

from which two samples are drawn are identical.– The data in sample is assigned to two groups, one consisting of data

whose values are higher than the median value in the two groups combined, and the other consisting of data whose values are at the median or below

• Sign test:– can be used to test the hypothesis that there is "no difference in

medians" between the continuous distributions of two random variables X and Y,

• Fisher's exact test:– test used in the analysis of contingency where sample sizes are small

Difference between parametric and non-parametric tests

References

1. Dr J V Dixit’s Principles and practice of biostatistics 5th edition.

2. Rao & Murthy’s applied statistics in health sciences 2nd edition.

3. Sarmukaddam’s fundamentals of biostatistics 1st edition.

4. Internet sources…….

research design

Data & Analytics