Download - Correlation Final

LOGO

CORRELATION ANALYSIS

CORRELATION ANALYSIS

MBA “A”NewishJashan

Jotdeep SinghYogesh

Introduction

Correlation a LINEAR association between two random variables

Correlation analysis show us how to determine both the nature and strength of relationship between two variables

When variables are dependent on time correlation is applied

Correlation lies between +1 to -1

A zero correlation indicates that there is no relationship between the variables

A correlation of –1 indicates a perfect negative correlation

A correlation of +1 indicates a perfect positive correlation

Types of CorrelationThere are three types of correlation

Types

Type 1 Type 2 Type 3

Type1

Positive Negative No Perfect

If two related variables are such that when one increases (decreases), the other also increases (decreases).

If two variables are such that when one increases (decreases), the other decreases (increases)

If both the variables are independent

When plotted on a graph it tends to be a perfect line

When plotted on a graph it is not a straight line

Type 2

Linear Non – linear

Two independent and one dependent variable One dependent and more than one independent

variables One dependent variable and more than one

independent variable but only one independent variable is considered and other independent variables are considered constant

Type 3

Simple Multiple Partial

Methods of Studying Correlation

Scatter Diagram Method

Karl Pearson Coefficient Correlation of Method

Spearman’s Rank Correlation Method

0

20

40

60

80

100

120

140

160

180

0 50 100 150 200 250

Drug A (dose in mg)

Sy

mp

tom

In

de

x

0

20

40

60

80

100

120

140

160

0 50 100 150 200 250

Drug B (dose in mg)

Sym

ptom

In

dex

Very good fit Moderate fit

Correlation: Linear Relationships

Strong relationship = good linear fit

Points clustered closely around a line show a strong correlation. The line is a good predictor (good fit) with the data. The more spread out the points, the weaker the correlation, and the less good the fit. The line is a REGRESSSION line (Y = bX + a)

Coefficient of CorrelationA measure of the strength of the linear relationship

between two variables that is defined in terms of the (sample) covariance of the variables divided by their (sample) standard deviations

Represented by “r”

r lies between +1 to -1

Magnitude and Direction

-1 < r < +1

The + and – signs are used for positive linear correlations and negative linear correlations, respectively

2222 )()(

YYnXXn

YXXYnr xy

Shared variability of X and Y variables on the topIndividual variability of X and Y variables on the bottom

Interpreting Correlation Coefficient r

strong correlation: r > .70 or r < –.70 moderate correlation: r is between .30

& .70or r is between –.30 and –.70

weak correlation: r is between 0 and .30 or r is between 0 and –.30 .

Coefficient of Determination

Coefficient of determination lies between 0 to 1

Represented by r2

The coefficient of determination is a measure of how

well the regression line represents the data

If the regression line passes exactly through every

point on the scatter plot, it would be able to explain all

of the variation

The further the line is away from the points, the less it

is able to explain

r 2, is useful because it gives the proportion of the variance

(fluctuation) of one variable that is predictable from the

other variable

It is a measure that allows us to determine how certain one

can be in making predictions from a certain model/graph

The coefficient of determination is the ratio of the

explained variation to the total variation

The coefficient of determination is such that 0 < r 2 < 1,

and denotes the strength of the linear association between

x and y

The Coefficient of determination represents the percent of the data that is the closest to the line of best fit

For example, if r = 0.922, then r 2 = 0.850

Which means that 85% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation)

The other 15% of the total variation in y remains unexplained

Spearmans rank coefficient

A method to determine correlation when the data

is not available in numerical form and as an

alternative the method, the method of rank

correlation is used. Thus when the values of the

two variables are converted to their ranks, and

there from the correlation is obtained, the

correlations known as rank correlation.

Computation of Rank Correlation

Spearman’s rank correlation coefficient ρ

can be calculated when

Actual ranks given

Ranks are not given but grades are given but not

repeated

Ranks are not given and grades are given and

repeated

LOGOBUSINESS STATISTICS

PRESENTATIONON

REGRESSION ANALYSIS

OBJECTIVES OF THE PRESENTATION-

What is regression analysis

Types and methods of regression analysis

Practical aspect of regression analysis with an example

INTRODUCTION-

Regression analysis is the statistical tool which is employed for the purpose of forecasting or making estimates

Here we make use of various mathematical formulas and assumptions to describe a real world situation.

In every situation, estimation becomes easy once it is known that the variable to be estimated is related to and dependent to some other variable.

For making estimates we first have to model the relationship between the variable involved .

Models can me broadly be classified into –

Linear regression-

Linear regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of another variable.More precisely, if X and Y are two related variables, then linear regression analysis helps us to predict the value of Y for a given value of X or vice verse.For example age of a human being and maturity are related variables. Then linear regression analyses can predict level of maturity given age of a human being.

Multiple regression-

Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables- also called the predictors.

Multiple regression analysis helps us to predict the value of Y for given values of X1, X2, …, Xk.

For example the yield of rice per acre depends upon quality of seed, fertility of soil, fertilizer used, temperature, rainfall. If one is interested to study the joint affect of all these variables on rice yield, one can use this technique.

Dependent and Independent Variables-

By linear regression, we mean models with just one independent and one dependent variable. The variable whose value is to be predicted is known as the dependent variable and the one whose known value is used for prediction is known as the independent variable.

By multiple regression, we mean models with just one dependent and two or more independent variables. The variable whose value is to be predicted is known as the dependent variable and the ones whose known values are used for prediction are known independent variables.

Methods of solving regression models-

1) GRAPHICAL METHOD-

In this graphical method the average relationship between the dependent variable and independent variable is expressed by a line called “line of best fit”.

Example: Experience( in years)

Income( in ‘000)

15 150

10 120

5 60

3 40

8 70

9 90

2 4 6 8 10 12 14 16

60

90

120

150

30

180

210

18

240

0

Line of best fit

income

experience

2) ALGEBRIC METHOD-In this method we make use of regression equation and regression coefficients.

Regression equation(Linear).

The general equation is given by-y = a + bx a is the intercept b is the slope of line

With the use of the above general equation we find the normal equations

Multiplying the general equation by N and taking the summatation of it we find the first normal equation i.e.

∑Y = N.a + b∑X

And again to find the second normal equation we multiply the general equation by x and then take the summatation i.e.

∑XY=a ∑X + b ∑X2

A statistical technique used to explain or predict thebehaviour of a dependent variable

General equation => y = a + b1 x1 + b2x2 + .........+ bnxn

Regression equation(Multiple).

Normal equations for multiple regression are:

∑Y = N.a + b1∑X1 + b2∑X2

∑X1Y= a ∑X1 + b1 ∑ X1 2 + b2∑ X1 . X2

∑X2Y= a ∑X2 + b1 ∑ X1 . X2 + b2∑ X2

2

Lines of Regression

There are two lines of regression- that of Y on X and X on Y.

The line of regression of Y on X is given by Y = a + bX where a and b are unknown constants known as intercept and slope of the equation. This is used to predict the unknown value of variable Y when value of variable X is known.

On the other hand, the line of regression of X on Y is given by X = c + dY which is used to predict the unknown value of variable X using the known value of variable Y.

Often, only one of these lines make sense.Exactly which of these will be appropriate for the analysis in hand will depend on labeling of dependent and independent variable in the

problem to be analyzed.

Regression coefficients-

The two regression co-efficient are byx and bxy . The formula for the two regression coefficient are given by –

or b y x = N .∑XY − ∑ X . ∑Y N. ∑X2 − (∑X)2

b x y = N.∑ XY – ∑X . ∑ Y N. ∑Y2 – (∑Y)2

The coefficient of X in the line of regression of Y on X is called the regression coefficient of Y on X and is denoted by b y x

It represents change in the value of dependent variable (Y)corresponding to unit change in the value of independent variable (X).

And similarly the coefficient of Y in the line of regression of X on Y is called coefficient of X on Y and is denoted by b x y .

How Good Is the Regression?

Once a regression equation has been constructed, we can check how good it by examining the coefficient of determination (R2). R2 always lies between 0 and 1.

The closer R2 is to 1, the better is the model and its prediction.

PRACTICAL ASPECT OF REGRESSION ANALYSIS-

Here we will show a linear regression analysis between two

variables X and Y.

Variable X is taken as “ driving experience” and variable Y is

taken as “number of road accidents(in a year)”.

Road accident is taken as the dependent variable and which

is related to independent variable X i.e. driving experience.

X (driving experience)

5 2 12 9 15 6 25 16

Y ( no. of road accidents)

64 87 50 71 44 56 42 60

From the date we will show-

The estimated regression line for the date.

Number of road accidents taking place when the

driving experience is 10 years and 30 years.

co efficient of determination(R2) and which will

help us to know that how much percentage of

dependent variable is explained by independent

variable.

X Y X.Y X2 Y2

5 64 320 25 4096

2 87 174 4 7569

12 50 600 144 2500

9 71 639 81 5041

15 44 660 225 1963

6 56 336 36 3136

25 42 1050 625 1764

16 60 960 256 3600

∑X=90 ∑Y=474 ∑X.Y=4739 ∑X2=1396 ∑Y2=29642

The following is the tabular representation of data related to driving experience and number of road accidents.

Since the estimated regression line is given by Y = a + b.X , now using the normal equations we calculate the value of a and b .

∑Y = N. a + b ∑X

474= 8.a + b.90

8a + 90b = 474 E .q - 1

∑XY=a ∑X + b ∑X2

4739 = a.90 + b.1396

90a + 1396 b = 4739 E.q-2

Now solving both the equation we get the value of a and b as-

Value of a = 76.66 Value of b = -1.5475

The estimated regression line is

Y = 76.66 – 1.5476 X

3 6 9 12 15 18 21 24 27

experience

80

70

60

50

40

30

20

10

No. Of accidents

Trend line for Y = 76.66 – 1.5476 X

Since we all know that the road accidents are dependent upon the driving experience and a new driver is considered to be inexperienced and for him the risk of accident is more so there exist a negative relationship between the two variables so the trend line is downward sloping in this case.

From the above value of a and b we can see that value of a is 76.66 which means if a driver has 0 experience then the no of road accidents that will take place is 76.66

From the value of b we can say that for every extra year of driving experience , the road accident is decreased by 1.5476

No of accidents with 10 yr experience No. of accidents with 30 yr experience

Y = 76.66 – 1.5476 XY = 76.66 – 1.5476 (10)Y = 61. 184

Y = 76.66 – 1.5476 XY = 76.66 – 1.5476 (30)Y= 30.232

Now we find coefficient of variation for the data using regression coefficients.

b y x = N .∑XY − ∑ X . ∑YN. ∑X2 − (∑X)2

b x y = N.∑ XY – ∑X . ∑ YN. ∑Y2 – (∑Y)2

= 8 (4739) − 90 . 474

8(1396) − (90)2

= − 1.547

= 8(4739) − 90. 474

8(29642)− (474)2

= − 0.381

Now R2 = b y x .b x y

= (- 1. 547) (- 0.381)

= 0.5894

From the above coefficient of determination we can say that almost 59 % of variance of dependent variable is explained by the independent variable.

LOGO

Conceptual Frame work of SENSEX and Nifty

Conceptual Frame work of SENSEX and Nifty

Stock Market Indices

Stock Market performance is quantified by calculating an index using the benchmark scrip’s and as known to all SENSEX (Sensitive Index) is associated with Bombay Stock Exchange and S&P CNX NIFTY is associated with National Stock Exchange

Bombay Stock Exchange

There are 23 stock exchanges in the India. Bombay Stock Exchange is the largest, with over 6,000 stocks listed. The BSE accounts for over two thirds of the total trading volume in the country.

Established in 1875, the exchange is also the oldest in Asia. Among the twenty-two Stock Exchanges recognized by the Government of India under the Securities Contracts (Regulation) Act, 1956, it was the first one to be recognized and it is the only one that had the privilege of getting permanent recognition.

Scrip’s at BSE

ACC AIRTEL BHEL DLF GRASIM GUJRAT AMBUJA HDFC HDFC BANK HINDALCO HUL ICICI BANK INFOSYS SUN Pharma IND.

LTD ITC L&TMARUTI

o MARUTIo MAHINDRA &

MAHINDRAo NTPCo ONGCo RANBAXYo RELIANCE

COMMUNICATIONo RELIANCE

INFRASTRUCTUREo RILo STERLITE

INDUSTIES LTDo SBIo TCSo TATA MOTERSo TATA STEELo TATA POWER

COMPANY LTDo WIPRO

http://en.wikipedia.org/wiki/Image:Bombay-Stock-Exchange.jpg

National Stock Exchange

The National Stock Exchange (NSE), located in Bombay, is India's first debt market.

It was set up in 1993 to encourage stock exchange reform through system modernization and competition.

The instruments traded are, treasury bills, government security and bonds issued by public sector companies

How are the SENSEX 30 Stocks are selected?

Listing History Trading Frequency Rank based on the Market Cap (Should be

Among top 100) Market Capitalization weight Industry / sector they belong Historical Record

Methodology of SENSEX

SENSEX has been calculated since 1986 and initially it was calculated based on the Total Market Capitalization methodology and the methodology was changed in 2003 to Free Float Market Capitalization.

Hence, these days, the SENSEX is based on the Free Floating Market cap of 30 SENSEX Stocks traded on the BSE relative to the base value which is 100(1978-79) and it is calculated for every 15 seconds

SENSEX is calculated using the "Free-float Market Capitalization" methodology, wherein, the level of index at any point of time reflects the free-float market

It reflects value of 30 component stocks relative to a base period.

The market capitalization of a company is determined by multiplying the price of its stock by the number of shares issued by the company.

This market capitalization is further multiplied by the free-float factor to determine the free-float market capitalization.

How SENSEX is calculated?

The formula for calculating the SENSEX = (Sum of free flow market cap of 30 benchmark stocks)*Index Factor

where, Index Factor = 100/Market Cap Value in

1978-79. 100 is the Index value during 1978-79.

How NIFTY is calculated?

The National Stock Exchange (NSE) is associated with NIFTY and it is also calculated by the same methodology but with two key differences.

1. Base year is 1995 and base value is 1000.

2. NIFTY is calculated based on 50 stocks.

Formulae for valuation

SENSEX=

Free float market Capital

Market Capital in 1978-79

Base index points of 1978-79

Download - Correlation Final

Top Related