linear functions 2 sociology 5811 lecture 18 copyright © 2004 by evan schofer do not copy or...

35
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission

Upload: hilda-oneal

Post on 30-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

  • Linear Functions 2Sociology 5811 Lecture 18Copyright 2004 by Evan SchoferDo not copy or distribute without permission

  • AnnouncementsProposals due November 15Todays class:Linear functions as summaries; introduction to the linear regression model

  • Review: ScatterplotsQuestion: Can you describe the association?Answer: Negative linear association

  • Review: ScatterplotsQuestion: Can you describe the association?Answer: Non-linear positive association

  • Review: ScatterplotsQuestion: Can you describe the association?Answer: No association

  • Review: ScatterplotsNo relationship is represented by a cloud of evenly distributed pointsStrong linear relationships are reflected by visible diagonal lines on the graphNon-linear (curved) relationships are reflected by various curved patternsU-shaped, upside down U-shapedS-shaped, J-shaped

  • Review: Linear AssociationThe closer points fall to a single line, the higher the linear associationMeasured by the correlation coefficient (r) Some Linear Assoc.Higher Linear Assoc.

  • Review: Linear FunctionsFormula: Y = a + bXIs a linear formula. If you graphed X and Y for any chosen values of a and b, youd get a straight line.It is a family of functions, like the normal curveFor any value of a and b, you get a particular line a is referred to as the constant or intercept b is referred to as the slope To graph a linear function: Pick values for X, compute corresponding values of YThen, connect dots to graph line

  • Linear Functions: Y = a + bXThe constant or intercept (a)Determines where the line intersects the Y-axisIf a increases (decreases), the line moves up (down)

  • Linear Functions: Y = a + bXThe slope (b) determines the steepness of the line

  • Linear Functions: SlopesThe slope (b) is the ratio of change in Y to change in XThe slope tells you howmany points Y willincrease for any singlepoint increase in XSlope:b = 15/5 = 3

  • Linear Functions as SummariesA linear function can be used to summarize the relationship between two variables:Slope:b = 2 / 40,000 = .00005 pts/$If you change units:b = .05 / $1K b = .5 pts/$10K b = 5 pts/$100K

  • Linear Functions as SummariesSlope and constant can be eyeballed to approximate a formula:Slope (b):b = 2 / 40,000 = .00005 pts/$Constant (a) = Value where line hits Y axisa = 2Happy = 2 + .00005Income

  • Linear Functions as SummariesLinear functions can powerfully summarize data:Formula: Happy = 2 + .00005IncomeGives a sense of how the two variables are relatedNamely, people get a .00005 increase in happiness for every extra dollar of income (or 5 pts per $100K)Also lets you predict values. What if someone earns $150,000? Happy = 2 + .00005($150,000) = 9.5But be careful You shouldnt assume that a relationship remains linear indefinitelyAlso, negative income or happiness make no sense

  • Linear Functions as SummariesCome up with a linear function that summarizes this real data: years of education vs. job prestigeIt isnt always easy! The line you choose depends on how much you weight these points.

  • Linear Functions as SummariesOne estimate of the linear functionFormula: Y = 5 + 3X

  • Linear Functions as SummariesQuestions:How much additional job prestige do you get by going to college (an extra 4 years of education)?Formula: Prestige = 5 + 3*EducationAnswer: About 12 points of job prestigeChange in X is 4 Slope is 3. 3 x 4 = 12 pointsIf X=12, Y=5+3*12 = 41; If X=16, Y=5+3*16 = 53What is the interpretation of the constant?It is the predicted job prestige of someone with zero years of education (Prestige = 5)

  • Linear Functions as SummariesWhat do you think happens to the relationship between education and job prestige when education exceeds 20?Would it remain linear?Or would the effect taper off?Answer: Some would argue that the returns from education diminish beyond a certain point.

  • Interpreting Linear FunctionsNew Example: In a society, the relationship between education (years) and income (in 1000s of dollars per year) can be summarized by: Income (in 1000s) = 10 + 3(Education)Questions:What is the general range of salaries? 0 education = 10k, 20 yrs education = 70KWhat is the economic benefit of college?Would you encourage your child to attend school?What if it were: Income = 30 + .2(Education) ?

  • Interpreting Linear FunctionsExample: Income (1000s) = 10 + 3(Education)Questions: How would the society be different if the constant was 0? Provide a possible social interpretation.How would the society be different if the constant was 30?How would the society be different if the slope was zero? If it was negative?How would the society be different of the slope was 8?

  • Linear FunctionsMany issues remain:1. How to test for independence among two interval measures (like a chi-square test)?In order to know if a linear relationship exists2. How to calculate correlation coefficients (r) to measure linear association?3. How to calculate the linear formula that best summarizes the relationship between two real variables (i.e., based on actual data)?4. What kinds of hypothesis tests can be done?

  • Lines: Summaries and PredictionRecall: Lines can be used to summarizedollars in 1000s:Slope (b):b = 2 / 40 = .05 pts/K$Constant (a) = Value where line hits Y axisa = 2Happy = 2 + .05Income

  • Linear Functions as PredictionLinear functions can summarize the relationship between two variables:Formula: Happy = 2 + .05Income (in 1,000s)Linear functions can also be used to predict (estimate) a cases value of variable (Yi) based on its value of another variable (Xi)If you know the constant and slopeY-hat indicates an estimation function: bYX denotes the slope of Y with respect to X

  • Prediction with Linear FunctionsIf Xi (Income) = 60K, what is our estimate of Yi (Happiness)? Happy = 2 + .05IncomeHappiness-hat =2 + .05(60) = 5

  • The Linear Regression ModelTo model real data, we must take into account that points will miss the lineSimilar to ANOVA, we refer to the deviation of points from the estimated value as error (ei)In ANOVA the estimated value is: the group meani.e., the grand mean plus the group effectIn regression the estimated value is derived from the formula Y = a + bXEstimation is based on the value of X, slope, and constant (assumes linear relationship between X and Y)

  • The Linear Regression ModelThe value of any point (Yi) can be modeled as:The value of Y for case (i) is made up ofA constant (a)A sloping function of the cases value on variable X (bYX)An error term (e), the deviation from the lineBy adding error (e), an abstract mathematical function can be applied to real data points

  • The Linear Regression ModelVisually: Yi = a + bXi + ei

  • Estimating Linear EquationsQuestion: How do we choose the best line to describe our real data?Previously, we just eyeballed itAnswer: Look at the errorIf a given line formula misses points by a lot, the observed error will be largeIf the line is as close to all points as possible, observed error will be smallOf course, even the best line has some errorExcept when all data points are perfectly on a line

  • Estimating Linear EquationsA poor estimation (big error)

  • Estimating Linear EquationsBetter estimation (less error)

  • Estimating Linear EquationsLook at the improvement (reduction) in error: High Error vs. Low Error

  • Estimating Linear EquationsIdea: The best line is the one that has the least error (deviation from the line)Total deviation from the line can be expressed as:But, to make all deviation positive, we square it, producing the sum of squares error

  • Estimating Linear EquationsGoal: Find values of constant (a) and slope (b) that produce the lowest squared errorThe least squares regression lineThe formula for the slope (b) that yields the least squares error is:Where s2x is the variance of XAnd sYX is the covariance of Y and X.

  • CovarianceVariance: Sum of deviation about Y-bar over N-1Covariance (sYX): Sum of deviation about Y-bar multiplied by deviation around X-bar:

  • CovarianceCovariance: A measure of how much variance of a case in X is accompanied by variance in YIt measures whether deviation (from mean) in X tends to be accompanied by similar deviation in YOr if cases with positive deviation in X have negative deviation in YThis is summed up for all cases in the dataThe covariance is one numerical measure that characterizes the extent of linear associationAs is the correlation coefficient (r).