cs7267 machine learning - kennesaw state...
TRANSCRIPT
CS7267 MACHINE LEARNING
LINEAR REGRESSION
Mingon Kang, Ph.D.
Computer Science, Kennesaw State University
* Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington
Correlation (r)
Linear association between two variables
Show how to determine both the nature and
strength of relationship between two variables
Correlation lies between +1 to -1
Zero correlation indicates that there is no
relationship between the variables
Pearson correlation coefficient
most familiar measure of dependence between two
quantities
Correlation (r)
where E is the expected value operator, cov(,) means
covariance, and corr(,) is a widely used alternative
notation for the correlation coefficient
Reference: https://en.wikipedia.org/wiki/Correlation_and_dependence
Coefficient of Determination (๐2)
Coefficient of determination lies between 0 and 1
Represented by ๐2
Measure of how well the regression line represents the data
If r = 0.922, then ๐2=0.85
Means that 85% of the total variation in y can be explained by the linear relationship between x and y in linear regression
The other 15% of the total variation in y remains unexplained
Linear Regression
How to represent the data as a vector/matrix
We assume a model:
๐ฒ = b0 + ๐๐ + ฯต,
where b0 and ๐ are intercept and slope, known as
coefficients or parameters. ฯต is the error term (typically
assumes that ฯต~๐(๐, ๐2)
Linear Regression
Simple linear regression
A single independent variable is used to predict
Multiple linear regression
Two or more independent variables are used to predict
Linear Regression
How to represent the data as a vector/matrix
Include bias constant (intercept) in the input vector
๐ โ โ๐ร(๐+๐), ๐ฒ โ โ๐, ๐ โ โ๐+๐, and ๐ โ โ๐
๐ฒ = ๐ โ ๐ + ๐
๐ = ๐, ๐ฑ๐, ๐ฑ๐, โฆ , ๐ฑ๐ฉ , ๐ = {๐0, ๐1, ๐2, โฆ , ๐๐}T
๐ฒ = {๐ฆ1, ๐ฆ2, โฆ , ๐ฆ๐}T, ๐ = {๐1, ๐2, โฆ , ๐๐}
T
โ is a dot product
equivalent toy๐ = 1 โ b0 + ๐ฅ๐1b1 + ๐ฅ๐2b2 +โฏ+ ๐ฅ๐๐bp (1 โค ๐ โค ๐)
Linear Regression
Find the optimal coefficient vector b that makes the
most similar observation
๐ฆ1โฎ๐ฆ๐
=111
๐ฅ11 โฏ ๐ฅ1๐โฎ โฑ โฎ
๐ฅ๐1 โฏ ๐ฅ๐๐
๐0โฎ๐๐
+
๐1โฎ๐๐
Ordinary Least Squares (OLS)
๐ฒ = ๐๐ + ๐
Estimate the unknown parameters (b) in linear regression model
Minimizing the sum of the squares of the differences between the observed responses and the predicted by a linear function
Sum squared error =
๐=1
๐
(๐ฆ๐ โ ๐ฑ๐โ๐)2
Optimization
Need to minimize the error
min ๐ฝ(๐) =
๐=1
๐
(๐ฆ๐ โ ๐ฑ๐,โ๐)2
To obtain the optimal set of parameters (b),
derivatives of the error w.r.t. each parameters must
be zero.
Optimization
๐ฝ = ๐T๐ = ๐ฒ โ ๐๐ โฒ ๐ฒ โ ๐๐= ๐ฒโฒ โ ๐โฒ๐โฒ ๐ฒ โ ๐๐= ๐ฒโฒ๐ฒ โ ๐ฒโฒ๐๐ โ ๐โฒ๐โฒ๐ฒ + ๐โฒ๐โฒ๐๐= ๐ฒโฒ๐ฒ โ ๐๐โฒ๐โฒ๐ฒ + ๐โฒ๐โฒ๐๐
๐๐โฒ๐
๐๐= โ2๐โฒ๐ฒ + 2๐โฒ๐๐ = 0
๐โฒ๐ ๐ = ๐โฒ๐ฒ๐ = (๐โฒ๐)โ1๐โฒ๐ฒ
Matrix Cookbook: https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
Linear regression for classification
For binary classification
Encode class labels as y = 0, 1 ๐๐ {โ1, 1}
Apply OLS
Check which class the prediction is closer to
If class 1 is encoded to 1 and class 2 is -1.
๐๐๐๐ ๐ 1 ๐๐ ๐ ๐ฅ โฅ 0
๐๐๐๐ ๐ 2 ๐๐ ๐ ๐ฅ < 0
Linear models are NOT optimized for classification
Logistic regression
Assumptions in Linear regression
Linearity of independent variable in the predictor
normally good approximation, especially for high-dimensional data
Error has normal distribution, with mean zero and constant variance
important for tests
Independent variables are independent from each other
Otherwise, it causes a multicollinearity problem; two or more predictor variables are highly correlated.
Should remove them