12/20/2016 climate change - vanier collegesun4.vaniercollege.qc.ca/~iti/proj/irina.pdf · bimodal...

23
12/20/2016 Climate Change Analysis with Multiple Linear Regression Probability and Statistics Vanier College Irina Moraru

Upload: others

Post on 19-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

12/20/2016

Climate Change Analysis with Multiple Linear Regression

Probability and Statistics Vanier College

Irina Moraru

Page 2: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

Abstract 3

Introduction, CO2 Emissions 4-5

Solar Irradiance 5

Transmission of Direct Solar Radiation 6

El Nino/ La Nina SOI 6-7

Previous Research 7

Methodology 8

Multiple Linear Regression, Introduction 8-9

Least Squares Estimation of Parameters 9-10

Estimating σ2 10

Test for Significance of Regression 11

R2 and Adjusted R2 11-12

Residual Analysis 12

Influential Observations 12-13

Selection of variables in Model Building 13-14

Results and discussion 14-21

Conclusion 21-22

Works Cited 23

2

Page 3: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

Abstract

It has been commonly accepted by the scientific community that the global temperatures have

increased by about 1.4 degrees Celsius since the beginning of the 20th century. However

there still continue to be controversies about the causes and factors that contribute to climate

change. Many argue that the increase of human activities is the main cause of global

warming, others that the increase in temperatures is due to natural factors (“Climate”).

However, over the course of the years the periods of increased industrialization coincide

with the intensified raise in global temperatures, which suggests that humans do have a

strong impact on global warming (Friedrich and Damassa). There are nonetheless other

natural factors like volcanic activity and El Nino phenomenon that greatly influence the raise

in global temperatures. In this study we want to find out if CO2 emissions in the atmosphere,

the Total Solar Irradiance, the El Nino Southern Oscillation Index, the atmospheric

transmission of direct solar radiation, predict monthly mean temperature anomalies. Our

goal is therefore to investigate the relationship there is between the factors previously

mentioned and the observed increases in monthly temperature anomalies. To do so a multiple

linear regression model was used and the model was run 3 times. The first time was run for

the period of time from January 1979 to July 2012.The second time it was run one the first

half of this period more precisely from January 1979 to December 1995 and the third time

the model was run for the second half of the time in consideration hence from January 1996

to July 2012. Applying the model to the total time period and then to the individual halves

allows us to verify if the initial model is consistent through time anywhere in the timeline of

the period under study. The model obtained for the first half was slightly different but still

within confidence intervals compared to the original, while the model obtained for the

second half was somewhat inconsistent with the original model. It is likely that this is due to

the fact that volcanic activity is only present in the first half and not in the second. After

having studied the relationship between the dependent and independent variables we can

therefore deduce that CO2 emissions in the atmosphere, the Total Solar Irradiance, the El

Nino Southern Oscillation Index, the atmospheric transmission of direct solar radiation all

significantly predict in different proportions temperature change.

3

Page 4: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

Introduction In this study four main predictors were chosen to build a multiple regression model. In the

following paragraphs I will introduce each variable present in the model and its significance in

predicting the global rise in temperatures.

CO2 Emissions CO2 is a greenhouse gas that is naturally present in the atmosphere in small quantities, however,

since the beginning of Industrialization humans have caused this gas to increase in our

atmosphere with consequences on global temperatures. Since 1990 carbon dioxide atmospheric

levels have increased by 0.4% each year. Greenhouse gases in the atmosphere serve to maintain

temperatures by allowing the short wavelengths of the sun to pass through them while trapping

inside the longer infrared wavelengths coming from the surroundings that have absorbed sunlight.

The increase of CO2 due to burning of fossil fuels and deforestation has augmented this effect

resulting in higher global temperatures. CO2 is one of many greenhouse gases emitted in the

atmosphere by human action. Some others are methane, nitrous oxide, O3 and

chlorofluorocarbons. Below are some diagrams representing the increase of CO2 and other

greenhouse gases over the course of the years and the proportions of some of these gases in the

atmosphere.

4

Page 5: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

As it is shown above, the fraction of other greenhouse houses is much smaller compared to the

emissions of CO2 therefore less predictive of the increase in temperatures. For simplicity we

only included CO2 emissions in the atmosphere as having the greater influence on our response.

The CO2 data collected in the Mauna Loa Observatory in Hawaii are measure by the number of

molecules of CO2 over the molecules of dry air multiplied by one million (ppm) Data on CO2

can be found at https://www.esrl.noaa.gov/gmd/ccgg/trends/ (Nave).

Solar Irradiance The Total Solar Irradiance measures the total amount of light energy coming from the sun in the

form photons with different wavelengths. The solar irradiance isn’t constant through time, but it

varies at intervals of 11 years, oscillating from maximum to a minimum. Any minimal changes

in solar irradiance can have great consequences to the temperatures felt on Earth. It is crucial to

include this variable in the model it is the primary source of energy on the planet, and its change

can lead to considerable change in average temperatures. The data used for this model were

measured at PMOD/WRC (Physikalisch-Meteorologisches Observatorium Davos/World

Radiation Center) in Davos, Switzerland, can be found at

https://www.ngdc.noaa.gov/stp/solar/solarirrad.html and are measured in Watts/Square meter

(Zell).

5

Page 6: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

Transmission of Direct Solar Radiation

The atmospheric transmission of direct solar radiation or Transparency is directly related to

volcanic activity. The Lava in volcanos contains sulfur dioxide in a small portion. During

volcanic eruptions, the ashes containing sulfur dioxide particles are projected into the

stratosphere where they bond to water vapor to become H2SO4. Some of these particles are very

light, but they are very numerous, therefore, they remain in the stratosphere for longer periods of

time and they absorb the solar energy stopping some of this radiation to reach the Earth surface.

This causes a cooling process and a drop of the mean global temperatures. For example,

evidence was found that during the 20th century 3 major volcanic eruptions decreased the global

temperatures by 1 degree Celsius. Despite their size these particles greatly influence

temperatures on earth, because they are present in big quantities in the stratosphere. This variable

has negative influence in increase in temperatures and therefore must be considered in our model

to accurately predict temperature changes. The atmospheric transmission of direct solar radiation

can be found at the following link https://www.esrl.noaa.gov/gmd/grad/mloapt.html and this

phenomenon is it measured in percentages of solar radiation passing through dry air

(“Volcanoes”).

El Nino/La Nina SOI The natural phenomenon of El Nino Southern Oscillation is characterized by fluctuating

temperatures of the Pacific Ocean along the equatorial region. The warm water oscillates in a

back and forth pattern through the Pacific Ocean and it affects temperatures on a local and global

scale. There are two phases between which the pattern can oscillate: a warmer than normal

central and eastern equatorial Pacific SSTs (El Niño) and cooler than normal central and eastern

equatorial Pacific SSTs (La Niña) (“Global Patterns”). The water temperatures of EL Nino/La

Nina are accompanied by high and low surface pressure respectively. Indeed The Southern

Oscillation describes a bimodal variation in sea level barometric pressure between observation

stations at Darwin, Australia and Tahiti. It is quantified in the Southern Oscillation Index

(SOI),which is a standardized difference between the two barometric pressures (“El Niño”).

However, in this study we use the SOI actual which is given by the difference between

6

Page 7: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

Standardized Tahiti and Standardized Darwin values. The El Nino/ La Nina Phenomenon should

be included in our model as it is highly influential for the variation of global temperatures. The

data used in this study can be found at the following website http://www.cpc.ncep.noaa.gov.

Temperature Anomalies The dependent variable in our multiple linear regression analysis is the global surface

temperature anomalies. Temperature anomalies or GISTEM measure the deviation in

temperature from a reference value which has been established as the average. Positive values

indicate that temperatures are higher than average, while negative values indicate that

temperatures are lower than average. GISTEMP are calculated using data files from NOAA

GHCN v3 (meteorological stations), ERSST v4 (ocean areas), and SCAR (Antarctic stations)

combined and can be found at http://data.giss.nasa.gov/gistemp/ .The temperature anomalies of

the data used are expressed in degrees Celsius. All of the other independent variables predict,

and therefore affect in different proportions, global surface temperature anomalies.

Previous Research

Research similar to the present was completed in 2013 by Gary Witt in the Journal of Statistics Education. In addition to showing through the multitude of available data the decline in artic sea ice he investigated the relationship between the portion of CO2 in the atmosphere, the total solar irradiance(TSI) and the global surface temperatures(GISTEMP) yearly from 1979 to 2010. The obtained model was GISTEM= -11373 + 8.05 TSI + 1.15 CO2 + Error with an R2 0.764. Being high enough the value of R2 indicates that the output variable variation is represented by the model and the predictor variables are closely related to the output variable, that is the GISTEMP. The p value found for CO2 is very small and therefore snows that the relationship between the variable CO2 and the predictor GISTEMP is not due to random chance. On the other hand, the two-sided p value found for TSI is 0.104 which is pretty insignificant in a sense that there is a 10.4 % probability that the relationship between TSI and GISTEP is due to random variation. As Witt states his model has the objective of showing that the CO2 and global temperatures have both increased in a steady fashion while TSI had an oscillating pattern that repeated itself every 11 years (Witt).

Methodology

7

Page 8: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

A multiple linear regression model was run through Minitab Express Software 3 times on

different time intervals. The first time interval analyzed was from January 1979 to July

2012.This fist analysis was repeated without the outliers with a resulting increase in R2. The

second time it was run one the first half of this period more precisely from January 1979 to

December 1995 and the third time the model was run for the second half of the time in

consideration hence from January 1996 to July 2012.The two halves of the total time interval

were evaluated to confirm the validity of the fist regression model obtained and test its

consistency through time. The data was obtained from very reliable sources like Nasa and

NOAA the link to which can be found in the previous section and the data was expressed as

monthly averages.

Multiple Linear Regression

Introduction

A Multiple linear regression analysis was run to find the relationship between our variables in

order to predict the temperature increase. A MLR is usually used in situations in which there is

more than one variable that affects/predicts the response as opposed to the simple linear

regression model in which there is only one variable to influences the outcome. In the case under

study 4 different variables, which are, the monthly average concentrations of CO2 in the

atmosphere, the Total Solar Irradiance, the El Nino Southern Oscillation Index, and the

atmospheric transmission of direct solar radiation predict the monthly mean temperature

anomalies. It is however important to consider that the linear relationship found is merely an

approximation to the true relationship between these variables which lies within the confidence

interval of the slope of each of the variables.

A multiple linear regression model with k variables shows how one variable (Y), the predicted

variable, relates to the others x1,x2,….,xk (the predictor variables) according to the following

equation:

The above formula is the equation of a hyperplane in the space of dimensions y,x1,x2,…,xk

where B0 represents the intercept while B1, B2 and Bk are the partial regression coefficients.

The coefficients reflect the change in the response variable over unit change of one specific

8

Page 9: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

predictor variable if all other variables were kept constant. Y is the predicted dependent variable,

x1, x2, xk are the predictor or regressor independent variables and B0, B1, B2 are the parameters of

the linear equation. Moreover, the constant E represents the random error and we assume it is

close to zero (Montgomery and Runger 12.1.1).

Least Squares Estimation of the Parameters The least squares method is employed to evaluate the regression coefficients in a multiple

regression model. If there are observations, and denote the observation or level of

the variable . The observations are:

The data is usually presented in a similar manner like in the table here below

Data for Multiple Linear Regression

The above observations can be expressed in following model

The formula of the least square function is:

The objective is to minimize with respect to . The least squares

estimates of must satisfy

9

Page 10: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

By simplification we obtain:

The results obtained by solving the above equations are the least square estimates for

which appear in the multiple linear regression model in equation

(Montgomery and Runger 12.1.2).

Estimating σ2

Just like in simple linear regression in multiple linear regression σ2 represents the variance of the random

error. However, in contrast to simple linear regression instead of the having only two parameters in the

denominator now in multiple linear regression there are p parameters. Accordingly, below is the

calculation of σ2:

The equation has SSE as the numerator which is the residual sum of squares while n-p as the denominator which represents the residuals degrees of freedom (Montgomery and Runger 12.1.3).

10

Page 11: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

Test for Significance of Regression

The significance of regression test verifies that there exists a linear relationship between the

dependent variable y and the independent ones such x1, x2...xk and it involves hypothesis testing.

The null hypothesis

states that there isn’t any relationship between the predicted variable y its predictors x1, x2,..xk.

The H1 hypothesis

on the other hand, states that at least one of the variables is different from zero and it is therefore

important to consider in our analysis. The significance of regression test uses the following

statistic:

with SSR=SST-SSE

which is F-distributed in case H0 is valid.

The rejection area of this statistics is determined by greater or equal to (Devore 561,

562).

R2 and Adjusted R2

The coefficient R2 also called coefficient of multiple determination is an important measure of

model fitness. R2 takes account of the percentage variability of the data explained by the

regression model, in other words how well the chosen variables are predicting the response. R2

can be expressed as:

11

Page 12: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

This measure can be sometimes unsuitable because its value cannot decrease when adding a

variable even if the variable does not predict the response. For this reason R2adj becomes a viable

alternative.

In the above formula the denominator represents a constant while the numerator is the residual

mean square. R2adj will only increase when a variable is added to the model if the new variable

reduces the error mean square. Keeping track of R2adj prevents overfitting, that is, including non-

significant independent variables (Montgomery and Runger 12.2.1).

Residual Analysis

Residuals defined by the formula ei = yi - ŷi are used to test model adequacy. If we plot the

residuals against variables that are not part of the model but seem possible candidates it is

possible to see if the new variable added improves the model

The standardized versions

are used to determine uniformly the magnitude of the residuals. The Standardized residuals are

mostly adopted because they are scaled in a way that their standard deviation equals unity

(Montgomery and Runger 12.5.1).

Influential Observations

Very often when a multiple linear regression is analyzed many of the data point in the

observations that are far away from all the rest of the points, and they greatly influence the

assessment of values such as R2, the regression coefficients and the magnitude of error mean

12

Page 13: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

square. To determine whether these observations are coherent with the model and the rest of the

data we use the distance measure developed by Dennis R. Cook:

which is measure the squared distance between the least squares estimate of β based on all n

observations and the estimate obtained when the ith point is removed. If the Di obtained is large

then the point is influential, and is therefore greatly changing the values of the regression

coefficients, R2 and error mean square and for this reason it is preferable to build also a model

which excludes them (Montgomery and Runger 12.5.2).

Selection Of Variables in Model Building

Selection of Variables is the first issue the analyst encounters. The aim is to include enough

variables to predict the response to a considerable degree, however we also aim to have fewer

variables to keep the analysis simple. Reaching a compromise between these two factors is also

reaching the best subset of repressor variables. There are several ways to select variables

One way of doing so is by comparing R2adj. By this criterion we should increase the number of

variables until the increase in R2adj becomes negligible. The model that has the greatest R2

adj can

be considered the best one to select indeed, it also minimizes the mean square error.

Another way of selecting variables is by calculating Cp statistic which estimates the total mean

square error in our model

13

Page 14: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

The Cp statistics estimates Γ p

If there is an almost null bias in the P-term, the following equation will be true

E(Cp|zerobias)= p

If the bias in the model is almost equal to zero then Cp can be estimated to be close to p. If there

is bias, then the Cp is greater than p. To evaluate which is the more suitable model we should

select the model with minimum Cp or a value of Cp slightly higher than the minimum

(Montgomery and Runger 12.6.3).

Results and Discussion

In this section we will apply the multiple linear regression described in the Methodology Section

to climate data. Below is a sample of the data used for our regression analysis. The first column

contains temperature anomalies, which is our response variable. The others columns following to

the right contain CO2 emissions in ppm, the ENSO Index, Transmission of Direct Solar

Radiation or Transparencies in percentages and TSI in Watts/m^2 and represent the independent

variables and predictors of temperature anomalies. The data in each column are the monthly

averages from 1979 and continue until April 2012. Notice that this table only shows the initial

portion of the data used for informative purposes.

Year TempAnom CO2 SOI_actual Transp TSI

1979 0.14 336.21 -0.7 0.9327 1366.285276

-0.09 336.65 1.6 0.9315 1366.323444

0.19 338.13 0.2 0.9258 1366.985539

0.13 338.94 -0.2 0.922 1366.6024

0.06 339 0.8 0.92 1366.752968

0.14 339.2 1.1 0.9275 1366.425696

0.03 337.6 2.2 0.9303 1366.769865

0.14 335.56 -0.3 0.931 1366.04647

14

Page 15: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

0.27 333.93 0.2 0.933 1366.175461

0.25 334.12 -0.2 0.9294 1366.423778

0.29 335.26 -0.7 0.9261 1366.461964

0.47 336.78 -1.2 0.9323 1366.859909 1980 0.3 337.8 0.7 0.9337 1366.626291

0.42 338.28 0.5 0.9339 1366.280895

0.29 340.04 -0.7 0.9308 1366.962183

0.32 340.86 -1 0.9257 1366.371066

0.34 341.47 0 0.923 1366.425257

0.16 341.26 0 0.9277 1366.72281

0.28 339.34 -0.1 0.929 1366.7101

0.24 337.45 0.6 0.9315 1366.767929

0.21 336.1 -0.8 0.928 1366.596317

0.2 336.05 0 0.9309 1366.597023

0.29 337.21 -0.6 0.9309 1366.226588

0.21 338.29 -0.1 0.9309 1366.794943

Analysis 1

The Regression Analysis was run on the original data from 1979 and continue until April 2012

without discarding the outliers. Below are the results obtained:

Regression Analysis: TempAnom versus CO2, SOI_actual, Transp, TSI The regression equation is TempAnom = - 107 + 0.0102 CO2 - 0.0253 SO_actual + 1.66 Transp + 0.0752 TSI Predictor Coef SE Coef T P Constant -107.45 21.29 -5.05 0.000 CO2 0.0101502 0.0004296 23.62 0.000 SO_actual -0.025326 0.004120 -6.15 0.000 Transp 1.6639 0.3027 5.50 0.000 TSI 0.07515 0.01554 4.84 0.000 S = 0.126093 R-Sq = 63.3% R-Sq(adj) = 62.9% Analysis of Variance Source DF SS MS F P Regression 4 10.8987 2.7247 171.37 0.000 Residual Error 398 6.3280 0.0159 Total 402 17.2267 Source DF Seq SS CO2 1 9.7668

15

Page 16: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

SO_actual 1 0.3122 Transp 1 0.4477 TSI 1 0.3720

From this first analysis we obtained the following regression coefficients

B0=-107

B1=0.0102

B2=-0.0253

B3=1.66

B4=0.0752

These values were calculated by the software using the Least Squares Method.

B0=-107 Indicates that if none of the variables included in the model were having an effect on

climate then the monthly temperature anomalies would drop by -107 degrees Celsius.

B1=0.0102 Indicates that there is an increase in 0.0102 Celsius at the increase of 1 ppm of CO2.

The fact that B1 is positive is reasonable since we expect an increase in temperatures as the CO2

increases in the atmosphere since CO2 is a greenhouse gas.

B2=-0.0253 Indicates that there is a decrease of 0.0253 Celsius at the increase of 1 unit of

Southern Oscillation Index. The negative slope in this case is also consistent with expectation

because a warm El Nino serves as a heat reservoir which decreases the monthly average surface

temperatures.

B3=1.66 Indicates an increase of 1.66 Celsius at the increase of one unit of transparency. Also in

this case the sign of this coefficient agrees with expectation. Indeed, transparency is an indicator

of the amount of light that reaches the surface of the earth therefore, as the transparencies

increase, the more light reaches earth resulting in higher temperatures.

16

Page 17: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

B4=0.0752 Indicates an increase of 0.0752 Celsius at the increase of each unit of total solar

irradiance. The positive slope meets expectation that temperature anomalies are going to increase

if there is an increase of solar radiance in the atmosphere.

The p values of each and every variable are close to 0% which leads us to discard the null

hypothesis, that is, that the variables do not predict the temperature anomalies. We can say with

high confidence that all of the variable chosen significantly predict the response.

The R2 for this regression analysis was obtained to be 63.3 percent which is acceptably high.

We would like to run the regression analysis again after having eliminated all of the outliers to

verify if R2 increases. Below are the results of the model after having removed from the dataset

all of the data points that were too far from the rest of the data points. To verify if the points are

influential or not Minitab uses Cook’s equation for distance:

Analysis 2

Regression Analysis: TempAnom versus CO2, SO_actual, Transp, TSI

The regression equation is TempAnom = - 96.8 + 0.00985 CO2 - 0.0223 SO_actual + 1.51 Transp + 0.0675 TSI Predictor Coef SE Coef T P Constant -96.80 19.46 -4.97 0.000 CO2 0.0098489 0.0003877 25.40 0.000 SO_actual -0.022262 0.003734 -5.96 0.000 Transp 1.5126 0.2707 5.59 0.000 TSI 0.06754 0.01420 4.76 0.000 S = 0.111894 R-Sq = 67.6% R-Sq(adj) = 67.3% Analysis of Variance Source DF SS MS F P Regression 4 9.9549 2.4887 198.78 0.000 Residual Error 381 4.7702 0.0125

17

Page 18: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

Total 385 14.7251 Source DF Seq SS CO2 1 9.0893 SO_actual 1 0.2176 Transp 1 0.3647 TSI 1 0.2834

In the new model R2 is equal to 67.6% which is greater than the R2 obtained for the previous

model. Therefore, by eliminating the outliers we were able to eliminate those data points that

might have been off by a measurement error. The regression equation has somewhat changed but

the signs did not therefore the positive/negative change in response as a consequence of change

of one unit in one of the variable stays the same. The new regression coefficients are:

B0=-96.8

B1=0.0985

B2=-0.0223

B3=1.51

B4=0.06775

The 95 % confidence intervals for these coefficients are:

-135.40864<B0<-58.19136

0.00908 < B1 < 0.01062

-0.02971 < B2 < -0.01489

0.97293 < B3 < 2.04707

0.03933 < B4 < 0.09567

P-values for the regression coefficients continue to be close to 0%

The change from the original regression is not as accentuated, indeed only B1 seems to diverge

the most from the initial model, but is still within the confidence interval.

Analysis 3

18

Page 19: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

We will now run the model on the first half of the previously examined time period, and that is

from January 1979 to December 1995 and notice how the model changes and if it is consistent

with the model we just obtained from Analysis 2 without the outliers.

Regression Analysis: TempAnom versus CO2, SO_actual, Transp, TSI The regression equation is TempAnom = - 176 + 0.00935 CO2 - 0.0243 SO_actual + 1.53 Transp + 0.126 TSI Predictor Coef SE Coef T P Constant -175.90 30.83 -5.71 0.000 CO2 0.009353 0.001211 7.72 0.000 SO_actual -0.024252 0.006420 -3.78 0.000 Transp 1.5270 0.3300 4.63 0.000 TSI 0.12555 0.02245 5.59 0.000 S = 0.125664 R-Sq = 32.1% R-Sq(adj) = 30.8% Analysis of Variance Source DF SS MS F P Regression 4 1.48865 0.37216 23.57 0.000 Residual Error 199 3.14249 0.01579 Total 203 4.63114 Source DF Seq SS CO2 1 0.70667 SO_actual 1 0.05032 Transp 1 0.23786 TSI 1 0.49379

Comparing the model in Analysis 2 with the present one

TempAnom = - 176 + 0.00935 CO2 - 0.0243 SO_actual + 1.53 Transp + 0.126 TSI

The highlighted areas are the coefficients that do not fit the 95% CI calculated for coefficients of

the model in Analysis 2.

The model here obtained doesn’t change drastically from the one in Analysis 2, indeed all

coefficients but B0 and B4 fit in the 95% confidence interval calculated for the regression

coefficients in the model of Analysis 2. Therefore, the model found for the first half of the period

under examination is consistent with the model in Analysis 2 for the variables CO2, SO actual,

and transparencies. This means that only the coefficient for TSI isn’t a very reliable coefficient

19

Page 20: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

since it does not stay constant, or vary within the confidence interval, in the first half of the

original time period. This might also be an indicator that the dependence on TSI is nonlinear.

The R2 however 32.1% which is less than half of then one in Analysis 2. This can be explained

the fact that there is more variability (SST is larger) in the data in whole data set than in half of it.

Since R2 is equal to 1-(SSE/SST) and SSE doesn’t change much from the whole data set to half,

R2 is larger for the whole data set that for half of it. Furthermore, the full original data set

captures well the influence of volcanoes while the half data sets do not.

In the next section we will analyze the second half of the time full time period, hence from

January 1996 to July 2012.

Analysis 4

Regression Analysis: TempAnom versus CO2, SO_actual, Transp, TSI The regression equation is TempAnom = - 16.7 + 0.00832 CO2 - 0.0260 SO_actual + 5.27 Transp + 0.0068 TSI Predictor Coef SE Coef T P Constant -16.68 29.77 -0.56 0.576 CO2 0.008324 0.001035 8.04 0.000 SO_actual -0.025953 0.005260 -4.93 0.000 Transp 5.274 2.187 2.41 0.017 TSI 0.00675 0.02182 0.31 0.757 S = 0.121451 R-Sq = 30.4% R-Sq(adj) = 28.9% Analysis of Variance Source DF SS MS F P Regression 4 1.24797 0.31199 21.15 0.000 Residual Error 194 2.86157 0.01475 Total 198 4.10954 Source DF Seq SS CO2 1 0.80939 SO_actual 1 0.34688 Transp 1 0.09029 TSI 1 0.00141

Here below is the regression equation for this regression analysis:

TempAnom = - 16.7 + 0.00832 CO2 - 0.0260 SO_actual + 5.27 Transp + 0.0068 TSI

20

Page 21: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

In the model obtained in this analysis 4 of 5 regression coefficients do not fit the 95% confidence

intervals of the model found for the full data set. This means that the whole model is inconsistent

with the original. This mismatch with the model in Analysis 2 can be explained with the absence

of major volcanic activity in the second half of the time period. Because volcanoes (related to

transparency variable) are not very active (there are no major volcanic events) in this half, the

regression model will be very different both from the model obtained for the full dataset and

from the one of half containing all of the volcanic activity. This model is not very representative

of the real model and therefore has to be discarded.

An even lower R2 than that of the first half is indicative of the inadequacy of the model. R2 show

that only 30.4 percent of the data is captured by the model.

It is important to consider that volcanic activity not only has an immediate effect on global

temperatures but its effects can be felt with 2-3 years delay (“Volcanoes”). Indeed, the particles

of H2SO4 can be trapped in the stratosphere for long periods of time and the effects of

Transmission of Direct Solar Radiation or Transparency can manifest after the volcanic activity

is no longer occurring. In a regression model where lagged values are considered a much more

realistic model has to include time delays. If we were analyzed the data with a vector

autoregressive model, we would obtain an R2 almost equal to 100. Since the multiple linear

regression model does not take into account time delays we could only reach an R2 as high as

67.6%.

Conclusion In this statistical analysis we have considered both naturally occurring phenomena and those

affected by human activity to predict temperature increase. Despite the fact that both types of

variables highly influence global temperature there is clearly a strong correlation between the

CO2 emissions produced with the intensification of Industrialization and Global Warming.

Therefore, even if partly, humanity does have an impact on climate and consequently on

ecosystems. If Earth is naturally increasing in temperature, then it is doing so very slowly. By

increasing the emission of greenhouse gases humans action has sped up this process and the

Earth is increasing in temperature at remarkable rates. Clearly correlation between two factors is

21

Page 22: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

not causation, however the relationship between CO2 and temperature increase in this study is an

indicator that cannot be ignored: Global Temperatures are steadily increasing as CO2 levels

increase. Given that from 2005 to 2014 there has been an average CO2 growth rate of 2.11 ppm,

using the multiple linear regression model in analysis 2 we can predict that there will be a yearly

increase of 0.02078 degrees Celsius if the CO2 emissions growth rate stays constant. This means

that in the time interval of 100 years global temperatures will increase by 2.078 degrees Celsius

and this is likely a conservative estimate given that the CO2 concentration growth rate is

accelerating (“CO2 Acceleration”). Even if the global worming debate is ongoing and the human

influence in this warming process is not exactly understood, we should adopt preventive

measures and implement more laws to avoid producing more greenhouse gasses than the amount

the atmosphere is able to absorb and disintegrate. In the very likely probability that we are

causing our planet to drastically change through our exploiting approach to its resources we shall

take a step back to evaluate the consequences of our lifestyle on the rest on the ecosystem and

our aspiration for the future of our species.

22

Page 23: 12/20/2016 Climate Change - Vanier Collegesun4.vaniercollege.qc.ca/~iti/proj/Irina.pdf · bimodal variation in sea level barometric pressure between observation stations at Darwin,

Works Cited

Montgomery, Douglas C. and George Runger C.. Applied Statistics and Probability for

Engineers (5thed.). United States: John Wiley & Sons, 2011. Print.

Devore, Jay L. Probability and Statistics for Engineering and the Sciences. 8th ed. Boston:

Brook/Cole Cenage Learning, 2012-2009. Print.

Witt, Gary. "Using Data from Climate Science to Teach Introductory Statistics." Journal of

Statistics Education 21.1 (2013): 1-24. Amstat. 2013. Web. 15 Dec. 2016.

Nave, R. "Greenhouse Effect." The Greenhouse Effect. Hyperphysics, n.d. Web. 17 Dec. 2016.

"Climate Change ProCon.org." ProConorg Headlines. ProCon, 28 June 2016. Web. 17 Dec.

2016.

Zell, Holly. "Solar Irradiance." NASA. NASA, 1 Jan. 2008. Web. 17 Dec. 2016.

"Volcanoes and Global Climate Change." Exploring the Environment. N.p., n.d. Web. 16 Jan.

2016.

"Global Patterns - El Niño-Southern Oscillation (ENSO) | State Climate Office of North

Carolina." Global Patterns - El Niño-Southern Oscillation (ENSO) | State Climate Office

of North Carolina. N.p., n.d. Web. 17 Dec. 2016.

"El Niño/Southern Oscillation (ENSO) Technical Discussion." El Nino/Southern Oscillation

(ENSO) Technical Discussion | Teleconnections | National Centers for Environmental

Information (NCEI). NOAA, n.d. Web. 17 Dec. 2016.

Friedrich, Johannes, and Thomas Damassa. "The History of Carbon Dioxide Emissions." The

History of Carbon Dioxide Emissions | World Resources Institute. World Resources

Institute, 21 May 2014. Web. 20 Dec. 2016.

"Global Surface Temperature Anomalies." Global Surface Temperature Anomalies | Monitoring

References | National Centers for Environmental Information (NCEI). NOAA, n.d. Web.

24 Dec. 2016.

"CO2 Acceleration." CO2.Earth. CO2.Earth, n.d. Web. 20 Dec. 2016.

23