12/20/2016 climate change - vanier collegesun4.vaniercollege.qc.ca/~iti/proj/irina.pdf · bimodal...
TRANSCRIPT
12/20/2016
Climate Change Analysis with Multiple Linear Regression
Probability and Statistics Vanier College
Irina Moraru
Abstract 3
Introduction, CO2 Emissions 4-5
Solar Irradiance 5
Transmission of Direct Solar Radiation 6
El Nino/ La Nina SOI 6-7
Previous Research 7
Methodology 8
Multiple Linear Regression, Introduction 8-9
Least Squares Estimation of Parameters 9-10
Estimating σ2 10
Test for Significance of Regression 11
R2 and Adjusted R2 11-12
Residual Analysis 12
Influential Observations 12-13
Selection of variables in Model Building 13-14
Results and discussion 14-21
Conclusion 21-22
Works Cited 23
2
Abstract
It has been commonly accepted by the scientific community that the global temperatures have
increased by about 1.4 degrees Celsius since the beginning of the 20th century. However
there still continue to be controversies about the causes and factors that contribute to climate
change. Many argue that the increase of human activities is the main cause of global
warming, others that the increase in temperatures is due to natural factors (“Climate”).
However, over the course of the years the periods of increased industrialization coincide
with the intensified raise in global temperatures, which suggests that humans do have a
strong impact on global warming (Friedrich and Damassa). There are nonetheless other
natural factors like volcanic activity and El Nino phenomenon that greatly influence the raise
in global temperatures. In this study we want to find out if CO2 emissions in the atmosphere,
the Total Solar Irradiance, the El Nino Southern Oscillation Index, the atmospheric
transmission of direct solar radiation, predict monthly mean temperature anomalies. Our
goal is therefore to investigate the relationship there is between the factors previously
mentioned and the observed increases in monthly temperature anomalies. To do so a multiple
linear regression model was used and the model was run 3 times. The first time was run for
the period of time from January 1979 to July 2012.The second time it was run one the first
half of this period more precisely from January 1979 to December 1995 and the third time
the model was run for the second half of the time in consideration hence from January 1996
to July 2012. Applying the model to the total time period and then to the individual halves
allows us to verify if the initial model is consistent through time anywhere in the timeline of
the period under study. The model obtained for the first half was slightly different but still
within confidence intervals compared to the original, while the model obtained for the
second half was somewhat inconsistent with the original model. It is likely that this is due to
the fact that volcanic activity is only present in the first half and not in the second. After
having studied the relationship between the dependent and independent variables we can
therefore deduce that CO2 emissions in the atmosphere, the Total Solar Irradiance, the El
Nino Southern Oscillation Index, the atmospheric transmission of direct solar radiation all
significantly predict in different proportions temperature change.
3
Introduction In this study four main predictors were chosen to build a multiple regression model. In the
following paragraphs I will introduce each variable present in the model and its significance in
predicting the global rise in temperatures.
CO2 Emissions CO2 is a greenhouse gas that is naturally present in the atmosphere in small quantities, however,
since the beginning of Industrialization humans have caused this gas to increase in our
atmosphere with consequences on global temperatures. Since 1990 carbon dioxide atmospheric
levels have increased by 0.4% each year. Greenhouse gases in the atmosphere serve to maintain
temperatures by allowing the short wavelengths of the sun to pass through them while trapping
inside the longer infrared wavelengths coming from the surroundings that have absorbed sunlight.
The increase of CO2 due to burning of fossil fuels and deforestation has augmented this effect
resulting in higher global temperatures. CO2 is one of many greenhouse gases emitted in the
atmosphere by human action. Some others are methane, nitrous oxide, O3 and
chlorofluorocarbons. Below are some diagrams representing the increase of CO2 and other
greenhouse gases over the course of the years and the proportions of some of these gases in the
atmosphere.
4
As it is shown above, the fraction of other greenhouse houses is much smaller compared to the
emissions of CO2 therefore less predictive of the increase in temperatures. For simplicity we
only included CO2 emissions in the atmosphere as having the greater influence on our response.
The CO2 data collected in the Mauna Loa Observatory in Hawaii are measure by the number of
molecules of CO2 over the molecules of dry air multiplied by one million (ppm) Data on CO2
can be found at https://www.esrl.noaa.gov/gmd/ccgg/trends/ (Nave).
Solar Irradiance The Total Solar Irradiance measures the total amount of light energy coming from the sun in the
form photons with different wavelengths. The solar irradiance isn’t constant through time, but it
varies at intervals of 11 years, oscillating from maximum to a minimum. Any minimal changes
in solar irradiance can have great consequences to the temperatures felt on Earth. It is crucial to
include this variable in the model it is the primary source of energy on the planet, and its change
can lead to considerable change in average temperatures. The data used for this model were
measured at PMOD/WRC (Physikalisch-Meteorologisches Observatorium Davos/World
Radiation Center) in Davos, Switzerland, can be found at
https://www.ngdc.noaa.gov/stp/solar/solarirrad.html and are measured in Watts/Square meter
(Zell).
5
Transmission of Direct Solar Radiation
The atmospheric transmission of direct solar radiation or Transparency is directly related to
volcanic activity. The Lava in volcanos contains sulfur dioxide in a small portion. During
volcanic eruptions, the ashes containing sulfur dioxide particles are projected into the
stratosphere where they bond to water vapor to become H2SO4. Some of these particles are very
light, but they are very numerous, therefore, they remain in the stratosphere for longer periods of
time and they absorb the solar energy stopping some of this radiation to reach the Earth surface.
This causes a cooling process and a drop of the mean global temperatures. For example,
evidence was found that during the 20th century 3 major volcanic eruptions decreased the global
temperatures by 1 degree Celsius. Despite their size these particles greatly influence
temperatures on earth, because they are present in big quantities in the stratosphere. This variable
has negative influence in increase in temperatures and therefore must be considered in our model
to accurately predict temperature changes. The atmospheric transmission of direct solar radiation
can be found at the following link https://www.esrl.noaa.gov/gmd/grad/mloapt.html and this
phenomenon is it measured in percentages of solar radiation passing through dry air
(“Volcanoes”).
El Nino/La Nina SOI The natural phenomenon of El Nino Southern Oscillation is characterized by fluctuating
temperatures of the Pacific Ocean along the equatorial region. The warm water oscillates in a
back and forth pattern through the Pacific Ocean and it affects temperatures on a local and global
scale. There are two phases between which the pattern can oscillate: a warmer than normal
central and eastern equatorial Pacific SSTs (El Niño) and cooler than normal central and eastern
equatorial Pacific SSTs (La Niña) (“Global Patterns”). The water temperatures of EL Nino/La
Nina are accompanied by high and low surface pressure respectively. Indeed The Southern
Oscillation describes a bimodal variation in sea level barometric pressure between observation
stations at Darwin, Australia and Tahiti. It is quantified in the Southern Oscillation Index
(SOI),which is a standardized difference between the two barometric pressures (“El Niño”).
However, in this study we use the SOI actual which is given by the difference between
6
Standardized Tahiti and Standardized Darwin values. The El Nino/ La Nina Phenomenon should
be included in our model as it is highly influential for the variation of global temperatures. The
data used in this study can be found at the following website http://www.cpc.ncep.noaa.gov.
Temperature Anomalies The dependent variable in our multiple linear regression analysis is the global surface
temperature anomalies. Temperature anomalies or GISTEM measure the deviation in
temperature from a reference value which has been established as the average. Positive values
indicate that temperatures are higher than average, while negative values indicate that
temperatures are lower than average. GISTEMP are calculated using data files from NOAA
GHCN v3 (meteorological stations), ERSST v4 (ocean areas), and SCAR (Antarctic stations)
combined and can be found at http://data.giss.nasa.gov/gistemp/ .The temperature anomalies of
the data used are expressed in degrees Celsius. All of the other independent variables predict,
and therefore affect in different proportions, global surface temperature anomalies.
Previous Research
Research similar to the present was completed in 2013 by Gary Witt in the Journal of Statistics Education. In addition to showing through the multitude of available data the decline in artic sea ice he investigated the relationship between the portion of CO2 in the atmosphere, the total solar irradiance(TSI) and the global surface temperatures(GISTEMP) yearly from 1979 to 2010. The obtained model was GISTEM= -11373 + 8.05 TSI + 1.15 CO2 + Error with an R2 0.764. Being high enough the value of R2 indicates that the output variable variation is represented by the model and the predictor variables are closely related to the output variable, that is the GISTEMP. The p value found for CO2 is very small and therefore snows that the relationship between the variable CO2 and the predictor GISTEMP is not due to random chance. On the other hand, the two-sided p value found for TSI is 0.104 which is pretty insignificant in a sense that there is a 10.4 % probability that the relationship between TSI and GISTEP is due to random variation. As Witt states his model has the objective of showing that the CO2 and global temperatures have both increased in a steady fashion while TSI had an oscillating pattern that repeated itself every 11 years (Witt).
Methodology
7
A multiple linear regression model was run through Minitab Express Software 3 times on
different time intervals. The first time interval analyzed was from January 1979 to July
2012.This fist analysis was repeated without the outliers with a resulting increase in R2. The
second time it was run one the first half of this period more precisely from January 1979 to
December 1995 and the third time the model was run for the second half of the time in
consideration hence from January 1996 to July 2012.The two halves of the total time interval
were evaluated to confirm the validity of the fist regression model obtained and test its
consistency through time. The data was obtained from very reliable sources like Nasa and
NOAA the link to which can be found in the previous section and the data was expressed as
monthly averages.
Multiple Linear Regression
Introduction
A Multiple linear regression analysis was run to find the relationship between our variables in
order to predict the temperature increase. A MLR is usually used in situations in which there is
more than one variable that affects/predicts the response as opposed to the simple linear
regression model in which there is only one variable to influences the outcome. In the case under
study 4 different variables, which are, the monthly average concentrations of CO2 in the
atmosphere, the Total Solar Irradiance, the El Nino Southern Oscillation Index, and the
atmospheric transmission of direct solar radiation predict the monthly mean temperature
anomalies. It is however important to consider that the linear relationship found is merely an
approximation to the true relationship between these variables which lies within the confidence
interval of the slope of each of the variables.
A multiple linear regression model with k variables shows how one variable (Y), the predicted
variable, relates to the others x1,x2,….,xk (the predictor variables) according to the following
equation:
The above formula is the equation of a hyperplane in the space of dimensions y,x1,x2,…,xk
where B0 represents the intercept while B1, B2 and Bk are the partial regression coefficients.
The coefficients reflect the change in the response variable over unit change of one specific
8
predictor variable if all other variables were kept constant. Y is the predicted dependent variable,
x1, x2, xk are the predictor or regressor independent variables and B0, B1, B2 are the parameters of
the linear equation. Moreover, the constant E represents the random error and we assume it is
close to zero (Montgomery and Runger 12.1.1).
Least Squares Estimation of the Parameters The least squares method is employed to evaluate the regression coefficients in a multiple
regression model. If there are observations, and denote the observation or level of
the variable . The observations are:
The data is usually presented in a similar manner like in the table here below
Data for Multiple Linear Regression
The above observations can be expressed in following model
The formula of the least square function is:
The objective is to minimize with respect to . The least squares
estimates of must satisfy
9
By simplification we obtain:
The results obtained by solving the above equations are the least square estimates for
which appear in the multiple linear regression model in equation
(Montgomery and Runger 12.1.2).
Estimating σ2
Just like in simple linear regression in multiple linear regression σ2 represents the variance of the random
error. However, in contrast to simple linear regression instead of the having only two parameters in the
denominator now in multiple linear regression there are p parameters. Accordingly, below is the
calculation of σ2:
The equation has SSE as the numerator which is the residual sum of squares while n-p as the denominator which represents the residuals degrees of freedom (Montgomery and Runger 12.1.3).
10
Test for Significance of Regression
The significance of regression test verifies that there exists a linear relationship between the
dependent variable y and the independent ones such x1, x2...xk and it involves hypothesis testing.
The null hypothesis
states that there isn’t any relationship between the predicted variable y its predictors x1, x2,..xk.
The H1 hypothesis
on the other hand, states that at least one of the variables is different from zero and it is therefore
important to consider in our analysis. The significance of regression test uses the following
statistic:
with SSR=SST-SSE
which is F-distributed in case H0 is valid.
The rejection area of this statistics is determined by greater or equal to (Devore 561,
562).
R2 and Adjusted R2
The coefficient R2 also called coefficient of multiple determination is an important measure of
model fitness. R2 takes account of the percentage variability of the data explained by the
regression model, in other words how well the chosen variables are predicting the response. R2
can be expressed as:
11
This measure can be sometimes unsuitable because its value cannot decrease when adding a
variable even if the variable does not predict the response. For this reason R2adj becomes a viable
alternative.
In the above formula the denominator represents a constant while the numerator is the residual
mean square. R2adj will only increase when a variable is added to the model if the new variable
reduces the error mean square. Keeping track of R2adj prevents overfitting, that is, including non-
significant independent variables (Montgomery and Runger 12.2.1).
Residual Analysis
Residuals defined by the formula ei = yi - ŷi are used to test model adequacy. If we plot the
residuals against variables that are not part of the model but seem possible candidates it is
possible to see if the new variable added improves the model
The standardized versions
are used to determine uniformly the magnitude of the residuals. The Standardized residuals are
mostly adopted because they are scaled in a way that their standard deviation equals unity
(Montgomery and Runger 12.5.1).
Influential Observations
Very often when a multiple linear regression is analyzed many of the data point in the
observations that are far away from all the rest of the points, and they greatly influence the
assessment of values such as R2, the regression coefficients and the magnitude of error mean
12
square. To determine whether these observations are coherent with the model and the rest of the
data we use the distance measure developed by Dennis R. Cook:
which is measure the squared distance between the least squares estimate of β based on all n
observations and the estimate obtained when the ith point is removed. If the Di obtained is large
then the point is influential, and is therefore greatly changing the values of the regression
coefficients, R2 and error mean square and for this reason it is preferable to build also a model
which excludes them (Montgomery and Runger 12.5.2).
Selection Of Variables in Model Building
Selection of Variables is the first issue the analyst encounters. The aim is to include enough
variables to predict the response to a considerable degree, however we also aim to have fewer
variables to keep the analysis simple. Reaching a compromise between these two factors is also
reaching the best subset of repressor variables. There are several ways to select variables
One way of doing so is by comparing R2adj. By this criterion we should increase the number of
variables until the increase in R2adj becomes negligible. The model that has the greatest R2
adj can
be considered the best one to select indeed, it also minimizes the mean square error.
Another way of selecting variables is by calculating Cp statistic which estimates the total mean
square error in our model
13
The Cp statistics estimates Γ p
If there is an almost null bias in the P-term, the following equation will be true
E(Cp|zerobias)= p
If the bias in the model is almost equal to zero then Cp can be estimated to be close to p. If there
is bias, then the Cp is greater than p. To evaluate which is the more suitable model we should
select the model with minimum Cp or a value of Cp slightly higher than the minimum
(Montgomery and Runger 12.6.3).
Results and Discussion
In this section we will apply the multiple linear regression described in the Methodology Section
to climate data. Below is a sample of the data used for our regression analysis. The first column
contains temperature anomalies, which is our response variable. The others columns following to
the right contain CO2 emissions in ppm, the ENSO Index, Transmission of Direct Solar
Radiation or Transparencies in percentages and TSI in Watts/m^2 and represent the independent
variables and predictors of temperature anomalies. The data in each column are the monthly
averages from 1979 and continue until April 2012. Notice that this table only shows the initial
portion of the data used for informative purposes.
Year TempAnom CO2 SOI_actual Transp TSI
1979 0.14 336.21 -0.7 0.9327 1366.285276
-0.09 336.65 1.6 0.9315 1366.323444
0.19 338.13 0.2 0.9258 1366.985539
0.13 338.94 -0.2 0.922 1366.6024
0.06 339 0.8 0.92 1366.752968
0.14 339.2 1.1 0.9275 1366.425696
0.03 337.6 2.2 0.9303 1366.769865
0.14 335.56 -0.3 0.931 1366.04647
14
0.27 333.93 0.2 0.933 1366.175461
0.25 334.12 -0.2 0.9294 1366.423778
0.29 335.26 -0.7 0.9261 1366.461964
0.47 336.78 -1.2 0.9323 1366.859909 1980 0.3 337.8 0.7 0.9337 1366.626291
0.42 338.28 0.5 0.9339 1366.280895
0.29 340.04 -0.7 0.9308 1366.962183
0.32 340.86 -1 0.9257 1366.371066
0.34 341.47 0 0.923 1366.425257
0.16 341.26 0 0.9277 1366.72281
0.28 339.34 -0.1 0.929 1366.7101
0.24 337.45 0.6 0.9315 1366.767929
0.21 336.1 -0.8 0.928 1366.596317
0.2 336.05 0 0.9309 1366.597023
0.29 337.21 -0.6 0.9309 1366.226588
0.21 338.29 -0.1 0.9309 1366.794943
Analysis 1
The Regression Analysis was run on the original data from 1979 and continue until April 2012
without discarding the outliers. Below are the results obtained:
Regression Analysis: TempAnom versus CO2, SOI_actual, Transp, TSI The regression equation is TempAnom = - 107 + 0.0102 CO2 - 0.0253 SO_actual + 1.66 Transp + 0.0752 TSI Predictor Coef SE Coef T P Constant -107.45 21.29 -5.05 0.000 CO2 0.0101502 0.0004296 23.62 0.000 SO_actual -0.025326 0.004120 -6.15 0.000 Transp 1.6639 0.3027 5.50 0.000 TSI 0.07515 0.01554 4.84 0.000 S = 0.126093 R-Sq = 63.3% R-Sq(adj) = 62.9% Analysis of Variance Source DF SS MS F P Regression 4 10.8987 2.7247 171.37 0.000 Residual Error 398 6.3280 0.0159 Total 402 17.2267 Source DF Seq SS CO2 1 9.7668
15
SO_actual 1 0.3122 Transp 1 0.4477 TSI 1 0.3720
From this first analysis we obtained the following regression coefficients
B0=-107
B1=0.0102
B2=-0.0253
B3=1.66
B4=0.0752
These values were calculated by the software using the Least Squares Method.
B0=-107 Indicates that if none of the variables included in the model were having an effect on
climate then the monthly temperature anomalies would drop by -107 degrees Celsius.
B1=0.0102 Indicates that there is an increase in 0.0102 Celsius at the increase of 1 ppm of CO2.
The fact that B1 is positive is reasonable since we expect an increase in temperatures as the CO2
increases in the atmosphere since CO2 is a greenhouse gas.
B2=-0.0253 Indicates that there is a decrease of 0.0253 Celsius at the increase of 1 unit of
Southern Oscillation Index. The negative slope in this case is also consistent with expectation
because a warm El Nino serves as a heat reservoir which decreases the monthly average surface
temperatures.
B3=1.66 Indicates an increase of 1.66 Celsius at the increase of one unit of transparency. Also in
this case the sign of this coefficient agrees with expectation. Indeed, transparency is an indicator
of the amount of light that reaches the surface of the earth therefore, as the transparencies
increase, the more light reaches earth resulting in higher temperatures.
16
B4=0.0752 Indicates an increase of 0.0752 Celsius at the increase of each unit of total solar
irradiance. The positive slope meets expectation that temperature anomalies are going to increase
if there is an increase of solar radiance in the atmosphere.
The p values of each and every variable are close to 0% which leads us to discard the null
hypothesis, that is, that the variables do not predict the temperature anomalies. We can say with
high confidence that all of the variable chosen significantly predict the response.
The R2 for this regression analysis was obtained to be 63.3 percent which is acceptably high.
We would like to run the regression analysis again after having eliminated all of the outliers to
verify if R2 increases. Below are the results of the model after having removed from the dataset
all of the data points that were too far from the rest of the data points. To verify if the points are
influential or not Minitab uses Cook’s equation for distance:
Analysis 2
Regression Analysis: TempAnom versus CO2, SO_actual, Transp, TSI
The regression equation is TempAnom = - 96.8 + 0.00985 CO2 - 0.0223 SO_actual + 1.51 Transp + 0.0675 TSI Predictor Coef SE Coef T P Constant -96.80 19.46 -4.97 0.000 CO2 0.0098489 0.0003877 25.40 0.000 SO_actual -0.022262 0.003734 -5.96 0.000 Transp 1.5126 0.2707 5.59 0.000 TSI 0.06754 0.01420 4.76 0.000 S = 0.111894 R-Sq = 67.6% R-Sq(adj) = 67.3% Analysis of Variance Source DF SS MS F P Regression 4 9.9549 2.4887 198.78 0.000 Residual Error 381 4.7702 0.0125
17
Total 385 14.7251 Source DF Seq SS CO2 1 9.0893 SO_actual 1 0.2176 Transp 1 0.3647 TSI 1 0.2834
In the new model R2 is equal to 67.6% which is greater than the R2 obtained for the previous
model. Therefore, by eliminating the outliers we were able to eliminate those data points that
might have been off by a measurement error. The regression equation has somewhat changed but
the signs did not therefore the positive/negative change in response as a consequence of change
of one unit in one of the variable stays the same. The new regression coefficients are:
B0=-96.8
B1=0.0985
B2=-0.0223
B3=1.51
B4=0.06775
The 95 % confidence intervals for these coefficients are:
-135.40864<B0<-58.19136
0.00908 < B1 < 0.01062
-0.02971 < B2 < -0.01489
0.97293 < B3 < 2.04707
0.03933 < B4 < 0.09567
P-values for the regression coefficients continue to be close to 0%
The change from the original regression is not as accentuated, indeed only B1 seems to diverge
the most from the initial model, but is still within the confidence interval.
Analysis 3
18
We will now run the model on the first half of the previously examined time period, and that is
from January 1979 to December 1995 and notice how the model changes and if it is consistent
with the model we just obtained from Analysis 2 without the outliers.
Regression Analysis: TempAnom versus CO2, SO_actual, Transp, TSI The regression equation is TempAnom = - 176 + 0.00935 CO2 - 0.0243 SO_actual + 1.53 Transp + 0.126 TSI Predictor Coef SE Coef T P Constant -175.90 30.83 -5.71 0.000 CO2 0.009353 0.001211 7.72 0.000 SO_actual -0.024252 0.006420 -3.78 0.000 Transp 1.5270 0.3300 4.63 0.000 TSI 0.12555 0.02245 5.59 0.000 S = 0.125664 R-Sq = 32.1% R-Sq(adj) = 30.8% Analysis of Variance Source DF SS MS F P Regression 4 1.48865 0.37216 23.57 0.000 Residual Error 199 3.14249 0.01579 Total 203 4.63114 Source DF Seq SS CO2 1 0.70667 SO_actual 1 0.05032 Transp 1 0.23786 TSI 1 0.49379
Comparing the model in Analysis 2 with the present one
TempAnom = - 176 + 0.00935 CO2 - 0.0243 SO_actual + 1.53 Transp + 0.126 TSI
The highlighted areas are the coefficients that do not fit the 95% CI calculated for coefficients of
the model in Analysis 2.
The model here obtained doesn’t change drastically from the one in Analysis 2, indeed all
coefficients but B0 and B4 fit in the 95% confidence interval calculated for the regression
coefficients in the model of Analysis 2. Therefore, the model found for the first half of the period
under examination is consistent with the model in Analysis 2 for the variables CO2, SO actual,
and transparencies. This means that only the coefficient for TSI isn’t a very reliable coefficient
19
since it does not stay constant, or vary within the confidence interval, in the first half of the
original time period. This might also be an indicator that the dependence on TSI is nonlinear.
The R2 however 32.1% which is less than half of then one in Analysis 2. This can be explained
the fact that there is more variability (SST is larger) in the data in whole data set than in half of it.
Since R2 is equal to 1-(SSE/SST) and SSE doesn’t change much from the whole data set to half,
R2 is larger for the whole data set that for half of it. Furthermore, the full original data set
captures well the influence of volcanoes while the half data sets do not.
In the next section we will analyze the second half of the time full time period, hence from
January 1996 to July 2012.
Analysis 4
Regression Analysis: TempAnom versus CO2, SO_actual, Transp, TSI The regression equation is TempAnom = - 16.7 + 0.00832 CO2 - 0.0260 SO_actual + 5.27 Transp + 0.0068 TSI Predictor Coef SE Coef T P Constant -16.68 29.77 -0.56 0.576 CO2 0.008324 0.001035 8.04 0.000 SO_actual -0.025953 0.005260 -4.93 0.000 Transp 5.274 2.187 2.41 0.017 TSI 0.00675 0.02182 0.31 0.757 S = 0.121451 R-Sq = 30.4% R-Sq(adj) = 28.9% Analysis of Variance Source DF SS MS F P Regression 4 1.24797 0.31199 21.15 0.000 Residual Error 194 2.86157 0.01475 Total 198 4.10954 Source DF Seq SS CO2 1 0.80939 SO_actual 1 0.34688 Transp 1 0.09029 TSI 1 0.00141
Here below is the regression equation for this regression analysis:
TempAnom = - 16.7 + 0.00832 CO2 - 0.0260 SO_actual + 5.27 Transp + 0.0068 TSI
20
In the model obtained in this analysis 4 of 5 regression coefficients do not fit the 95% confidence
intervals of the model found for the full data set. This means that the whole model is inconsistent
with the original. This mismatch with the model in Analysis 2 can be explained with the absence
of major volcanic activity in the second half of the time period. Because volcanoes (related to
transparency variable) are not very active (there are no major volcanic events) in this half, the
regression model will be very different both from the model obtained for the full dataset and
from the one of half containing all of the volcanic activity. This model is not very representative
of the real model and therefore has to be discarded.
An even lower R2 than that of the first half is indicative of the inadequacy of the model. R2 show
that only 30.4 percent of the data is captured by the model.
It is important to consider that volcanic activity not only has an immediate effect on global
temperatures but its effects can be felt with 2-3 years delay (“Volcanoes”). Indeed, the particles
of H2SO4 can be trapped in the stratosphere for long periods of time and the effects of
Transmission of Direct Solar Radiation or Transparency can manifest after the volcanic activity
is no longer occurring. In a regression model where lagged values are considered a much more
realistic model has to include time delays. If we were analyzed the data with a vector
autoregressive model, we would obtain an R2 almost equal to 100. Since the multiple linear
regression model does not take into account time delays we could only reach an R2 as high as
67.6%.
Conclusion In this statistical analysis we have considered both naturally occurring phenomena and those
affected by human activity to predict temperature increase. Despite the fact that both types of
variables highly influence global temperature there is clearly a strong correlation between the
CO2 emissions produced with the intensification of Industrialization and Global Warming.
Therefore, even if partly, humanity does have an impact on climate and consequently on
ecosystems. If Earth is naturally increasing in temperature, then it is doing so very slowly. By
increasing the emission of greenhouse gases humans action has sped up this process and the
Earth is increasing in temperature at remarkable rates. Clearly correlation between two factors is
21
not causation, however the relationship between CO2 and temperature increase in this study is an
indicator that cannot be ignored: Global Temperatures are steadily increasing as CO2 levels
increase. Given that from 2005 to 2014 there has been an average CO2 growth rate of 2.11 ppm,
using the multiple linear regression model in analysis 2 we can predict that there will be a yearly
increase of 0.02078 degrees Celsius if the CO2 emissions growth rate stays constant. This means
that in the time interval of 100 years global temperatures will increase by 2.078 degrees Celsius
and this is likely a conservative estimate given that the CO2 concentration growth rate is
accelerating (“CO2 Acceleration”). Even if the global worming debate is ongoing and the human
influence in this warming process is not exactly understood, we should adopt preventive
measures and implement more laws to avoid producing more greenhouse gasses than the amount
the atmosphere is able to absorb and disintegrate. In the very likely probability that we are
causing our planet to drastically change through our exploiting approach to its resources we shall
take a step back to evaluate the consequences of our lifestyle on the rest on the ecosystem and
our aspiration for the future of our species.
22
Works Cited
Montgomery, Douglas C. and George Runger C.. Applied Statistics and Probability for
Engineers (5thed.). United States: John Wiley & Sons, 2011. Print.
Devore, Jay L. Probability and Statistics for Engineering and the Sciences. 8th ed. Boston:
Brook/Cole Cenage Learning, 2012-2009. Print.
Witt, Gary. "Using Data from Climate Science to Teach Introductory Statistics." Journal of
Statistics Education 21.1 (2013): 1-24. Amstat. 2013. Web. 15 Dec. 2016.
Nave, R. "Greenhouse Effect." The Greenhouse Effect. Hyperphysics, n.d. Web. 17 Dec. 2016.
"Climate Change ProCon.org." ProConorg Headlines. ProCon, 28 June 2016. Web. 17 Dec.
2016.
Zell, Holly. "Solar Irradiance." NASA. NASA, 1 Jan. 2008. Web. 17 Dec. 2016.
"Volcanoes and Global Climate Change." Exploring the Environment. N.p., n.d. Web. 16 Jan.
2016.
"Global Patterns - El Niño-Southern Oscillation (ENSO) | State Climate Office of North
Carolina." Global Patterns - El Niño-Southern Oscillation (ENSO) | State Climate Office
of North Carolina. N.p., n.d. Web. 17 Dec. 2016.
"El Niño/Southern Oscillation (ENSO) Technical Discussion." El Nino/Southern Oscillation
(ENSO) Technical Discussion | Teleconnections | National Centers for Environmental
Information (NCEI). NOAA, n.d. Web. 17 Dec. 2016.
Friedrich, Johannes, and Thomas Damassa. "The History of Carbon Dioxide Emissions." The
History of Carbon Dioxide Emissions | World Resources Institute. World Resources
Institute, 21 May 2014. Web. 20 Dec. 2016.
"Global Surface Temperature Anomalies." Global Surface Temperature Anomalies | Monitoring
References | National Centers for Environmental Information (NCEI). NOAA, n.d. Web.
24 Dec. 2016.
"CO2 Acceleration." CO2.Earth. CO2.Earth, n.d. Web. 20 Dec. 2016.
23