analysis of oil spill trends in the niger delta, nigeria....jason brownlee(2018): predict the future...

1
Analysis of Oil Spill Trends in the Niger Delta, Nigeria. Nkem OMEDE and Mamoru ISHIKAWA (Graduate School of Environmental Science) Since the discovery of crude oil deposits and production of petroleum products in the 1950’s, there have been hundreds of oil spills spilling millions of barrels of crude oil and refined oil products into the country’s Environment. Since its establishment in 2006 National Oil Spill Detection and Response Agency(NOSDRA) have maintained an extensive database of these spills. This database holds vital information about these spills such as the volume, location, causes, habitat and type of contaminants. Impacts of oil spills in NigeriaDenudation of Mangroves (UNEP Report, 2012) Species migration and/or replacement by invasive species resistant to the spills (UNEP Report, 2012) Sediments becomes reservoirs for continuous contamination Possibility of cancer & neurotoxicity (Aguilera et al.,2010) Data Description (NOSDRA OSM) Contains over 15,000 (rows) incidences of oil spill occurrence from 2005 till date and 41 columns. Each row has a unique identifier(id) and contains spill information by date of spill occurrence. Some of the most important columns are: id, status, company, incident date, contaminant, estimated quantity, spill area habitat, cause, latitude, longitude, states affected, formadate, type of facility. Niger Delta, NigeriaThe Niger Delta, Nigeria is presently facing serious problems of oil spills There is need to be able to produce reliable forecasts which will help in making adequate preparations in the event of an oil spill. I intend in these research to provide these forecasts using machine learning and deep learning. There is also need for extensive data transformation to achieve some sort of reliability in the forecast results Discussion The original database is a panel type data but can be reduced into a time series using the python groupby method. We produced time series grouped into daily, weekly, monthly, quarterly and yearly timeseries. Although this SARIMA model used for this estimation has a rather degree of error, it shows that oil spill occurrences might decrease in the area in the coming years. Though considering the high error any reasonable deductions cannot be made with this analysis. Other researchers have shown that forecasts from baseline models can produce results at par with more complicated models. We believe that with proper and adequate data transformation we will be able to produce reliable forecasts on this oil spill monitor. Future Plan Data Transformation i.) Box-cox. Jenkins. ii.) Log transformation iii.)Modeling, fitting and forecast Collation, interpretation and presentation of results Introduction Reference Summary Jason Brownlee(2018): Predict the Future with MLPs, CNNs and LSTMs in Python. Matthew Moocarme, Mahla Abdolahnejad & Ritesh Bhagwat (2019): The Deep Learning with Keras Workshop Investigate trends in the oil spills in the NOSDRA’s Oil Spill Monitor (OSM) and hence produce a 3-period forecast using machine and deep learning methods. Objectives Classes of models -Baseline(Persistence & average), Autoregression(SARIMA), Exponential Smoothing(single, double & tripple), Linear Machine Learning(regression), Nonlinear Machine Learning(decisison trees, support vector regression), Ensemble Machine Learning(Random forest, gradient boosting), Deep Learning(CNN, LSTMs, Hybrid) Andreas C. Muller & Sarah Guido (2016): Introduction to Machine Learning with Python, a guide for Data Scientists. Score analysis for the different time series Data collection & wrangling flowchart Scrap raw data. Load & organize data into python dataframe. Preliminary data reconnaissance eg. Shape, describe, count, tail Data relocation & fix typos Remove irrelevant columns & rows eg. Reviewed, invalid, confirmed Upsampling of data – weeks, months, quarters, years Changing of data types. Datetime, numeric. Preparation of data for Machine learning 3- period forecasting result for spill numbers(frequency of spills) using SARIMA Deal with missing data D W M Q Y count 5,475 783 186 60 15 mean 128 899 3,848 11,742 46,971 std 1,148 3,021 6,343 11,040 24,239 min 0 0 0.03 103 1,843 25% 0 74 1,244 5,682 32,874 50% 3 270 2,312 9,428 40,527 75% 30 767 3,822 14,545 62,577 max 55,007 55,020 57,005 59,316 96,847 Estimated Quantity (barrels) 2005 - 2019 D W M Q Y count 5,475 783 186 60 15 mean 2 16 69 210 841 std 2 9 35 99 379 min 0 0 1 27 138 25% 1 10 46 150 632 50% 2 15 66 207 853 75% 3 21 86 255 1,003 max 39 53 186 491 1,562 Spill number 2005 - 2019 Forecasting Methodology Design Test Harness Data Transformation Finalize Model Fit Model and make forecast Test Models Data Transformation (Differencing) Checked for stationarity using the Augumented Dickey Fuller Test (ADF) Both series(spillno, estimatedqty) have p- values(0.23777 & 0.072006 respectively) > 0.05 (the significance value) so we cannot reject the null hypothesis. Difference the series and plot the autocorrelation plot One of the series (spillno) reached stationarity after first differencing. The other series for estimatedqty did not attain stationarity even after 3 differencings. This maybe as a result of many points that were outliers in the original series. So for this series special data transformation technique will need to be carried out. For the spillno series however, I will go ahead to make predictions using SARIMA (Seasonal Autoregressive Integrated Moving Average)

Upload: others

Post on 24-Jan-2021

14 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Analysis of Oil Spill Trends in the Niger Delta, Nigeria....Jason Brownlee(2018): Predict the Future with MLPs, CNNs and LSTMs in Python. Matthew Moocarme, Mahla Abdolahnejad & Ritesh

Analysis of Oil Spill Trends in the Niger Delta, Nigeria.Nkem OMEDE and Mamoru ISHIKAWA (Graduate School of Environmental Science)

Since the discovery of crude oil deposits and production of petroleum products in the 1950’s, there have been hundreds of oil spills spilling millions of barrels of crude oil and refined oil products into the country’s Environment.・ Since its establishment in 2006 National Oil Spill Detection and Response Agency(NOSDRA) have maintained an extensive database of these spills.・This database holds vital information about these spills such as the volume, location, causes, habitat and type of contaminants.

【Impacts of oil spills in Nigeria】・Denudation of Mangroves (UNEP Report, 2012) ・Species migration and/or replacement by invasive species resistant to the spills (UNEP Report, 2012)・Sediments becomes reservoirs for continuous contamination ・Possibility of cancer & neurotoxicity (Aguilera et al.,2010)

Data Description (NOSDRA OSM)

Contains over 15,000 (rows) incidences of oil spill occurrence from 2005 till date and 41 columns.

Each row has a unique identifier(id) and contains spill information by date of spill occurrence.

Some of the most important columns are: id, status, company, incident date, contaminant, estimated quantity, spill area habitat, cause, latitude, longitude, states affected, formadate, type of facility.

【Niger Delta, Nigeria】・The Niger Delta, Nigeria is presently facing serious problems of oil spills• There is need to be able to produce reliable forecasts which will help in making

adequate preparations in the event of an oil spill.• I intend in these research to provide these forecasts using machine learning

and deep learning.• There is also need for extensive data transformation to achieve some sort of

reliability in the forecast results

DiscussionThe original database is a panel type data but can be reduced into a time series using the python groupby method. We produced time series grouped into daily, weekly, monthly, quarterly and yearly timeseries.

Although this SARIMA model used for this estimation has a rather degree of error, it shows that oil spill occurrences might decrease in the area in the coming years. Though considering the high error any reasonable deductions cannot be made with this analysis. Other researchers have shown that forecasts from baseline models can produce results at par with more complicated models. We believe that with proper and adequate data transformation we will be able to produce reliable forecasts on this oil spill monitor.

Future PlanData Transformationi.) Box-cox. Jenkins.

ii.) Log transformationiii.)Modeling, fitting and forecast

Collation, interpretation and presentation of results

Introduction

Reference

Summary

Jason Brownlee(2018): Predict the Future with MLPs, CNNs and LSTMs in Python.

Matthew Moocarme, Mahla Abdolahnejad & Ritesh Bhagwat (2019): The Deep Learning with Keras Workshop

Investigate trends in the oil spills in the NOSDRA’s Oil Spill Monitor (OSM) and hence produce a 3-period forecast using machine and deep learning methods.

Objectives

Classes of models-Baseline(Persistence & average), Autoregression(SARIMA), Exponential Smoothing(single, double & tripple), Linear Machine Learning(regression),Nonlinear Machine Learning(decisison trees, support vector regression), Ensemble Machine Learning(Random forest, gradient boosting), Deep Learning(CNN, LSTMs, Hybrid)

Andreas C. Muller & Sarah Guido (2016): Introduction to Machine Learning with Python, a guide for Data Scientists.

Score analysis for the different time series

Data collection & wranglingflowchart

Scrap raw data. Load & organize data into python dataframe.

Preliminary data reconnaissance eg. Shape, describe, count, tail

Data relocation & fix typos

Remove irrelevant columns & rows eg. Reviewed, invalid, confirmed

Upsampling of data – weeks, months, quarters, years

Changing of data types. Datetime, numeric.

Preparation of data for Machine learning

3- period forecasting result for spill numbers(frequency of spills) using SARIMA

Deal with missing data

D W M Q Ycount 5,475 783 186 60 15mean 128 899 3,848 11,742 46,971std 1,148 3,021 6,343 11,040 24,239min 0 0 0.03 103 1,84325% 0 74 1,244 5,682 32,87450% 3 270 2,312 9,428 40,52775% 30 767 3,822 14,545 62,577max 55,007 55,020 57,005 59,316 96,847

Estimated Quantity (barrels) 2005 - 2019

D W M Q Ycount 5,475 783 186 60 15mean 2 16 69 210 841std 2 9 35 99 379min 0 0 1 27 13825% 1 10 46 150 63250% 2 15 66 207 85375% 3 21 86 255 1,003max 39 53 186 491 1,562

Spill number 2005 - 2019

Forecasting Methodology

Design Test Harness

Data Transformation

Finalize Model

Fit Model and make forecast

Test Models

Data Transformation (Differencing)

Checked for stationarity using the Augumented Dickey Fuller Test (ADF)

Both series(spillno, estimatedqty) have p-values(0.23777 & 0.072006 respectively) > 0.05 (the significance value) so we cannot reject the null hypothesis.

Difference the series and plot the autocorrelation plot

One of the series (spillno) reached stationarity after first differencing. The other series for estimatedqty did not attain stationarity even after 3 differencings. This maybe as a result of many points that were outliers in the original series. So for this series special data transformation technique will need to be carried out.

For the spillno series however, I will go ahead to make predictions using SARIMA (Seasonal Autoregressive Integrated Moving Average)