gdg devfest seoul 2017: codelab - time series analysis for kaggle using tensorflow

104
전태균, 전승현 Developer of Satrec Initiative Taegyun Jeon and Seunghyun Jeon 시계열 분석: TensorFlow짜보고 Kaggle 도전하기

Upload: taegyun-jeon

Post on 21-Jan-2018

138 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

전태균, 전승현

Developer of Satrec Initiative

Taegyun Jeon and Seunghyun Jeon

시계열 분석: TensorFlow로 짜보고 Kaggle 도전하기

Page 2: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Time Series Analysis

Introduction to Kaggle

KaggleZeroToAll

Contents

Page 3: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

코드랩을 다 듣고 나시면

1. 시계열 문제에 대해 이해!2. Kaggle에서 문제 풀기 가능!3. Kaggle Leaderboard에 본인의 모델 업로드!

Page 4: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Time Series Analysis

Page 5: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

시계열 분석

● Time Series Analysis

● Models for Time Series Analysis: AR, MA, ARMA, ARIMA, RNN

● TensorFlow TimeSeries API (TFTS)

Page 6: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

시계열 분석

● Time Series Analysis

● Models for Time Series Analysis: AR, MA, ARMA, ARIMA, RNN

● TensorFlow TimeSeries API (TFTS)

Page 7: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

시계열 분석

Page 8: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

시계열 데이터

Page 9: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

시계열 데이터● Stock values

● Economic variables

● Weather

● Sensor: Internet-of-Things

● Energy demand

● Signal processing

● Sales forecasting

Page 10: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 11: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 12: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

문제점

● Standard Supervised Learning

○ IID assumption

○ Same distribution for training and test data

○ Distributions fixed over time (stationarity)

● Time Series

○ 모두 해당 되지 않음!!

Page 13: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

시계열 분석

● Time Series Analysis

● Models for Time Series Analysis: AR, MA, ARMA, ARIMA, RNN

● TensorFlow TimeSeries API (TFTS)

Page 14: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Autoregressive (AR) Models

● AR(p) model

: Linear generative model based on the pth order Markov assumption

○ : zero mean uncorrelated random variables with variance

○ : autoregressive coefficients

○ : observed stochastic process

Page 15: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Moving Average (MA)● MA(q) model

: Linear generative model for noise term on the qth order Markov

assumption

○ : moving average coefficients

Page 16: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

ARMA Model● ARMA(p,q) model

: generative linear model that combines AR(p) and MA(q) models

Page 17: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Stationarity● Definition: a sequence of random variables is stationary if its

distribution is invariant to shifting in time.

Page 18: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Lag Operator● Definition: Lag operator is defined by

● ARMA model in terms of the lag operator:

● Characteristic polynomial

can be used to study properties of this stochastic process.

Page 19: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

ARIMA Model● Definition: Non-stationary processes can be modeled using processes

whose characteristic polynomial has unit roots.

● Characteristic polynomial with unit roots can be factored:

● ARIMA(p, D, q) model is an ARMA(p,q) model for

Page 20: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Other Extensions● Further variants:

○ Models with seasonal components (SARIMA)

○ Models with side information (ARIMAX)

○ Models with long-memory (ARFIMA)

○ Multi-variate time series model (VAR)

○ Models with time-varing coefficients

○ other non-linear models

Page 21: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Recurrent Neural Networks

Page 22: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Recurrent Neural Networks

Page 23: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Recurrent Neural Networks

Page 24: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Recurrent Neural Networks

Page 25: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Recurrent Neural Networks

Page 26: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Recurrent Neural Networks

Page 27: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Recurrent Neural Networks

Page 28: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Recurrent Neural Networks

Page 29: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Recurrent Neural Networks

Page 30: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

시계열 분석

● Time Series Analysis

● Models for Time Series Analysis: AR, MA, ARMA, ARIMA, RNN

● TensorFlow TimeSeries API (TFTS)

Page 31: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

쉽게 구현 할 수 있는 방법?

Page 32: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 33: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 34: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

TensorFlow TimeSeries● tf.contrib.timeseries

○ Classic model (state space, autoregressive)

○ Flexible infrastructure

○ Data management

■ Chunking

■ Batching

■ Saving model

■ Truncated backpropagation

Page 35: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

과연 쉬울까요??

Page 36: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

예제부터 살펴봅시다

Page 37: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 38: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 39: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 40: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 41: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 42: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 43: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 44: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 45: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 46: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 47: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 48: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 49: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 50: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 51: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 52: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 53: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 54: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 55: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 56: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 57: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Introduction to Kaggle

Page 58: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

https://www.kaggle.com/

Page 59: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

What is the Kaggle?

Page 60: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

마음껏 데이터를 가지고 놀수있는 데이터 놀이터

Page 61: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Kaggle에서 노는 법

1. 대회 고르기2. 문제와 데이터를 확인하고 분석하기3. 다른 사람들은 어떻게 하나 구경하기 4. 본인만의 솔루션 만들기

Page 62: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 63: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 64: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Competitions 종류

1. Featured: 기업, 기관에서 돈을 걸고 경쟁2. Research: 연구 목적 대회3. Playground: 연습 문제 4. Getting Started: 연습 문제

Page 65: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

몇 가지 일반적인 대회 규칙

1. 하루 제출 횟수 제한2. Test의 일정 비율만 Public Score에 노출3. 대회가 종료될때 최종 점수가 공개4. 대회가 끝나도 데이터셋 접근 가능!

Page 66: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Kaggle에서 노는 법

1. 대회 고르기2. 문제와 데이터를 확인하고 분석하기3. 다른 사람들은 어떻게 하나 구경하기 4. 본인만의 솔루션 만들기

Page 67: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 68: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Kaggle에서 노는 법

1. 대회 고르기2. 문제와 데이터를 확인하고 분석하기3. 다른 사람들은 어떻게 하나 구경하기 4. 본인만의 솔루션 만들기

Page 69: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

https://www.kaggle.com/c/favorita-grocery-sales-forecasting

Page 70: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

오프라인 식료품점의 판매량 예측하기

Page 71: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 72: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 73: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

복잡하다면…

남이 잘 분석한걸 이용하자: https://www.kaggle.com/headsortails/shopping-for-insights-favorita-eda

Page 74: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

대부분의 대회에서 가장 많이 추천을 받는 커널은 EDA처음 대회 들어가면 EDA를 먼저 보는걸 추천

Page 75: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Kaggle에서 노는 법

1. 대회 고르기2. 문제와 데이터를 확인하고 분석하기3. 다른 사람들은 어떻게 하나 구경하기 4. 본인만의 솔루션 만들기

Page 76: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

https://www.kaggle.com/towever/devfest

Page 77: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

KaggleZeroToAll

Page 78: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 79: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

# -*- coding: utf-8 -*-

import datetime

from datetime import timedelta

import numpy as np

import pandas as pd

import tensorflow as tf

from tensorflow.contrib.timeseries.python.timeseries import NumpyReader

from tensorflow.contrib.timeseries.python.timeseries import estimators as tfts_estimators

from tensorflow.contrib.timeseries.python.timeseries import model as tfts_model

import matplotlib

import matplotlib.pyplot as plt

%matplotlib inline

Prepare

Page 80: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

dtypes = {'id':'int64', 'item_nbr':'int32', 'store_nbr':'int8'}

train = pd.read_csv('../input/train.csv', usecols=[1,2,3,4], dtype=dtypes,

parse_dates=['date'],

skiprows=range(1, 101688780) #Skip initial dates

)

train.loc[(train.unit_sales < 0),'unit_sales'] = 0 # eliminate negatives

train['unit_sales'] = train['unit_sales'].apply(pd.np.log1p) #logarithm conversion

train['dow'] = train['date'].dt.dayofweek

Read Dataset

Page 81: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

# creating records for all items, in all markets on all dates

# for correct calculation of daily unit sales averages.

u_dates = train.date.unique()

u_stores = train.store_nbr.unique()

u_items = train.item_nbr.unique()

train.set_index(['date', 'store_nbr', 'item_nbr'], inplace=True)

train = train.reindex(

pd.MultiIndex.from_product(

(u_dates, u_stores, u_items),

names=['date','store_nbr','item_nbr']

)

)

Preprocess data

Page 82: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

train.loc[:, 'unit_sales'].fillna(0, inplace=True) # fill NaNs

train.reset_index(inplace=True) # reset index and restoring unique columns

lastdate = train.iloc[train.shape[0]-1].date # get last day on data

train.head()

Preprocess data

Page 83: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

train.loc[:, 'unit_sales'].fillna(0, inplace=True) # fill NaNs

train.reset_index(inplace=True) # reset index and restoring unique columns

lastdate = train.iloc[train.shape[0]-1].date # get last day on data

train.head()

Preprocess data

Page 84: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

tmp = train[['item_nbr','store_nbr','dow','unit_sales']]

ma_dw = tmp.groupby(['item_nbr','store_nbr','dow'])['unit_sales'].mean().to_frame('madw')

ma_dw.reset_index(inplace=True)

ma_dw.head()

Preprocess data

Page 85: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

tmp = ma_dw[['item_nbr','store_nbr','madw']]

ma_wk = tmp.groupby(['item_nbr', 'store_nbr'])['madw'].mean().to_frame('mawk')

ma_wk.reset_index(inplace=True)

ma_wk.head()

Preprocess data

Page 86: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

tmp = train[['item_nbr','store_nbr','unit_sales']]

ma_is = tmp.groupby(['item_nbr', 'store_nbr'])['unit_sales'].mean().to_frame('mais226')

Moving Average using Pandas

Page 87: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

for i in [112,56,28,14,7,3,1]:

tmp = train[train.date>lastdate-timedelta(int(i))]

tmpg = tmp.groupby(['item_nbr','store_nbr'])['unit_sales'].mean().to_frame('mais'+str(i))

ma_is = ma_is.join(tmpg, how='left')

del tmp,tmpg

Moving Average using Pandas

Page 88: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

ma_is['mais']=ma_is.median(axis=1)

ma_is.reset_index(inplace=True)

ma_is.head()

Moving Average using Pandas

Page 89: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

def data_to_npreader(store_nbr: int, item_nbr: int) -> NumpyReader:

unit_sales = train[np.logical_and(train["store_nbr"] == store_nbr,

train['item_nbr'] == item_nbr)].unit_sales

x = np.asarray(range(len(unit_sales)))

y = np.asarray(unit_sales)

dataset = {

tf.contrib.timeseries.TrainEvalFeatures.TIMES: x,

tf.contrib.timeseries.TrainEvalFeatures.VALUES: y,

}

reader = NumpyReader(dataset)

return x, y, reader

Make data trainable

Page 90: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

x, y, reader = data_to_npreader(store_nbr=1, item_nbr=105574)

train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(

reader, batch_size=32, window_size=40)

ar = tf.contrib.timeseries.ARRegressor(

periodicities=21, input_window_size=30, output_window_size=10,

num_features=1,

loss=tf.contrib.timeseries.ARModel.NORMAL_LIKELIHOOD_LOSS

)

ar.train(input_fn=train_input_fn, steps=16000)

Tensorflow Timesereies - ARRegressor

Page 91: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

evaluation_input_fn = tf.contrib.timeseries.WholeDatasetInputFn(reader)

# keys of evaluation: ['covariance', 'loss', 'mean', 'observed', 'start_tuple',

'times', 'global_step']

evaluation = ar.evaluate(input_fn=evaluation_input_fn, steps=1)

(ar_predictions,) = tuple(ar.predict(

input_fn=tf.contrib.timeseries.predict_continuation_input_fn(

evaluation, steps=16)))

Tensorflow Timesereies - ARRegressor

Page 92: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

plt.figure(figsize=(15, 5))

plt.plot(x.reshape(-1), y.reshape(-1), label='origin')

plt.plot(evaluation['times'].reshape(-1), evaluation['mean'].reshape(-1), label='evaluation')

plt.plot(ar_predictions['times'].reshape(-1), ar_predictions['mean'].reshape(-1),

label='prediction')

plt.xlabel('time_step')

plt.ylabel('values')

plt.legend(loc=4)

plt.show()

Tensorflow Timesereies - ARRegressor

Page 93: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Tensorflow Timesereies - ARRegressor

Page 94: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Tensorflow Timesereies - LSTM

get lstm class: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/timeseries/examples/lstm.py

Page 95: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Tensorflow Timesereies - LSTMx, y, reader = data_to_npreader(store_nbr=2, item_nbr=105574)

train_input_fn = tf.contrib.timeseries.RandomWindowInputFn(

reader, batch_size=16, window_size=21)

estimator = tfts_estimators.TimeSeriesRegressor(

model=_LSTMModel(num_features=1, num_units=32),

optimizer=tf.train.AdamOptimizer(0.001))

estimator.train(input_fn=train_input_fn, steps=16000)

evaluation_input_fn = tf.contrib.timeseries.WholeDatasetInputFn(reader)

evaluation = estimator.evaluate(input_fn=evaluation_input_fn, steps=1)

Page 96: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Tensorflow Timesereies - LSTM

(lstm_predictions,) = tuple(estimator.predict(

input_fn=tf.contrib.timeseries.predict_continuation_input_fn(

evaluation, steps=16)))

Page 97: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Tensorflow Timesereies - LSTMplt.figure(figsize=(15, 5))

plt.plot(x.reshape(-1), y.reshape(-1), label='origin')

plt.plot(evaluation['times'].reshape(-1), evaluation['mean'].reshape(-1), label='evaluation')

plt.plot(lstm_predictions['times'].reshape(-1), lstm_predictions['mean'].reshape(-1),

label='prediction')

plt.xlabel('time_step')

plt.ylabel('values')

plt.legend(loc=4)

plt.show()

Page 98: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Tensorflow Timesereies - LSTM

Page 99: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Forecasting test data

# Read test dataset

test = pd.read_csv('../input/test.csv', dtype=dtypes,

parse_dates=['date'])

test['dow'] = test['date'].dt.dayofweek

Page 100: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Forecasting test data# Moving Average

test = pd.merge(test, ma_is, how='left', on=['item_nbr','store_nbr'])

test = pd.merge(test, ma_wk, how='left', on=['item_nbr','store_nbr'])

test = pd.merge(test, ma_dw, how='left', on=['item_nbr','store_nbr','dow'])

test['unit_sales'] = test.mais

# Autoregressive

ar_predictions['mean'][ar_predictions['mean'] < 0] = 0

test.loc[np.logical_and(test['store_nbr'] == 1, test['item_nbr'] == 105574), 'unit_sales'] =

ar_predictions['mean']

# LSTM

lstm_predictions['mean'][lstm_predictions['mean'] < 0] = 0

test.loc[np.logical_and(test['store_nbr'] == 2, test['item_nbr'] == 105574), 'unit_sales'] =

lstm_predictions['mean']

Page 101: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Forecasting test data

pos_idx = test['mawk'] > 0

test_pos = test.loc[pos_idx]

test.loc[pos_idx, 'unit_sales'] = test_pos['unit_sales'] * test_pos['madw'] / test_pos['mawk']

test.loc[:, "unit_sales"].fillna(0, inplace=True)

test['unit_sales'] = test['unit_sales'].apply(pd.np.expm1) # restoring unit values

Page 102: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Forecasting test data

holiday = pd.read_csv('../input/holidays_events.csv', parse_dates=['date'])

holiday = holiday.loc[holiday['transferred'] == False]

test = pd.merge(test, holiday, how = 'left', on =['date'] )

test['transferred'].fillna(True, inplace=True)

test.loc[test['transferred'] == False, 'unit_sales'] *= 1.2

test.loc[test['onpromotion'] == True, 'unit_sales'] *= 1.15

test[['id','unit_sales']].to_csv('submission.csv.gz', index=False, compression='gzip')

Page 103: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow
Page 104: GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using TensorFlow

Thanks You!