trading decision trees ( elaborated by mohamed dhaoui )

32
Tunisia Polytechnic School Mini-Project Data Analysis How to Use a Decision Tree in trading ELABORATED BY : MOHAMED DHAOUI SUPERVISED BY: MR WAJDI TEKAYA

Upload: mohamed-dhaoui

Post on 12-Apr-2017

129 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Tunisia Polytechnic School

Mini-Project Data Analysis

How to Use a Decision Tree in trading

ELABORATED BY:

•MOHAMED DHAOUI

SUPERVISED BY:

•MR WAJDI TEKAYA

Page 2: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Plan

Method used : Decision Tree

The Database construction

R code & interpretations

2

Page 3: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Method used: Decision Tree

A Visual Representation of Choices, Consequences, Probabilities, and Opportunities

A Way of Breaking Down Complicated Situations Down to Easier-to-Understand Scenarios

A simple representation for classifying examples

To create a model that predicts the value of a target

based on several input variables

3

Page 4: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Example: How it works

Predict if John will play tennis

9 Yes / 5 No

New Data D15 Rain High Weak ?

Day Outlook Humidity Wind Play

D1 Sunny High Weak No

D2 Sunny High Strong No

D3 Overcast High Weak Yes

D4 Rain High Weak Yes

D5 Rain Normal Weak Yes

D6 Rain Normal Strong No

D7 Overcast Normal Strong Yes

D8 Sunny High Weak No

D9 Sunny Normal Weak Yes

D10 Rain Normal Weak Yes

D11 Sunny Normal Strong Yes

D12 Overcast High Strong Yes

D13 Overcast Normal Weak Yes

D14 Rain High Strong No

4

Page 5: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Example: How it works

Outlook

9 Yes / 5 No

Sunny

Overcast

Rain

2 Yes / 3 No Split further 4 Yes / 0 No

Pure subset

3 Yes / 2 No Split further

Day Outlook Humid WindD1 Sunny High WeakD2 Sunny High StrongD8 Sunny High WeakD9 Sunny Normal WeakD11 Sunny Normal Strong

Day Outlook Humid WindD3 Overcast High WeakD7 Overcast Normal StrongD12 Overcast High StrongD13 Overcast Normal Weak

Day Outlook Humid WindD4 Rain High WeakD5 Rain Normal WeakD6 Rain Normal StrongD10 Rain Normal WeakD14 Rain High Strong

5

Page 6: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Example: How it worksOutlook

Humidity

Sunny

OvercastWind

Rain

High Normal Weak Strong

0 Yes / 3 No Pure subset

2 Yes / 0 No Pure subset

3 Yes / 0 No Pure subset

0 Yes / 2 No Pure subset

NO NOYESYESNew Data D15 Rain High Weak YES

Day Humid WindD1 High WeakD2 High StrongD8 High Weak

Day Humid WindD9 Normal WeakD11 Normal Strong

Day Humid WindD4 High WeakD5 Normal WeakD10 Normal Weak

Day Humid WindD6 Normal StrongD14 High Strong

6

Page 7: Trading decision trees ( Elaborated by Mohamed DHAOUI )

ID3 algorithmSplit (nod, {examples})

1/ A: the best attribute for splitting the {examples}

2/ Decision attribute for this node

3/ for each value of A, create a child node

4/ for each child node/subset

if subset is pure stop

else : split (child_node, {subset})

7

Page 8: Trading decision trees ( Elaborated by Mohamed DHAOUI )

How we select the best attribute?

Outlook

Sunny RainOvercast

Wind

Weak Strong2 Yes / 3 No4 Yes / 0 No

3 Yes / 2 No

6 Yes / 2 No 3 Yes / 3 No

9 Yes / 5 No 9 Yes / 5 No

Which one is better?

8

Page 9: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Entropy• S is a sample of training examples

• p+ is the proportion of positive examples in S

• p- is the proportion of negative examples in S

• Entropy measures the impurity of S

• Entropy(S) = H(S) = - p+ log2( p+ )- p- log2( p- )

• H(S) = 0 if sample is pure (all + or all -), H(S) = 1 if p+ = p- = 0,5

• Impure set (3 Yes / 3 No)

H(S) = - (3/6) * log2(3/6) – (3/6) * log2(3/6) = 1

• Pure set (4 Yes / 0 No)

H(S) = -(4/4) * log2(4/4) – (0/4) * log2(0/4) = 0

9

Page 10: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Information gain

• Gain (S,A) = H(S) - 𝑉∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)𝑆𝑣

𝑆𝐻(𝑆𝑣)

Wind

Weak Strong

6 Yes / 2 No 3 Yes / 3 No

9 Yes / 5 NoH(S) = - (9/14) * log2(9/14) – (5/14) * log2(5/14) = 0,94

H(Sweak) = - (6/8) * log2(6/8) – (2/8) * log2(2/8) = 0,81

H(Sstrong) = - (3/6) * log2(3/6) –(3/6) * log2(3/6) = 1

Gain(S,wind) = H(S) – (8/14) * H(Sweak) – (6/14) * H(Sstrong)= 0,049

Gain(S,A)

Outlook 0,25

Humidity 0,15

Wind 0,049

10

Page 11: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Advantages and disadvantages

• Are simple to understand and interpret

• Allow the addition of new possible scenarios

• Help determine worst, best and expected values for different scenarios

o For data including categorical variables with different number of levels, information gain in decision trees are biased in favor of those attributes with more levels.

o A greedy algorithm: making the locally optimal choice at each stage but in general it does not produce an optimal solution

o Calculations can get very complex particularly if many values are uncertain and/or if many outcomes are linked.

11

Page 12: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Plan

Method used : Decision Tree

The Database construction

R code & interpretations

12

Page 13: Trading decision trees ( Elaborated by Mohamed DHAOUI )

The Database used is a panel of daily OHLCV of Bank of America's stock Retrievedfrom Yahoo FINANCE.

The Database construction

13

Page 14: Trading decision trees ( Elaborated by Mohamed DHAOUI )

O Opening price: The price at which a security first trades upon the opening of an exchangeon a given trading day.

H Today's high is the highest price at which a stock traded during the course of the day.Today's high is typically higher than the closing or opening price.

L Today's low is the lowest price at which a stock trades over the course of a trading day.Today's low is typically lower than the opening or closing price.

C Closing price : The final price at which a security is traded on a given trading day.

V The number of shares or contracts traded in a security or an entire market on a giventrading day. It is the amount of shares that trade hands from sellers to buyers as a measure ofactivity.

The Database construction

14

Page 15: Trading decision trees ( Elaborated by Mohamed DHAOUI )

The Database constructionThe analysis of the Stock exchange data request the calculation of some specific ratios:

Relative Strength Index - RSI

Exponential Moving Average - EMA

Moving Average Convergence Divergence – MACD

Smart money index - SMI

15

Page 16: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Relative Strength Index - RSI

A technical momentum indicator that compares the magnitude of recent gains to recentlosses in an attempt to determine overbought and oversold conditions of an asset. It iscalculated using the following formula:

RSI = 100 - 100/(1 + RS*)

*Where RS = Average of x days' up closes / Average of x days' down closes. (x = 3)

16

Page 17: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Relative Strength Index - RSI

An asset is deemed to be overbought once the RSIapproaches the 70 level, meaning that it may begetting overvalued and is a good candidate for apullback. Likewise, if the RSI approaches 30, it is anindication that the asset may be getting oversold andtherefore likely to become undervalued.

17

Page 18: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Exponential Moving Average - EMAA type of moving average thatis similar to a simple movingaverage, except that moreweight is given to the latestdata. The exponential movingaverage is also known as"exponentially weightedmoving average". The 12- and26-day EMAs are the mostpopular short-term averages.

18

Page 19: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Moving Average Convergence Divergence – MACD

A trend-following momentumindicator that shows the relationshipbetween two moving averages ofprices. The MACD is calculated bysubtracting the 26-day exponentialmoving average (EMA) from the 12-day EMA. A nine-day EMA of theMACD, called the "signal line", is thenplotted on top of the MACD,functioning as a trigger for buy andsell signals.

19

Page 20: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Smart money index - SMI

or smart money flow index: is a technical analysis indicator demonstrating investors'sentiment. The indicator is based on intra-day price patterns.The main idea is that the majority of traders (emotional, news-driven) overreact at thebeginning of the trading day because of the overnight news and economic data. There is alsoa lot of buying on market orders and short covering at the opening.

The basic formula for SMI is:

Today's SMI reading = yesterday's SMI – opening gain or loss + last hour change

20

Page 21: Trading decision trees ( Elaborated by Mohamed DHAOUI )

If, the SMI rises sharply when themarket falls, this fact would meanthat smart money is buying, andthe market is to revert to anuptrend soon.

Smart money index - SMI

The opposite situation is alsotrue. A rapidly falling SMI duringa bullish market means thatsmart money is selling and thatmarket is to revert to adowntrend soon.

21

Page 22: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Plan

Method used : Decision Tree

The Database construction

R code & interpretations

22

Page 23: Trading decision trees ( Elaborated by Mohamed DHAOUI )

R code & interpretationsLibraries:

quantmod Package that helps get the data from Yahoo Finance.

rpart Package containing algorithms related to decision trees.

rpart.plot Package that helps best visualize the decision tree.

23

Page 24: Trading decision trees ( Elaborated by Mohamed DHAOUI )

R code & interpretationsGetting the data:

startDate = as.Date("2012-01-01")

endDate = as.Date("2014-01-01")

getSymbols("BAC",src="yahoo",from = startDate,to=endDate)

Get the Open High Close Low Volume data from startDate to endDate

24

Page 25: Trading decision trees ( Elaborated by Mohamed DHAOUI )

R code & interpretationsCalculating the indicators

RSI3 <- RSI(Op(BAC),n = 3) #Relative Strength Indicator

EMA5 <- EMA(Op(BAC),n = 5) #Exponential Moving Average

EMAcross <- Op(BAC) - EMA5 #Difference between the open price and the 5-EMA

MACDsignal <- MACD(Op(BAC),fast = 12, slow = 26, signal = 9)[,2]

SMI <- SMI(Op(BAC),n=13,slow=25,fast=2,signal=9)[,1]

PriceChange <- Cl(BAC) - Op(BAC)

25

Page 26: Trading decision trees ( Elaborated by Mohamed DHAOUI )

R code & interpretationsConstructing the database

Class<-ifelse(PriceChange>0,"UP","DOWN") #Create a binary classification variable

DataSet<-data.frame(RSI3,EMAcross,MACDsignal,SMI,Class) #Create our data set

colnames(DataSet)<-c("RSI3","EMAcross","MACDsignal","Stochastic","Class") #Name the columns

DataSet<-DataSet[-c(1:33),] #Keep the good data (not NA)

TrainingSet<-DataSet[1:312,] #Use 2/3 of the data to build the tree

TestSet<-DataSet[313:469,] #Use 1/3 of the data as testing data

26

Page 27: Trading decision trees ( Elaborated by Mohamed DHAOUI )

R code & interpretationsThe decision tree

DecisionTree<-rpart(Class~RSI3+EMAcross+MACDsignal+Stochastic,data=TrainingSet, cp=.001)

Predict the Class attribute

Use indicators: RSI3, EMAcross, MACDsignal, Stochastic

Specify the data used to build the tree: TrainingSet

Specify the minimum information gain to justify the split

prp(DecisionTree,type=2,extra=8)

Plot the decision tree

27

Page 28: Trading decision trees ( Elaborated by Mohamed DHAOUI )

R code & interpretationsThe first decision tree

15 splits

4 indicators

28

Page 29: Trading decision trees ( Elaborated by Mohamed DHAOUI )

R code & interpretationsPruning the tree

printcp(DecisionTree) #shows the minimal cp for each trees of each size.

The minimum xerror value the best cp to use

cp=0.0272109

0272109

29

Page 30: Trading decision trees ( Elaborated by Mohamed DHAOUI )

R code & interpretationsThe pruned decision tree

PrunedDecisionTree<-prune(DecisionTree,cp=0.0272109)

Set the parameter cp to the value that gives the minimum cross-validated error

prp(PrunedDecisionTree, type=2, extra=8)

Plot the decision tree

30

Page 31: Trading decision trees ( Elaborated by Mohamed DHAOUI )

R code & interpretationsValidating the tree

table(predict(PrunedDecisionTree,TestSet,type="class"),TestSet[,5],dnn=list('predicted','actual'))

81 correct predictions over 157 52% accuracy

31

Page 32: Trading decision trees ( Elaborated by Mohamed DHAOUI )

Thank you for your attention

32