fouls in dutch soccer: a poisson point process

45
Fouls in Dutch soccer: A Poisson point process Jorik Harbers S2978245 January 6, 2021 Supervisor: Prof. dr. R.H. Koning Second assessor: Prof. dr. R.J.M Alessie

Upload: others

Post on 03-Feb-2022

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fouls in Dutch soccer: A Poisson point process

Fouls in Dutch soccer: A Poisson point process

Jorik Harbers S2978245

January 6, 2021

Supervisor:Prof. dr. R.H. Koning

Second assessor:Prof. dr. R.J.M Alessie

Page 2: Fouls in Dutch soccer: A Poisson point process

University of GroningenFaculty of Economics & Business

Fouls in Dutch soccer: A Poisson point process

Jorik Harbers S2978245

SupervisorProf. dr. R.H. Koning

Second assessorProf. dr. R.J.M Alessie

January 6, 2021

Abstract

�is paper investigates the occurrence of fouls within a match of soccer in the Dutch national league inthe seasons 2007-2008 until 2013-2014. �e occurrence of fouls is mostly absent in present literature andcould potentially be a new �eld of research. �is paper considers four di�erent samples, all teams, onlyteams playing at home, only teams playing away and fouls within a match itself. �e occurrence of foulsis investigated by means of a Poisson point process using di�erent rates. First, assuming a constant rate,then a time increasing rate, followed by a rate considering the time that has passed since the last foul. Andlastly, a rate taking into account all previously made fouls in the match by a sample. From the results, itcan be observed that the expected number of fouls increases while the match lasts. Furthermore, the modelconsidering the time since the last foul tells us that the expected number of fouls increases a�er a �rst foulhas been made. While it also tells us that the expected number of fouls in a minute is larger when a foul hasnot occurred for a longer time period. �e last model, considering all previous fouls, informs us that theexpected number of fouls per minute increases when more fouls are made within the match. �ese e�ectsare present for all di�erent samples, however, for teams playing home and teams playing away, these areless signi�cant. Lastly, all models provide evidence that there is a di�erence in the expected fouls made bya team playing at home and a team playing away.

Page 3: Fouls in Dutch soccer: A Poisson point process

Contents

1 Introduction 3

2 Literature 42.1 Unintentional foul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Professional foul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Hostile foul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Referee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.5 Home advantage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.6 Morals of a foul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.7 Goal scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.8 Poisson point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Methodology 73.1 Estimation process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Poisson point process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Homogeneous process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Non-homogeneous process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Time di�erence process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.6 Self-exciting process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.7 Likelihood ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Data 164.1 Team level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Home vs away . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Match level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Results 235.1 Homogeneous Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 Inhomogeneous Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.3 Time di�erence Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.4 Self-exciting Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.5 Fouls within a match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Conclusion 38

7 Discussion 38

A Derivation log-likelihood time di�erence process 42

B Derivation log-likelihood self-exciting process 43

2

Page 4: Fouls in Dutch soccer: A Poisson point process

1 Introduction

Soccer, one of the most popular and largest sports across Europe, is generally seen as a major entertainmentindustry. �ere are several professional competitions worldwide, with large fan bases following the leagueand supporting their favourite teams. �ese fan bases contribute to large turnovers and pro�ts made by teams.�e most common way to make a pro�t is by selling players, a�aining a high rank in the competition anda�racting sponsors. A higher rank yields more fans, more media coverage and could increase the willingnessto pay by sponsors. Such a ranking is a�ained by gaining points, either by a draw or a win. �e most impor-tant factor determining the result of a game is the number of goals scored and conceded by a team. However,there are more underlying factors for a team winning a game. �is could be home advantage (Pollard & Pol-lard (2005)), playing tactics (Lago-Ballesteros et al. (2012)) or just plain luck (Yue et al. (2014)). However oneof the less known factors determining the result of a match is a foul.

Fouls occur widely during a match and in many di�erent ways. Examples are a deliberate handball, holdingan opponent, a �ying tackle and many more. All have in common that the rules set by the organiser of thecompetition before the start of the season are broken. Whenever a foul is commi�ed there are several pun-ishments possible.

�e most common punishment for fouls is a free-kick. In this way, the disadvantaged team keeps possessionof the ball and can continue or start a new a�empt to score a goal. Besides this free-kick, the referee canchoose the give a yellow or a red card. A yellow card is a severe warning. �e foul was not harsh enough forreceiving a red card, however the player has to be cautious as another harsh foul will result in a second yellowcard. A player receiving two yellow cards will receive a red card and will be excluded from the game withoutbeing allowed to be replaced. In addition, the player is not allowed to play in the next game. Red cards canalso be handed directly. �is has the same e�ect with the player being sent o� the pitch, and those are for themore extreme fouls. �e last punishment is the penalty kick. �is will be awarded to a team if a player of theopponent is ending an a�ack in the box and yields a free shot at the goal with only the goalkeeper, which hasa high chance to result in a goal.

All of these di�erent fouls can be classi�ed in roughly 3 categories, using a similar approach as Gumusdaget al. (2011). �e �rst category, the unintentional foul is a foul occurring due to circumstances controlled onthe pitch. For instance, a player losing its balance and falling into an opponent, or an accidental handball.�e player has no intention of making the foul, however, the opposing team has a disadvantage and cannotcontinue their action.

�e second way is classi�ed as the intentional instrumental foul. �is is more commonly known as the strate-gic foul, an intentional foul made by a player to end a possible dangerous counter-a�ack. �ese fouls areoccurring more o�en across soccer according to Carmichael et al. (2000). �ese fouls are made intentionallybut are not meant to harm or injure the opponent.

�e last type of foul is the intentional hostile foul. �is foul is made intentionally and aims to harm the oppo-nent. Reasons for intentional fouls can be forms of payback or ge�ing revenge a�er a harsh foul, or the resultof irritation/disappointment when a match is not going according to plan.

A lot of di�erent fouls in di�erent categories can be made during a match. To �nd a distinction between thosefouls is hard, and is not the point of interest in this thesis. �e main point lies in the timing of a fouls made by

3

Page 5: Fouls in Dutch soccer: A Poisson point process

a team within a match. Is there a di�erence in the number of fouls made within parts of a match? For instance,is there a di�erence in the number of fouls made in the 30th minute compared to the 75th. What happens inthe time following a foul, is a foul more likely to occur short a�er another foul or is there some time withoutfouls? And lastly, does the number of fouls made before a certain point in time in the game give an insightinto how many fouls will follow in the remainder of the game. In addition, it might be of interest whetherthere are di�erences in these questions for teams playing at home or away. And how does this relate at amatch level, the combination of both these teams? �is match se�ing could be of interest in the a�ractivenessof the game, possibly increasing the amount of viewers at home or in the stadium.

�is will be investigated with multiple models in the se�ing of a Poisson point process. First using a con-stant rate, to model the expected number of fouls to occur within a match. A�er that a model assuming anincreasing number of fouls made over time will be �t, followed by a model taking into account the time thathas passed since the last foul. Finally, the di�erence in time between current time and all previous fouls isinvestigated to see if the expected number of fouls increases when more fouls are made and possibly givingmore weight to recent fouls made compared to fools made earlier in the match.

�ese methods using the most recent events are quite common in modelling earthquakes and is upcoming inthe �eld of �nance. For instance, one commonly cited paper in the �eld of earthquakes is Ogata (1988). Heinvestigates the timing and magnitude of earthquakes and the in�uence of a�ershocks and seismologic activ-ity in predicting the next major earthquake. Similarly in �nance, the method has implications for estimatingvalue-at-risk as Chaves-Demoulin et al. (2005) performed. �ey model a peak over threshold approach fordaily percentage return for indices, so estimated when a certain threshold was crossed and by what value.Especially, the main point of this process lies in tail estimation and emphasises the in�uence of recent eventsin comparison to more distant ones.

�is paper will be structured as follows. First, a brief literature review of relevant factors will be given, fol-lowed by the di�erent methods used in this paper. Section 4 will describe the data at the di�erent investigatedlevels. In section 5 the results will be discussed. �e paper ends summarising the conclusions and provides abrief discussion and possible suggestions for further follow up research.

2 Literature

Currently, there exists a gap in literature regarding the frequency of fouls within game dynamics and possiblerelations between them. A lot of soccer-related research has been focused on the frequency of goals beingscored and the presence of home advantage during matches. Furthermore, research has been made in themedical �eld, concerning the presence of injuries and the in�uence of endurance ability within a match. De-spite the lack of literature regarding the actual frequency of fouls, there are related topics that could explainpartly why fouls are made, drivers for fouls and why they occur o�en during a match.

As mentioned in the introduction fouls can be classi�ed in 3 ways. �ese 3 categories all have their speci�creasons to occur and classi�cation. Besides the di�erent sorts of fouls, factors on the occurrence of foulscould be the in�uence of the referee. �e referee decides when something can be classi�ed as a foul, and asthere could be factors in�uencing the referee. Similar, there is home advantage present in soccer as fans andfamiliarity can help a team perform di�erently which could be in�uential on the number of fouls made. Inthe rest of this literature review, all of these factors will be discussed separately, together with some litera-ture regarding morals of a foul. A�er that models using a similar approaches regarding goal scoring will be

4

Page 6: Fouls in Dutch soccer: A Poisson point process

discussed, and �nally one of our speci�c models will be discussed brie�y.

2.1 Unintentional foul

�e unintentional foul is a foul made by accident or some sort of clumsiness. �ere is no real intention tomake a foul or harm the opponent and it just happens during the match. For instance, Rampini et al. (2011)and Carling & Dupont (2011) both mention that the sprinting speed of players decreases during the game andplayers covered less distance when the game progressed. �is could result in players being slightly late for aninterception missing the ball and making a foul. �ese fouls are unintentionally, however, they occur. Otherreasons for this type of foul could be players who are too eager, misjudge a pass or have an incorrect timingin their tackle.

2.2 Professional foul

�e professional foul can be seen as a foul made on purpose, but is not meant to harm the opponent orcreate injuries. It is a decision made by a player to end an a�ack. According to Carmichael et al. (2000) theprofessional foul has become part of a defenders repertoire. �ere are bene�ts in fouls that seem to outweighthe punishment of a foul, making this particular foul bene�cial for the team. Similarly, Gumusdag et al. (2011)mention that players seem to �nd a balance between the punishment and the pro�t of a foul, for instance toavoid a goal by a foul near the end of a game. An example of this is the red card conceded by Luis Suarezduring the quarter-�nal of the world championship against Ghana. �e player made a handball on the goalline, preventing a goal to be scored. Ghana was rewarded with a penalty kick and Suarez with a red card.�e resulting penalty was missed and Suarez team made it to the next round. �e penalty and the red cardwere accepted by the player, in the hope of giving his team a chance to win the game and proceed. �is is anextreme example of a professional foul. However, there are more examples. For instance a player not keepingup with the speed of an a�acker and pulling back the a�acker, or make him lose his balance.

2.3 Hostile foul

�ese intentional harsh fouls are not intensively discussed in current literature. However, Gumusdag et al.(2011) mention that these are mostly emotional fouls. For instance things not going according to plan triggersa reaction based on disappointment or frustration or a payback. An example of an aggressive foul is the wellknown foul by Zinedine Zidane in the world championship �nal in 2006. He gave a headbu� to opponentMarco Materazzi and received a red card. Other more general examples of such fouls are intentional elbows,pedalling a�erwards or even biting.

2.4 Referee

A�er mentioning the di�erent types of fouls and possible reasons for them, they still have to be classi�ed asfouls during the game. �e referee decides when something is a foul or not and has the lead of the match. Areferee is basing his decision on what he sees and makes a judgement whether an action can be classi�ed as afoul. As this classi�cation is based on a judgement call there are di�erences between referees. Some refereesallow players to have more physical contact than others in the same situation. Depending on their style thereferee can have a large impact on the game.

�is is con�rmed by Su�er & Koche (2004), stating that referees have a very important in�uence on the resultof a match. �ey conclude that a referee has a systematic bias in favour of home teams. Similarly, Nevill et al.

5

Page 7: Fouls in Dutch soccer: A Poisson point process

(2002) conclude that referees were less certain about their decisions regarding home teams. �is could lead tobiases of the referee, by not whistling for a foul by the home team, or be unintentionally more willing to givefouls against the away team. However, it is not investigated whether it results in an actual advantage for theteams playing at home related to the frequency of fouls.

2.5 Home advantage

Regarding home advantage, Pollard & Pollard (2005) give a review of home advantage in di�erent competi-tions and researches. �eir conclusion is that home advantage is present in all levels of soccer. �ey state thatit is probably due to the home crowd in the stadium, which in turn can result in the previously mentionedreferee bias, and possibly increases the home teams chances to win. Other reasons for home advantage couldbe familiarity with the location. Knowing which part of the pitch is �at for clear passing or making be�er useof the spaces on the �eld as the size of �elds are allowed to di�er. Similar to this, territoriality dri�s of theplayer can have an in�uence as well, with players being more reluctant to lose during a game at home.

Furthermore, some teams have di�erent playing tactics at home. Teams usually play more o�ensive at homecompared to away. Which could result in more goals being scored and could alter the result of the match. �isdi�erence in playing tactics could also be of interest to our research. As di�erent playing styles could a�ectfouls being made by a team. In conclusion, regarding home advantage in a match, Pollard & Pollard (2005) doconclude it is present based on multiple researches, however, the reasons are not necessarily clear.

2.6 Morals of a foul

Concerning that fouls are common practice, there are some publications that treat fouls from a moral point ofview. For instance Moore (2017) elaborates on whether it is ethically acceptable to make a professional foul,and lists both reasons why it is acceptable or not.

Di�erently, Traclet et al. (2011) have classi�ed fouls and the players reactions on these fouls as moral disen-gagement. �ey performed a research of what justi�ed these fouls from a players point of view. In some casesthe referee was blamed for the foul, coaches were blamed. And more di�erent reasons why players did nottake responsibility for the foul made. Both publications show a more moral point of view on fouls being made,and gives some more insight in what drives players to make a foul.

2.7 Goal scoring

�ere is a large literature base using a similar approach to the number of goals scored. �is has an importantuse in for instance be�ing, in which the number of goals or the �nal result is of interest. For instance, Everson& Goldsmith-Pinkham (2008) use a Poisson approach to model the goals scored by teams using o�ensive anddefensive qualities. Similarly, Dixon & Robinson (1998) predict goal scoring in a match using an exponentialdistribution and found that the scoring rates change during the game depending on the score, home advantageand time le� to play. Both use multiple factors to estimate the number of goals and therewith the �nal resultof a game. �ere are more examples of papers that have performed research in predicting goal scoring or todetect the important factors for scoring goals.

6

Page 8: Fouls in Dutch soccer: A Poisson point process

2.8 Poisson point processes

�is paper aims to estimate the di�erent models using a Poisson point process, with a particular interest inthe self-exciting processes. For our sample, these self-exciting processes turn out to be similar to a so-calledHawkes process. According to Hawkes (2018), the usage of this process was small in the last 25 years with anexception among seismologists. However, in recent years the use in �nance and social network studies hasbeen increasing fast.

�e most commonly known paper in the seismologic �eld is Ogata (1988), investigating spatial in�uences in anearthquake process, using a speci�c variant of this model for their modelling of earthquakes. If an earthquakeor shock is captured early, this model could predict the location and magnitude of a possible new large shockfor the upcoming time. Furthermore Ogata performed multiple research concerning the Hawkes process,in Ogata (1978) investigating stationarity of the model and imposing certain restrictions to the earthquake-related Poisson processes.

In �nance the contribution has been growing over the years. One of the �rst papers using such a similar ap-proach are Chaves-Demoulin et al. (2005). Using a self-exciting process they model the timing of exceedancesusing recent events a�ecting the current intensity more than distant ones. �ey conclude that their modelyields a reasonable model for the behaviour of returns. Other papers involving a self-exciting process areBacry et al. (2015), giving an overview in Hawkes process usage in �nance and mathematical theory. �eyconclude that the Hawkes process allows characterising precisely between di�erent evens and accounts forcausal relations however all of these relations can be modelled within a rather simple framework.

Considering the fouls and the motivation behind of them, there are large di�erences present. We are notparticularly interested in why a foul is made or whether or not it is justi�able to make the foul. �e 3 categoriesmentioned distinguishes between motivational issues regarding the foul. However, the point of interest of thispaper lies in the timing of fouls made and the in�uence of previous fouls.

3 Methodology

�e aim of this research is to investigate the timing of fouls, e�ects over time within a game and if there is arelation between the expected fouls and the time that has passed since the last foul. And it investigates whathappens to the fouls to come if the timing of all previous made fouls within a match are taken into consider-ation. Is the e�ect of recent fouls larger compared to fouls more distant? To model these di�erent processesa point Poisson process will be used. �is process uses a rate λ(t) which needs to be speci�ed beforehand toestimate the di�erent models. λ(t) can have di�erent forms, either constant, increasing over time or takingprevious events into account. In this research di�erent forms of λ(t) are used. First, the constant parametermodel which is most common. For one of the models we assume that λ(t) is increasing over time. A�erthat we take into account the time that has passed since the last foul. Finally, to model fouls all earlier madefouls are included by an intensity function in a similar manner as in Chaves-Demoulin et al. (2005). First ourestimation process will be discussed, then the Poisson point process and the di�erent rates for λ(t) and �nallya comparison method is presented.

7

Page 9: Fouls in Dutch soccer: A Poisson point process

3.1 Estimation process

All parameter estimates are based on maximum likelihood estimation. �is implies that for each used speci�-cation of λ(t) a log-likelihood function has to be speci�ed. �erea�er this log-likelihood has to be maximisedwith respect to the parameters to �nd the corresponding parameter estimates. Di�erent methods can be usedto maximise the log-likelihood function. For instance, by Azzalini (1996) we can use numerical di�erentiation,and set the partial derivatives of the log-likelihood function equal to zero. Or alternatively checking multipleparameter values and search for the parameters where the maximum value is obtained. Since a log-likelihoodfunction can have multiple local maximums, in the last option multiple starting values have to be used toobtain the global maximum. A�er comparing these function values an overall optimal function value can befound with the according parameters. �ese parameters will equal our maximum likelihood estimates if asecond-order condition is ful�lled; the hessian should be negative de�nite. �e hessian is a matrix with thesecond-order derivatives of the log-likelihood function and can be evaluated at points of interest.

�is hessian, we can use to obtain the standard errors for our parameters estimates. According to McNeil etal. (2015), one of the properties of the maximum likelihood estimators is that:

√n(θn − θ)

d−→ N(0, I(θ)−1),

where I(θ) denotes the expected Fisher information matrix. �e expected Fisher information matrix is de�nedby:

I(θ) = −E(

∂2

∂θ∂θ′L(θ;X)

).

Due to the asymptotic properties, we have that:

θ ∼ N(θ,

1

nI(θ)−1

).

Instead of using the Fisher information, I(θ), we can use the hessian to approximate it when the sample sizeis large. Using this, for an individual parameter j of θ we have that our standard error equals:

se(θj) =

√1

nI(θ)−1jj .

And therefore the standard errors can be approximated by the hessian a�er certain translations. �is hessianneeds to be evaluated at the parameters of the optimal point. �e hessian can either be found numerically bydi�erentiating the log-likelihood function and �ll this in. Or for more complex functions it can be approxi-mated numerically using so�ware. And the standard errors will equal the square root of the diagonal elementsof the approximated hessian.

3.2 Poisson point process

�e modelling of events is generally performed using a Poisson point process. Following the notation of Coles(2004) we can provide the basics of this process. A point process can be seen as a stochastic rule for the oc-currence and position of point events. �e model can, given a period of time A, describe the occurrence ofevents. It can model either the probability of a certain number of events or the expected waiting time untilthe next.

8

Page 10: Fouls in Dutch soccer: A Poisson point process

�e statistical properties of a point process can be de�ned by a set of non-negative integer-valued randomvariables N(A), for A ⊂ A, where N(A) is the number of points in set A. And can be used to specify theprobability distribution in a consistent way for each N(A). And one of the main features of the point processis:

Λ(A) = E(N(A)),

which provides the expected number of point in any subsetA ⊂ A and is the intensity measure of the process.

For a one-dimensional process the common form is a Poisson process. Either a homogeneous or an inhomo-geneous Poisson process or another form of Poisson process. All of these make use of a intensity rate λ(t)that has to be speci�ed beforehand.

For a Poisson process it must hold that:

• for all A = [t1, t2] ⊂ A,N(A) ∼ Poi(Λ(A)),where,

Λ(A) =

∫ t2

t1

λ(t) dt

• for all non-overlapping subsets A and B of A, N(A) and N(B) are independent random variables.

�erefore, one of the intrinsic properties of a Poisson process is that points occur independently of another. Aoccurrence of an event at point x has no direct or causal in�uence of events happening at any other moment intime and is memoryless regarding direct relations to the previous events. Di�erences in the number of occur-rences are possible due to variations in the rate λ(t) and is not due to the presence or absence of events nearby.

Application of this model requires a set of observed points N , occurring random at time T1, . . . , TN , and arethe realisations of the Poisson process on A, with rate λ(.;θ) for some value of the parameters θ. For thelikelihood we assume that events have occurred at times T1, . . . , TN and can not occur at other times inA. LetIi = [Ti, Ti + δi], for i = 1, . . . , N be a set of small intervals based around the di�erent observed occurrencesand let I = A\ ∪Ni=1 Ii. �en by the Poisson property it holds that:

Pr(N(Ii) = 1) = exp(−Λ(Ii;θ)Λ(Ii;θ), where

Λ(Ii;θ) =

∫ Ti+δi

Ti

λ(u) du ≈ λ(Ti)δi.

Where we have used that exp(−λ(Ti))δi ≈ 1, for small δi. Furthermore,

Pr(N(I) = 0) = exp(−Λ(I) ≈ exp(−Λ(A))

since δi is small.

9

Page 11: Fouls in Dutch soccer: A Poisson point process

Using these di�erent probabilities we can construct the likelihood function as:

L(T1, . . . , TN ;θ) = Pr(N(I = 0, N(I1) = 1, N(I2) = 1, . . . , N(IN ) = 1)

= Pr(N(I = 0))N∏i=1

Pr(N(Ii) = 1)

≈ exp(−Λ(A;θ))

N∏i=1

λ(Ti;θ)δi.

�is expression can be turned into a density by dividing by δi, this leads to:

L(T1, . . . , TN ;θ) = exp(−Λ(A;θ))N∏i=1

λ(Ti;θ), where

Λ(A;θ) =

∫Aλ(T ;θ) dt.

Alternative to the likelihood function, the log-likelihood function can be constructed. �is requires a mono-tonic transformation by means of the natural logarithm to the likelihood function and can be used more easilywith regard to our modelling approach.

L(T1, . . . TN ;θ) = log(L(T1 . . . TN ;θ)

= log

[exp(Λ(A;θ))

N∏i=1

λ(Ti;θ)

]

= log

[exp(−Λ(A;θ))

]+ log

[ N∏i=1

λ(Ti;θ)

]

= −Λ(A;θ) +

N∑i=1

log(λ(Ti;θ)). (1)

�is equation for L(T1, . . . , TN ;θ) is for a general Poisson point process and can be used for performingmaximum likelihood estimation when a speci�c form of λ(t) is known. �is form has to be �lled in and canbe used to �nd our maximum likelihood estimates. For λ(t) di�erent forms can be chosen, depending on thesort of process. Furthermore, when this general from of λ(t) is known and our estimation interval, we can�nd and �ll in the actual expression for Λ(A;θ) into the expression for the log-likelihood.

In the remainder of this paper for a single observation z,Nz events occur with the according times T1, . . . , TNz

in the M minutes for our investigated sample. And we will �rst calculate the individual log-likelihoodLz(T1, . . . , TNz ;θ). For the log-likelihood of multiple observations, the time of multiple events will be notedas data, due to the di�erent Ti for each sample z.

3.3 Homogeneous process

�e most simple version of a Poisson point process is the homogeneous Poisson process. �is process assumesone constant parameter λ > 0 for λ(t) over the whole time span. �e log-likelihood can be obtained by �llingλ into equation (1) and calculating Λ(A;θ). Doing this yields:

Λ(A;θ) =

∫ M

0λ ds = [λ · s]M0 = Mλ.

10

Page 12: Fouls in Dutch soccer: A Poisson point process

Now �lling in this expression for Λ(A;θ) and the expression for λ(t) into our likelihood expression we havethat:

Lz(T1, . . . , TNz ;θ) =− Λ(A;θ) +

Nz∑i=1

log(λ) =

−Mλ+

Nz∑i=1

log(λ) = −Mλ+Nz · log(λ).

Now for the full likelihood for a total number of matches within a sample of size Z, we need to add thelog-likelihood of the separate matches.

L(data;θ) =Z∑z=1

Lz(T1, . . . , TNz ;θ)

=Z∑z=1

(−Mλ+Nz · log(λ)

)= −Z ·M · λ+

Z∑z=1

Nz log(λ).

To obtain our maximum likelihood estimator we need to di�erentiate with respect to λ and equate this tozero. Doing this we �nd that:

−M · Z +

Z∑i=1

Nz

λ= 0

1

λ

Z∑z=1

Nz = M · Z

λ =

∑Zz=1Nz

M · ZTo obtain the standard errors we should take a second hand order derivative, this results in:

Hessian(data;θ) =∂2L(θ; data)

∂λ2=∂ −M · Z +

∑Zz=1

Nzλ

∂λ= −

Z∑i=1

Nz

λ2.

As this is a hessian for a single parameter model, it holds that:

V ar(λ) =1

−−∑Z

i=1Nz

λ2

=1∑Z

i=1Nz

λ2

=λ2∑Zi=1Nz

.

�erefore we can �nd our maximum likelihood estimate for λ by dividing our mean number of events by thetotal time spanned. Our standard error for λ equals the square root of λ squared divided by the total numberof events.

In this simple version, the maximum likelihood estimator and the standard errors can be obtained by numericaldi�erentiation of the log-likelihood function. For more complex functions this is rather di�cult and so�warecan be used for a numerical approximation.

11

Page 13: Fouls in Dutch soccer: A Poisson point process

3.4 Non-homogeneous process

�e previous model assumed a constant rate for λ(t). However, this rate does not have to be constant as theremight be di�erences in number of events over time. �erefore, in this case, λ(t) is allowed to change over timeand another speci�cation for λ(t) has to be decided. For this particular se�ing we make use of the function:

λ(t) = α+ β · t , where α > 0, β ≥ 0.

As the rate λ(t) must be greater than zero, we assume that α > 0, furthermore as we assume an increasingrate, it must hold that β ≥ 0. Since the trend might not be increasing, β must be allowed to equal 0 aswell. We can estimate these parameters, by �lling in equation (1) with regard to this process, and obtain thelog-likelihood. First of all, this equation needs to be simpli�ed in order to get to a solution. Starting withΛ(A;θ):

Λ(A;θ) =

∫ M

0λ(s) ds =

∫ M

0α+ βs ds =

∫ M

0αds

∫ M

0βs ds

=

[αs

]M0

+

[1

2βs2]M0

= α(M − 0) +1

2β · (M2 − 0)

= M · α+1

2·M2 · β.

Filling in both equations for Λ(A;θ) and λ(t) into equation (1), we �nd that:

Lz(θ;T1, . . . , TNz) = −Λ(A;θ) +

Nz∑i=1

log(λ(Ti)

= −M · α− 1

2·M2 · β +

Nz∑i=1

log(α+ β · Ti)).

Now we want to �nd our maximum likelihood estimators for the total sample, therefore we need to add thelikelihoods of the individual cases. �en we have:

L(data;θ) =

Z∑z=1

Lz(T1, . . . , TNz ;θ) =

Z∑z=1

(−M · α− 1

2·M2 · β

)+

N∑i=1

log(α+ β · Ti)

=

Z∑z=1

(−M · α− 1

2·M2 · β

)+

Z∑z=1

Nz∑i=1

log(α+ β · Ti)

= Z ·(−M · α− 1

2·M2 · β

)+

Z∑z=1

Nz∑i=1

log(α+ β · Ti).

Now we have to take derivatives with respect to α and β to �nd equations our the parameters α and β haveto ful�l to be the maximum likelihood estimators for our sample.

∂L(data;θ)

∂α= −

∂Z ·(−M · α− 1

2 ·M2 · β

)+∑Z

z=1

∑Nzi=1 log(α+ β · Ti)

∂α

= −M · Z +Z∑z=1

Nz∑i=1

1

α+ β · Ti

12

Page 14: Fouls in Dutch soccer: A Poisson point process

∂L(data;θ)

∂β= −

∂Z ·(−Mα− 1

2 ·M2 · β

)+∑Z

z=1

∑Nzi=1 log(α+ β · Ti)

∂β

= −1

2·M2 · Z +

Z∑z=1

Nz∑i=1

Tiα+ β · Ti

So it should hold simultaneously that:

MZ =

Z∑z=1

Nz∑i=1

1∑Zz=1 α+ β · Ti

1

2·M2 · Z =

Z∑z=1

Nz∑i=1

Tiα+ β · Ti

.

�ese equations can not be solved directly and therefore our maximum likelihood estimators ful�lling the con-straint should be found by numerical optimisation. Furthermore, to obtain standard errors of the parametersthe hessian can be estimated numerically to �nd them.

3.5 Time di�erence process

Another possible form for λ(t) takes into account the time that has passed since the previous event. In fact,this is a relaxation of the yet to come self-exciting Poisson process. �e di�erence lies in the assumptionsregarding the parameter γ and the inclusion of the last event instead of all previous events. However, this willbe addressed later. For this model we assume that:

λ(t) = τ + ψ · 1Tj<t<Tj+1 · exp(−γ · (t− Tj)),

where Tj denotes the last event that has occurred before time t and Tj+1 is the moment in time of the followingevent. For this particular case, τ implies the initial rate and it should hold that τ > 0. And ψ ≥ 0 showsthe increase if a foul has been made, while γ shows the excitation over time and can be both negative andpositive. Whether there is a increase or decrease in the rate λ(t) when more time has passed. Now we can �llthis particular form of λ(t) into equation (1), and we �nd that:

Lz(T1, . . . , TNz ;θ) = −M · τ +ψ

γ·N−1∑j=1

(exp(−γ · (Tj+1 − Tj))− 1) +ψ

γ(exp(−γ · (M − TNz))− 1)

+

Nz∑i=1

log(λ(Ti)),

For the full derivation of this expression we want to refer to Appendix A. However this expression is onlyfor a single observation, to obtain the log-likelihood for our full sample we need to add all the di�erent Z

13

Page 15: Fouls in Dutch soccer: A Poisson point process

observations.

L(data;θ) =

Z∑z=1

Lz(T1, . . . , TNz ;θ)

=Z∑z=1

−M · τ +ψ

γ·Nz−1∑j=1

exp(−γ · (Tj+1 − Tj)) +ψ

γexp(−γ · (M − TNz) + log(λ(Ti))

= −M · Z · τ +ψ

γ

Z∑z=1

Nz−1∑j=1

(exp(−γ · (Tj+1 − Tj))− 1) +ψ

γ(exp(−γ · (M − TNz))− 1)

+ log(λ(Ti))

We can solve this numerically using so�ware and �nd the optimal values for the parameters. Furthermore,we can �nd a numerical approximation for the hessian in order to �nd the standard errors for the parameters.

3.6 Self-exciting process

A special class of the Poisson point processes are the self-exciting processes. And can be seen as an exten-sion to the homogeneous and inhomogeneous Poisson point processes. �e di�erences between the regularPoisson process and the self-exciting process is that the self-exciting process assumes an additional intensityfunction. �is intensity function is based on the assumption that events, in this occasion fouls, are clustered.�erefore a �rst occurrence could start a set of events in the near future. Using McNeil et al. (2015) as a ref-erence point, we can construct the form of λ(t) with the according likelihood function and assumptions.

Given that for a match z, we have Nz occurring events. �e general self-exciting process will be:

λ(t) = τ + ψ ·∑Tj<t

h(t− Tj),

where τ > 0, ψ ≥ 0 and h() is a positive-valued function.

In this se�ing only the timing of fouls are relevant. In other se�ings such as earthquakes in Ogata (1988),value at risk in Chaves-Demoulin et al. (2005) or spatial sciences it is also of interest what size the occurringevent is. And events will only be considered an event a�er a certain threshold has been made for the variableof interest. For possible versions of this we want to refer to McNeil et al. (2015).

�ere are di�erent possibilities for the intensity function. In this particular case, we have chosen for anintensity function of:

h(s) = exp(−γ · s),where γ > 0.

Using this, we can �nd our �nal equation for λ(t), and we �nd that:

λ(t) = τ + ψ ·∑Tj<t

exp(−γ · (t− Tj)).

Similar to the time di�erence model are the assumptions regarding τ and ψ. τ needs to be positive and ψ isgreater than or equal to zero. Regarding the estimates of γ there are di�erences. �e time di�erence model

14

Page 16: Fouls in Dutch soccer: A Poisson point process

estimates for γ are allowed to be negative, this to allow for an increase over time. However in this se�ing anegative estimate for γ would imply that events occurring further in the past provide more information aboutthe expected events to occur at time t, which discards the properties of this model. �erefore this model hasthe extra assumption of γ ≥ 0.

Figure 1: Example simulation self-exciting process

Due to the combination of the assumptions for our team, our process above will boil down to a so calledHawkes process as de�ned and focused on in Ozaki (1979). An example of a simulated self-exciting or Hawkesprocess can be seen in Figure 1. In this �gure such a process has been simulated with τ = 0.2, ψ = 0.5 andγ = 1, while the latest event is allowed to occur at moment 90. �is �gure shows us that events are clustered.Multiple events occur at the same moment in time, while for other long periods no event occur. �e periodsbetween the fouls depend on the size of τ . A large τ implying larger periods between the occasions. �enumber of events nearby depends on the parameter ψ, the larger ψ, the more events occur a�er one event.Lastly, γ shows the behaviour of λ(t) shortly a�er an event. A small γ suggests that more events are likely tooccur in the following short period, while a large γ suggests that the large rate of events occurs local and hasa lower probability to have in�uence on the rate for future events nearby.

To estimate the parameters the log-likelihood should be maximised. From equation (1) a solution to the log-likelihood for the general Poisson point process depends on the form of λ(t). We can transform this for ourspeci�c case by �lling in our expression for λ(t) and �nding an expression Λ(A;θ). To maximise this functionnumerically it needs to be rewri�en into an easier form, when doing this we �nd the expression for the log-likelihood below for a single observation z. For the full manipulations of obtaining Lz(T1, . . . , TNz ;θ) werefer to Appendix B.

Lz(T1, . . . , TNz ;θ) = −M · τ +ψ

γ·N∑i=1

(exp(−γ(M − Ti))− 1

)+

Nz∑i=1

log(λ(Ti)).

Now that we have our log-likelihood for a single expression, again we need to add all of them to yield ouroverall maximum likelihood estimators.

15

Page 17: Fouls in Dutch soccer: A Poisson point process

L(data;θ) =Z∑z=1

Lz(θ;T1, . . . , TNz)

=

Z∑z=1

[−M · τ +

ψ

γ·Nz∑i=1

(exp(−γ(M − Ti))− 1

)+

Nz∑i=1

log(λ(Ti)

]

= −M · τ · Z +ψ

γ

Z∑z=1

Nz∑i=1

(exp(−γ(M − Ti))− 1

)+

Z∑z=1

Nz∑i=1

log(λ(Ti))

�is expression can be maximised numerically using so�ware to obtain our maximum likelihood estimates.Furthermore, the hessian can be approximated to get the according standard errors.

3.7 Likelihood ratio test

For comparison of di�erent samples a likelihood ratio test will be used. �e likelihood ratio test is based onthe log-likelihood and will asymptotically follow a χ2 distribution with the degrees of freedom equalling thenumber of parameters in di�erence between the models. �is likelihood ratio test has a null hypothesis of theparameters being equal, versus the alternative hypothesis of di�erent parameters. �e likelihood ratio equals−2 · (LLnull − LLalternative), and will be rejected if this ratio is larger then the value of the according χ2

distribution. If it smaller than the value of the χ2 distribution we fail to reject the null hypothesis. For fulldetails regarding this test we want to refer to Azzalini (1996).

4 Data

�e data is provided by Infostrada and contains records of matches in the Eredivisie, the national soccer leagueof the Netherlands. �e Eredivisie is a round-robin competition with 18 teams, each team playing each othertwice. One time they play in their stadium at home and the other time away in the stadium of the opponent.Due to this set up of the league, every season contains 306 distinct matches with in total 612 di�erent obser-vations to investigate. �e data is available starting from the season 2007-2008 and continues up and until theseason 2013-2014, totalling 2142 matches.

For each match, there is data available for every distinct action regarding foul play. For every foul, yellowcard or red card there is information on the minute it has occurred, the player who made the foul, the team heis playing for, the opponent and the referee. For the yellow and red cards, there is also a general descriptionof the foul available. As we are interested in modelling the occurrence of fouls, the main variable of interestis the minute a foul occurred, and identi�cation variables for the foul regarding team and match.

Due to limitations in modelling all fouls made a�er the 90th minute will not be considered. Similarly, foulsmade in the extra time of the �rst half are not considered either. Since the extra time is variable per matchthis cannot be included in the model for each speci�c match. Furthermore, the inclusion of fouls in the extratime of the �rst half would yield an issue as these minutes coincide with the minutes of the second half. �iswould lead to a high count of fouls in those minutes and could yield biased results. Besides the modellingof extra time, the combination of fouls, yellow cards and red cards lead to some di�culties. All are seen asequal. However, two yellow cards will make a red card too. And when a card is given it is usually a followup of a foul, however, there are occasions a yellow card is not recorded in the data as a foul. �is could be for

16

Page 18: Fouls in Dutch soccer: A Poisson point process

instance, when advantage has been given and the referee only gave the card to the player at the end of thea�ack. �erefore the data on all actions is added and a�erwards �ltered on whether a player has had multiplefouls in the same minute or the minute before.

Figure 2: Distribution fouls across matches

Examples of how fouls are distributed among a match are given in Figure 2. �e fouls made in the match DeGraafschap-Ajax on the 19th of August 2007 can be seen on the le�. While on the right the fouls made in thematch Excelsior-FC Groningen on the 18th of December 2010 can be seen. From this �gure, it can be noticedthat in general fouls do not come on their own and there is a clustering present. �ere are occasions wheremultiple fouls are made in the same minute or shortly a�er a foul in the next minutes. However, there arealso periods where no fouls are made in more than 30 minutes a�er the previous foul. �ese two exampleslook slightly similar as our simulated example in Figure 1. However, there is a lot of variability in the timebetween fouls within matches and this will be addressed in more detail over the seasons later by the averagewaiting times.

Lastly, since multiple models are being estimated for all teams, a distinction between home playing and teamsplaying away will be made and a combined match evaluation will be investigated. Multiple samples will beused for estimating these di�erences. All of these samples will be described separately starting with separateteams, followed by home and away teams and �nishing with fouls at a match level.

4.1 Team level

In total there are 2142 distinct matches in our sample, which will yield a total of 4284 distinct observations.However, there is one match that stands out, Ajax-NEC Nijmegen, in which Ajax has not made a single foulwithin the entire 90 minutes. �e only foul made was in the extra time of the match and is therefore notconsidered in the models. �is suggests that within our estimation sample the least number of fouls made bya team is zero, followed by multiple teams making only 3 fouls during the match. �e maximum number offouls made by a single team equals 32, occurring for several teams. To put this in perspective, this implies

17

Page 19: Fouls in Dutch soccer: A Poisson point process

that on average a team makes a foul in less than every 3 minutes. �e average number of fouls for the generalsample lies around 14.660, implying a foul every 6 minutes. More summary statistics can be found in Table 1.

Table 1: Summary statistics all teams over seasons per match

Season Season no. fouls Mean no. fouls Sd. no. fouls Avg wait time Sd. Avg wait time2007-2008 9813 16.034 4.614 5.753 1.8322008-2009 9781 15.982 4.421 5.761 1.7362009-2010 9308 15.209 4.379 6.062 1.8912010-2011 9171 14.985 4.278 6.153 2.0222011-2012 8484 13.863 4.358 6.752 2.442012-2013 8362 13.663 4.261 6.803 2.3902013-2014 7871 12.861 4.032 7.214 2.634

All 62790 14.660 4.479 6.357 2.221

When inspecting Table 1 �rst of all it stands out that the number of fouls is decreasing over the seasons. �iscan both be observed from the decrease in the total number of fouls in the season as well in the mean numberof fouls per match. �e standard deviation is decreasing over the seasons as well, indicating this might betrue. Performing an Anova test for an equal mean number of fouls per match over all the seasons gives us theevidence. A p-value of less than 0.001 rejects the null hypotheses of equal means and therefore we concludethat there are di�erences between the seasons amongst the mean. Besides the average number of fouls permatch, the time between fouls is of interest as well.

Figure 3: Count average waiting time in a match over seasons

�e summary statistics regarding the average time between fouls of a team, referred to as the average waitingtime, are also present in Table 1. �ere seems to be an increase in the average time between fouls over allseasons. Again this is tested and con�rmed by an Anova test signi�cant at a 1 per cent level. In addition tothe summary statistics, in Figure 3 the distribution of the average waiting times can be seen. From this �gure,it can be observed that for most teams the average waiting time lies below 10 minutes, with a peak around

18

Page 20: Fouls in Dutch soccer: A Poisson point process

the 7th minute for all distinct seasons. Furthermore, it can be noticed that the shapes are quite similar for allseasons, all having a bell shape. �is indicates that most of the observations lie around the mean. However,the tail of the distribution is slightly larger than the head, also due to the presence of matches with a smallnumber of fouls resulting in large average waiting time, for instance, more than 20 minutes.

Figure 4: Histogram number of fouls per minute across season

Besides the previously mentioned summary statistics, it is important to know at what time a foul is occurring.Figure 4 shows the total number of fouls per minute for each distinct season. First of all, it stands out thatthe y-axis is di�erent regarding the maximum amongst seasons. Overall it can be concluded that there is abaseline in the level of fouls per minute where it moves around depending on the minute with both highsand lows. However, based on eye-balling the data a clear trend can not be observed, there is for instance noincrease in the number of fouls over time.

Due to the di�erences between seasons all signi�cant at a 1 per cent level, all seasons must be evaluatedseparately. However, to make general conclusions, a combination of all seasons will be evaluated as well.

4.2 Home vs away

As mentioned previously in the literature, referees can make di�erent decisions under in�uence of a homecrowd. Also, the di�erent playing tactics and behaviour of players could lead to di�erences in the fouls madeby home and away teams. �is could result in di�erences between samples and possible estimation levels.

Summary statistics regarding the average number of fouls can be found in Table 2. �is shows both themean and standard deviation for both samples for all di�erent seasons. Furthermore, a p-value is includedfor a univariate t-test whether or not there is a di�erence between the number of fouls per match for homeand away teams. A p-value lower than 0.05 rejects the null hypothesis for equal means. For all seasonsthis is the case, providing evidence that there are di�erences in the number of fouls for the home and awaysamples. Furthermore, an Anova test for equal means amongst multiple groups provides evidence as well for

19

Page 21: Fouls in Dutch soccer: A Poisson point process

di�erences in the mean number of fouls per match amongst seasons and home and away teams at a 1 percent signi�cance level. However, based on the statistics in Table 2 the di�erences seem to become smallerover time. In the season 2007-2008, there is a di�erence of 1.395 fouls per match, decreasing to 0.709 for theseason 2013-2014. Di�erences between occurrences in the data are small besides the average. For instance,the maximum number of fouls made by home teams equals 31, while for away teams this is 32. Both sampleshave a minimum number of fouls equalling 3, and the frequencies of both the high and low count of foulsseem quite similar across both samples.

Table 2: Summary statistics number of fouls home vs away teams per match

Home AwaySeason Season No. fouls avg no. std Season No. fouls avg no. std. p-value t-test

2007-2008 4693 15.337 4.415 5120 16.732 4.710 1.714E-042008-2009 4703 15.369 4.347 5078 16.595 4.417 5.800E-042009-2010 4504 14.719 4.435 4804 15.699 4.274 5528E-032010-2011 4397 14.369 4.126 4774 15.601 4.345 3.484E-042011-2012 4105 13.415 4.389 4379 14.310 4.288 1.093E-022012-2013 4053 13.289 4.220 4309 14.082 4.272 2.128E-022013-2014 3827 12.507 3.896 4044 13.216 4.140 2.949E-02

All 30282 14.144 4.379 32508 15.176 4.519 3.776E-14

Table 3: Summary statistics average waiting time home vs away teams per match

Season home std home away std away p-value t-test2007-2008 6.010 1.972 5.496 1.645 4.866E-042008-2009 5.966 1.723 5.556 1.726 3.382E-032009-2010 6.292 2.056 5.831 1.681 2.502E-032010-2011 6.386 2.179 5.919 1.826 4.209E-032011-2012 7.039 2.653 6.466 2.174 3.623E-032012-2013 7.024 2.601 6.583 2.139 2.249E-022013-2014 7.334 2.485 7.095 2.860 2.607E-01

All 6.579 2.300 6.135 2.117 5.740E-11

Summary statistics regarding the average waiting time can be found in Table 3, while the distribution of thewaiting time between fouls can be found in Figure 5. From the �gure, it stands out that the frequency oflower waiting times for away teams is higher compared to the frequency of waiting times of home teams.Furthermore, both show a bell-shaped distribution, however, the peaks for away teams are larger with asmaller tail. For home teams, the peaks are smaller and the tail is longer compared to the away teams. �isseems to be the case for all seasons. All of this could indicate that the average waiting times are smallerfor away teams than for home teams. �is is con�rmed by the summary statistics. For all seasons exceptthe season 2013-2014, the di�erence in average waiting time between home and away teams is signi�cantby the univariate t-test. And an Anova test comparing all seasons and samples shows there is a di�erenceacross seasons as well, signi�cant at a 1 per cent level. Based on eye-balling the summary statistics it can beconcluded that the the waiting times of fouls for home and away teams over the seasons shows an increasingtrend, with an exception of home teams in the season 2008-2009.

20

Page 22: Fouls in Dutch soccer: A Poisson point process

Figure 5: Average waiting time between fouls per match

When inspecting the distribution across minutes in Figure 6 it appears that for both home and away teamsa sort of wave can be observed. �is wave pa�ern seems more present for away teams than for home teamsand can be observed for every distinct season. �ere are some di�erences in the height of the waves as in thelater seasons the peaks are lower compared to the earlier seasons. In relative comparison, the total numberof fouls per minute by home teams is larger till the 20th minute. From that moment in time, the number offouls of away teams seems to be larger than home teams.

Figure 6: Distribution of plots away vs home

21

Page 23: Fouls in Dutch soccer: A Poisson point process

4.3 Match level

In the previous paragraphs teams were analysed separately. We will now investigate the combined numberof fouls by both teams at match level. �e di�erences between both se�ings are small as for instance thedistribution across minutes stays the same as in Figure 4. However, the di�erences are present in the minimumnumber of fouls in a game equalling 9, while the maximum number of fouls in a game equals 56. �is impliesa larger range for the number of fouls compared to our models at team level. Summary statistics regardingthe number of fouls can be found in Table 4. From these observations, it stands out that the mean number offouls is decreasing over time. Performing an Anova test con�rms the di�erences in mean, rejecting the nullhypothesis of equal means at a 1 per cent signi�cance level.

Table 4: Summary statistics fouls made at match level

Season Mean no. fouls Sd. No. fouls Avg wait time std avg wait time2007-2008 32.069 6.828 2.870 0.6592008-2009 31.964 6.917 2.893 0.6822009-2010 30.418 6.435 3.019 0.6562010-2011 29.971 6.225 3.065 0.6882011-2012 27.725 6.892 3.384 0.9762012-2013 27.327 6.538 3.406 0.9182013-2014 25.722 5.848 3.570 0.861

All 29.314 6.906 3.172 0.827

Figure 7: Average waiting time distribution at match level

In addition to a change in the average number of fouls made per match, there is a di�erence in the average timebetween fouls in comparison with the separate teams. Due tho the increased number of fouls, the averagewaiting time has decreased. However, again the bell-shape is present, but the tail is much smaller and theobservations are more centred around the mean. �is seems to be the case for all seasons. From the summarystatistics in Table 4 it can be observed that the average waiting time is increasing over seasons and this is

22

Page 24: Fouls in Dutch soccer: A Poisson point process

con�rmed by an Anova test signi�cant at a 1 per cent level. �is can also be seen in Figure 7, as the peaks forthe di�erent seasons are all slightly at di�erent times, however, do show similar pa�erns.

5 Results

First of all, before discussing the results, we want to stress that each foul is seen as an independent eventduring this process. �at is, there is no direct or causal relation between two distinct fouls within a match andall models make use of the memoryless property of the Poisson process. Furthermore, due to the low numberof events, the parameters have been estimated over the whole population, including all distinct teams withina particular subset of the data. All estimates are limited to occur within 90 minutes, se�ing M to 90 and noevents can occur a�er this. �erefore, it is focused on what happens with the dynamics of the game, howthe di�erent models relate within a game and what happens regarding fouls in those 90 minutes. If there isa di�erence in the number of fouls during the match, or an increased number of expected fouls when moretime has passed since the last foul. And the possible e�ects of inclusion of all previously made fouls.

5.1 Homogeneous Poisson process

First, the homogeneous Poisson point process will be discussed. �is can be seen as a relatively simple modeland can be used as a comparison for the other models. �is model assumes a constant parameter over timeand in fouls, while the point estimates can be obtained using the mean number of fouls made. For this modelwill give an indication for the number of expected fouls per minute.

Table 5: Results homogeneous Poisson process

Season team se(team) home se(home) away se(away) Match se(match)2007-2008 0.178 1.796E-03 0.170 2.482E-03 0.186 2.599E-03 0.356 3.594E-032008-2009 0.178 1.799E-03 0.171 2.493E-03 0.184 2.582E-03 0.355 3.590E-032009-2010 0.169 1.752E-03 0.164 2.444E-03 0.174 2.510E-03 0.338 3.503E-032010-2011 0.167 1.743E-03 0.160 2.413E-03 0.173 2.504E-03 0.333 3.477E-032011-2012 0.154 1.671E-03 0.149 2.326E-03 0.159 2.403E-03 0.308 3.344E-032012-2013 0.152 1.662E-03 0.148 2.325E-03 0.156 2.376E-03 0.304 3.324E-032013-2014 0.143 1.612E-03 0.139 2.247E-03 0.147 2.312E-03 0.286 3.224E-03

All 0.163 6.504E-04 0.157 9.022E-04 0.169 9.373E-04 0.326 1.301E-03Note that all parameters are signi�cant at 1 per cent level

�e results for all di�erent samples can be found in Table 5, starting at a team level with the point estimates incolumn two and the standard errors in column three. It has to be noted that all parameters are signi�cant at a 1per cent level. Besides this high level of signi�cance for all point estimates it stands out that the estimates arebecoming smaller over the seasons. �is seems to be con�rmed with the decrease in the average number offouls made and the increase in the average waiting time. Regarding the interpretation of the point estimates,we �nd that λ(t) = λ. In season 2007-2008 and season 2008-2009 according to the model we expect 0.178fouls to occur in a minute. For season 2009-2010, there is an expected number of 0.169 fouls per minute. Forthe other seasons, the parameters can be interpreted in a similar manner where the point estimate stands forthe number of expected fouls per minute.

Regarding the home and away teams, the point estimates can be found in columns 4 and 6, while the accompa-nying standard errors can be found in columns 5 and 7. Again all estimates are signi�cant at a 1 per cent level,

23

Page 25: Fouls in Dutch soccer: A Poisson point process

while also the trend of decreasing estimates can be seen. Furthermore, there is a large di�erence between thepoint estimates of home and away teams while both have small standard errors. For instance, the di�erencein the expected number of fouls per minute in the season 2007-2008 between home and away teams is 0.016,which within a 90 minute match will result into an expected 0.016 · 90 = 1.440 more fouls made by awayteams compared to home teams. �is di�erence in expected fouls within a match is decreasing over time, as inseason 2013-2014 away teams are expected to make 0.008 · 90 = 0.720 more fouls compared to home teams.�e di�erences in point estimates show that it is likely that there is a di�erence between home and away teams.

�is is con�rmed by the likelihood ratio test. �is test compares the log-likelihood of the combination of homeand away teams with the model without a distinction. �e null hypothesis states that the initial model withoutthe distinction �ts be�er to the data. According to the results in Table 6, we can reject this null hypothesis forall seasons. �is implies that for every season the models with a distinction between teams playing at homeand away �ts be�er to the data than one combined model for both. Based on this we can conclude that thereis indeed a di�erence in teams playing at home or away regarding the expected number of fouls made.

Table 6: Result likelihood ratio test homogeneous model

Season LLfull LLhome LLaway LR p-value2007-2008 -2.674E+04 -1.300E+04 -1.373E+04 18.585 1.625E-052008-2009 -2.669E+04 -1.302E+04 -1.366E+04 14.380 1.494E-042009-2010 -2.586E+04 -1.266E+04 -1.319E+04 9.672 1.871E-032010-2011 -2.561E+04 -1.246E+04 -1.314E+04 15.471 8.378E-052011-2012 -2.435E+04 -1.192E+04 -1.243E+04 8.843 2.942E-032012-2013 -2.411E+04 -1.181E+04 -1.230E+04 7.025 8.039E-032013-2014 -2.318E+04 -1.138E+04 -1.180E+04 5.994E 1.435E-02

All -1.767E+05 -8.632E+04 -9.037E+04 77.653 0

�e interpretation of the models is quite similar to the models at a team level. For instance, a team playing athome in the season 2007-2008 has an expected number of fouls per minute equalling 0.170, while if the sameteam is playing away it would have an expected 0.186 fouls per minute. For the combination of all seasons,a team playing at home is expected to make 0.157 fouls per minute in a match, while the same team playingaway would be expected to make 0.169 fouls per minute. All distinct seasons can be interpreted similarlyregarding the point estimates shown in Table 5.

�e estimated combination of 2 teams within a match present in the last columns of Table 5. Again all param-eters are signi�cant at a 1 per cent level and the point estimates are decreasing over the seasons. Regardingthe interpretation of the estimates, it can be noticed that within a match in the season 2007-2008, 0.356 foulsper minute are expected to occur. �is foul can be made by either the home team or the away team during thematch. For the season 2008-2009, 0.355 fouls are expected to be made per minute. For the other seasons, thepoint estimate equals the number of expected fouls per minutes as well. Overall in a match within our data,0.326 fouls per minute are expected to occur within a match.

As mentioned before all of these models make use of the mean number of fouls from the di�erent samples.Since a team must either be playing at home or away, the estimated rate for a match level must be the sum ofboth the home and away teams except for rounding. Similarly, the arrival rate at a team level must be equalto the average of the home and away fouls and half of the arrival rate of fouls at a match level.

24

Page 26: Fouls in Dutch soccer: A Poisson point process

From this model we can infer that there are di�erences in the expected number of fouls made whether ateam is playing at home or away. Furthermore, it can be concluded that there are di�erences in the expectednumber of fouls made over the seasons. In the later seasons less fouls per minute are expected to be made incomparison to the early seasons for all distinct samples.

5.2 Inhomogeneous Poisson process

�e inhomogeneous Poisson point process estimates of λ(t) are according to the formula: λ(t) = α + β · t,and makes use of the assumption that α > 0 and β ≥ 0. �erefore it relies on the assumption that the numberof fouls made increases during the match.

Team level

Recall that from Figure 2, not a true clear trend can be observed regarding an increasing number of foulsduring the match. Also on average a foul is made at the 46th minute, so slightly a�er half time. �is does notgive a clear view that fouls are made at later times during the match. However, it could still be true that thenumber of fouls increases during the match. �e results for separate teams can be found in Table 7.

Table 7: Results inhomogeneous team-level

Season α se(α) β se(β)2007-2008 1.711E-01∗∗∗ 2.764E-03 1.574E-04∗∗∗ 4.129E-052008-2009 1.752E-01∗∗∗ 3.572E-03 5.191E-05 6.892E-052009-2010 1.622E-01∗∗∗ 3.476E-03 1.507E-04∗∗ 6.762E-052010-2011 1.588E-01∗∗∗ 3.476E-03 1.694E-04∗∗ 6.745E-052011-2012 1.465E-01∗∗∗ 3.310E-03 1.657E-04∗∗ 6.454E-052012-2013 1.426E-01∗∗∗ 3.270E-03 2.119E-04∗∗∗ 6.392E-052013-2014 1.316E-01∗∗∗ 3.176E-03 2.502E-04∗∗∗ 6.245E-05

All 1.556E-01∗∗∗ 1.288E-03 1.628E-04∗∗∗ 2.507E-05*** p<0.01, ** p<0.05, * p<0.1

From Table 7 it stands out that all estimates for α are signi�cant at a 1 per cent level. Besides this almost all ofthe estimates for β are signi�cant at least at a 5 per cent level, except for season 2008-2009. �e signi�cance ofthe parameters implies that the expected number of fouls λ(t) is increasing during the game. �e estimate ofα provides a base level regarding λ(t) in the expected number of fouls. When comparing these estimates overthe seasons it stands out that these are decreasing in general, except for the season 2008-2009. �e estimate ofβ provides the increase per minute in λ(t). Comparing β over the seasons there seems to be a slight increase,indicating that in the later seasons slightly more fouls are made during the game, and the time e�ect is larger.

As λ(t) is not constant anymore and increases over time, λ(t) is di�erent any minute in the game. λ(t) canbe calculated using the formula λ(t) = α + β · t. For the season 2007-2008 this formula equals: λ(t) =0.1711 + 0.0001574 · t, where t represents the minute in the game. For instance in the 34th minute in thisseasons, λ(t) = 0.1711+34·0.0001574 = 0.1764, while in the 68th λ(t) = 0.1711+68·0.0001574 = 0.1818.�erefore we could say that the expected number of fouls in the 34th minute equals 0.1764. While in the 68thminute the estimates of λ(t) equals 0.1818. �is seems a small di�erence in practice, however, this becomeslarger for instance between the 1st and 90th minute and the di�erence in rate leads to 1.272 more fouls

25

Page 27: Fouls in Dutch soccer: A Poisson point process

expected to be made in the entire game. Similar interpretations can be made for the other seasons and theoverall sample. Only the season 2008-2009 is di�erent. Due to the insigni�cance of β, we can not conclude thisestimate is di�erent from zero, and the estimate of λ(t) equals α overall and yields a homogeneous Poissonprocess. In general, we can conclude that at a team level there is an increase in the expected number of foulsper minute made within a match.

Home vs away

According to the homogeneous Poisson point process, there is a signi�cant di�erence between home andaway teams. In Figure 6 a di�erence seems to be present between home and away teams with respect to theaverage waiting time. �erefore both samples are investigated separately and the results for the teams playingat home can be found in Table 8.

Table 8: results inhomogeneous home teams

Season α se(α) β se(β)2007-2008 1.635E-01∗∗∗ 4.911E-03 1.531E-04 9.541E-052008-2009 1.708E-01∗∗∗ 4.962E-03 1.565E-06 9.542E-052009-2010 1.554E-01∗∗∗ 4.835E-03 1.812E-04∗ 9.436E-052010-2011 1.564E-01∗∗∗ 4.835E-03 7.276E-05 9.339E-052011-2012 1.379E-01∗∗∗ 4.579E-03 2.476E-04∗∗∗ 8.986E-052012-2013 1.395E-01∗∗∗ 4.568E-03 1.817E-04∗∗ 8.914E-052013-2014 1.290E-01∗∗∗ 4.422E-03 2.195E-04∗∗ 8.674E-05

All 1.503E-01∗∗∗ 1.789E-03 1.537E-04∗ 3.483E-05*** p<0.01, ** p<0.05, * p<0.1

From Table 8 it can be observed that all the estimates for α are signi�cant at a 1 per cent level, while some ofthe estimates of β are signi�cant at a one, �ve or ten per cent level. For the season 2011-2012, 2012-2013 and2013-2014 the estimates of β are signi�cant at least at a 5 per cent level, while for the season 2009-2010 andthe overall sample the estimates are signi�cant at a 10 per cent level. �e other seasons do not have a relevantlevel of signi�cance, indicating the estimated parameters do not statistically di�er from zero. Furthermoreregarding the estimates of α it can be noticed that this seems to be decreasing over the seasons, while for βno clear increasing or decreasing trend over the seasons can be observed.

As the estimates of α and β can be used to construct our expected number of fouls per minute, λ(t). Forthe seasons where β is not signi�cant or signi�cant at a 10 per cent level, we can not conclude it statisticallydi�ers from zero. �erefore for those seasons, we have to conclude that no time increasing trend in λ(t) canbe observed and our λ(t) equals α. For the seasons where β is signi�cant at least at a 5 per cent level we canconclude that λ(t) = α+ β · t. For instance, in the season 2013-2014 the expected number of fouls in minutet equals λ(t) = 0.1290 + 0.0002195 · t for all t smaller or equal to 90. To compare with the home teams weneed to �t the same models for away teams. �e results from this can be found in Table 9.

26

Page 28: Fouls in Dutch soccer: A Poisson point process

Table 9: Results inhomogeneous away teams

Season α se(α) β se(β)2007-2008 1.784E-01∗∗∗ 4.193E-03 1.669E-04∗∗ 6.559E-052008-2009 1.795E-01∗∗∗ 4.087E-03 1.090E-04 6.240E-052009-2010 1.691E-01∗∗∗ 4.995E-03 1.178E-04 9.688E-052010-2011 1.614E-01∗∗∗ 4.995E-03 2.636E-04∗∗∗ 5.752E-052011-2012 1.556E-01∗∗∗ 4.786E-03 7.728E-05 9.265E-052012-2013 1.456E-01∗∗∗ 4.674E-03 2.422E-04∗∗∗ 9.150E-052013-2014 1.339E-01∗∗∗ 4.557E-03 2.867E-04∗∗∗ 8.991E-05

All 1.605E-01∗∗∗ 1.851E-03 1.809E-04∗∗∗ 3.609E-05*** p<0.01, ** p<0.05, * p<0.1

From Table 9 in general similar results as for the home teams can be observed. All the estimates for α aresigni�cant at a 1 per cent level, while for β signi�cance is changing per season. �e estimates for β in sea-sons 2007-2008, 2010-2011, 2012-2013, 2013-2014 and the combined sample are signi�cant at least at a 5 percent level. However, the estimate for β is not signi�cant in the seasons 2008-2009, 2009-2010 and 2011-2012.Due to the insigni�cance of β for these seasons, we can not conclude there is a time increase in the expectednumber of fouls per minute and the expected number of fouls per minute, equals α. For the seasons whereβ is signi�cant, we can conclude λ(t) = α + β · t for t smaller or equal to 90. For instance for 2010-2011the expected number of fouls in minute t equals λ(t) = 0.1614 + 0.0002636 · t. �e same applies for otherseasons where β is signi�cant.

When comparing the estimates of the models all estimates for α are larger for teams playing away rather thanat home. �is is similar to the homogeneous model where the expected number of fouls for away teams waslarger. Regarding the estimates of β, no direct conclusion can be drawn. For some seasons the estimate ofβ for home teams is larger while for other seasons the estimate of β for away teams is larger. In addition tothe actual comparison of the estimates, the combination of models can also be compared using the likelihoodratio test. According to the results in Table 10, we can conclude that for every season the null hypothesis canbe rejected. �is implies that the model estimating home and away teams separately �ts signi�cantly be�erto the data than one model for the combined data. �erefore, the conclusion can be drawn that there aresigni�cant di�erences in whether a team is playing at home or away regarding the expected number of foulsper minute made.

Table 10: Results likelihood ratio test inhomogeneous models

Season LLfull LLhome LLaway LR p-value2007-2008 -2.674E+04 -1.300E+04 -1.373E+04 18.586 9.207E-052008-2009 -2.669E+04 -1.302E+04 -1.366E+04 15.010 5.503E-042009-2010 -2.585E+04 -1.266E+04 -1.319E+04 9.963 6.864E-032010-2011 -2.561E+04 -1.246E+04 -1.314E+04 17.198 1.843E-042011-2012 -2.435E+04 -1.191E+04 -1.243E+04 10.953 4.184E-032012-2013 -2.411E+04 -1.180E+04 -1.230E+04 7.183 2.755E-022013-2014 -2.318E+04 -1.138E+04 -1.180E+04 6.167 4.581E-02

All -1.767E+05 -8.631E+04 -9.036E+04 78.007 0

27

Page 29: Fouls in Dutch soccer: A Poisson point process

Match level

�ere can be di�erences between a match and team level. �e higher number of fouls could, for instance, leadto a higher estimate of β, implying a higher time increase. Particularly since the distribution of the averagewaiting times is more centred around the mean. �e results for ��ing this model can be found in Table 11.

Table 11: Results inhomogeneous process match level

Season α se(α) β se(β)2007-2008 3.421E-01∗∗∗ 7.112E-03 3.170E-04∗∗ 1.382E-042008-2009 3.502E-01∗∗∗ 6.956E-03 1.087E-04 1.287E-042009-2010 3.246E-01∗∗∗ 6.954E-03 3.004E-04∗∗ 1.353E-042010-2011 3.178E-01∗∗∗ 6.954E-03 3.396E-04∗∗∗ 1.253E-042011-2012 2.934E-01∗∗∗ 6.395E-03 3.271E-04∗∗∗ 1.181E-042012-2013 2.848E-01∗∗∗ 6.301E-03 4.178E-04∗∗∗ 1.166E-042013-2014 2.630E-01∗∗∗ 6.099E-03 5.062E-04∗∗∗ 1.129E-04

All 3.110E-01∗∗∗ 2.497E-03 3.277E-04∗∗∗ 3.609E-05*** p<0.01, ** p<0.05, * p<0.1

First of all, from Table 11 it stands out that except for the estimates for β in the seasons 2007-2008 and 2008-2009 all the estimates are signi�cant at a 1 per cent level. �e estimate for β in 2007-2008 is signi�cant at a5 per cent level, while the estimate in 2008-2009 is not signi�cant. Similar as in all previously investigatedmodels the estimates for α are decreasing over the seasons, while for β no clear trend regarding the size of theestimates is clear. As the expected number of fouls per minute equals λ(t) = α+β ·t, we can use the estimatesto construct this rate. For instance, for the seasons 2007-2008 we have that λ(t) = 0.342 + 0.003170 · t for tsmaller than or equal to 90. �en this rate equals the expected number of fouls per minute at minute t. Such aformula can be constructed for all seasons and be interpreted similarly except for season 2008-2009. For thisseason we can not conclude that the estimate for β is di�erent from zero and our expected number of foulsper minute equals 0.3502.

So overall, there seems to be evidence for an increase in the expected number of fouls made throughout thegame for both teams and matches. Furthermore, there seems to be a decreasing trend in the estimates of α andan increasing trend for β in both team and match samples. For home and away teams the increase throughoutthe game are less present, there are certain seasons where the point estimates for β are not signi�cant, notshowing the time increasing trend. However, based on the likelihood ratio test it can be concluded that thedistinction between home and away teams �ts be�er to the data than estimating one combined model andthere are di�erences between home and away teams. �is can be seen in the di�erences in the estimates of α.

5.3 Time di�erence Poisson process

Recall that for this model we assume that the rate λ(t) = τ + ψ · 1Ti<t<Ti+1 exp(−γ(t− Tj), and thereforeonly takes in account the time that has passed since the last foul that has occurred. And therefore this modelis depend on the waiting time between fouls. �is model will be estimated under the restrictions that τ > 0,ψ ≥ 0, regarding γ there are no restrictions. To construct the actual rate of expected fouls we need to knowwhen the last foul has occurred.

28

Page 30: Fouls in Dutch soccer: A Poisson point process

Team level

First of all we start with modelling this for single teams within a match. �e result for this can be found inTable 12.

Table 12: Results time di�erence model

Season τ se(τ ) ψ se(ψ) γ se(γ)2007-2008 1.508E-01∗∗∗ 5.102E-03 2.198E-02∗∗∗ 4.533E-03 -5.067E-02∗∗∗ 7.153E-032008-2009 1.631E-01∗∗∗ 5.002E-03 1.171E-02∗∗∗ 4.177E-03 -4.676E-02∗∗∗ 8.744E-032009-2010 1.551E-01∗∗∗ 4.798E-03 1.099E-02∗∗∗ 4.025E-03 -4.866E-02∗∗∗ 9.778E-032010-2011 1.480E-01∗∗∗ 4.951E-03 1.514E-02∗∗∗ 4.372E-03 -4.403E-02∗∗∗ 8.366E-032011-2012 1.426E-01∗∗∗ 4.806E-03 9.885E-03∗∗ 4.254E-03 -3.485E-02∗∗∗ 1.036E-022012-2013 1.407E-01∗∗∗ 4.501E-03 9.288E-03∗∗ 3.823E-03 -4.154E-02∗∗∗ 9.676E-032013-2014 1.268E-01∗∗∗ 4.219E-03 1.310E-02∗∗∗ 3.620E-03 -4.111E-02∗∗∗ 6.921E-03

All 1.496E-01∗∗∗ 1.818E-03 1.135E-02∗∗∗ 1.5616E-03 -4.120E-02∗∗∗ 3.4837E-03*** p<0.01, ** p<0.05, * p<0.1

Table 12 shows that all parameters are signi�cant at least at a 5 per cent level. Furthermore, it can be observedthat all estimates for the distinct parameters are of similar size. To interpret the parameters the formula forλ(t) = τ + ψ · 1Ti<t<Ti+1 exp(−γ · (t − Tj) needs to be used. �e estimate of τ provides a base level, thisis the minimum level and is the expected rate of fouls per minute until the �rst foul has been made. �eestimate of ψ tells us the increase in the rate a�er the �rst foul has been made, and this will be scaled by theterm exp(−γ · (t−Tj)), where Tj equals the time of occurrence of the previous foul. �e negative parameterestimate of γ for all seasons tells us that the higher the waiting time, the higher the expected number of foulsper minute are. Furthermore, it can be noticed that except for season 2007-2008 there is a decreasing trendover τ across seasons, implying that on average fewer fouls are made.

As an example for interpretation of λ(t), we take season 2007-2008. Until the �rst foul has been made weexpect 0.1508 fouls per minute, right at the moment the �rst foul has been made we expect a minimum of0.1508 + 0.02918 · exp(0.05067 · 0) = 0.17278 fouls to be made per minute. �e �rst minute a�er a foulour model expects 0.1508 + 0.02918 · exp(0.05067 · 1) = 0.1739 fouls to be made per minute, the secondminute 0.1508 + 0.02918 · exp(0.05067 · 2) = 0.1751, while for our average waiting time, the 7th minute,0.1508 + 0.02918 · exp(0.05067 · 7) = 0.1821 are expected to be made per minute. In a similar manner theexpected fouls for every minute can be calculated, given the time that has passed since the last foul is known.For other seasons the model can be interpreted similarly, and it holds that more fouls per minute are expectedto be made the longer has passed since the last foul.

Home vs away

In the previous models, both showed di�erences in the expected number of fouls in whether a team wasplaying at home or playing away. Combining this with the statistical di�erences between the average waitingtimes in our sample, we will distinguish a di�erence between them as well. �e results for a team playing athome can be found in Table 13.

29

Page 31: Fouls in Dutch soccer: A Poisson point process

Table 13: Results time di�erence model home teams

Season τ se(τ ) ψ se(ψ) γ se(γ)2007-2008 1.477E-01∗∗∗ 6.927E-03 1.817E-02∗∗∗ 6.061E-03 -4.899E-02∗∗∗ 1.064E-022008-2009 1.639E-01∗∗∗ 7.009E-03 5.708E-03 5.803E-03 -3.960E-02∗∗ 1.847E-022009-2010 1.486E-01∗∗∗ 6.725E-03 1.208E-02∗∗ 5.766E-03 -4.553E-02∗∗∗ 1.297E-022010-2011 1.444E-01∗∗∗ 6.614E-03 1.221E-02∗∗ 5.705E-03 -4.637E-02∗∗∗ 1.299E-022011-2012 1.335E-01∗∗∗ 6.995E-03 1.440E-02∗∗ 6.634E-03 -2.495E-02∗∗ 1.158E-022012-2013 1.382E-01∗∗∗ 6.767E-03 8.363E-03 6.074E-03 -2.968E-02∗∗ 1.511E-022013-2014 1.244E-01∗∗∗ 5.253E-03 1.052E-02∗∗ 4.093E-03 -5.413E-02∗∗∗ 1.007E-02

All 1.438E-01∗∗∗ 6.927E-03 1.115E-02∗ 6.061E-03 -3.900E-02∗∗∗ 1.064E-02*** p<0.01, ** p<0.05, * p<0.1

From Table 13 it can be noticed that all estimates for τ and γ are signi�cant at least at a 5 per cent level. For theestimates for ψ the signi�cance depends on the season. Several seasons provide signi�cant estimates, exceptfor seasons 2008-2009, 2012-2013 and the combined sample. Regarding the seasons where ψ is signi�cant, wecan conclude that the expected number of fouls per minute increases the more time has passed since the lastfoul. For the season where ψ is not signi�cant it can not be concluded there is any e�ect regarding the timethat has passed since the previous foul. Although the scaling factor for this e�ect, γ is signi�cant, it can not besaid that ψ is statistically di�erent from zero and therefore there is no e�ect on the time passed since the lastfoul. As such, the expected number of fouls per minute stays constant over the whole match. Furthermore,in this case, a decrease of τ over the seasons is less present in comparison to other models. Continuing withteams playing away, their results can be found in Table 14.

Table 14: Results time di�erence model away teams

Season τ se(τ ) ψ se(ψ) γ se(γ)2007-2008 1.533E-01∗∗∗ 7.433E-03 2.571E-02∗∗∗ 6.662E-03 -5.575E-02∗∗∗ 9.949E-032008-2009 1.606E-01∗∗∗ 6.888E-03 1.778E-02∗∗∗ 5.692E-03 -6.199E-02∗∗∗ 1.064E-022009-2010 1.616E-01∗∗∗ 6.730E-03 9.947E-03∗ 5.493E-03 -5.248E-02∗∗∗ 1.471E-022010-2011 1.510E-01∗∗∗ 7.301E-03 1.864E-02∗∗∗ 6.551E-03 -4.307E-02∗∗∗ 1.073E-022011-2012 1.487E-01∗∗∗ 6.289E-03 7.930E-03 5.162E-03 -4.971E-02∗∗∗ 1.644E-022012-2013 1.422E-01∗∗∗ 5.915E-03 1.009E-02∗∗ 4.742E-03 -6.103E-02∗∗∗ 1.415E-022013-2014 1.266E-01∗∗∗ 6.583E-03 1.828E-02∗∗∗ 6.188E-03 -2.856E-02∗∗∗ 9.052E-03

All 1.507E-01 2.642E-03 1.466E-02∗∗∗ 2.297E-03 -4.425E-02∗∗∗ 4.345E-03*** p<0.01, ** p<0.05, * p<0.1

First of all, from Table 14, it stands out that again all estimates for τ and γ are signi�cant at a 5 per centlevel, while most of the estimates of ψ are signi�cant at this level as well. �e exceptions are the estimatesfor season 2009-2010 and season 2011-2012 respectively signi�cant at 10 per cent level and not signi�cant ata relevant level. Similar as for the home teams for most of the seasons the expected number of fouls increaseswhen more time has passed since the previous foul. Only for seasons 2009-2010 and 2011-2012, this is not thecase and the number of expected fouls per match stays constant during the match and does not depend onthe time that has passed since the last foul.

Comparing the results from both samples it can be noticed that there are seasons which do not show an in-teraction e�ect with time. Furthermore, except for season 2008-2009, the estimates for τ are larger for away

30

Page 32: Fouls in Dutch soccer: A Poisson point process

teams. Regarding the estimates of ψ, there is some �uctuation between them. For some seasons the estimatesfor away teams are larger and vice versa. Furthermore, both have some seasons that show non-signi�cantestimates in altering seasons. Comparing the estimates of γ, it can be noticed that except for season 2013-2014the estimates for away teams are larger. �is could imply that the e�ect of the time passed since a foul is morerelevant for away teams. However, most of these estimates are within range of one standard error, indicatingthe estimates are close, while the full e�ect of the time increase depends on ψ as well, which mostly are alsowithin range of one standard error.

Furthermore, from the results obtained by the likelihood ratio test in Table 15, it can be concluded that there isa di�erence in the expected number of fouls whether a team plays at home or away. For all investigated casesthis test replies a p-value smaller than 0.05, indicating we reject the null hypothesis that the model withouta distinction between playing at home and away �ts the data be�er. As the null hypothesis is rejected it canbe concluded that there are di�erences in the expected number of fouls per minute whether a team plays athome or away.

Table 15: Results likelihood ratio test time di�erence

Season LLfull LLhome LLaway LR p-value2007-2008 -2.672E+04 -1.299E+04 -1.372E+04 22.196 5.938E-052008-2009 -2.669E+04 -1.301E+04 -1.365E+04 36.185 6.842E-082009-2010 -2.585E+04 -1.266E+04 -1.319E+04 10.064 1.803E-022010-2011 -2.560E+04 -1.246E+04 -1.313E+04 16.715 8.087E-042011-2012 -2.435E+04 -1.192E+04 -1.243E+04 10.618 1.398E-022012-2013 -2.411E+04 -1.180E+04 -1.230E+04 12.141 6.914E-032013-2014 -2.317E+04 -1.137E+04 -1.180E+04 9.881 1.960E-02

All -1.767E+05 -8.630E+04 -9.034E+04 86.301 0

Match level

As this model mostly relies on the average waiting time, the results are likely to be di�erent compared to theresults obtained by at team level. �e result a�er ��ing the models can be found in Table 16.

Table 16: Results time di�erence model match

Season τ se(τ ) ψ se(ψ) γ se(γ)2007-2008 2.521E-01∗∗∗ 1.136E-02 8.023E-02∗∗∗ 9.770E-03 -1.004E-01∗∗∗ 7.712E-032008-2009 2.476E-01∗∗∗ 1.075E-02 8.019E-02∗∗∗ 9.067E-03 -1.102E-01∗∗∗ 7.487E-032009-2010 2.492E-01∗∗∗ 1.107E-02 7.167E-02∗∗∗ 9.420E-03 -8.109E-02∗∗∗ 6.247E-032010-2011 2.222E-01∗∗∗ 1.010E-02 8.466E-02∗∗∗ 8.651E-03 -1.008E-01∗∗∗ 6.652E-032011-2012 2.364E-01∗∗∗ 1.094E-02 5.814E-02∗∗∗ 9.591E-03 -7.345E-02∗∗∗ 7.987E-032012-2013 2.373E-01∗∗∗ 1.042E-02 5.253E-02∗∗∗ 8.961E-03 -7.905E-02∗∗∗ 8.270E-032013-2014 2.059E-01∗∗∗ 9.426E-03 6.114E-02∗∗∗ 8.082E-03 -8.617E-02∗∗∗ 7.203E-03

All 2.391E-01∗∗∗ 4.062E-03 6.844E-02∗∗∗ 3.495E-03 -8.511E-02∗∗∗ 2.736E-03

First of all, it can be noticed from Table 16 that all estimates are signi�cant at least at a 1 per cent level. �isimplies that in every season the expected number of fouls increases as more time has passed since the last foulwithin a match. Furthermore, in comparison with single teams, the estimates of all parameters are larger. For

31

Page 33: Fouls in Dutch soccer: A Poisson point process

this case, the same interpretation holds as for single teams. At the start of the match, the expected number offouls per minute equals τ , while a�er a �rst foul is made by either of the teams it increases with ψ directly.�en a�er some time has passed this increases by a factor equal to exp(−γ · (t− Tj)), where Tj denotes thetime of the last foul. For instance, for season 2007-2008, the match starts with an expected number of 0.2521fouls per minute by either of the teams. �en, a�er the �rst foul has occurred the expected number of fouls perminute increases to 0.33233. If one minute later no foul has occurred, the expected number of fouls increasesto 0.3408, a�er 2 minutes without a foul it increases to 0.3502, a�er 3 minutes 0.3605 and a�er 10 minuteswithout a foul, we expected 0.4312 fouls in the next minute. All the expected fouls per minute di�er with re-spect to the time that has passed and all can be calculated by using the formula λ(t) = τ+ψ·exp(−γ ·(t−Tj)).�is formula can be used for all seasons and will yield the expected number of fouls to be made in minute t,given Tj is known.

Overall, both for separate teams within a match and matches itself, it can be concluded that the expectednumber of fouls per minute increases when the waiting time is larger and the time since the last foul is larger.Furthermore, the �rst foul in a match can be seen as some sort of starting point for more fouls to be made,both for teams and matches. A�er this �rst foul has occurred, the expected number of fouls increases by arelatively large amount, and this could imply that at the start of the match players are somewhat reluctant tomake the �rst foul.

Also, comparing the di�erences to whether a team is playing at home or away it can be concluded there areseveral di�erences. In models for both samples, there are some insigni�cant parameters for the increase overtime. �e estimates of ψ and γ are furthermore of similar size and lie mostly within one standard deviation.�ere are greater di�erences for the parameter τ . By the likelihood ratio test, it can be concluded that ateam playing away are expected to make more fouls compared to the same the same team playing at home.However, the e�ects in the di�erence since the last foul seem similar for teams playing at home and teamsplaying away.

5.4 Self-exciting Poisson process

First of all note that for the self-exciting Poisson process the rate λ(t) is estimated using the following formula:λ(t) = τ +ψ ·

∑Tj<t

exp(−γ(t−Tj)). So to obtain the value of λ(t) at a certain time t we need to know thepreviously made fouls up till time t for that particular team. When these are known we can construct our rateλ(t) and use this for the expected number of fouls per minute. �is model is similar as the time dependentmodel more in�uenced by the waiting time between fouls, as for the latest added term, the waiting time sincethe last foul, is included.

Team level

Starting at a team level, the arrival rate of fouls of individual teams will be investigated �rst. �e results canbe found in Table 17.

32

Page 34: Fouls in Dutch soccer: A Poisson point process

Table 17: Results self-exciting Poisson process

Season τ se(τ ) ψ se(ψ) γ se(γ)2007-2008 1.680E-01∗∗∗ 3.099E-03 1.157E-03∗∗∗ 4.273E-04 2.140E-05 9.043E-032008-2009 1.729E-01∗∗∗ 6.381E-03 5.900E-04∗ 3.334E-04 1.242E-10 4.371E-022009-2010 1.607E-01∗∗∗ 3.146E-03 1.107E-03∗∗∗ 3.849E-04 8.561E-06 1.021E-022010-2011 1.584E-01∗∗∗ 3.041E-03 1.104E-03∗∗∗ 4.106E-04 6.829E-06 1.017E-022011-2012 1.431E-01∗∗∗ 2.792E-03 1.605E-03∗∗∗ 4.934E-04 8.850E-05 7.604E-032012-2013 1.408E-01∗∗∗ 2.786E-03 1.686E-03∗∗∗ 4.884E-04 3.350E-05 8.140E-032013-2014 1.318E-01∗∗∗ 2.700E-03 1.777E-03∗∗∗ 4.896E-04 1.055E-05 7.871E-03

All 1.535E-01∗∗∗ 1.095E-03 1.572E-03∗∗∗ 1.742E-04 4.354E-07 3.036E-03*** p<0.01, ** p<0.05, * p<0.1

When inspecting the point estimates in Figure 17, �rst of all, it can be noticed that all estimates for τ aresigni�cant at a 1 per cent level. Furthermore, except for season 2009-2010, all estimates for ψ are signi�cantat a 1 per cent level, while none of the parameters for γ is signi�cant. Regarding the estimates themselves,a decrease over season for τ can be observed. While for ψ in general, an increase over the seasons can beobserved. �e estimates for γ are due to their insigni�cance not statistically di�erent from zero. Combiningthis with our previously mentioned formula the expected number of fouls at time t does not depend on thetime that has passed since the previous foul. However, since ψ is signi�cant, λ(t) does depend on the numberof previous fouls made. For example in season 2007-2008, at the start of the match the expected number offouls per minute equals τ = 0.168, then a�er the �rst foul has occurred this increases with ψ = 0.00157and 0.16957 fouls are expected to be made per minute. A�er another foul occurring the expected numberof fouls increases with 0.00157 again. So we can conclude that λ(t) = τ + ψ ·

∑Tj<t

1. �is since theexp(−0(t − Tj) = exp(0) = 1. �is formula can be applied to all season except for season 2008-2009 andprovides us with the information that the number of expected fouls per minute depends on the number offouls made until time t in the match. For season 2008-2009, we can only conclude that the expected numberof fouls per minute equals τ .

Home vs away

As mentioned before there is a statistical di�erence in the mean number of fouls made between teams playingat home and away according to the previously used models. �erefore, it is likely that there is a di�erencebetween home and away teams for this model as well and to both a model will be �t. First, the results of homeplaying teams will be interpreted in Table 18.

33

Page 35: Fouls in Dutch soccer: A Poisson point process

Table 18: Results home playing team

Season τ se(τ ) ψ se(ψ) γ se(γ)2007-2008 1.616E-01∗∗∗ 4.439E-03 1.157E-03∗∗ 5.750E-04 1.955E-05 1.503E-022008-2009 1.676E-01∗∗∗ 4.320E-03 4.094E-04 4.635E-04 3.247E-15 8.544E-052009-2010 1.530E-01∗∗∗ 4.159E-03 1.465E-03∗∗ 6.052E-04 1.630E-05 1.120E-022010-2011 1.552E-01∗∗∗ 4.944E-03 7.006E-04 5.559E-04 3.649E-03 2.463E-022011-2012 1.344E-01∗∗∗ 3.844E-03 2.243E-03∗∗∗ 8.047E-04 4.339E-05 8.762E-032012-2013 1.373E-01∗∗∗ 3.866E-03 1.590E-03∗∗ 7.114E-04 2.532E-05 1.235E-022013-2014 1.297E-01∗∗∗ 4.206E-03 1.503E-03∗∗ 6.069E-04 5.860E-06 1.500E-02

All 1.470E-01∗∗∗ 1.524E-03 1.464E-03∗∗∗ 2.430E-04 4.601E-06 4.461E-03*** p<0.01, ** p<0.05, * p<0.1

�e results in Table 18 show that all parameters of τ are signi�cant at a 1 per cent level. Furthermore, most ofthe estimated parameters for ψ are signi�cant at least at a 5 per cent level. However, for the seasons 2008-2009and 2010-2011 there is no evidence the parameters statistically di�er from zero, and the same applies for allestimates for γ, as none of these parameters are signi�cant at a relevant level. From these signi�cance levels,we can conclude that for the majority of the di�erent seasons the expected number of fouls per minute isdetermined by the previous number of fouls and there is no impact on the time that has passed since the lastfoul. When it comes to interpreting the parameters, we can conclude that for instance, in season 2007-2008,0.1616 fouls per minute are expected, while a�er every foul that has been made this increases by 0.00157.Similar interpretations hold for the other seasons where ψ is signi�cant. For the other seasons according tothe model, the estimate of τ is the expected number of fouls per minute. For instance, in season 2008-2009,0.1676 fouls per minute are expected to be made. From the estimates of τ it stands out these are not alldecreasing over time, as was in most of the previous models. While no clear trend can be observed in theestimates of ψ either over the seasons.

Table 19: Results away playing team

Season τ se(τ ) ψ se(ψ) γ se(γ)2007-2008 1.754E-01∗∗∗ 4.474E-03 1.269E-03∗∗ 6.020E-04 1.900E-05 1.255E-022008-2009 1.796E-01∗∗∗ 8.965E-03 6.982E-04 4.985E-04 2.020E-05 5.480E-022009-2010 1.690E-01∗∗∗ 5.981E-03 6.949E-04 4.641E-04 3.630E-06 2.810E-022010-2011 1.625E-01∗∗∗ 4.366E-03 1.436E-03∗∗ 6.549E-04 3.824E-04 1.336E-022011-2012 1.523E-01∗∗∗ 4.163E-03 9.475E-04∗ 5.692E-04 8.155E-06 1.548E-022012-2013 1.448E-01∗∗∗ 4.150E-03 1.783E-03∗∗∗ 6.885E-04 1.753E-03 1.211E-022013-2014 1.342E-01∗∗∗ 3.820E-03 1.971E-03∗∗∗ 7.525E-04 2.849E-06 9.862E-03

All 1.576E-01∗∗∗ 1.584E-03 1.483E-03∗∗∗ 2.464E-04 5.243E-06 4.412E-03*** p<0.01, ** p<0.05, * p<0.1

�e results for away teams can be found in Table 19. All estimates for τ are again signi�cant at a 1 per centlevel. Furthermore, most of the estimates forψ are signi�cant at least at a 5 per cent level, only for season 2008-2009 and 2009-2010 this estimate is not the case. For the estimates of γ none of the seasons has an estimatesigni�cant at a relevant level. Due to the signi�cance of the di�erent parameters, it can be concluded thatthe expected number of fouls is dependent on the previous number of fouls made for most seasons. For theseseasons the expected number of fouls equals τ and increases by ψ for every foul made. �ere is no evidencefor an increase in the expected number of fouls per minute shortly a�er a foul compared to when more time

34

Page 36: Fouls in Dutch soccer: A Poisson point process

has passed. For the seasons 2008-2009 and 2009-2010 this is not the case, and the number of expected foulsper minute is constant over time equalling the estimate of τ . Regarding those estimates except for season2008-2009 the decreasing trend over the seasons can be seen again, while for ψ no clear trend is found.

Table 20: LR test self exciting model

Season LLfull LLhome LLaway LR p-value2007-2008 -2.673E+04 -1.299E+04 -1.373E+04 16.580 8.623E-042008-2009 -2.668E+04 -1.301E+04 -1.366E+04 13.728 3.300E-032009-2010 -2.585E+04 -1.265E+04 -1.319E+04 10.274 1.638E-022010-2011 -2.561E+04 -1.246E+04 -1.314E+04 15.055 1.770E-032011-2012 -2.434E+04 -1.191E+04 -1.243E+04 11.427 9.626E-032012-2013 -2.410E+04 -1.180E+04 -1.230E+04 5.588 1.335E-012013-2014 -2.317E+04 -1.138E+04 -1.179E+04 5.486 1.395E-01

All -1.781E+05 -8.629E+04 -9.034E+04 3025.796 0

Comparing the models it stands out that all estimates for τ are signi�cant and seem to di�er. �e di�erencesin the estimates of ψ are small and lie within a range of 1 standard error. �erefore the di�erences are smalland we can not conclude that there is a large di�erence between them. Furthermore, it can be tested usingthe likelihood ratio test whether one model for the overall sample �ts be�er or the 2 di�erent models providea be�er representation of the data. �e results from this test can be found in Table 20. According to thep-values, we can conclude that for the seasons 2007-2008 up till and including 2011-2012 and the combinedsample the distinction between home and away teams �ts be�er to the data. For modelling, it is be�er tomake a distinction between teams playing at home or away. For the seasons 2012-2013 and 2013-2014, we failto reject the null hypothesis that the separate model �ts be�er to the data. Based on this we can not concludethat for these seasons there is a di�erence between home and away teams.

Match level

�e estimated parameters at a match level can be found in Table 21. In comparison to the data at a team level,it is most likely that the estimate of τ is larger due to the large number of fouls. Furthermore, the estimatesof ψ could possibly be larger due to possible reactions of opponents.

Table 21: Results self homogeneous self exciting process match level

Season τ se(τ ) ψ se(ψ) γ se(γ)2007-2008 3.375E-01∗∗∗ 6.502E-03 1.187E-03∗∗ 4.687E-04 1.782E-05 1.020E-022008-2009 3.434E-01∗∗∗ 7.734E-03 7.416E-04 3.712E-04 2.223E-06 1.821E-022009-2010 3.224E-01∗∗∗ 6.720E-03 1.124E-03∗∗ 4.632E-04 2.653E-03 1.187E-022010-2011 3.176E-01∗∗∗ 6.348E-03 1.045E-03∗∗ 4.544E-04 7.002E-05 1.107E-022011-2012 2.835E-01∗∗∗ 5.915E-03 1.801E-03∗∗∗ 5.829E-04 3.530E-05 7.390E-032012-2013 2.796E-01∗∗∗ 5.852E-03 1.842E-03∗∗∗ 5.911E-04 8.204E-04 8.496E-032013-2014 2.640E-01∗∗∗ 5.678E-03 1.744E-03∗∗∗ 5.653E-04 7.317E-06 8.562E-03

All 3.027E-01∗∗∗ 1.095E-03 1.587E-03∗∗∗ 1.742E-04 3.489E-10 3.036E-03*** p<0.01, ** p<0.05, * p<0.1

From Table 21 it can �rst be noticed that all estimates for τ are signi�cant at a 1 per cent level. Except for

35

Page 37: Fouls in Dutch soccer: A Poisson point process

season 2008-2009 all estimates for ψ are signi�cant at least at a 5 per cent level. Furthermore, none of theestimated parameters for γ are signi�cant at a level of interest. �erefore, similar to analysis at a team levelwe can not conclude that there is a di�erent e�ect when more times has passed a�er the previous fouls, com-pared to shortly a�er the foul. What can be concluded based on the estimates is that the expected numberof fouls per minute increases a�er every foul that has been made. For example in season 2007-2008 0.3375fouls are expected to be made per minute, a�er every foul that has been made this increases by 0.001187, soa�er the �rst foul 0.338687 fouls are expected to be made, while a�er 15 fouls, 0.355305 fouls are expected tobe made per minute. Regarding the other seasons, similar interpretations apply except for season 2008-2009.In that season there is no evidence the rate is increasing with the number of fouls, and for that seasons thenumber of expected fouls per minute equals 0.3434 for the entire match.

Overall from the self-exciting model, we can conclude that there is no di�erence in the expected numberof fouls with regard to the time di�erence to the previously made fouls. What can be concluded is thatthe expected number of fouls increases by every foul made. �is holds both at a team and a match level.Furthermore, there is a signi�cant di�erence in the expected number of fouls made for most of the seasons.However, this di�erence does not seem to lie in the self exciting term, but the baseline of the expected numberof fouls, particularly as the self-exciting term is not signi�cant for all di�erent seasons.

5.5 Fouls within a match

Until now the interpretation was mostly considering the number of expected fouls within a minute. However,a match takes 90 minutes and in this multiple fouls occur. �e models rely on di�erent assumptions regardingthe rate of λ(t). For instance, the homogeneous model assumes that the expected number of fouls stays thesame throughout the match. So for a single match within season, the expected number of fouls within a matchequals E(N) = λ(t) · 90. For a random team in our sample this equals 0.163 · 90 = 14.670. So for a team ingeneral we expect 14.670 fouls to be made not knowing whether it plays at home or away. For home teamsin this season, we expect a team to make 14.130 fouls in a match, while for a team playing away in, general,we expect a team to make 15.210 fouls. While combined we expect 29.340 to occur within a game.

From the inhomogeneous model, we could conclude that the expected number of fouls per minute increasesover time. A general idea for the expected fouls per minute for a team can be seen in Figure 8. As this isonly related at a minute level within a single match this is harder to interpret. In this se�ing, a match last 90minutes, based on the formula the highest expected number of fouls occurs in the 90th minute. Overall withina match we can expectE(N) = 90 ·α+

∑90i=1 i ·β to occur within a match, either for teams in general, home

teams, away teams and for matches. For instance, for a single team in we can expect 14.670 fouls to be madein this match. When we know this team is playing at home this decreases to 14.156, while if this team wasplaying away 15.186 fouls would be expected to be made by this team during a match. Within a single game,the model expects 29.332 fouls to be made.

When taking into account only the last foul that has been made, it can be concluded that the expected numberof fouls increases �rst a�er the occurrence of the �rst foul within a game. A�er this occurrence, the expectednumber of fouls stays at this base level and increases if the waiting time between fouls increases. A�er oc-currence of a foul the expected number drops to the level a�er the �rst foul, to increase again when moretime has passed since that particular foul. For modelling this, the occurrences of fouls need to be known, andtherefore a general conclusion can not be made as this varies per match. Assuming a foul occurs a�er ouraverage waiting time, every 6.357 minutes, we can model this. �e rate throughout the game can be seen inFigure 8. �is shows a clearer view of what happens regarding the increase in the expected number of fouls

36

Page 38: Fouls in Dutch soccer: A Poisson point process

Figure 8: Expected number of fouls over time various models

a�er a foul. According to our model it expects 14.564 fouls to be made within the match. If it is known ateam plays at home, due to the insigni�cance of ψ we can expect 12.942 fouls. For a team playing away this is14.671, while during a match we can expect 28.279 fouls. Which shows a rather large bias for a team playingat home if we compare it to the mean number of fouls.

Regarding the self-exciting model, this is somewhat di�erent. From the interpretation, it turns out that thenumber of previously made fouls is important for the expected number of fouls. According to the summarystatistics on average 14.660 fouls are made within a match. Assuming these are uniformly distributed acrossthe match, a plot of the expected fouls per minute can be found in Figure 8. Assuming this distribution we cancalculate the expected fouls equalling 14.671. For teams playing at home, this comes down to 14.173 foulsassuming a uniform distribution in the mean number of fouls per team, while for home teams it comes downto 15.209 fouls per match. For a match in general, we expect 29.358 fouls to be made.

Note that for the last 2 models we assume that fouls occur uniformly during the match the match to make apossible prediction. Furthermore, the mean number of fouls occur however this is an average match. Bothassumptions are violated easily, and more fouls at the beginning of the match or in general would increasethe number of expected fouls for the self-exciting model. While a larger time between fouls would increasethe expected number of fouls for the time di�erence model due to the higher peaks. �e expected number offouls in a match heavily relies on the moment when a foul occurs within the match and the total number offouls. While for the time di�erence model on the waiting time between fouls.

Overall, all di�erent models more or less predict the same number of fouls to occur within a match. �edi�erence between them lies at the moment when the fouls are occurring. �e homogeneous model assumesthey occur uniformly distributed throughout the game. �e inhomogeneous model assumes these occur morenear the end of the game. �e time di�erence model tells us that more fouls are expected when the waitingtime since the last foul is larger. While the self-exciting model tells us that the expected number of fouls isincreasing a�er a foul has been made.

37

Page 39: Fouls in Dutch soccer: A Poisson point process

6 Conclusion

From the results, it can be concluded that the number of fouls made is increasing over time, for all possiblesamples. �is implies that the model expects more fouls to be made made while the match continues. Andfor instance, we expect more fouls to be made in the 30th minute compared to the 75th. �e time di�erencemodel tells us that only considering the last foul, the expected number of fouls increases when the waitingtime since the last foul becomes larger. Furthermore, from the self-exciting model, it can be concluded thatthere is an increase in the expected number of fouls a�er a foul has occurred and the number of previouslymade fouls are related to the expected number of fouls to follow. �is e�ect seems to be lasting and increasesby a certain rate for every foul. Regarding the timing between a foul and the expected number of fouls, noclear relationship can be found. �ere is no evidence that the expected number of fouls per minute increasesshortly a�er a foul has been made according to this model. �erefore, it can be concluded that for every foul,more fouls are expected to be made in the remainder of the match. �is e�ect seems present for teams ingeneral and at match level. Furthermore, comparing the point estimates over the seasons it can be concludedthat the expected number of fouls decreases over the seasons and fewer fouls are made in the later seasons.

When making a distinction between home and away teams we can conclude by the likelihood ratio test thatthere are di�erences in the expected number of fouls made when a team plays at home or away. �e estimatesprovide us with information that a team playing away is expected to make more fouls than a team playing athome. �is is present in the homogeneous model, the inhomogeneous model, the time di�erence model andthe self-exciting model. �is di�erence is mainly due to the starting rates of the model and not in the increaseover time or a�er a foul has been made. Teams playing away have a larger starting rate compared to a teamplaying at home and therefore the expected number of fouls to be made is higher. �e models do not imposesigni�cant estimates for all of the seasons regarding the time-related parameter, and neither the parameterfor the number of fouls is signi�cant for all seasons. Both for teams playing at home and teams playing awaythere are multiple seasons where the e�ect is less signi�cant.

7 Discussion

�is research �eld is relatively unknown and therefore there are a lot of possible implications to add. Oneof the �rst could be the time-span of the data. �e data set is relatively old, while the rules of soccer changeslightly every season. Also, soccer is constantly innovating with the introduction of the video assistant ref-eree. �is video assistant referee can intervene in the match when a referee has made a large mistake. �iscould be for instance missing a potential red card, or a foul made just before a goal has been scored. �ischanges the dynamics within a game, as fouls that would have been missed previously are know seen by thevideo assistant referee. It could scare players and possibly decrease the number of fouls made, combined withthe decreasing trend in fouls made by a team during the season, shown in both the data and the results. Andfor current soccer the results might be somewhat di�erent.

Other possible implications could be including the seasonal ranking at the moment the match occurs or thescore in the match at the moment of the foul. A team struggling to score points in the season or defendinga possible draw or small lead could be more tempted to make fouls in general or near the end of the game.Using a di�erent speci�cation of λ(t) this can possibly be introduced.

Furthermore, in the data description a wave pa�ern was noticed in Figure 4 and in Figure 6. In the inhomo-geneous model a linear speci�cation was used to capture these e�ects. Due to this wave pa�ern a possible

38

Page 40: Fouls in Dutch soccer: A Poisson point process

non-linear speci�cation could provide a be�er �t to the data compared to our linear model speci�cation.

What could be of further point of interest is the playing style a team has. A more defensive team couldmake fewer fouls due to teams players are closes to each other and can help their teammates. While a teamplaying more a�acking could need the fouls to stop a counter-a�ack. Besides this, there could be some generaldi�erences between teams regarding the number of fouls. Making a distinction between cards and regularfouls could change the results, for instance, a yellow card is given as a warning to a player. All of these couldbe lead to new insights into the world of soccer and especially regarding fouls.

39

Page 41: Fouls in Dutch soccer: A Poisson point process

References

Azzalini, A. (Ed.). (1996). Statistical inference: Based on the likelihood. Chapman & Hall/CRC.

Bacry, E., Mastroma�eo, I., & Muzy, J. (2015). Hawkes processes in �nance. Market Microstructure andLiquidity, 1(1).

Carling, C., & Dupont, G. (2011). Are declines in physical performance associated with a reduction in skill-related performance during professional soccer match-play? Journal of Sports Sciences, 29(1), 63-71.

Carmichael, F., �omas, D., & Ward, R. (2000). Team performance: �e case of english premiership football.Managerial and Decision Economics, 21(1), 31-45.

Chaves-Demoulin, V., Davison, A. C., & Mcneil, A. J. (2005). Estimating value-at-risk: a point process ap-proach. �antitative Finance, 5(2), 227-234.

Coles, S. (Ed.). (2004). An introduction to statistical modeling of extreme values. Springer-Verlag London.

Dixon, M. J., & Robinson, M. E. (1998). A birth process model for association football matches. �e Statistician,47 (3), 523-538.

Everson, P., & Goldsmith-Pinkham, P. (2008). Composite poisson models for goals scoring. Journal of �an-titative Analysis in Sports, 4(2), 1-17.

Gumusdag, H., Yildiran, I., F.Yamaner, & A.Kartal. (2011). Agression and fouls in professional football. Biomed-ical Human Kinetics, 3, 67-71.

Hawkes, A. G. (2018). Hawkes processes and their applications to �nance: a review. �antitative Finance,18(2), 193-198.

Lago-Ballesteros, J., Lago-Penas, C., & Rey, E. (2012). �e e�ect of playing tactics and situational variables onachieving score-box possessions in a professional soccer team. Journal of Sports Sciences, 30(14), 1455-1461.

McNeil, A., Rudiger, F., & Embrechts, P. (Eds.). (2015). �antitative risk management: Concepts, techniques andtools. Princeton University Press.

Moore, E. (2017). Formalism and strategic fouls. Journal of the Philosophy of Sport, 44(1), 95-107.

Nevill, A., Balmer, N., & Williams, A. (2002). �e in�uence of crowd noise and experience upon refereeingdecisions in football. Psychology of Sport and Exercise, 3, 261-272.

Ogata, Y. (1978). �e asymptotic behaviour of maxium likelihood estimators for stationary point processes.Annals of the institure of statistical mathematics, 30, 243-261.

Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes.Journal of the American Statistical Association, 83(401), 9-27.

Ozaki, T. (1979). Maximum likelihood estimation of hawkes’self-exciting point processes. Annals of theinstiture of statistical mathematics, 31, 145-155.

Pollard, R., & Pollard, G. (2005). Home advantage in soccer: A review of its existence and causes. InternationalJournal of Soccer and Science Journal, 3(1), 28-38.

40

Page 42: Fouls in Dutch soccer: A Poisson point process

Rampini, E., Bosio, A., Ferraresi, I., Petruolo, A., Morelli, A., & Sassi, A. (2011). Match-related fatigue in soccerplayers. Medicine and Science in Sports and Exercise, 43(11), 2161–2170.

Su�er, M., & Koche, M. (2004). Favoritism of agents-the case of referees home bias. Journal of EconomicPsychology, 25, 461-469.

Traclet, A., Romand, P., Moret, O., & Kavussanu, M. (2011). Antisocial behavior in soccer: A qualitative studyof moral disengagement. International Journal of Sport and Exercise Psychology, 9(2), 143-155.

Yue, Z., Broich, H., & Mester, J. (2014). Statistical analysis for the soccer matches of the �rst bundesliga.International Journal of Sports Science Coaching, 9(3), 553–560.

41

Page 43: Fouls in Dutch soccer: A Poisson point process

A Derivation log-likelihood time di�erence process

Assuming that there are Nz fouls occurring at times Ti for i = 1, . . . , Nz . Furthermore, no fouls can occurpast minute M . We start with �nding an expression for Λ(A,θ).

Λ(A,θ) =

∫ M

0λ(s) ds

=

∫ M

0τ + ψ · 1Tj<s<Tj+1 · exp(−γ · (s− Tj)) ds

=

∫ M

0τ ds+ ψ ·

∫ M

01Tj<s<Tj+1 · exp(−γ · (s− Tj)) ds

= M · τ + ψ ·[ ∫ T2

T1

exp(−γ · (s− T1)) ds+

∫ T3

T2

exp(−γ · (s− T2)) ds+ . . .

+

∫ TNz

TNz−1

exp(−γ · (s− TNz−1)) ds+

∫ M

TNz

exp(−γ · (s− TNz)) ds

]Now we can work out the integrals, note that all are similar with respect to the bounds and therefore onlyone of them will be shown.∫ Ti+1

Ti

exp(−γ · (s− Ti)) ds = exp(γ · Ti) ·∫ Ti+1

Ti

exp(−γ · s) ds = exp(γ · Ti) ·[−1

γexp(−γ · s)

]Ti+1

Ti

=−1

γexp(γ · Ti) ·

(exp(−γ · Ti+1)− exp(−γ · Ti)

)=−1

γexp(γ · Ti) · exp(−γ · Ti+1)− exp(γ · Ti) · exp(−γ · Ti)

=−1

γ

(exp(−γ · (Ti+1 − Ti))− 1

)A�er this has been �nished the integrals can be substituted and we have that:

Λ(A,θ) = M · τ + ψ

[−1

γ

(exp(−γ · (T2 − T1))− 1

)+−1

γ

(exp(−γ · (T3 − T2))− 1

)+ . . .

+−1

γ

(exp(−γ · (TNz−1 − TNz))− 1

)+−1

γ

(exp(−γ · (M − TNz))− 1

)]= M · τ − ψ

γ·Nz−1∑j=1

(exp(−γ · (Tj+1 − Tj))− 1)− ψ

γ(exp(−γ · (M − TNz))− 1)

Now we can �ll in the original equation of the log-likelihood of a Poisson process and we �nd that :

Lz(T1, . . . , TNz ;θ) = −Λ(A;θ) +

Nz∑i=1

log(λ(Ti)

= −M · τ +ψ

γ·Nz−1∑j=1

exp(−γ · (Tj+1 − Tj)) +ψ

γ(exp(−γ · (M − TNz))− 1)

+ log(λ(Ti))

�erefore, we have an expression for the log-likelihood that can be used to obtain our maximum likelihoodestimators.

42

Page 44: Fouls in Dutch soccer: A Poisson point process

B Derivation log-likelihood self-exciting process

Let our total number of fouls in a match equalNz , and the according time of occurrence of an event equals Tifor i = 1, . . . , Nz . Furthermore the assumption is used that fouls can not occur past the minute M. We startwith �nding an expression for Λ(A,θ).

Λ(A,θ) =

∫ M

0λ(s) ds

=

∫ M

0τ + ψ ·

∑Ti<t

exp(−γ · (s− Ti) ds

=

∫ M

0τ ds+ ψ ·

∫ M

0

∑Ti<t

exp(−γ · (s− Ti) ds

= M · τ + ψ ·∫ M

0

∑Ti<t

exp(−γ · (s− Ti)) ds

= M · τ + ψ ·[ ∫ T2

T1

exp(−γ · (s− T1)) ds+

∫ T3

T2

exp(−γ · (s− T1)) + exp(−γ · (s− T2)) ds

+

∫ T4

T3

exp(−γ · (s− T1)) + exp(−γ · (s− T2)) + exp(−γ · (s− T3)) ds+ · · ·+∫ TNz

TNz−1

exp(−γ · (s− T1)) + exp(−γ · (s− T2)) + · · ·+ exp(−γ · (s− TNz−1)) ds

+

∫ M

TNz

exp(−γ · (s− T1)) + exp(−γ · (s− T2)) + · · ·+ exp(−γ · (s− TNz−1))

+ exp(−γ · (s− TNz)) ds

]=

M · τ + ψ ·[(∫ T2

T1

exp(−γ · (s− T1)) ds+

∫ T3

T2

exp(−γ · (s− T1)) ds+ . . .

+

∫ M

TNz

exp(−γ · (s− T1)) ds)

+

(∫ T3

T2

exp(−γ · (s− T2)) ds+

∫ T4

T3

exp(−γ · (s− T2)) ds

+ · · ·+∫ M

TNz

exp(−γ · (s− T2)) ds))

+ · · ·+(∫ TNz

TNz−1

exp(−γ · (s− TNz−1)) ds

+

∫ M

TNz−1

exp(−γ · (s− TNz−1)) ds

)+

∫ M

TNz

exp(−γ · (s− TNz))

]ds

= M · τ + ψ ·[ ∫ M

T1

exp(−γ · (s− T1)) ds+

∫ M

T2

exp(−γ · (s− T2)) ds+ . . .

+

∫ M

TNz−1

exp(−γ(s− TNz−1)) ds+

∫ M

TNz

exp(−γ · (s− TNz)) ds

]Now we will work out one of the integrals using Ti as an example, all the integrals can be solved using thismanner.

43

Page 45: Fouls in Dutch soccer: A Poisson point process

∫ M

Ti

exp(−γ · (s− Ti)) ds = exp(γ · Ti)∫ M

Ti

exp(−γ · s) ds

= exp(γ · Ti) ·[

1

−γexp(−γ · s)

]MTi

=−1

γ· exp(γ · Ti) ·

[exp(−M · γ)− exp(−Ti · γ)

]=−1

γ

(exp(γ · T1) · exp(−γ ·M)− exp(γ · Ti) · exp(−γ · Ti)

)=−1

γ

(exp(−γ · (M − Ti))− exp(−γ · (Ti − Ti))

)=−1

γ

(exp(−γ · (M − Ti))− 1

)We can �ll this into our previously obtained expression and for all di�erent integrals, then Λ(A;θ) is:

Λ(A;θ) = M · τ + ψ ·[−1

γ

(exp(−γ · (M − T1))− 1

)+−1

γ

(exp(−γ · (M − T2))− 1

)+ · · ·+ −1

γ

(exp(−γ · (M − TNz))− 1

)]= M · τ − ψ

γ·[(

exp(−γ · (M − T1))− 1)

+(

exp(−γ · (M − T2))− 1)

+ . . .

+(

exp(−γ · (M − TNz))− 1)]

= M · τ − ψ

γ·Nz∑i=1

(exp(−γ(M − Ti))− 1

)Since now we know the form of Λ(A;θ), we have that:

Lz(T1, . . . , TNz ;θ) = −Λ(A; θ) +

Nz∑i=1

log(λ(Ti) =

−M · τ +ψ

γ·Nz∑i=1

(exp(−γ(M − Ti))− 1

)+

Nz∑i=1

log(λ(Ti)).

And we have obtained an expression for the log-likelihood for this process than can be maximised.

44