a likelihood enhanced centrographic geographic profiling...

Team #6620 of 17

A Likelihood Enhanced Centrographic Geographic

Profiling Model in Detecting Serial Criminals

Team #6620 of 17

Introduction

Serial crime is generally defined as crimes in a serial or repetitive nature,

including serial murder, serial rape and serial arson. Location is a key factor in the

investigation of serial crimes. Geographic profiling has been developed for detecting

serial crimes. Geographic profiling uses locations and environment in which crimes

occur to analyze crime patterns, find places where certain crimes cluster, and identify

the home base locations of criminals by plotting the location of crimes committed by

a serial offender and then using a model to estimate..

Our aim is to develop a method to generate a geographical profile and to predict

possible locations of the next crime based on the time and locations of the past crime

scenes.

As shown in Figure 1, in order to predict possible locations of the next crime, we

take actions in two Tiers: First, trace back the blue lines to find the criminal’s home

base; Second, search along the grey lines to predict locations of the next crime.

Home Base

Previous Incident

Locations

Predicted Possible Locations

Of The Next Crime

Tier 1: find the underlying home base of the criminal

Tier 2: find the possible locations of the next crime

Figure 1. Preliminary analysis of the prediction process

Problem Background

Through investigations of human patterns and psychology study, there are already

several main geographic profiling models.

The Brentingham Model examines where crimes are most likely to happen,

based on offender’s residence, workplace and leisure activities. The concept of

“awareness space of offender” is proposed in the model. It is described by the

offender’s activity space, which is composed of the offender’s home, workplace,

social activity sites, and their connecting paths. According to Brentingham,

crimes occur in those locations where suitable targets are overlapped by the

offender's awareness space [D. Kim. Rossmo. 1993].

The problem is that, the residential place and the activity space of the offenders

keeps unknown in most cases.

Team #6620 of 17

The Centrographic Methods is a univariate measure of the central tendency of a

point pattern which minimizes the sum of the squared distances from the spatial

mean to the various points [Taylor, 1977]. From the serial points, a single

summary location is provided.

But it also suffers from three serious methodological difficulties:

(1) the method generally provides only a single piece of information;

(2) it is susceptible to the influence of outtiers;

(3) some theoretical models suggest that the locations of the confluences of the

offender's activity space and the victim backcloth may not be related to measures

of central tendency. [D. Kim. Rossmo. 1993].

Criminal Geographic Targeting (the CGT model, or Rossmo model,1993)

reverse the logic of Brentingham model and attempts to determine the most

probable areas in which the offender might be located by analyzing the spatial

information associated to a series of linked crimes. The problems of CGT model

are:

(1) The model leads to overestimation of travel distances in places in urban areas

without a uniform grid street layout.

(2) It multiplies different variables from different systems, some of the values are

densities, and the process might overestimate and underestimate some of the

likelihood of the residence.

(3) The model never measures the attraction of following crime locations that

would exist for an offender.

The Canter model is different from the CGT model is that it suggests a search

strategy by the police for a serial offender rather than a particular location. The

strength is to indicate how narrow an area the police should concentrate on in

order to optimize finding an offender.

Similar to the CGT, this model never measure the criminal attractions, the

probable locations of following crime are only inferred from the pattern of crime

incidents.

Journey to Crime Routine (JTC) (by Ned Levine) builds on the CGT

framework, and extends its modeling capability. The CrimeStat JTC model

provides five functions to different offender patterns which promote the accuracy

of the residential place of the offender.

The problem is that the CrimeStat JTC model is too complicated; actually a

simpler model which finds approximate geographic center of the distribution

where travel time to each of the incidents was minimal produced as good an

estimate as the more sophisticated methods [CrimeStat Version 3.2a Update

Notes. Ned Levine. 2009].

After the review of existing models, we find that as the method goes more and

more sophisticated, the accuracy of estimation stays almost the same. We will thus

make the estimation based improved simple models.

Additionally, case variables such as crime time, are not taken into account.

Crime time will be taken into consideration in our model.

Team #6620 of 17

Assumptions, Terms & Notations

Assumptions

The residential location of the offender doesn’t change during the process of

serial-crime-investigation.

There would be a small safety area (or buffer zone) of relatively little offender

activity near to the criminal’s base location; beyond that zone, the number of

crime trips would decrease according to a distance decay model.

There are specific victim types that are not equally distributed throughout space.

In the crimes, the body dump site is the same as the attack location and the

victim's last known location.

Terms

Serial criminal: A serial criminal is a person who commits three or more crimes

in series by similar patterns in a certain period of time.

Home base: the residential location of a criminal or a group of criminals in a

serial crime, which is a key factor to predict the next crime.

Notations

P : the likelihood of a criminal traveling a certain distance to commit a crime

Dij : the distance from a location to an incident site.

α ∶ an arbitrary constant in Canter Model.

β ∶ the coefficient of the distance in Canter Model.

P: a normalization constant in Canter Model.

e: the base of the natural logarithm in Canter Model.

(Xj , Yj), j = 1,2… k: a set of incident locations

xi , yi , i = 1,2…m × n: coordinates of the grids in the investigation area.

Tj: crime time.

Sa : the major axis of an Standard Deviation Ellipse.

Sb : the minor axis of an Standard Deviation Ellipse.

θ: the angle between the minor axis of a Standard Deviation Ellipse and a vertical line.

Dpj : the average distance of crime locations to the home base.

xp , yp : location of the grid with the maximum likelihood, which is also the location of

criminal’s home base.

Model Derivation We start our research by examining the accuracy of traditional Centrographic

methods.

Team #6620 of 17

Centrographic Methods According to Paulsen’s study results of 247 serial crime events that

occurred in Baltimore Country, Maryland between 1994-1997, accuracy of

certain centrographic pridictions of criminals’ home base are shown in Table1.

Table 1.

Accuracy of certain centrographic pridictions of criminals’ home base

Approach

Name

Percentage

Correct

Avg.

Search Area

Avg.

Search Cost

Accuracy

Precision

SDR 80% 151.68 170% 0.5274

SDE 73% 122.10 134% 0.5978

MCP 42% 23.21 26% 1.8095

Source. Dr. Derek J. Paulsen, 2005 UK Crime Mapping Conference

1. Percentage correct is the percent of final event location within predicted

area.

2. Search area is the average size of the predicted area.

3. Search cost is the percent of base search covered by the final predicted

area.

4. Accuracy Precision is the percent of correct forecasts divided by the

average predicted area.

Data shows that Standard Deviation Rectangle (SDR) and Standard

Deviation Ellipse (SDE) centrographic profiling methods are good at

percentage correct but weak in accuracy precision. Minimum Convex-Hull

Polygon (MCP) performs well in accuracy precision but has a flaw in

percentage correct.

A balance between percentage correct and accuracy precision cannot be

reached by using centrographic methods only. Concerned this, we use Canter

model to strengthen the weakness of accuracy precision of centrographic

methods with high percentage correct.

SDE surpasses SDR in the fields of average search area, average search

cost and accuracy precision, thus SDE is selected to be strengthened.

The Standard Deviation Ellipse is a statistic ellipse that gives dispersion of

the incidents around the mean center in two dimensions (Ebdon, 1988; Cromley,

1992).

The Y-axis is rotated clockwise through an angle θ, where

θ =tan−1 ( (Xi − X )2 − (Yi − Y )2) + [( (Xi − X )2 − (Yi − Y )2)2 + 4( (Xi − X )(Yi − Y ))2]2

2 (Xi − X )(Yi − Y )

The X-axis and Y-axis of the ellipse are defined by,

length x = 2Sa, length y = 2Sb, where

Team #6620 of 17

Sa = 2

k − 2× [ Xi − X cos θ − (Yi − Y ) sinθ]

2

Sb = 2

k − 2× [ Xi − X sin θ − (Yi − Y ) cos θ

2

Given all the incidents locations { (Xi , Yi), i=1,2…k} (shown in blue dots

in Figure 2), the mean value of all incident locations X , Y and other statistics

θ, Sa, Sb can be calculated. Then a Standard Deviation Ellipse can be drawn

around the incident locations, as shown in Figure 2.

Figure 2. A Deviation Ellipse

The SDE method indicates a large ellipse area for police to search for

criminal’s home base. Its efficiency can be improved by introducing search

priority. Canter model provides likelihood and can be used to strengthen the

SDE method.

Canter Model

Canter’s group modifies the distance decay function for journey to crime

trips by using a negative exponential term (Canter and Tagg, 1975; Canter and

Larkin, 1993; Canter and Snook, 1999; Canter, Coffey and Huntley, 2000).

Equation (*) indicates the likelihood with which any location is likely to be the

home base of the criminal based on one incident.

P ij = αe−β∙D ij

P (*)

where P ij is the likelihood of a criminal traveling a certain distance to commit

Sb

θ

Sa

Team #6620 of 17

a crime, Dij is the distance from a home base location to an incident site, α is

an arbitrary constant, β is the coefficient of the distance, P is a normalization

constant, and e is the base of the natural logarithm.

Our Methods

Given these models, we develop a new method. A set of incident

locations{ (Xj , Yj), j=1,2…k} and correspondence crime time { Tj, j=1,2…k}

are obtained from investigations of crime scenes. The Geographic Information

System is accessible to draw a geographical map containing all the incident

locations. The method follows 3 steps:

First, divide the whole area into m×n grids. For any Grid i, the coordinates

are (xi , yi)

i

Figure 3. Divided the whole area m×n grids

Secondly, calculate the statistics X , Y , θ, Sa , Sb to draw a SDE. The SDE is

clockwise rotated from a general ellipse (xi−X )2

Sa2 +

(yi−Y )2

Sb2 = 1 by an angle of θ

The equation of coordinate transformation is,

x = x′cosθ − y′sinθ

y = x′sinθ + y′cosθ

Hence a SDE can be obtained with an equation of,

(xi cos θ − yisin θ − X )2

Sa2 +

(xi sin θ + yicos θ − Y )2

Sb2 = 1

Any Grid i lies within the SDE, (xi , yi) should satisfy the constraint:

(xi cos θ−yisinθ−X )2

Sa2 +

(xi sinθ+yicosθ−Y )2

Sb2 ≤ 1 (Sa ≥ Sb > 0, i = 1, 2 …m × n)

Thirdly, apply Equation (*). In this equation, let arbitrary scalar α be 1.

y

x o

Team #6620 of 17

Probability normalization is not essential and set P=1 since the sum of

probabilities is not required to be 1, as long as they are calculated by the same

equation.

We get the likelihood of a criminal traveling a certain distance from the ith

grid to commit a crime at the jth

incident location,

Yij = e−β∙D ij (i=1, 2 …, m × n, j = 1,2… k)

where β is the coefficient that affects the decay function. It varies in different

cases.

For each Grid i, the total likelihood of an criminal traveling from the ith

grid to commit crimes at all incident locations is the sum of Yij for each j,

Yi = e−β∙D ijkj=1 (i=1, 2 …, m × n)

Overall, the likelihood for each Grid i to be the home base location of the

criminal is given by,

Yi = e−β∙D ijkj=1

(i=1, 2 …, m × n , and (xi cos θ−yisinθ−X )2

Sa2 +

(xi sin θ+yicosθ−Y )2

Sb2 ≤ 1 )

Time-weighted Method

Crime time is taken into consideration in the model due to criminal

behavior basis. For serial criminals who commit most crimes in the day time,

they need to sleep at night. Therefore they cannot go far-away to commit a

crime the next morning. Hence, if a crime is committed in the morning, the

location should not be far away from the criminal’s home base.

6 and 12 o’clock at noon are selected to be the time nodes. Crimes

committed during this period of time have more reliable locations in detecting

the home base. If there are r crimes committed in the morning(r≤k), these r

locations are weighted.

Likelihood equation turns into a weighted form, where ω is the weighing

Yi′ = e−β∙D ijk

j=1 +ω ∙ e−β∙D ij ′rj=1

(i=1, 2 …, m × n , and (xi cos θ−yisinθ−X )2

Sa2 +


Sb2 ≤ 1 )

In this equation, time is connected to calculate the likelihood of each grid.

This is especially designed in detecting serial criminals who commit most

crimes in the day time.

Law enforcement officers could plot the likelihood Yi′ values on the map

to determine where to search for the criminal within an ellipse according to the

Team #6620 of 17

likelihood.

Prediction

The grid with maximum likelihood of home base can be regarded as a fixed

location in predicting the next crime location in this section.

Brantingham’s study (1981) postulated that there would be a small safety

area (or buffer zone) of relatively few criminals near to the criminal’s home

base location; beyond that zone, however , the number of crime trips would

decrease according to a distance decay model.We make the same postulation.

Since crime decays beyond the safety area, the edge of the safety area has

the most crimes, and has the most probability of the next crime. The circle

coincides the edge of the safety area can be predicted to have the most

probability of the next crime. If we can estimate the radius of the safety area,

the most likely place of the next crime can be predicted.

Average distance between home base and all incident locations are the

average distance a criminal travels to commit a crime. It converges to the

distance of most probability to have the next crime, thus it can be used as an

estimator of the radius of the safety area.

Let Yp = Ymax , then (xp , yp ) is the location with the maximum likelihood.

Dpj =

1

k Xj − xp

2+ Yj − y

p

2kj=1 is the average distance between home

base and all incident locations. The predicted circle has an equation of:

The circle for most

probability crime

Criminal’s home

Dmj

Figure 4. Circle with the most probability

Team #6620 of 17

(xa − xp )2 + (ya − yp )2 = Dpj 2

((xi cosθ−yisinθ−X )2

Sa2 +

(xi sinθ+yicos θ−Y )2

Sb2 ≤ 1 )

The constraint of ellipse is also valid here, since the SDE is a statistical

area including most incident locations.

The prediction of possible locations of the next crime is an arc (part of the

circle within the SDE, see Figure 4). As shown in Figure 5, it could be a band

around the arc. The possibility decreases when going away from Dmj , as the

color of the band turns lighter.

Figure 5. The prediction area

Verification of the model

Data

This set of data shown in Table 2 is obtained from Crimestat package

(http://www.icpsr.umich.edu/icpsrweb/CRIMESTAT/). It is collected from a

serial crime committed from 1993 to1997 in Baltimore, Maryland provided by

Mr. Phil Canter of the Baltimore County Police Department.

Table 2.

The location and time information in a serial crime

Number DATE TIME INCIDY(Yi) INCIDX(Xi)

1 3/12/93 1623 -76.5754 39.3868

2 3/19/93 830 -76.5409 39.3887

3 7/17/93 2341 -76.5230 39.3611

4 8/26/93 2009 -76.5801 39.3932

5 10/29/93 2125 -76.7507 39.3115

6 6/18/95 1315 -76.6015 39.4042

7 11/24/95 1715 -76.4439 39.2940

8 9/29/96 1621 -76.5633 39.3725

9 10/4/96 1834 -76.5670 39.3723

10 10/6/96 1921 -76.5616 39.3725

11 3/13/97 1338 -76.5815 39.3929

12 5/16/97 954 -76.5857 39.3975

Prediction area

http://www.icpsr.umich.edu/icpsrweb/CRIMESTAT/

Team #6620 of 17

13 5/16/97 0 -76.5815 39.3929

14 8/12/97 930 -76.5617 39.3726

Source. CrimeStat, Ned Levine & Associates, and the National Institute of Justice, 2009

Time in the table is denoted on the form of 24hrs. Home of the criminal is at

(39.3688N, 76.5666W). In a relative small area like in this case, latitude and

longitude can be used as approximation of orthogonal coordinates to calculate

the distances. So here we regard latitude as Xj and longitude as Yj, in absolute

value, and use distance formula directly. In large areas, latitude and longitude

should be transferred first to calculate the distance.

Verification

We use Google Map (http://maps.google.com) as a substitution of

Geographical Information System. Plot these locations on Figure 6:

Figure 6.

First, set m=200, n=100 and divide the area into m× n grids. The boundaries are shown

in Table 3.

Table 3.

The boundaries of the map

http://maps.google.com/

Team #6620 of 17

boundaries X-axis Y-axis

Min 39.28 76.4

Max 39.4 76.85

Secondly, calculate the statistics, we get Table 4.

Table 4.

The statistics of the fuction

K X Y θ Sa Sb

14 39.3723 -76.5727 510.6189 0.05 0.035

Then, draw SDE (xi cos θ−yisinθ−X )2

Sa2 +


Sb2 = 1 with these statistics

obtained.

Thirdly, for each Grid i, calculate Yi = e−β∙Dij

1313j=1 . In order to make the graph look

adequate, we set β = 20. Figure 7 on the left shows the distribution of likelihood for every Grid i. Corresponding contour view is displayed on the right. The smaller the circle is, the higher probability it represents.

Figure 7. The distribution of likelihood

Crime No.2 and 12 were committed in the morning. Use Weighted

Likelihood Equation and set ω= 0.2, and we can get weighted likelihood

shown in Figure 8(on the right),

Team #6620 of 17

Figure 8. Original vs. weighted (on the right) likelihood

The red hollow circle is the criminal’s real residence. The ellipse with the

dash line is centered at the geometric mean of all the incident points.

The green straight line which lies in the middle of the contour is the

distance between the grid where real home base locates (shown in small red

circle) and the grid with maximum likelihood. The grid of the maximum

likelihood changed location after weighting incident locations. By using weight

= 0.2, an expected result is obtained. The predicted home base (the grid of

the maximum likelihood) gets closer to the real home base as the green line

turns shorter. That means, after taking crime time into consideration, the

prediction is more precise than before in this case. In other words, this model

testifies that crime time plays a considerable role in predicting the home base

location.

However, here is one question, "How to determine the weight of ?"

Obviously, different number will lead to different results and that depends on

the choice of weight and analysis of the influence of the crime time in

specific cases. Intuitively, the latter factor is much more important in most

occasions.

Prediction

Team #6620 of 17

Find the grid p with the maximum probability and draw a circle with center

p and radius of d. The X-axis and Y-axis are not in the same scale, so the circle

is flatter in the figure.

We make a prediction of the fourteenth incident location with the first 13

locations, the average distance d’ is 0.0445.

Figure 9. Predicted, #13(Left) and #14(Right)

We can find the new graph is more helpful for the police to narrow the

searching region according to the Figure 5, and much time and energy can be

maintained. However, it should be noted that the prediction of the possible

location of the next crime is based on the probability and therefore it is proper

to say the criminal tends to show up in the predicted area with large probability.

Despite this, efficiency and enlightenment brought by the model is undeniable.

It does find us an intuitive, easy and feasible approach to solving this tough job.

Then, to testify our model, we trace back a point to find whether our

model works. We make another prediction of the thirteenth incident location

with the first 12 locations, the average distance d’ is 0.0253.

Team #6620 of 17

Figure 10. Predicted, #12(Left) and #13(Right)

In this case, the locations of the possible incidents are two arcs separated by the

ellipse in two sides. The change seems not so apparent. The minor axis of the

ellipse decreased greatly, but the criminal’s residence is still included. Notice

that, the ellipse shows us the dispersion of the incidents around the mean center

which should be close to the real home base. The major axis of the ellipse

stretches longer. This is probable that the “remote” point, point 5, has a great

influence in the dispersion.

Conclusions Geographic profiling supplement what detectives find .It can provide insights

predictions, but they are only assistants. Our methods are not suitable for every

case. In some cases, it may provide positive information to solving crimes while in

the others traditional detective work will make the difference. Overall, strengths

and weaknesses of our model are:

Strengths

Our model combines the process of allocating the home base of offenders and

searching for the next crime. Instead of using sophisticated models which consider

varies aspects, this model trace back to the simple functions in which fewer

parameters are involved.

By improving the traditional Centrographic models, our model can quickly

Team #6620 of 17

focus on a relatively smaller and more precise area. And by using the Travel

behavior model, the valuable area is further concentrated which helps the police to

focus the resources on person research. Additionally, the accurate time of the

previous crime is considered in the model, adding its influence to the model by

suggesting distance information.

In the crime-location-finding progress, a safety radius equals the average

distance between the probable home base and previous crime locations by our

analysis. Suspicious area is then successfully focused on an inferior arc, which

greatly reduce the workload of the police.

Weaknesses

Only one grid cell is chosen (the one with the maximum likelihood of home

base) when fixing the safety radius and predicting the next crime. Other grid cells

with high likelihoods of the home base (which is not so high as the maximum one)

are ignored then. This would reduce the potential area of crime location and

decrease the accuracy of the prediction; because the actual home base cannot be

identified precisely and the whole system of probability should be involved in the

second modeling process.

In addition, local information isn’t added in our model to filter information for

further prediction because a lack of data. And the local information sometimes

might be crucial factor which will have great influence on the prediction and

change the whole situation.

Future Work

·To take into account the impact of other factors such as crime type, city type,

and road network.

·To accumulate more data and local information to build a database.

·To evaluate the efficiency and accuracy of a model by testing the right

number of predictions involved in a serial crime.

·To determine case variables that may indicate predictive success.

·To have more research on serial offender spatial and temporal behavior to

develop and analyze other new strategies.

References Craig Bennell, Brent Snook, Paul. Taylor, Shevaun Corey, and Julia Keyton. 2007.

IT’S NO RIDDLE, CHOOSE THE MIDDLE. The Effect of Number of

Crimes and Topographical Detail on Police Officer Predictions of Serial

Burglars’ Home Locations. Criminal Justice And Behavior Vol. 34, No. 1,

January 2007:119-132

http://www.mun.ca/psychology/brl/publications/Bennell4.pdf

David Canter, Toby Coffey, Malcolm Huntley, and Christopher Missen. 2000.

Predicting Serial Killers’ Home Base Using a Decision Support System. Journal

Team #6620 of 17

of Quantitative Criminology, Vol. 16, No. 4, 2000: 457-478.

http://www.springerlink.com/index/H3T57X8878534228.pdf

D. Kim Rossmo. 1993. A Methodological Model. American Journal Of Criminal

Justice, Vol. XVII, No. 2 1993: 1-21.

http://www.springerlink.com/index/N161K91QV4R03604.pdf

E. R. Lawrence, L. M. Glidden, and B. M. Jobe. Keeping them happy: Job satisfaction,

personality, and attitudes toward disability in predicting counselor job retention.

Education and Training in Developmental Disabilities,41(1):70–80, 2006.

J. Block. Going beyond the five factors given: Rejoinder to Costa and McCrae (1995)

and Goldberg and Saucier(1995). Psychological Bulletin, 117(2):226–229, 1995.

J. Salgado. The five-factor model of personality and job performance in the European

community. Journal of Applied Psychology, 82:30–43, 1997.

National Science Foundation. Women, minorities, and persons with disabilities in

science and engineering -2002: Employment.

http://www.nsf.gov/sbe/srs/nsf03312/c6/c6s3.htm.

Ned Levine (2009). CrimeStat: A Spatial Statistics Program for the Analysis of

Crime Incident Locations (v 3.2a). Ned Levine & Associates, Houston, TX, and

the National Institute of Justice, Washington, DC. October

http://www.icpsr.umich.edu/files/CRIMESTAT/files/

CrimeStat3.2aupdatenotes.pdf

R. Bagby and M. Marshall. 2003. Positive impression management and its influence

on the Revised NEO Personality Inventory: a comparison of analog and

differential prevalence group designs. Psychological Assessment, 15(3):333–339,

2003.

Snook, B., Cullen, R. M., Mokros, A., & Harbort, S. (2005). Serial murderers’ spatial

decisions: Factors that influence crime

location choice. Journal of Investigative Psychology and Offender Profiling, 2,

147-164.

S. J. Greenwald, K. G. Olthoff, V. Raskin, and W. Ruch. The user non-acceptance

paradigm: Infosec’s dirty little secret. In Proceedings of the 2004 New Security

Paradigms Workshop, 35–43, Nova Scotia, Canada, 2004. 20-23 September

2004.

Tara Whalen, Carrie Gates. 2007. A Psychological Profile of Defender Personality

Traits. JOURNAL OF COMPUTERS, VOL. 2, NO. 2, APRIL 2007:

83-94.http://academypublisher.com/jcp/vol02/no02/jcp02028493.pdf

U.S. Department of Justice, Federal Bureau of Investigation. August 2000.

Uniform Crime Reporting: National Incident-Based Reporting System

Volume 1: Data Collection Guidelines. Criminal Justice Information Services

Division. http://www.fbi.gov/ucr/nibrs/manuals/v1all.pdf

a likelihood enhanced centrographic geographic profiling...

Documents