a likelihood enhanced centrographic geographic profiling...
TRANSCRIPT
Team #6620 Page 1 of 17
A Likelihood Enhanced Centrographic Geographic
Profiling Model in Detecting Serial Criminals
Team #6620 Page 2 of 17
Introduction
Serial crime is generally defined as crimes in a serial or repetitive nature,
including serial murder, serial rape and serial arson. Location is a key factor in the
investigation of serial crimes. Geographic profiling has been developed for detecting
serial crimes. Geographic profiling uses locations and environment in which crimes
occur to analyze crime patterns, find places where certain crimes cluster, and identify
the home base locations of criminals by plotting the location of crimes committed by
a serial offender and then using a model to estimate..
Our aim is to develop a method to generate a geographical profile and to predict
possible locations of the next crime based on the time and locations of the past crime
scenes.
As shown in Figure 1, in order to predict possible locations of the next crime, we
take actions in two Tiers: First, trace back the blue lines to find the criminal’s home
base; Second, search along the grey lines to predict locations of the next crime.
Home Base
Previous Incident
Locations
Predicted Possible Locations
Of The Next Crime
Tier 1: find the underlying home base of the criminal
Tier 2: find the possible locations of the next crime
Figure 1. Preliminary analysis of the prediction process
Problem Background
Through investigations of human patterns and psychology study, there are already
several main geographic profiling models.
The Brentingham Model examines where crimes are most likely to happen,
based on offender’s residence, workplace and leisure activities. The concept of
“awareness space of offender” is proposed in the model. It is described by the
offender’s activity space, which is composed of the offender’s home, workplace,
social activity sites, and their connecting paths. According to Brentingham,
crimes occur in those locations where suitable targets are overlapped by the
offender's awareness space [D. Kim. Rossmo. 1993].
The problem is that, the residential place and the activity space of the offenders
keeps unknown in most cases.
Team #6620 Page 3 of 17
The Centrographic Methods is a univariate measure of the central tendency of a
point pattern which minimizes the sum of the squared distances from the spatial
mean to the various points [Taylor, 1977]. From the serial points, a single
summary location is provided.
But it also suffers from three serious methodological difficulties:
(1) the method generally provides only a single piece of information;
(2) it is susceptible to the influence of outtiers;
(3) some theoretical models suggest that the locations of the confluences of the
offender's activity space and the victim backcloth may not be related to measures
of central tendency. [D. Kim. Rossmo. 1993].
Criminal Geographic Targeting (the CGT model, or Rossmo model,1993)
reverse the logic of Brentingham model and attempts to determine the most
probable areas in which the offender might be located by analyzing the spatial
information associated to a series of linked crimes. The problems of CGT model
are:
(1) The model leads to overestimation of travel distances in places in urban areas
without a uniform grid street layout.
(2) It multiplies different variables from different systems, some of the values are
densities, and the process might overestimate and underestimate some of the
likelihood of the residence.
(3) The model never measures the attraction of following crime locations that
would exist for an offender.
The Canter model is different from the CGT model is that it suggests a search
strategy by the police for a serial offender rather than a particular location. The
strength is to indicate how narrow an area the police should concentrate on in
order to optimize finding an offender.
Similar to the CGT, this model never measure the criminal attractions, the
probable locations of following crime are only inferred from the pattern of crime
incidents.
Journey to Crime Routine (JTC) (by Ned Levine) builds on the CGT
framework, and extends its modeling capability. The CrimeStat JTC model
provides five functions to different offender patterns which promote the accuracy
of the residential place of the offender.
The problem is that the CrimeStat JTC model is too complicated; actually a
simpler model which finds approximate geographic center of the distribution
where travel time to each of the incidents was minimal produced as good an
estimate as the more sophisticated methods [CrimeStat Version 3.2a Update
Notes. Ned Levine. 2009].
After the review of existing models, we find that as the method goes more and
more sophisticated, the accuracy of estimation stays almost the same. We will thus
make the estimation based improved simple models.
Additionally, case variables such as crime time, are not taken into account.
Crime time will be taken into consideration in our model.
Team #6620 Page 4 of 17
Assumptions, Terms & Notations
Assumptions
The residential location of the offender doesn’t change during the process of
serial-crime-investigation.
There would be a small safety area (or buffer zone) of relatively little offender
activity near to the criminal’s base location; beyond that zone, the number of
crime trips would decrease according to a distance decay model.
There are specific victim types that are not equally distributed throughout space.
In the crimes, the body dump site is the same as the attack location and the
victim's last known location.
Terms
Serial criminal: A serial criminal is a person who commits three or more crimes
in series by similar patterns in a certain period of time.
Home base: the residential location of a criminal or a group of criminals in a
serial crime, which is a key factor to predict the next crime.
Notations
P : the likelihood of a criminal traveling a certain distance to commit a crime
Dij : the distance from a location to an incident site.
α ∶ an arbitrary constant in Canter Model.
β ∶ the coefficient of the distance in Canter Model.
P: a normalization constant in Canter Model.
e: the base of the natural logarithm in Canter Model.
(Xj , Yj), j = 1,2… k: a set of incident locations
xi , yi , i = 1,2…m × n: coordinates of the grids in the investigation area.
Tj: crime time.
Sa : the major axis of an Standard Deviation Ellipse.
Sb : the minor axis of an Standard Deviation Ellipse.
θ: the angle between the minor axis of a Standard Deviation Ellipse and a vertical line.
Dpj : the average distance of crime locations to the home base.
xp , yp : location of the grid with the maximum likelihood, which is also the location of
criminal’s home base.
Model Derivation We start our research by examining the accuracy of traditional Centrographic
methods.
Team #6620 Page 5 of 17
Centrographic Methods According to Paulsen’s study results of 247 serial crime events that
occurred in Baltimore Country, Maryland between 1994-1997, accuracy of
certain centrographic pridictions of criminals’ home base are shown in Table1.
Table 1.
Accuracy of certain centrographic pridictions of criminals’ home base
Approach
Name
Percentage
Correct
Avg.
Search Area
Avg.
Search Cost
Accuracy
Precision
SDR 80% 151.68 170% 0.5274
SDE 73% 122.10 134% 0.5978
MCP 42% 23.21 26% 1.8095
Source. Dr. Derek J. Paulsen, 2005 UK Crime Mapping Conference
1. Percentage correct is the percent of final event location within predicted
area.
2. Search area is the average size of the predicted area.
3. Search cost is the percent of base search covered by the final predicted
area.
4. Accuracy Precision is the percent of correct forecasts divided by the
average predicted area.
Data shows that Standard Deviation Rectangle (SDR) and Standard
Deviation Ellipse (SDE) centrographic profiling methods are good at
percentage correct but weak in accuracy precision. Minimum Convex-Hull
Polygon (MCP) performs well in accuracy precision but has a flaw in
percentage correct.
A balance between percentage correct and accuracy precision cannot be
reached by using centrographic methods only. Concerned this, we use Canter
model to strengthen the weakness of accuracy precision of centrographic
methods with high percentage correct.
SDE surpasses SDR in the fields of average search area, average search
cost and accuracy precision, thus SDE is selected to be strengthened.
The Standard Deviation Ellipse is a statistic ellipse that gives dispersion of
the incidents around the mean center in two dimensions (Ebdon, 1988; Cromley,
1992).
The Y-axis is rotated clockwise through an angle θ, where
θ =tan−1 ( (Xi − X )2 − (Yi − Y )2) + [( (Xi − X )2 − (Yi − Y )2)2 + 4( (Xi − X )(Yi − Y ))2]2
2 (Xi − X )(Yi − Y )
The X-axis and Y-axis of the ellipse are defined by,
length x = 2Sa, length y = 2Sb, where
Team #6620 Page 6 of 17
Sa = 2
k − 2× [ Xi − X cos θ − (Yi − Y ) sinθ]
2
Sb = 2
k − 2× [ Xi − X sin θ − (Yi − Y ) cos θ
2
Given all the incidents locations { (Xi , Yi), i=1,2…k} (shown in blue dots
in Figure 2), the mean value of all incident locations X , Y and other statistics
θ, Sa, Sb can be calculated. Then a Standard Deviation Ellipse can be drawn
around the incident locations, as shown in Figure 2.
Figure 2. A Deviation Ellipse
The SDE method indicates a large ellipse area for police to search for
criminal’s home base. Its efficiency can be improved by introducing search
priority. Canter model provides likelihood and can be used to strengthen the
SDE method.
Canter Model
Canter’s group modifies the distance decay function for journey to crime
trips by using a negative exponential term (Canter and Tagg, 1975; Canter and
Larkin, 1993; Canter and Snook, 1999; Canter, Coffey and Huntley, 2000).
Equation (*) indicates the likelihood with which any location is likely to be the
home base of the criminal based on one incident.
P ij = αe−β∙D ij
P (*)
where P ij is the likelihood of a criminal traveling a certain distance to commit
Sb
θ
Sa
Team #6620 Page 7 of 17
a crime, Dij is the distance from a home base location to an incident site, α is
an arbitrary constant, β is the coefficient of the distance, P is a normalization
constant, and e is the base of the natural logarithm.
Our Methods
Given these models, we develop a new method. A set of incident
locations{ (Xj , Yj), j=1,2…k} and correspondence crime time { Tj, j=1,2…k}
are obtained from investigations of crime scenes. The Geographic Information
System is accessible to draw a geographical map containing all the incident
locations. The method follows 3 steps:
First, divide the whole area into m×n grids. For any Grid i, the coordinates
are (xi , yi)
i
Figure 3. Divided the whole area m×n grids
Secondly, calculate the statistics X , Y , θ, Sa , Sb to draw a SDE. The SDE is
clockwise rotated from a general ellipse (xi−X )2
Sa2 +
(yi−Y )2
Sb2 = 1 by an angle of θ
The equation of coordinate transformation is,
x = x′cosθ − y′sinθ
y = x′sinθ + y′cosθ
Hence a SDE can be obtained with an equation of,
(xi cos θ − yisin θ − X )2
Sa2 +
(xi sin θ + yicos θ − Y )2
Sb2 = 1
Any Grid i lies within the SDE, (xi , yi) should satisfy the constraint:
(xi cos θ−yisinθ−X )2
Sa2 +
(xi sinθ+yicosθ−Y )2
Sb2 ≤ 1 (Sa ≥ Sb > 0, i = 1, 2 …m × n)
Thirdly, apply Equation (*). In this equation, let arbitrary scalar α be 1.
y
x o
Team #6620 Page 8 of 17
Probability normalization is not essential and set P=1 since the sum of
probabilities is not required to be 1, as long as they are calculated by the same
equation.
We get the likelihood of a criminal traveling a certain distance from the ith
grid to commit a crime at the jth
incident location,
Yij = e−β∙D ij (i=1, 2 …, m × n, j = 1,2… k)
where β is the coefficient that affects the decay function. It varies in different
cases.
For each Grid i, the total likelihood of an criminal traveling from the ith
grid to commit crimes at all incident locations is the sum of Yij for each j,
Yi = e−β∙D ijkj=1 (i=1, 2 …, m × n)
Overall, the likelihood for each Grid i to be the home base location of the
criminal is given by,
Yi = e−β∙D ijkj=1
(i=1, 2 …, m × n , and (xi cos θ−yisinθ−X )2
Sa2 +
(xi sin θ+yicosθ−Y )2
Sb2 ≤ 1 )
Time-weighted Method
Crime time is taken into consideration in the model due to criminal
behavior basis. For serial criminals who commit most crimes in the day time,
they need to sleep at night. Therefore they cannot go far-away to commit a
crime the next morning. Hence, if a crime is committed in the morning, the
location should not be far away from the criminal’s home base.
6 and 12 o’clock at noon are selected to be the time nodes. Crimes
committed during this period of time have more reliable locations in detecting
the home base. If there are r crimes committed in the morning(r≤k), these r
locations are weighted.
Likelihood equation turns into a weighted form, where ω is the weighing
Yi′ = e−β∙D ijk
j=1 +ω ∙ e−β∙D ij ′rj=1
(i=1, 2 …, m × n , and (xi cos θ−yisinθ−X )2
Sa2 +
(xi sin θ+yicosθ−Y )2
Sb2 ≤ 1 )
In this equation, time is connected to calculate the likelihood of each grid.
This is especially designed in detecting serial criminals who commit most
crimes in the day time.
Law enforcement officers could plot the likelihood Yi′ values on the map
to determine where to search for the criminal within an ellipse according to the
Team #6620 Page 9 of 17
likelihood.
Prediction
The grid with maximum likelihood of home base can be regarded as a fixed
location in predicting the next crime location in this section.
Brantingham’s study (1981) postulated that there would be a small safety
area (or buffer zone) of relatively few criminals near to the criminal’s home
base location; beyond that zone, however , the number of crime trips would
decrease according to a distance decay model.We make the same postulation.
Since crime decays beyond the safety area, the edge of the safety area has
the most crimes, and has the most probability of the next crime. The circle
coincides the edge of the safety area can be predicted to have the most
probability of the next crime. If we can estimate the radius of the safety area,
the most likely place of the next crime can be predicted.
Average distance between home base and all incident locations are the
average distance a criminal travels to commit a crime. It converges to the
distance of most probability to have the next crime, thus it can be used as an
estimator of the radius of the safety area.
Let Yp = Ymax , then (xp , yp ) is the location with the maximum likelihood.
Dpj =
1
k Xj − xp
2+ Yj − y
p
2kj=1 is the average distance between home
base and all incident locations. The predicted circle has an equation of:
The circle for most
probability crime
Criminal’s home
Dmj
Figure 4. Circle with the most probability
Team #6620 Page 10 of 17
(xa − xp )2 + (ya − yp )2 = Dpj 2
((xi cosθ−yisinθ−X )2
Sa2 +
(xi sinθ+yicos θ−Y )2
Sb2 ≤ 1 )
The constraint of ellipse is also valid here, since the SDE is a statistical
area including most incident locations.
The prediction of possible locations of the next crime is an arc (part of the
circle within the SDE, see Figure 4). As shown in Figure 5, it could be a band
around the arc. The possibility decreases when going away from Dmj , as the
color of the band turns lighter.
Figure 5. The prediction area
Verification of the model
Data
This set of data shown in Table 2 is obtained from Crimestat package
(http://www.icpsr.umich.edu/icpsrweb/CRIMESTAT/). It is collected from a
serial crime committed from 1993 to1997 in Baltimore, Maryland provided by
Mr. Phil Canter of the Baltimore County Police Department.
Table 2.
The location and time information in a serial crime
Number DATE TIME INCIDY(Yi) INCIDX(Xi)
1 3/12/93 1623 -76.5754 39.3868
2 3/19/93 830 -76.5409 39.3887
3 7/17/93 2341 -76.5230 39.3611
4 8/26/93 2009 -76.5801 39.3932
5 10/29/93 2125 -76.7507 39.3115
6 6/18/95 1315 -76.6015 39.4042
7 11/24/95 1715 -76.4439 39.2940
8 9/29/96 1621 -76.5633 39.3725
9 10/4/96 1834 -76.5670 39.3723
10 10/6/96 1921 -76.5616 39.3725
11 3/13/97 1338 -76.5815 39.3929
12 5/16/97 954 -76.5857 39.3975
Prediction area
Team #6620 Page 11 of 17
13 5/16/97 0 -76.5815 39.3929
14 8/12/97 930 -76.5617 39.3726
Source. CrimeStat, Ned Levine & Associates, and the National Institute of Justice, 2009
Time in the table is denoted on the form of 24hrs. Home of the criminal is at
(39.3688N, 76.5666W). In a relative small area like in this case, latitude and
longitude can be used as approximation of orthogonal coordinates to calculate
the distances. So here we regard latitude as Xj and longitude as Yj, in absolute
value, and use distance formula directly. In large areas, latitude and longitude
should be transferred first to calculate the distance.
Verification
We use Google Map (http://maps.google.com) as a substitution of
Geographical Information System. Plot these locations on Figure 6:
Figure 6.
First, set m=200, n=100 and divide the area into m× n grids. The boundaries are shown
in Table 3.
Table 3.
The boundaries of the map
Team #6620 Page 12 of 17
boundaries X-axis Y-axis
Min 39.28 76.4
Max 39.4 76.85
Secondly, calculate the statistics, we get Table 4.
Table 4.
The statistics of the fuction
K X Y θ Sa Sb
14 39.3723 -76.5727 510.6189 0.05 0.035
Then, draw SDE (xi cos θ−yisinθ−X )2
Sa2 +
(xi sin θ+yicosθ−Y )2
Sb2 = 1 with these statistics
obtained.
Thirdly, for each Grid i, calculate Yi = e−β∙Dij
1313j=1 . In order to make the graph look
adequate, we set β = 20. Figure 7 on the left shows the distribution of likelihood for every Grid i. Corresponding contour view is displayed on the right. The smaller the circle is, the higher probability it represents.
Figure 7. The distribution of likelihood
Crime No.2 and 12 were committed in the morning. Use Weighted
Likelihood Equation and set ω= 0.2, and we can get weighted likelihood
shown in Figure 8(on the right),
Team #6620 Page 13 of 17
Figure 8. Original vs. weighted (on the right) likelihood
The red hollow circle is the criminal’s real residence. The ellipse with the
dash line is centered at the geometric mean of all the incident points.
The green straight line which lies in the middle of the contour is the
distance between the grid where real home base locates (shown in small red
circle) and the grid with maximum likelihood. The grid of the maximum
likelihood changed location after weighting incident locations. By using weight
= 0.2, an expected result is obtained. The predicted home base (the grid of
the maximum likelihood) gets closer to the real home base as the green line
turns shorter. That means, after taking crime time into consideration, the
prediction is more precise than before in this case. In other words, this model
testifies that crime time plays a considerable role in predicting the home base
location.
However, here is one question, "How to determine the weight of ?"
Obviously, different number will lead to different results and that depends on
the choice of weight and analysis of the influence of the crime time in
specific cases. Intuitively, the latter factor is much more important in most
occasions.
Prediction
Team #6620 Page 14 of 17
Find the grid p with the maximum probability and draw a circle with center
p and radius of d. The X-axis and Y-axis are not in the same scale, so the circle
is flatter in the figure.
We make a prediction of the fourteenth incident location with the first 13
locations, the average distance d’ is 0.0445.
Figure 9. Predicted, #13(Left) and #14(Right)
We can find the new graph is more helpful for the police to narrow the
searching region according to the Figure 5, and much time and energy can be
maintained. However, it should be noted that the prediction of the possible
location of the next crime is based on the probability and therefore it is proper
to say the criminal tends to show up in the predicted area with large probability.
Despite this, efficiency and enlightenment brought by the model is undeniable.
It does find us an intuitive, easy and feasible approach to solving this tough job.
Then, to testify our model, we trace back a point to find whether our
model works. We make another prediction of the thirteenth incident location
with the first 12 locations, the average distance d’ is 0.0253.
Team #6620 Page 15 of 17
Figure 10. Predicted, #12(Left) and #13(Right)
In this case, the locations of the possible incidents are two arcs separated by the
ellipse in two sides. The change seems not so apparent. The minor axis of the
ellipse decreased greatly, but the criminal’s residence is still included. Notice
that, the ellipse shows us the dispersion of the incidents around the mean center
which should be close to the real home base. The major axis of the ellipse
stretches longer. This is probable that the “remote” point, point 5, has a great
influence in the dispersion.
Conclusions Geographic profiling supplement what detectives find .It can provide insights
predictions, but they are only assistants. Our methods are not suitable for every
case. In some cases, it may provide positive information to solving crimes while in
the others traditional detective work will make the difference. Overall, strengths
and weaknesses of our model are:
Strengths
Our model combines the process of allocating the home base of offenders and
searching for the next crime. Instead of using sophisticated models which consider
varies aspects, this model trace back to the simple functions in which fewer
parameters are involved.
By improving the traditional Centrographic models, our model can quickly
Team #6620 Page 16 of 17
focus on a relatively smaller and more precise area. And by using the Travel
behavior model, the valuable area is further concentrated which helps the police to
focus the resources on person research. Additionally, the accurate time of the
previous crime is considered in the model, adding its influence to the model by
suggesting distance information.
In the crime-location-finding progress, a safety radius equals the average
distance between the probable home base and previous crime locations by our
analysis. Suspicious area is then successfully focused on an inferior arc, which
greatly reduce the workload of the police.
Weaknesses
Only one grid cell is chosen (the one with the maximum likelihood of home
base) when fixing the safety radius and predicting the next crime. Other grid cells
with high likelihoods of the home base (which is not so high as the maximum one)
are ignored then. This would reduce the potential area of crime location and
decrease the accuracy of the prediction; because the actual home base cannot be
identified precisely and the whole system of probability should be involved in the
second modeling process.
In addition, local information isn’t added in our model to filter information for
further prediction because a lack of data. And the local information sometimes
might be crucial factor which will have great influence on the prediction and
change the whole situation.
Future Work
·To take into account the impact of other factors such as crime type, city type,
and road network.
·To accumulate more data and local information to build a database.
·To evaluate the efficiency and accuracy of a model by testing the right
number of predictions involved in a serial crime.
·To determine case variables that may indicate predictive success.
·To have more research on serial offender spatial and temporal behavior to
develop and analyze other new strategies.
References Craig Bennell, Brent Snook, Paul. Taylor, Shevaun Corey, and Julia Keyton. 2007.
IT’S NO RIDDLE, CHOOSE THE MIDDLE. The Effect of Number of
Crimes and Topographical Detail on Police Officer Predictions of Serial
Burglars’ Home Locations. Criminal Justice And Behavior Vol. 34, No. 1,
January 2007:119-132
http://www.mun.ca/psychology/brl/publications/Bennell4.pdf
David Canter, Toby Coffey, Malcolm Huntley, and Christopher Missen. 2000.
Predicting Serial Killers’ Home Base Using a Decision Support System. Journal
Team #6620 Page 17 of 17
of Quantitative Criminology, Vol. 16, No. 4, 2000: 457-478.
http://www.springerlink.com/index/H3T57X8878534228.pdf
D. Kim Rossmo. 1993. A Methodological Model. American Journal Of Criminal
Justice, Vol. XVII, No. 2 1993: 1-21.
http://www.springerlink.com/index/N161K91QV4R03604.pdf
E. R. Lawrence, L. M. Glidden, and B. M. Jobe. Keeping them happy: Job satisfaction,
personality, and attitudes toward disability in predicting counselor job retention.
Education and Training in Developmental Disabilities,41(1):70–80, 2006.
J. Block. Going beyond the five factors given: Rejoinder to Costa and McCrae (1995)
and Goldberg and Saucier(1995). Psychological Bulletin, 117(2):226–229, 1995.
J. Salgado. The five-factor model of personality and job performance in the European
community. Journal of Applied Psychology, 82:30–43, 1997.
National Science Foundation. Women, minorities, and persons with disabilities in
science and engineering -2002: Employment.
http://www.nsf.gov/sbe/srs/nsf03312/c6/c6s3.htm.
Ned Levine (2009). CrimeStat: A Spatial Statistics Program for the Analysis of
Crime Incident Locations (v 3.2a). Ned Levine & Associates, Houston, TX, and
the National Institute of Justice, Washington, DC. October
http://www.icpsr.umich.edu/files/CRIMESTAT/files/
CrimeStat3.2aupdatenotes.pdf
R. Bagby and M. Marshall. 2003. Positive impression management and its influence
on the Revised NEO Personality Inventory: a comparison of analog and
differential prevalence group designs. Psychological Assessment, 15(3):333–339,
2003.
Snook, B., Cullen, R. M., Mokros, A., & Harbort, S. (2005). Serial murderers’ spatial
decisions: Factors that influence crime
location choice. Journal of Investigative Psychology and Offender Profiling, 2,
147-164.
S. J. Greenwald, K. G. Olthoff, V. Raskin, and W. Ruch. The user non-acceptance
paradigm: Infosec’s dirty little secret. In Proceedings of the 2004 New Security
Paradigms Workshop, 35–43, Nova Scotia, Canada, 2004. 20-23 September
2004.
Tara Whalen, Carrie Gates. 2007. A Psychological Profile of Defender Personality
Traits. JOURNAL OF COMPUTERS, VOL. 2, NO. 2, APRIL 2007:
83-94.http://academypublisher.com/jcp/vol02/no02/jcp02028493.pdf
U.S. Department of Justice, Federal Bureau of Investigation. August 2000.
Uniform Crime Reporting: National Incident-Based Reporting System
Volume 1: Data Collection Guidelines. Criminal Justice Information Services
Division. http://www.fbi.gov/ucr/nibrs/manuals/v1all.pdf