final presentationrodent baiting
TRANSCRIPT
Fall 2014
Analytics Project Presentation - Fall 2014
NYU Real Time and Big Data
Project : Rodent Baiting in NYC.
Team: Sanchit Khandelwal, Rohit Shankar, Simran Kaur.
1
Fall 2014
Rodent Baiting in NYC.
AbstractAnalytic 1•To find the factor which can be best used to predict the occurrence of Rodents in a particular area.•Using Garbage, Water Leaks complaints with Rodent complaints to find the if there is an increase in Rodent complaints.
Analytic 2•Analyze the frequency of rodent complaints made in the city with respect to temperature ranges since 2012
Analytic 3•To estimate the rat population of the city. 8 million rats for 8 million New Yorkers? Debunk the myth ?
2
Fall 2014
Rodent Baiting in NYC.
Background•NYC- infamous for its rodent problem.
•311-non emergency helpline to provide access to different government services. •Takes requests in the form of complaints. Tracks and Manages complaints.
•311 complaints database updated daily and open source.
•New York City Department of Health and Mental Hygiene (DHMH)
3
Fall 2014
Rodent Baiting in NYC.
Motivation
•The aforementioned rodent problem.
•DHMH does not take well planned preemptive actions to control rodent population.
•First come first serve basis problem solving.
•No official estimate of no. of rodents.
•DHMH can use our analytic to take preemptive actions which can help reduce /control the no. of rodents.
4
Fall 2014
Rodent Baiting in NYC.
Data Sources<311 Rodent Complaint Database>•Contains rodent complaints with details like timestamp of complaint, zip code, location type etc. for year 2010- Nov ’14.•Size: 38MB; Format: ‘.CSV’
<311 Sanitation Complaint Database>•Contains sanitation complaints having fields similar to rodent database for 2010-Nov’14.•Size: 41MB; Format: ‘.CSV’
<311 Water Leak Database>•Contains several water complaints like water leaking, standing water, hydrant overflow along with timestamp, zip code etc. for 2010-Nov’14.•Size: 30MB; Format: ‘.CSV’
5
Fall 2014
Rodent Baiting in NYC.
Data Sources Contd.<NCDC Weather Database>•The National Climate Data Center (NCDC) weather database for NYC contains fields like max, min temp, rainfall, wind speeds for each day for years 2012-Nov’2014.•Size:1MB; Format: ‘.CSV’
Analytic 1: Sanitation, Water FactorDesign Diagram:
6
Fall 2014
Figure 1: Sanitation/Water leak
7
‘311 Rodent complaints’ database
‘311 Sanitation complaints’ database
Data cleanup: Extract {date,zipcode} fields
Data cleanup: Extract {date ,zipcode} fields
PIG: Join operation to get for each sanitation date all rodent dates along
with zipcodes (area)
MR1: For each sanitation date get count of no. of rodent complaints ,1 week prior(negative) and 1
week (positive)after the sanitation date, along with zipcodes (area)
MR2: Get Average no of negative and positive rodent complaints for each ZipCode(area)
Analysis of results
Fall 20148
Data Flow Diagram
Figure 2: Input and Outputs in each Stage using Cloudera VMware
Fall 2014
Centra
l Broo
klyn
Bushw
ick an
d Willi
amsb
urg
E. New
York
and N
ew Lo
ts
Inwoo
d & W
ashin
gton H
eights
Southe
ast B
ronx
Wes
t Cen
tral Q
ueen
s
Flatbu
sh
Centra
l Broo
klyn
High B
ridge
& M
oriss
ania
Upper
Wes
t Side
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Top 10 areas with highest sanitation factor
Sanitation Factor
ResultAreas where, when a sanitation complaint is received, preemptive rodent control action should be taken .
Fall 2014
ResultAreas where sanitation is not the cause for a rodent complaint
-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
Top 10 areas least affected by sanitation complaint
Fall 2014
11.60%
88.40%
Sanitation factors - comparison
Negative Sanitation Factor Positive Sanitation FactorResultIn almost all cases number of rodent complaints a week after a sanitation complaint is more than the rodent complaints a week before
Fall 2014
Bushw
ick an
d Willi
amsb
urg
Wes
t Que
ens
High B
ridge
and M
orrisa
nia
Flatbu
sh
Centra
l Bron
x
Centra
l Broo
klyn
Centra
l Harl
em
East N
ew Y
ork an
d New
Lots
East N
ew Y
ork an
d New
Lots
Northw
est Q
ueen
s0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Top 10 areas with highest water leak factor
ResultAreas where, when a water leak complaint is received, preemptive rodent control action should be taken
Fall 2014
Lower West Side
Chelsea & Clinton
Bronx Park and Fordham
Central Bronx
Upper East Side
Borough Park
Central Harlem
Upper East Side
Northwest Brooklyn
West Queens
-1.8
-1.6
-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
Top 10 areas least affected by water leak complaint
ResultAreas where a water leak is not the prime cause for a rodent complaint; other factors are more dominant.
Fall 2014
28.12%
71.88%
Water Leak factors - comparison
Negative Water Leak Factor Positive Water Leak Factor
ResultIn most cases number of rodent complaints a week after a water leak complaint is more than the rodent complaints a week before
Fall 2014
Rodent Baiting in NYC.
Analytic 2: Weather affecting rodent complaints
Aim to find Rodent complaints and temperature relation.
Design Diagram:
Fall 2014
Figure 3: Weather AnalyticNCDC Weather
database for NYC, 2012-14
311 Rodent Complaints database
Data Cleanup and date formatting
Data Cleanup and extracting 2012-
14 data only.
MR1:Date formatting
Individual temperature values replaced by 5⁰C
interval Ranges.
PIG: Inner Join to get temperature range for each rodent complaint
date
MR2: Aggregation of complaints based
on temperature ranges.
Analysis of results
Fall 2014
[-15 , -10] [-10 , -5] [-5 , 0] [0 , 5] [5 , 10] [10 , 15] [15 , 20] [20 , 25] [25 , 30] [30 , 35]0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Number of complaints for each temperature range (in Celsius)
Rodent Complaints
Result1)As NYC experiences moderate temperature [15 – 25 C] the number of rodent complaints increase.
Fall 2014
2) Results analogous to scientific finding3) When we move from summer to winter ((30-25)->(10-5)) Rodent complaints increase. Because rodents move indoors. Preemptive measure when fall ends and winter starts.
Analytic 3: Estimation of Rodent Population
Design Diagram:
Fall 2014
311 Rodent Complaint Database for 5 years (2010-14)
Calculate Avg. no of complaints each year=>Total no of complaints /5. Assuming one rat lives 1 year.
Multiply the Avg. by 50. Each colony of rat has around 50 rats. Assuming Each complaint is for
different colony
OutPut:Overestimate of the number
of rats in NYC
PIG: Calculating rodent complaints for each zipcode for each
year.
Analysis of result
Fall 2014
Bushw
ick an
d Willi
amsb
urg
Centra
l Broo
klyn
Centra
l Broo
klyn
East N
ew Y
ork an
d New
Lots
Bronx P
ark an
d Ford
ham
Upper
Wes
t Side
High B
ridge
and M
orrisa
nia
Bronx P
ark an
d Ford
ham
Wes
t Cen
tral Q
ueen
s
Centra
l Broo
klyn
0
5000
10000
15000
20000
25000
30000
35000
Top 10 areas with highest number of rodents (numbers are estimates)
Fall 2014
Worl
d Trad
e Cen
ter
Rocka
ways
Rocka
ways
Upper
East S
ide
Chelse
a and
Clin
ton
Jamaic
a
North E
nd A
v
New H
yde P
ark
Wes
t Cen
tral Q
ueen
s
Canars
ie an
d Flat
lands
0
20
40
60
80
100
120
140
160
180
Top 10 areas with lowest number of rodents (numbers are estimates)
Fall 2014
Southe
ast B
ronx
Wes
t End
Ave
Northw
est Q
ueen
s
North E
nd A
v
Chelse
a and
Clin
ton
Gramerc
y Park
and M
urray
Hill
North Q
ueen
s
Inwoo
d and
Was
hingto
n Heig
hts
Lower
East S
ide
Gramerc
y Park
and M
urray
Hill
0
1
2
3
4
5
6
7
Top 10 areas with greatest percentage change in rodent population between 2010-2014
% change in rodent population
22
Fall 2014
Rodent Baiting in NYC.Analysis of Results for Estimation of Rodent Population:1) Scientific studies have shown that life expectancy of a
rodent is 1 year in a city.
2) Hence we found Avg. no rodent complaints for 1 year
3) Taking the big overestimation-each rodent call represents each entire colony (on an avg. rodents live in a colony of 40-50)
4) We Get approx.1.2million
5) Sewer population(not that much)+ 1.2million = approx. 2 million. A very good Overestimation.
6) Which is still less than 8 Million. Urban myth debunked.
23
Fall 2014
Rodent Baiting in NYC.
Obstacles
•Change of analytic project- no access to College data.
•NYC HPC Cluster – Encountered several problems and had to start over using Cloudera VM
•Each database had a date format that was entirely different from the other (sometimes even within a database)
24
Fall 2014
Rodent Baiting in NYC.
Conclusion1) Sanitation and Water leakage are a cause for increase
in rodents in 85% of the NYC areas.2) Rodents increase between 65F -90F, which conforms to
scientific findings.3) Urban Theory “8 million rats for 8 million people” debunked.
Acknowledgements
25
•NCDC for providing us with the weather database for NYC
•311 service of NYC for putting up their extensive databases online
•Prof. Suzanne Macintosh for her guidance and support during the course of this project
Fall 2014
Rodent Baiting in NYC.
References[1] http://www.statetechmagazine.com/article/2014/11/chicago-leverages-311-and-big-data-tackle-its-rat-problems
[2] New York Department of sanitation: Spatial Analysis Of Complaints. Sarah Williams, Nick Klien
[3]http://www.health.ny.gov/statistics/cancer/registry/appendix/neighborhoods.htm
[4] Planning Rodent Control For Boston’s Central Artery/Tunnel Project. Bruce Colvin, A.Daniel AShton,Wellard McCartney, William Jackson
26
Fall 2014
Rodent Baiting in NYC.
Than
k yo
u!
27