san francisco 311 data visualizations: august 2013 paper 10-9... · san francisco 311 data...

17
San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu Mary Menees Kristen Wolslegel PA 755 October 9, 2013

Upload: buidat

Post on 20-Mar-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

3 P a g e | 1

1 | P a g e

San Francisco 311 Data Visualizations: August 2013

Saidah Leatutufu Mary Menees Kristen Wolslegel PA 755 October 9, 2013

Page 2: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

2 | D a t a V i s u a l i z a t i o n s

Introduction

San Francisco has become an international center for technological innovation

and startups such as Twitter, Dropbox, Eventbrite, and Square, among others. Karoly and Panis

(2004) predicted that technology will accelerate in growth in the next 10 to 15 years, and that

time has come (Karoly and Panis, 2004). Accompanying this growing tech-sector is the widely

available amount of public data. San Francisco offers an open data portal

(https://data.sfgov.org/) “to enhance open government, transparency, and accountability by

improving access to data,” with the stated goal of innovating “how residents interact with

government, resulting in social and economic benefits for the City,”

(http://sf311.org/index.aspx?page=791). This portal is a major step toward e-engagement

between the City and County of San Francisco and its workforce and residents. SF311.org meets

the first and second objective of e-engagement through information dissemination as well as

allowing constituents to manipulate data and provide feedback to the City and County (Reddick,

p. 51).

Currently San Francisco utilizes 311, a 24 hour customer service center for San

Francisco residents. Constituents can reach a City and County service representative through

the web, by telephone and by Twitter. 311 receives hundreds of reports each day requesting

services regarding abandoned vehicles, illegal dumping, pot holes, graffiti, etc. The City then

collects the data and posts it in an accessible format on data.sfgov.org. By providing public

access to this data, the city wants to develop informed constituents moving toward the third

objective of e-engagement: active participation in government.

What does this 311 data tell the public? This data tells a story and analyzing this

Page 3: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

3 | D a t a V i s u a l i z a t i o n s

data makes room for a story to be told and solutions to be given for the enormous amount of

problems this urban area is experiencing. For the purposes of this paper we have focused on

one dataset, Case Data from San Francisco 311 from August 1, 2013 through August 30, 2013.

We will present three visualizations that reflect the top 5 requests made to 311 according to

source and location. The first visualization maps the GPS coordinates with the top five most

common requests to determine regional differences in the types of problems reported. The

second visualization displays the top five most common request types and the source that is

used to make the request. The third visualization attempts to show a digital divide in the City

and County of San Francisco. Upon analyzing these visualizations we will first, describe the

methodology of how each of the visualizations were created; second, we will outline the major

points gained from the visualizations; third, we will discuss the implications of this data and

what it means to providing public service; finally, we will explain challenges and make

recommendations of how this data can be used to improve the nature of San Francisco’s public

service.

Data Visualization #1: Location of Service Requests

The 311 dataset from August 2013 contained a very wide range of service categories.

We selected the top five most represented categories, which comprised about 2/3 of the data

from the month of August (10,135 of 16,667 total cases). These categories contained requests

about illegal dumping, abandoned vehicles, sidewalk cleaning, graffiti (public & private), and

damaged parking meters.

Page 4: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

4 | D a t a V i s u a l i z a t i o n s

Process of Creating Visualization: Various categories for similar complaints were merged so as

to reduce the number of overall categories and to help observe general trends. For example,

abandoned car categories for 2-door and 4-door were merged into one “abandoned vehicle”

category and public and private graffiti were merged. Point coordinates were separated into

longitude and latitude and data was imported into Tableau Software, a data visualization

package that is user-friendly and accessible to the public. Position coordinates were overlaid

with the five different categories on a map of San Francisco. The number of records was also

included in the visualization, with a larger size of the point being proportional to a greater

number of records.

Major points gained from the data: There are a number of interesting trends that reflect

different neighborhood characteristics. 1) Damaged parking meter complaints had a higher

incidence in the Financial District and Civic Center than other city neighborhoods. These two

regions are where most commuters from out of town work and are the most densely populated

areas during business hours. It is therefore not surprising that these areas have the most

parking meter complaints because they have the most parking meters. 2) Sidewalk cleaning

complaints are most highly enriched in the Civic Center neighborhood as compared to all other

San Francisco neighborhoods. This is possibly due to the high homeless population density

contrasting with its broad visibility as the city’s governmental center. Although there are

homeless populations in other neighborhoods (Haight, Mission), these areas are not adjacent to

City Hall, and therefore do not seem to be getting the attention that they may deserve. 3)

Graffiti complaints are more prevalent in the Civic Center, Haight Ashbury, and Inner Mission.

Page 5: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

5 | D a t a V i s u a l i z a t i o n s

This speaks to the high population density and high traffic of these areas. Peripheral, lower-

density neighborhoods are less plagued with graffiti problems. 4) Illegal dumping occurs

evenly throughout the city with a higher concentration in the eastern part of the city, which has

a greater population density. Trash pickup may not be frequent enough, or the high population

density may make residents more careless. 5) Abandoned vehicle complaints are more diffuse

than the other complaints, but there is enrichment in the Mission Bay & Bayshore areas which

have a lower population density. This is possibly due to the fact that these are industrial, less

developed neighborhoods where street cleaning and parking are less stringent.

Implications and Recommendations: The type of 311 service request seems to have a direct

correlation with the distinct neighborhood characteristics of San Francisco. The increased

incidence of parking meter complaints in civic and financial areas where commuters enter the

city to work should not be surprising. But whether the complaints are due to a greater

proportion of damaged parking meters in these areas or simply because there are simply more

of them cannot be resolved from this data. It is recommended that the city investigate the

nature of these complaints and work to maintain proper parking meter function in these areas.

The greater incidence of sidewalk cleaning in Civic Center tells a different story. Yes,

there is a high homeless population there, but other neighborhoods have very visible homeless

populations also. Why does Civic Center get all the attention? Probably because it is where the

Mayor’s office is located, as well as other high profile city offices. They need to present a clean

and organized face to the world. It is recommended that the city be more consistent and fair in

attending to the sidewalk cleaning of all neighborhoods, regardless of their status. In sum,

Page 6: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

6 | D a t a V i s u a l i z a t i o n s

visualization #1 shows that the unique characteristics of San Francisco neighborhoods

determine the nature of 311 service calls.

Data Visualization #2: Top 5 311 Request Types by Source

Process of Creating Visualization: The second visualization reflects the top five 311 request

types by source (meaning, the method people used to contact 311). There are four methods

that can be used to make a request: voice in or call, Open311 (forum), integrated agency (an

internal office that makes the request), and online (Twitter and the web). Creating visualization

that reflects the source that was used is important in order to determine how accessible 311 is

and if the service is being used to its maximum potential.

In order to create the charts according to source, the top five requests were collected

from 311 case data including: graffiti, abandoned vehicles, damaged parking meters, sidewalk

cleaning, and illegal dumping. Although these categories were widely represented in the

dataset, in order to get a true understanding of how requests were being made, it was

necessary to group relative categories. By using the top five requests, the visualization shows a

clear picture of how the sources are being used among each category.

In order to generate the data visualization, the case 311 data was imported into excel

and was narrowed down to zip code, category (i.e. 311 external), type of request (i.e. graffiti),

and source (i.e. voice in). Although the zip code was not necessary for this visualization, the

numbers were needed for the visualization tool used which will be later discussed. After

extracting the data needed from excel, it was input into the visualization tool, Many Eyes.

Many Eyes is a tool that creates a number of visualizations like bar graphs and word clouds, and

Page 7: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

7 | D a t a V i s u a l i z a t i o n s

once published, this information is made available to the public. The case data was uploaded

onto Many Eyes and a visualization was chosen. In order to really show the differences

between the types of sources used, multicolored pie charts were chosen. Using the uploaded

data, Many Eyes automatically generated the multicolor charts, the numbers of requests made

and the percentage of the different sources used in each category.

Major Points Gained from the Data: As represented in the visualization, the five pie charts

represent the top five request types; each of the four colors represents each one of the

sources. At first glance the most prevalent color is red (voice in). What does this mean?

Obviously, most people are making requests via telephone, in fact, 6,007 people; but even

more so people are not aware of or do not have access some of the other options. The request

with the greatest number of voice in calls is damaged parking meters. Parking meters present

one of the biggest issues in the city of San Francisco and there are thousands of them. As

previously mentioned damaged parking meters are highly reported from the financial district

and the Civic Center areas, 98.2%. Damaged parking meters can be reported for one of two

reasons; one, a person reports the meter so they are not obligated to pay and can receive free

parking; two, a person that has received a ticket calls in to report a damaged meter and contest

the ticket. There is no surprise people are more likely to call in a damaged parking meter

because there is a quicker response. Many people, when in a stressful situation, would rather

speak to an actual person than to make a request on line, which explains why there were only

10 online requests. Perhaps, voicing in a request is the most used because there is a quicker

response to the problem.

Page 8: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

8 | D a t a V i s u a l i z a t i o n s

Voicing in a request is the most used form than any of the other sources accept for

graffiti. Interestingly enough, the top source used to report graffiti is an online method

(33.2%). This could be for many reasons, but considering the amount of knowledge captured

from each of the datasets it is likely that people are taking pictures of the graffiti and posting it

online. 311 does not clearly define what graffiti is; it may be a very intricate mural or it could

simply be a tag. The vague definition of graffiti could mean that 311 is misinterpreting graffiti

posts.

In addition to using voice in, integrated agency is the second most used source to make

a request. An integrated agency is considered an office within San Francisco’s public service

sector such as the Department of Public Works. Integrated agency is widely reflected in

reporting graffiti and illegal dumping. As previously mentioned, the Department of Public

Works (an integrated agency) is responsible for maintaining the cleanliness of the city, meaning

this department picks up any furniture left on the sidewalk, cleans graffiti off of bus stops and

collects trash that does not fit into residential trash cans. It is expected that integrated

agencies be associated with anything that has to do with cleaning up the city because it is their

job, hence they are not called for abandoned vehicles and damaged parking meters. However,

one might speculate why only 0.16% of sidewalk cleaning is reported by an integrated agency.

It seems that most people voice in sidewalk cleaning so that compensates for the integrated

agency requests that are not made. In summary, the request type by source visualization

reveals that most people access the 311 service via telephone; damaged parking meters

receives the most voice in requests and cleaning the city is managed by integrated agencies.

Page 9: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

9 | D a t a V i s u a l i z a t i o n s

Implications and Recommendations: It is important to consider how many people are aware of

the 311 service and how many people are aware of the different sources that can make

requests. It is clear that most people use telephones to make requests so what does this mean

for the city? What if some people do not have access to a phone, but want to make a request?

That is where the other sources come in. Since phones are widely used, 311 can focus less on

advertising calling in and more on people “Tweeting” in or using mobile applications. Imagine

how many more people could be reached if they could make a request from the computer at

the library or their tablet.

More advertising needs to be done for the other sources if 311 wants to expand its

clientele. For example, when someone calls 311, they could tell the caller that tweet requests

can be made if they do not want to be put on hold. Expanding the source list will allow more

people to access the 311 service and in turn provide an easier customer experience.

Visualization #3: The Digital Divide

Process and Tools: The 311 data was downloaded from data.sf.org into Excel 2010 (for

Windows) for review and manipulation. Second, the data was further refined by breaking the

“Point” data into distinct longitude and latitude information for use in mapping the data. This

was accomplished by inserting a column next to the “Point” column. Next the “Point” column

was selected and using the Text to Columns tool under Data, separated the longitude from the

latitude information into two columns. A few extra steps were required to remove unusable

artifacts such as parentheses and commas, using the Find and Replace tool. Next, the same

steps were used to separate the address column into Street Address, City, State, and Zip Code.

Page 10: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

10 | D a t a V i s u a l i z a t i o n s

Third, a filter was used for each column to allow selection of relevant data. This was

accomplished by clicking on the row containing all the column heads and then clicking on the

Sort and Filter tool. A drop down filter menu appeared at the head of each column. By clicking

on the filter for Zip Code and deselecting “null,” extraneous data was removed.

The data visualization tool used was Tableau. After creating an account on line and

downloading Tableau Desktop 8.0, the Excel data was connected to a new Tableau work sheet

by opening Tableau Desktop and selecting Data, then Find Data and selecting the Excel

spreadsheet that had already been cleaned up and filtered. Once the data was connected, the

Longitude data was dragged to the Columns and the Latitude data was dragged to the rows,

then the Symbol Map was selected from the Show Me tool. To further refine the map, Zip Code

was selected and dragged into Marks, along with Source. To populate the map, the Number of

Records was dragged into Marks. To create the visualization, under Marks, Number of Records

was selected using the Angle tool. The Number of Records was then selected under Marks and

right clicked to pull up an editing menu, where Measure, Continuous, and Compute using

Source were selected. Then Quick Table Calculation and Percent of Total were selected. Next

Number of Records was again dragged into Marks and then clicked on the Size tool. Finally,

under Map, Map Options, Data Layer, House Hold Income was selected by Zip Code using

temperature colors.

As a result of this data manipulation and visualization using Excel and Tableau, a map

was generated that had multiple attributes that included: 1) a base map of San Francisco with

areas colored from ochre to green showing 2009 Household Income (ochre equals highest

income); 2) pie charts of varying sizes depending on the number of service requests per zip

Page 11: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

11 | D a t a V i s u a l i z a t i o n s

code (largest pie equals most requests); 3) pie chart also represents request by type

(Interagency, Voice In, Web Self Service, and Twitter); 4) each pie chart is labeled with

percentages that reflect the percent of web self service and or Twitter requests in comparison

to all requests for that zip code.

Major Points Gathered from data: Visualization three could be helpful to the public by

indicating areas of the digital divide within San Francisco. The map suggests a relationship

between household income and the digital divide. For example, in three zip codes, 94124,

94103, and 94102 the household income for 2009 was significantly less than the central and

western areas of the city in zip codes 94110, 94118, and 94116. What the comparison shows is

that in the lower income zip codes between 6.19% and 6.8% of all requests came by web self-

service, whereas in the higher income areas that percent was from 16.05% to 23.97% - a

significant difference. This digital divide was not demonstrated in all areas and a map that

showed more nuance in income would be very helpful. For example, in the area out west of SF

State University only 2.61% came in via the website, but the sample size was three times

smaller than the Bayview zip code, 94124.

Visualization three could be more helpful than a data table to people reviewing the

data, because it is easy to see at a glance the differences in income, the number of service

requests, the source of those requests, and their relative percentages. A data table would be

much more difficult for the public to see these underlying patterns. For example, the public

could easily see that house hold income is lower along the eastern edge of the City compared

to the central and western side. In addition, it is easy to see that the eastern side of the City has

many more service requests than the western side.

Page 12: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

12 | D a t a V i s u a l i z a t i o n s

By enabling the public to better understand the data through this visualization, it could

help to engender advocacy for free wireless service in San Francisco and or free text requests

for service to 311 to reduce the barrier to online access for lower income areas.

Implications and Challenges: Future data analysis could include more nuanced household

income data, as well as the median age for each zip code to refine further the digital divide into

age and or income. In addition, it would be very interesting to see if the number of Tweets

grows. Currently the Mission zip code (94110) produces the most Tweets – will that technology

spread? Will a new communication technology enter into the scene?

A challenge for the future is to advertise and include free texting to 311 as a way to

submit requests because cell phones are nearly ubiquitous. A test pilot and tracking the data

could show interesting results. If more people have access to 311 through a variety of

communication methods, will it exceed the capacity of the City and County of San Francisco to

meet the service level demand? How will the City manage public accountability?

Conclusion

The increase in freely available public datasets, combined with a rapidly advancing

software industry developing open access tools has created the conditions for an explosion of

visualization options for data miners in the public sector. In this report, we examined 311

service request data and found that there is a relationship between 1) neighborhood and type

of request, 2) source and type of request, and 3) source and neighborhood. Interesting as

these finding are, they reveal how an increase of information can have the unintended

Page 13: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

13 | D a t a V i s u a l i z a t i o n s

consequence of creating even more challenges than it solves. How is the city to reconcile the

inconsistency in sidewalk cleaning or the digital divide in service calls? Integrating information

into better public service is a challenge that can be mitigated through data visualization.

Page 14: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

14 | D a t a V i s u a l i z a t i o n s

Visualization #1:

Page 15: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

15 | D a t a V i s u a l i z a t i o n s

Visualization #2

311 Top 5 Request Type by Source

Graffi Voicein–701(24.8%)Online–937(33.2%)Open311–389(13.7%)IntegratedAgency–791(28.1%)

AbandonedVehicleVoicein–825(64.5%)Online–425(33.2%)Open311–29(2.26%)

DamagedMeterVoicein–880(98.2%)Online–10(1.11%)Open311–5(0.55%)

SidewalkCleaningVoicein–1,067(89.8%)Online–70(5.89%)Open311–44(4.12%)IntegratedAgency–2(0.16%)

IllegalDumpingVoicein–2,534(64.1%)Online–260(6.58%)Open311–335(8.48%)IntegratedAgency–821(20.7%)

Page 16: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

16 | D a t a V i s u a l i z a t i o n s

Visualization #3

Page 17: San Francisco 311 Data Visualizations: August 2013 Paper 10-9... · San Francisco 311 Data Visualizations: August 2013 Saidah Leatutufu ... and Square, among others. Karoly and Panis

17 | D a t a V i s u a l i z a t i o n s

References

Karoly, L. and Panis, C., (2004), “The Information Age and Beyond: The Reach of Technology.”

The 21st Century at Work. pp. 70-124. www.rand.org

Reddick, C. (2012). Public Administration and Information Technology. Burlington, MA: Jones &

Bartlett Learning.