what about the whole country?

29
What about the whole country? Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’14 ORF467 F13

Upload: steffi

Post on 23-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

What about the whole country?. Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’ 14 ORF467 F13. The Process. Generate Schools Generate Employee Patronage File Assign Patronage Generate Patronage-Employee Ratios - PowerPoint PPT Presentation

TRANSCRIPT

What about the whole country?Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans

Judy Sun ‘14 & Luke Cheng ’14

ORF467 F13

The Process Generate Schools Generate Employee Patronage File Assign Patronage

Generate Patronage-Employee Ratios

A Look at the Data Generate Census File (with Microsoft Access) NN Files through 7 NJ Modules by Jake and Talal

Trip File Generator: Out-of-State commuters, students, workplace assignment, 18 Tour Type (Activity Patterns) assignment, Temporal Dimension

Roadmap

Schools Data Employee-Patronage Data A Look at the Data Census Data Further Steps

Schools Data

Public Schools in the US

Quick stats on Public Schools (2011)

Primary Middle High Other No Answer -

10,000

20,000

30,000

40,000

50,000

60,000

PUBLICCHARTER

Num

ber

of S

choo

ls in

US

School Type # of CHARTER # of PUBLIC TotalPrimary 2,584 51,793 54,377Middle 615 16,332 16,947High 1,316 19,762 21,078Other 1,145 5,847 6,992No Answer 564 3,525 4,089Total 6,224 97,259 103,483

Public Schools: Enrollment

Primary Middle High Other No Answer -

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

PUBLICCHARTER

School Type CHARTER PUBLIC TotalPrimary 896,544 23,226,606 24,123,150 Middle 166,519 9,425,155 9,591,674 High 368,109 13,767,489 14,135,598 Other 626,562 1,289,050 1,915,612 No Answer (1,128) (7,016) (8,144)Total 2,056,606 47,701,284 49,757,890

Private Schools in the US

Type Number of Schools Primary 18,400Secondary 2,517Combined 7,300Total 28,217

Primary Secondary Combined -

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

20,000

Num

ber

of S

choo

ls in

US

Private Schools: Enrollment

Primary Secondary Combined -

500,000

1,000,000

1,500,000

2,000,000

2,500,000

# students

Type # studentsPrimary 2,134,007 Secondary 738,600 Combined 1,431,252 Total 4,303,859

Private Schools: School Size

0 200 400 600 800 1000 1200 1400 1600 1800 20000

100

200

300

400

500

600

School Size (number of students)

Num

er o

f Sc

hool

s

Post-secondary schools (2009)

Institution type # of Students Enrolled # of students as percent total Number of SchoolsGraduate 291 0% 350Primarily Baccalaureate 1,483,018 93% 2,169Primarily Non-Bacc 53,903 3% 623Associate's 49,263 3% 1,745Nondegree-granting postbac 17 0% 14Nondegree-granting pre-bac 10,960 1% 2,698Total 1,597,452 100% 7,735

Gradua

te

Primari

ly Bac

calau

reate

Primari

ly Non

-Bacc

Associa

te's

Nonde

gree-g

rantin

g pos

tbac

Nonde

gree-g

rantin

g pre-

bac

- 500

1,000 1,500 2,000 2,500 3,000

Number of Schools

Employee-Patronage Data

The Process

2012 InfoGroup US Businesses File (5.80 GB) 30 CSV files with 500,000 entries (~200MB) – Shell

Script 30 CSV files with patronage generation and data

cleaning and mapping (~115MB) – R Script 1570 Segmented State Files (1KB to 20MB) – R Script 51 Merged State Files (8MB to 390MB) – Python Script

Patronage Generation Previous Process – Manual Fine-Tuning

Inconsistent: Same NAICS Code, Different Patronage/Employee Ratio

Current Process – Employee Size Range, Sales Volume Range Not Perfect Data

Matching businesses (Zip, County, NAICS, Latt/Long) Same Employee Size Range Assumption: Sales Volume same across time

Trying to acquire the 2005 Data for better correlations

Ratios from Averaging Previous EP file

Comparison: Distributions

Conclusion: Need to use NAICS Codes, in additionA large number of 0-1 ratio values are offset by the 7-20. Therefore, we get a surge averages of around 4-5. Difficult to capture nuances with just employee size and sales volume.

Next Steps: Man-Power needed to assign ratio for each NAICS Code, Sales Volume, Employee Size combination

A Look at the Data

NJ Counties (Change in NJ EP File)

Uncensored Un-Named Removed

NJ Wide

Uncensored Un-named Removed No Businesses +73,500

Tot Emp +4.8M

Emp Size +7.85

Tot Patrons -4.9M

Avg Patrons -17.17

No Businesses +39,350

Tot Emp +4.8M

Emp Size +9.09

Tot Patrons -5.3M

Avg Patrons -16.29

Nation-Wide

Rank StateSales

Volume No. BusinessesTotal

EmployeesAvg Employee

Size Total PatronsAverage Patrons

1 California $1,889 1,579,342 23,518,022 14.89 36,820,129 23.31

2 Texas $2,115 999,331 17,624,235 17.64 24,846,695 24.86

3 Florida $1,702 895,586 12,331,524 13.77 21,231,864 23.71

4 New York $1,822 837,773 18,327,933 21.88 19,610,813 23.41

5 Pennsylvania $2,134 550,678 10,498,442 19.06 13,704,903 24.89

9 New Jersey $1,919 428,596 8,833,890 20.61 9,986,529 23.30

45 Washington DC $1,317 49,488 5,702,617 115.23 1,067,938 21.58

47 Rhode Island $1,814 46,503 1,117,140 24.02 1,201,124 25.83

48 North Dakota $1,978 44,518 492,547 11.06 1,021,077 22.94

49 Delaware $2,108 41,296 670,622 16.24 1,011,400 24.49

50 Vermont $1,554 39,230 379,291 9.67 821,193 20.93

51 Wyoming $1,679 35,881 340,342 9.49 772,090 21.52

Census Data

Inputs

2010 Census Summary File 1 http://www2.census.gov/census_2010/04-Summary_File_1/ Does not convert to CSV/TXT; Files made for MS Access Process Tables (P12, P16, P29, H13, P43) with Talal’s VBA

macro in MS Access (p.78) VBA Code – whereabouts unknown, perhaps with Prof K

2012 5-Year Census American Community Survey http://www2.census.gov/acs2012_5yr/summaryfile/ Income Data to assign incomes to households and

residents

Generation

Module 1 – Outputs resident file for each county in state Rows: Individual People Attributes/Columns: County Number (replace with State

Number_County Number for national file), Household ID, Household Type, Latt/Long, ID Number, Age, Sex, Traveler Type, Income Bracket

Module 2 – Out of state/region/nation nodes For commenting on code, go to p.17-19

http://www.princeton.edu/~alaink/Orf467F12/MuftiTripSynthesizer_v.1.pdf

Further Steps

What To Do Next? Patronage Generation with NAICS, Sales Volume,

Employee Size and Research – Low Difficulty I already generated a file mapping all NAICS and

employment counts along with payrolls for patronage assignment using 2010 Census Data (200K entries)

Census Data Generation and Rework NN Generation Modules – High Difficulty

Optional: Data Verification for Employee-Patronage Files

Modules Very hard-coded for NJ; not very well-commented Initial National Implementation Ideas:

Treat US as one entity with external nodes at airports to represent foreigners Problem: Computationally intensive for 330M people Solution: Do a semi-randomized sample

Regionalize the US and use out-of-region external nodes Less labor-intensive and parallel processing

Doing each state Problem: Hard to generalize code, out-of-state nodes Extremely labor-intensive

The Code: Thought Process Trips generated state-by-state

Use state-level demographic information on residents Ignore state-level boundaries since we have employer and

attraction information for the nation. Example:

John Smith lives in NYC and works in CT. We will get his household from NYC Census file and the

probability distribution of workplace in CT E-P file. When we map NYC Trips, we will see John Smith going to CT

for work. When we map CT Trips, we will see John Smith returning from work.

Trip destinations can be approximated using destination county centroids Requires assigning centroid to each county

The Code: Thought Process Workplace assignment (without replacement):

Census maps individuals to workplace John Smith lives in NYC and works in CT

Use distribution to match workplace to E-P file (keep a count of employees to match the number given) John Smith mapped to an employer in CT If more than x (e.g. 250) miles, assume arrival at airport

School Assignment (without replacement): Use bounds and distribution to match students with schools

(assume same county) Jane (8) is mapped to elementary school in her county

The Code: Thought Process Tour Type assignment and Temporal Dimension

Can try to repurpose Talal’s code Add in Time Zones in Temporal Dimension Can do this with replacement (patrons) Assumptions: Same behavior across states in terms of

work time and leisure time and activity patterns

Out-of-Country Commuters / Non-Resident Workers International nodes for the states along the Canadian and

Mexican borders Trip to the nearest border crossing