Analysis and Public Policy How I became a data geek (and

Download Report

Transcript Analysis and Public Policy How I became a data geek (and

Building and Learning
Activity-Based Models
Erik Sabina
Jennifer Malm
Suzanne Childress
John Bowman
DeVon Culbertson
Erik’s recommendations (and the
price you’ll have to pay)
• Get into the critical path
– Though it will put the hurt on schedule
• Software is the hardest part
– Steal someone else’s if you can
– Modelers must program WELL!
– The custom versus vendor conundrum
• Estimate a few models
– Some basic skills are indispensible
– Why let a few consultants have all the fun?
Expertise we started with
• Discrete choice model estimation
– Decent theory, limited practice
• Programming/software development
– Very strong in IT, modelers good
programmers
• Math, stats, econ, etc.
– Pretty strong across the board
• Trip-based modeling
– Very strong
Levels of Knowledge
•
•
•
•
•
•
•
Sensitivities
Limitations and approximations
Fix/update variables
Re-calibrate
Re-estimate some components
Re-design the theoretical structure
Re-design or extend sw structure
Model sensitivities
What (example) - non-motorized mode
choice
• Blend point-based and skim distances
• Point locations for households and jobs
• Intersection, retail, mixed use density
How – programmed all utility function
variables
Why – can’t apply model without it!
Limitations and approximations
What (examples) • Little: simpler logsums skims
• Little: Twice O-D, rather than O-D + D-O
• Big: sequential TOD and mode choice
• Big: Implicit intra-household interactions
How – programmed all utility function
variables
Why – know what model can/can’t do
Fix/update variables
What (examples) –
• Simple: changes to input data/ GISDK
• Simple once you get used to it - change
SQL Server table / variables in C#
• Pretty hard - understand how component
works (older sibling school, logit solver)
How – wrote GISDK/ C# model components
Why – changes to variables over time are
very common
Variable example
//Additional created in code variable, Older Sibling's school district
if (householdID != currentHouseholdID)
{
currentHouseholdID = householdID;
//Clear Sibling's school zone variable
foreach (string choice in OlderSiblingSchoolZoneChoice)
{
this.UtilityFunctionParameters.AlternativeSpecificVariables[choice]["OldSibSchZone"].Value = 0;
}
OlderSiblingSchoolZoneChoice.Clear();
}
else
{
//take choice and add a 1 to the OldSibSchZone variable in the choosen zone.
tempList = choosenAlternative.Split(' ');
zoneID = int.Parse(tempList[1]);
if (universityStudent != 1) //Doesn't matter where an older sib university student went to school
{
if (zoneID != 0)
{
this.UtilityFunctionParameters.AlternativeSpecificVariables[choosenAlternative]["OldSibSchZone"].Value = 1;
}
}
}
Variable changes in Database
Recalibrate
What (example) - work location choice model calibrated
five years ago is not matching new ACS journey to work
data. Model has too few people working at home.
What knobs can you turn?
Some Examples:
1. Change coefficients to variables related to working at home.
2. Trace back issues to the land use or economic forecasts.
How – checked model estimation reports,
programmed variables and model input code
Why – frequent “tweaks” expected!
Recalibrate (2)
Another example Observed boardings on light rail are lower than modeled
boardings.
What knobs can you turn?
Possibilities:
1. Mode choice coefficients / alternative specific constants.
2. Too few university students on light rail?
Trips too short, so walk/bike mode share too high.
Adjust school location choice model: coefficients on the distance
to school variables.
Re-estimation
What (example) – added variables to school
location choice:
• Zone in School District
• Older Sibling’s School Zone
How – estimated several models ourselves:
• Estimation software syntax
• Various levels of theoretical knowledge
• The “expert coach” approach
Why – expecting frequent “tweaks” again!
Re-estimate: data issues
Oops! You need to re-estimate. The colors on the picture signify where
education employment was geo-coded but there was no school
enrollment geo-coded.
Re-estimate
High School Location Choice
Non-Size Variables
Row Parm ID Variable
3 Zone in School District
1
2
12 Mode Choice Logsum
3
16 Sibling's School Zone
4
20 Piecewise Linear Distance <2 miles
5
21 Piecewise Linear Distance 2 - 6 miles
6
23 Piecewise Linear Distance >6 miles
Student Grade
HH Income
9-12
9-12
9-12
9-12
9-12
9-12
Size Variables
Row Parm ID
4
0
Row Parm ID
5
6
7
8
9
10
11
12
13
103
112
150
151
152
153
154
155
172
SIZE MULTIPLIER
Student Type
Zonal Education Employees
Number of Households in the Zone
Public High School Enrollment
Public High School Enrollment
Public High School Enrollment
Private High School Enrollment
Private High School Enrollment
Private High School Enrollment
Service Employment
Summary Statistics
Number of Observed Choices
Number of Estimated Parameters
Log Likelihood with Coeffs= 0
Final Log likelihood
Rho-Squared
Adjusted Rho-Squared
9-12
9-12
9-12
9-12
9-12
9-12
9-12
9-12
9-12
HH Income
<$75K
>$75K
Refused
<$75K
>$75K
Refused
603
13
-4783
-1238
0.7411
0.7384
Est.
1.513
0.064
3.794
-0.435
-0.702
-0.189
Std. error
0.156
0.070
0.365
0.131
0.044
0.019
T-stat
9.7
0.9
10.4
-3.3
-16.0
-10.0
Est.
0.83
Est.
Std. error
0.072
Std. error
T-stat
11.63
T-stat
-3.978
-7.695
0.000
0.000
0.000
0.290
0.546
-5.000
-8.451
0.573
0.925
0.000
0.000
0.000
0.277
0.354
0.000
1.392
-6.94
-8.31
0.00
0.00
0.00
1.05
1.54
0.00
-6.07
Redesign model
What (examples) •
•
•
•
•
Swap out a component
Estimate joint models
Explicit joint tour formation
Daily schedule interaction
Activity generation and assignment
How – thorough study of your design
(among other things!)
Why – next round of model upgrades
Redesign software
What (examples) –
• Model component design
• Upgrade key functions (e.g. MakeChoice)
• Enhance distributability ( modifications
to“plumbing” code)
How – designed/wrote most code
- The “coach” model
Why – open source versus vendor again!
Redesign Software (example)
What we did: estimation
Component Name
Synthetic Sample Generator Tour Primary Destination
Regular Workplace Location Choice
Regular School Location
Tour Main Mode Choice
Auto Ownership
Area Type
Tour Time of Day Choice
Parking Cost
Daily Activity Pattern
Intermediate Stop
Generation
Intermediate Stop Location
Exact Number of Tours
Trip Mode Choice
Trip Departure Time
Work Tour Destination Type
What we did: software
Wrote our own (bigtime!)
• That coach model again
Had to learn a lot to do it:
• Logit math
• OO/C#/.Net programming
• SQL-Server database
Wanted to use industry-standard tools
Led by strong IT department
Future training
• Explicit intra-hh interaction
– Many flavors
• Selection of choice set from a larger set
• “Doubly constrained” location choice
– Enforced quota
– Shadow price
• Ties to DTA
• Enhancements in population synthesis
Authors
Erik Sabina, [email protected]
Jennifer Malm, [email protected]
Suzanne Childress, [email protected]
- Denver Regional Council of Governments
John Bowman, [email protected]
- Bowman Research and Consulting
DeVon Culbertson, [email protected]
- DeVon Culbertson, LLC