Key Words: R, GIS, Predictive Modeling, Curb Space Management, Cluster Analysis, Random Forest, XGBoost, Pilot Design, Data-Driven City Planning, Technology Planning, Smart Cities, Smart Transit

Objectives

In 2022, The City of Philadelphia’s Office of Innovation and Technology (OIT) launched a smart loading zone pilot program. The pilot program, in collaboration with a Google company, captured curb sign inventory and launched a small-scale paid booking platform for commercial vehicles in loading zones. This is the first exploratory study of the pilot’s data outside of OIT. Our analysis includes:

  • Cluster Analysis to uncover signage and use patterns in the study area

  • Impact of local economic context on loading zone bookings

  • Predictive modeling of loading zone bookings

Policy Context

City curbs were originally designed to distinguish transportation versus pedestrian right of way. This physical threshold has evolved from stone-paved street markings to wooden bollards to the finished concrete edge we recognize in most cities around the U.S. Today, the curb is more than an aesthetic barrier, it is also a forever changing posterboard of transit uses and technologies, such as loading zones, bike share docks, food trucks, bus stops, containerized waste, parklets, or ‘streateries’. As a result, the demand for ‘share-of-street’ continues to grow while the supply remains constrained. New, high-capacity management tools are needed to balance this bullish market for curb space.

In order to be able to deploy new tools at scale, cities digitize the curb through sensor or camera networks that produce ‘digital twins’. Digitizing the curb is only as effective as the analysis of the data it produces, and the analysis is only as effective as the clarity of a pilot’s goals. Interdisciplinary innovations that leverage both qualitative and quantitative methodologies to optimize curb space allocation are an emerging way to respond to these challenges.

The federal Bipartisan Infrastructure Law (BIL) established the Strengthening Mobility and Revolutionizing Transportation [(SMART)] (https://www.transportation.gov/grants/SMART) discretionary grant program with $100 million appropriated annually to fill in this equity divide. In March 2023, Philadelphia received $2M in SMART funding to digitize its city streets and advance how they manage scarcity in the right-of-way. The following study provides insight to the operational utility of limited-scale data-driven pilots that are constrained to work within the bounds of existing curb signage (which may not be optimal for the pilot).

Furthermore, while these awards are substantial, they are not sufficient to operationalize all this digital data. Many municipalities lack the resources or expert staff to develop data-driven programs to realize the full potential of this information. This project intends to assist cities in how they can operationalize data from limited-scale pilots in order to optimize public value along the curb for all their constituents.

Clean Data and Select Variables

The SLZ vendor’s curb sign inventory was converted from image to text with optical character recognition that produced inconsistent results. Data was cleaned and converted to numeric.

Alt text Alt text Alt text

Sample signage from within the study area.

The curb is the study’s unit of analysis which is defined as one side of the street from end-to-end.

Alt text

Establishing a curb-level unit of analysis.

To understand activity at the curb, variables were sourced from OpenDataPhilly. Cortest() was used to test for multicollinearity and the data’s general accuracy. Do the associations reflect relationships that we know to be true? For example, given that the curb space in commercial areas is used more frequently for other purposes (ex. Transit stops, streateries, loading zones, etc.), it makes sense that there is a negative association between parking signs and commercial zoning.

Alt text

The final correlation matrix was limited to significant variables including specific jobs, time of day, and relevant signage such as bus stops and existing loading zones.

Cluster Analysis

An unsupervised cluster analysis defines candidate curbs which are representative of greater curb use patterns in Center City and University City. The next step of our research will involve qualitative analysis and interviews with property and business owners on these candidate curbs. This will help us identify the “ideal” curb use from the perspective of those who use the space every day.

Alt text

Using the elbow method (plotting number of clusters against the within-cluster sum of squares) we determined three clusters to be the most efficient for Philadelphia’s relatively homogenous downtown area. K-means analysis did not yield definitive results. While there are distinct separations between clusters, the centroids do not overlap, there are few outliers, and the clusters are of similar size, none of the clusters significantly correlate with any of the curb characteristic variables.

The k-means algorithm starts by randomly assigning k cluster centroids to the data points. Each data point is then assigned to the nearest centroid based on the distance metric (usually Euclidean distance) and a cluster is formed. The centroids are then recalculated as the mean of the data points in each cluster. This process is repeated until the centroids no longer significantly change.

datb <- mydata
mydata$cluster <- km.out$cluster
cluster_means <- data.frame(cluster = 1:k, km.out$centers)

cluster_means %>%
  head()%>%
  kable() %>%
  kable_styling(full_width = F) %>%
  column_spec(1:ncol(cluster_means), width = "100px") %>%
  scroll_box(width = "100%", height = "200px") 

kmeans_basic <- kmeans(datb, centers = k)
kmeans_basic_table <- data.frame(kmeans_basic$size, kmeans_basic$centers)
kmeans_basic_df <- data.frame(Cluster = kmeans_basic$cluster, datb)
kmeans_basic_df$number <- rownames(kmeans_basic_df)

Spatially, there are some notable distinctions between the clusters to investigate. For example, it would be interesting to determine why there are so many signs along South Street in Cluster 2 and Cluster 3, but virtually none in Cluster 1. It could also be revealing to investigate why Cluster 1 has so many more signs in University City compared to the other clusters.

Alt text

Created in GIS, each cluster is mapped to show spatial dispersion within the study area.

To generalize curbs across the study area, we ran an unsupervised k-means cluster analysis to group curbs by signage, number of jobs, and zoning. This analysis renders a predetermined number of distinct clusters of curbs. The goal is to unveil which variables distinguish the clusters from one another. Practically, clusters of curbs in this study are primarily distinguishable by land use: Cluster 1 is residential, Cluster 2 is commercial and municipal use, and Cluster 3 is institutional. However, statistically, these clusters are not significant predictors of any variables in our study because the curbs within the study area are arguably too homogenous to confidently provide a curb usage pattern to extrapolate across Philadelphia or use as a variable in our prediction models.

Predictive Demand Models

The predictive demand models’ objective is to forecast future demand for loading zones based on the pilot’s booking data. The models ranked variable importance on predicting when a SLZ will be booked in the three busiest zones (19, 20, 21).

Alt text

Total number of bookings per Smart Loading Zone. Zones 19, 20, and 21 had the most bookings during the pilot.

The Random Forest machine learning model gives a Root Mean Square Error (RMSE) = 1.64 and R2 = 0.122. RMSE is a helpful measure of the average magnitude of errors between the predicted values and the actual values in the dataset.

set.seed(1234)

naive_split <- initial_split(zones, prop = 3/4)

zones_training <- training(naive_split)
zones_testing <- testing(naive_split)

folds <- vfold_cv(zones_training, 5)

#Build out the random forest first

rf_recipe <- recipe(Bookings ~ SmartZoneNumber + Day + Hour + open_close + CNS07 +  CNS16 + CNS18, data = zones_training)  

rf_zones_mod <- rand_forest(mode = "regression", mtry = tune(), trees = tune()) %>% 
  set_engine("ranger", importance = "impurity")

#Pull out model params explicitly for tuning
rf_wf1 <- workflow() %>% 
  add_recipe(rf_recipe) %>% 
  add_model(rf_zones_mod)

params <- extract_parameter_set_dials(rf_zones_mod) %>% 
  finalize(zones_training)

ctrl <- control_bayes(verbose = FALSE)

rf_bayes_tune <- rf_wf1 %>% 
  tune_bayes(resamples = folds, initial = 5, 
             iter = 25, control = ctrl, param_info = params)

rf_wf_tuned <- finalize_workflow(rf_wf1, select_best(rf_bayes_tune, "rmse"))

rf_results <- last_fit(rf_wf_tuned, naive_split)

rf_zones_predictions <- rf_results %>% 
  collect_predictions()

The XGBoost machine learning model gives a marginal improvement here with an RMSE = 1.63 and R2 = .138. A lower RMSE indicates less error, and the higher R2 indicates a marginally better model fit.

xg_recipe <- recipe(Bookings ~ SmartZoneNumber + Day + Hour + CNS07 +  CNS16 + CNS18, data = zones_training) %>% 
  step_dummy(SmartZoneNumber, Day, one_hot = TRUE)

xg_mod <- boost_tree( trees = tune(), tree_depth = tune(), 
                     min_n = tune(), mtry = tune(), 
                     learn_rate = tune(), sample_size = tune()) %>% 
  set_engine("xgboost", nthread = 6) %>% 
  set_mode("regression")

# Pull out model params explicitly for tuning

params <- extract_parameter_set_dials(xg_mod) %>% 
  finalize(zones_training)

# Set new workflow object using our original recipe and new model
xg_wf <- workflow() %>% 
 add_recipe(xg_recipe)  %>% 
  add_model(xg_mod)
ctrl <- control_bayes(verbose = TRUE)

xg_bayes_tune <- xg_wf %>% 
  tune_bayes(resamples = folds, initial = 10, 
             iter = 25, control = ctrl, param_info = params)

xg_wf_tuned <- finalize_workflow(xg_wf, select_best(xg_bayes_tune, "rmse"))

xg_results <- last_fit(xg_wf_tuned, naive_split)

xg_zones_predictions <- xg_results %>% 
collect_predictions()

Number of jobs, particularly retail jobs, was the most important variable. In order to proxy for surrounding land-uses we make use of ‘ground-floor’ job sectors including retail, restaurants, and healthcare. Compared to other studied variables such building density, zoning, or business permits, the number of jobs remains influential across our models. Zones 19, 20, and 21 feature institutions such as Jefferson Hospital, as well as large hospitality businesses and mixed-use developments.

Alt text Alt text Alt text

From left to right, the streetscapes around Smart Loading Zones 19, 20, and 21.

In addition to local economic context, time is a significant variable in the predictive model. 11AM to 4PM (afternoon) is the busiest time for bookings despite less than half of the zones being in operation during those hours. Considering the volume of bookings in Zones 19, 20, 21, it is reasonable to conclude that commercial activity, such as package, materials, and food deliveries, drive bookings. Furthermore, the lack of booking activity on the weekend also supports the conclusion that commercial deliveries drive bookings.

Alt text

In Zones, 19, 20, and 21 the peak hours from 11AM - 4PM is insightful and can directly inform pricing schemes, traffic management, or future curb design.

Policy Discussion

An exploratory analysis of the 2022 SmartLoadingZone pilot’s booking data shows that the leading variables to predict bookings are local economic context (specifically, the number of retail and health and social service jobs) and time of day. Furthermore, it shows that bookings can be digitized and adoption takes time. While the initial objective of this study was to quantify the supply and demand of curb space in Philadelphia to identify ‘mismatch’, the available data better supports a discussion about booking behavior rather than curb space allocation.

Alt text

Distribution of bookings over the course of the pilot. Number of bookings trended upwards later into the study.

Our analysis suggests that the SmartLoadingZone pilot demonstrates the following takeaways:

  1. Cluster analysis’s homogenous dataset failed to derive curb characteristic patterns to apply to a booking prediction model.

  2. Local economic context is a decent determinant of booking demand.

  3. Day of the week and time of day remain influential across our models.

  4. Bookings can be digitized and adoption takes time.

The full policy memo sent to the City of Philadelphia is available upon request.

Acknowledgments

This study was originally conducted as part of Dr. Megan Ryerson’s Planning by Numbers graduate-level class at the Weitzman School of Design at the University of Pennsylvania. Dr. Ryerson is an expert in transportation planning and methods.

Additional analysis and modeling was conducted with Dr. Jamaal Green, an expert in statistical data analysis and economic development. Thank you for your advice, guidance, and support.

Thank you to the City of Philadelphia’s Smart City office for providing the data.

References

Pochowski, A. L., Crim, S., Liu, L., Trask, L., Sherman Baker, C., Woodworth, S., & Klion, J. (2022). Solving the curb space puzzle through the development of a curb space allocation tool. Transportation Research Record: Journal of the Transportation Research Board, 2676(10), 601–621. https://doi.org/10.1177/03611981221090514

Chang, K., Goodchild, A., Ranjbari, A., and McCormack, E. (2022). Managing Increasing Demand for Curb Space in the City of the Future. PacTrans Final Project Report. Retrieved from https://depts.washington.edu/sctlctr/research/publications/managing-increasing-demand-curb-space-city-future

City receives USDOT SMART grant for SmartCityPHL Project: Department of Streets. City of Philadelphia. (n.d.). Retrieved March 31, 2023, from https://www.phila.gov/2023-03-22-city-receives-usdot-smart-grant-for-smartcityphl-project/

City of Philadelphia. (n.d.). Smart loading zones. Retrieved from https://www.phila.gov/programs/smartcityphl/smart-loading-zones/

National Association of City Transportation Officials. (2016). Curb appeal: Curbside management strategies for vibrant urban places. Retrieved from https://nacto.org/wp-content/uploads/2017/11/NACTO-Curb-Appeal-Curbside-Management.pdf

Federal Highway Administration. (n.d.). Curbside inventory report. Retrieved from https://www.fhwa.dot.gov/livability/fact_sheets/curbside_inventory_report.pdf

International Transport Forum. (2020). Shared use of city streets: Managing curb space. Retrieved from https://www.itf-oecd.org/sites/default/files/docs/shared-use-city-managing-curb_5.pdf

Institute of Transportation Engineers. (2018). Curbside management tool user guide (Final). Retrieved from https://github.com/ITE-Curbside/curbside-management-tool/blob/main/Curbside%20Management%20Tool%20User%20Guide_Final.pdf

Open Mobility Foundation. (n.d.). Curb data specification. Retrieved from https://github.com/openmobilityfoundation/curb-data-specification

WESA-FM. (2021, September 27). Pittsburgh Pilots New ‘Smart’ Loading Zones to Ease Congestion and Air Pollution from Delivery Vehicles. https://www.wesa.fm/development-transportation/2021-09-27/pittsburgh-pilots-new-smart-loading-zones-to-ease-congestion-and-air-pollution-from-delivery-vehicles

GovTech. (n.d.). Curb Management Pilots to Launch in Several US Cities. https://www.govtech.com/fs/data/curb-management-pilots-to-launch-in-several-us-cities.html

Los Angeles Department of Transportation (LADOT). (n.d.). Code the Curb. https://ladot.lacity.org/codethecurb

Vision Zero PHL. (n.d.). Vision Zero Action Plan. http://visionzerophl.com/

City of Philadelphia. (2018). Connect: Philadelphia’s Strategic Transportation Plan. https://www.phila.gov/documents/connect-philadelphias-strategic-transportation-plan/

Digital Advancement Academy. (n.d.). Muni index. Retrieved from https://www.digitaladvancement.org/muni-index