New York is a biking city. Over the past 10 years, bike ridership has increased by over 4 times, and it is becoming the city’s fastest growing transportation for New Yorkers, looking for a safe, convenient and affordable way to travel in the city. New York City, with its mostly flat geography and high density is an ideal place for setting up a bike share system. Citi Bike has taken the initiative of using these strengths by offering an unlimited number of affordable, short bike trips. Polls released by Quinnipiac University has shown that over 70% New Yorkers support Citi’s Bike Sharing System. Another reason why the public has strongly embraced this system is because of the exhaustive and highly participatory planning progress used to develop Citi bike stations within New York City. Citi’s Bike Sharing System plans to target most of the New Yorkers who plan to look for a time-efficient, healthier, and fun way of traveling in the city. This bike sharing initiative has not only proven healthy for the rider due to the bicycle exercise involved, but also healthy for the environment since it saves gasoline, prevents carbon emissions and keeps pollutants out of the air.
Although Citi’s Bike Sharing System is a win-win situation for all, it faces a costly maintenance issue – rebalancing bikes across the stations. Usually we observe that existing systems frequently suffer from situations where there are no bikes to rent or where there are no bikes to deposit at the bike station due to the ad-hoc rebalancing practice. Asymmetric demands create empty and full stations across the city. For example, docking stations located in the affluent Manhattan neighborhood have noticed a lot more renting than returning. There are many factors such as time, weather, past demand and space that affect the bike share demand across the city. However, there are also some hidden factors such as New York City’s Upper West Side area, which is an uphill region with high observed demand, and people usually do not like to ride uphill which leads to more renting than returning. Such hidden factors tell us why there are asymmetric demands in the city and might play a large role in optimizing the re-balancing model.
In this analysis, we would be looking into the Citi’s Bike Sharing System and making an attempt to predict the demand across different bikes stations and at different times of the day. Having an idea of what the forecasted demand is, we would next come up with a re-balancing plan to cater to the these demands. For the scope of this analysis, we would consider three resources to carry out this re-balancing, namely (i) Trucks: useful for carrying large number of bikes but generally require low traffic, (ii) Trikes: much smaller than trucks and can reach the bike station in high traffic and (iii) Corrals: used to artificially increase the capacity of a bike station.
We start the analysis by first importing the required libraries, helper functions and defining some color palettes.
# Import necessary libraries
library(tidyverse)
library(sf)
library(lubridate)
library(tigris)
library(gganimate)
library(riem)
library(gridExtra)
library(knitr)
library(kableExtra)
library(gifski)
library(tidycensus)
library(viridis)
library(osmdata)
library(FNN)
options(tigris_class = "sf")
# Import required functions
source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")
# Initialize color pallets
palette5 <- c("#eff3ff","#bdd7e7","#6baed6","#3182bd","#08519c")
palette3 <- c("#eff3ff","#6baed6","#08519c")
palette4 <- c("#D2FBD4","#92BCAB","#527D82","#123F5A")
palette2 <- c("#6baed6","#08519c")
Next, we download the census data for three of the counties spanned by the Citi Bike share system, namely New York county, Kings county and Queens county. We would not use the census variables in our predictive models (due to collinearity with bike station ids), but would rather use the boundaries for mapping purpose.
# Setting the CRS
crs <- st_crs("+proj=lcc +lat_1=40 +lat_2=41.66666666666666 +lat_0=39.33333333333334 +lon_0=-122 +x_0=2000000 +y_0=500000 +ellps=GRS80 +units=m no_defs")
# Downloading census tracts data
nycCensus <-
get_acs(geography = "tract", variables = c("B01003_001", "B19013_001", "B02001_002",
"B08013_001", "B08012_001", "B08301_001",
"B08301_010"),
year = 2018, state = "NY", geometry = TRUE, county=c("New York County", "Kings County", "Queens County")) %>%
st_transform(crs=crs) %>%
mutate(variable =
case_when(variable == "B01003_001" ~ "Total_Population",
variable == "B19013_001" ~ "Median_Household_Income",
variable == "B02001_002" ~ "Total_White_Population",
variable == "B08013_001" ~ "Aggregate_Travel_Time_to_Work",
variable == "B08012_001" ~ "Sex_of_Workers_by_Travel_Time_to_work",
variable == "B08301_001" ~ "Means_of_Transportation_to_Work",
variable == "B08301_010" ~ "Total_Public_Trans_excl_Taxi")) %>%
select(variable, estimate, GEOID, geometry) %>%
spread(variable,estimate) %>%
mutate(Percent_White = Total_White_Population / Total_Population,
Mean_Commute_Time_for_Workers =
Aggregate_Travel_Time_to_Work / Sex_of_Workers_by_Travel_Time_to_work,
Percent_Taking_Public_Trans =
Total_Public_Trans_excl_Taxi / Means_of_Transportation_to_Work) %>%
gather(Variable,Value, -GEOID, -geometry)
# Creating the boundary
nycBoundary <- st_union(nycCensus)
To mark our area of study, we next look into how these bike stations are distributed across the three counties. As can be seen from the left-hand-side plot in Figure 1, majority of these bike stations fall inside the New York county, while others are on the boundaries of the Kings and Queens county.
To extract our area of study, we created a quarter mile buffer around each bike station and selected out the census tracts that touch this buffer. The plot on the right-hand-side of Figure 1 shows the density of the the bike stations in the so extracted area of study.
# Filter out bike stations
bikeStations <- ride %>%
select(start.station.id, start.station.latitude, start.station.longitude) %>%
distinct() %>%
filter(start.station.latitude < 42) %>%
st_as_sf(coords = c("start.station.longitude", "start.station.latitude"), crs = 4326, agr = "constant") %>%
st_transform(crs=crs) %>%
mutate(start.station.id = as.character(start.station.id))
# Filtering ride data based on valid bike station ids
ride <- ride %>% filter(start.station.id %in% bikeStations$start.station.id)
# Filtering out census tracts around 0.25 miles of bike station
buffer_around_bikestations <- st_union(st_buffer(bikeStations, 1320))
# Plotting the data
p1 <- ggplot() +
geom_sf(data = nycCensus, color = "grey", fill = "transparent") +
geom_sf(data = buffer_around_bikestations, fill = "red", alpha = 0.3) +
geom_sf(data = bikeStations, alpha = 0.3) +
labs(title = "Study Area",
subtitle = "Red area denotes the area of study \nwhile black dots denote the bike stations") +
mapTheme()
# Getting the new boundaries
nycCensus <- nycCensus %>% .[buffer_around_bikestations, ]
nycBoundary <- st_union(nycCensus)
p2 <- ggplot() +
geom_sf(data = nycCensus, color = "grey", fill = "transparent") +
stat_density2d(data = data.frame(st_coordinates(bikeStations)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_viridis() +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title = "Density of bike stations") +
mapTheme(title_size = 14) + theme(legend.position = "none")
grid.arrange(ncol = 2, p1, p2, bottom="Figure 1: Area of study: parts of New York, Kings and Queens county")
Figure 2 below shows the weather patterns for the month of May and June, 2018. As can be seen, May is one of New York city’s loveliest months, with the temperature ranging from 50-80 F, making it warm but not hot, and perfect for a bike ride. There have also been few spells of rain/snow during the time, which might effect the demand of bike share rides.
# Getting the weather data
weather.Data <-
riem_measures(station = "NYC", date_start = "2018-05-01", date_end = "2018-06-04")
# Processing weather data to get hourly metrics
weather.Panel <-
weather.Data %>%
mutate_if(is.character, list(~replace(as.character(.), is.na(.), "0"))) %>%
replace(is.na(.), 0) %>%
mutate(interval60 = ymd_h(substr(valid, 1, 13))) %>%
mutate(week = week(interval60),
dotw = wday(interval60, label=TRUE)) %>%
group_by(interval60) %>%
summarize(Temperature = max(tmpf),
Percipitation = sum(p01i),
Wind_Speed = max(sknt)
)
# Plotting
grid.arrange(top = "Weather Data - NYC - May & June, 2018",
ggplot(weather.Panel, aes(interval60,Percipitation)) + geom_line() +
labs(title="Percipitation", x="Hour", y="Percipitation") + plotTheme(),
ggplot(weather.Panel, aes(interval60,Wind_Speed)) + geom_line() +
labs(title="Wind Speed", x="Hour", y="Wind Speed") + plotTheme(),
ggplot(weather.Panel, aes(interval60,Temperature)) + geom_line() +
labs(title="Temperature", x="Hour", y="Temperature") + plotTheme(),
bottom = "Figure 2: Weather patterns across the months")
Following our data gathering process, we next jump into exploring the characteristics of our bike share dataset.
First, we begin by looking at the distribution of trip counts for different stations at different times of the day. From Figure 3 below, we can see that this distribution is skewed towards the left, with high frequency of low trip counts (many of them 0) and low frequency for high trip counts for bike station/hours. From a re-balancing point of view, this plot indicates that we may need to focus our attention on certain bikes stations at certain points of the day when there will be high demand of bikes.
# Extract time related features from the ride data
ride2 <-
ride %>%
mutate(interval60 = floor_date(ymd_hms(starttime), unit = "hour"),
interval15 = floor_date(ymd_hms(starttime), unit = "15 mins"),
week = week(interval60),
dotw = wday(interval60, label=TRUE),
start.station.id = as.character(start.station.id),
end.station.id = as.character(end.station.id))
# Bike share trips per hr by station
ggplot(ride2 %>%
group_by(interval60, start.station.id) %>%
tally())+
geom_histogram(aes(n), binwidth = 5)+
labs(title="Bike share trips per hr by station. NYC, May-June, 2018",
x="Trip Counts",
y="Number of Stations",
caption = "Figure 3")+
plotTheme()
In order to deep dive into our analysis, we break down Figure 3 by the different time periods of the day, namely, morning rush hours (7AM-10AM), mid-day (10AM-3PM), evening rush hours (3PM-6PM) and overnight (6PM-7AM). From Figure 4, it can be seen that the distribution of ride trips during morning and the evening rush hours seem to have a heavy right tails, in simple words, more demand, compared to the mid-day and overnight time. This intuitively makes sense because we can attribute this high demand typically to time when people go/return from their workplace, and would want to use bikes to avoid traffic, especially in a crowded city like New York.
From the point of view of re-balancing, this plot provides an interesting insight into the overnight time being more favorable to move bikes, when the demand is low and also when the traffic is less, favoring trucks as a mode to move bikes. We could also forecast demand for the entire next day (as we will also see using one of the predictive models we build using just the 1 day lag variable in Section 5.1), and make a plan to re-balance the bikes at each station, say by the average forecasted demand.
# Mean Number of Hourly Trips Per Station
ride2 %>%
mutate(time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush"))%>%
group_by(interval60, start.station.id, time_of_day) %>%
tally()%>%
group_by(start.station.id, time_of_day)%>%
summarize(mean_trips = mean(n))%>%
ggplot()+
geom_histogram(aes(mean_trips), binwidth = 1)+
labs(title="Mean Number of Hourly Trips Per Station. NYC, May-June, 2018",
x="Number of trips",
y="Frequency",
caption = "Figure 4")+
facet_wrap(~time_of_day)+
plotTheme()
Next, we look into how the bike share trips vary across the different days of the week and by the different time of the day. From Figure 5 below we can see that the bike share demand is relatively high during the weekdays and compared to the weekends. For the weekdays, we note two significant peaks in demand during the entire day, (i) between 5AM and 10AM and (ii) between 3PM and 8PM, which also map to the morning and evening rush hour categories in Figure 4. Overall, this plot indicates that re-balancing efforts should be focused more on the weekdays, especially during the two peak hours that we found to cater to consumer demand.
# Bike share trips by day of the week
p3 <- ggplot(ride2 %>% mutate(hour = hour(starttime)))+
geom_freqpoly(aes(hour, color = dotw), binwidth = 1)+
labs(title="Bike share trips in NYC\nby day of the week",
subtitle = "NYC, May-June 2018",
x="Hour",
y="Trip Counts")+
plotTheme() +
theme(legend.position="bottom")
# Bike share trips - weekend vs weekday,
p4 <- ggplot(ride2 %>%
mutate(hour = hour(starttime),
weekend = ifelse(dotw %in% c("Sun", "Sat"), "Weekend", "Weekday")
)
)+
geom_freqpoly(aes(hour, color = weekend), binwidth = 1)+
labs(title="Bike share trips in NYC\nWeekend vs Weekday",
subtitle = "NYC, May-June 2018",
x="Hour",
y="Trip Counts")+
plotTheme() +
theme(legend.position="bottom")
grid.arrange(ncol=2, p3, p4, bottom="Figure 5: Bikshare trips across days of week")
To further deepen our findings from Figure 5, we next look at where in space is the demand more. From Figure 6 below, we see that during the weekdays, and almost across all periods of time, the demand for bikes is more in the Upper-West Manhattan, Midtown Manhattan and Downtown Manhattan compared to the other areas in the city. From a re-balancing point of view, this indicates that we need to focus more on the Manhattan area to cater to this higher demand.
Considering our mentioned re-balancing resources, setting up corrals in these places can be good idea to insure that sufficient bikes are present in these areas. Trucks can be used to use to re-balance the bike stands overnight, and then trikes can be used during the morning and evening rush hours to move bikes from area of low demand to Manhattan.
ggplot()+
geom_sf(data = nycCensus, color = "grey", fill = "transparent")+
geom_sf(data = ride2 %>%
mutate(hour = hour(starttime),
weekend = ifelse(dotw %in% c("Sun", "Sat"), "Weekend", "Weekday"),
time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush"))%>%
group_by(start.station.id, start.station.latitude, start.station.longitude, weekend, time_of_day) %>%
tally() %>%
st_as_sf(coords = c("start.station.longitude", "start.station.latitude"), crs = 4326, agr = "constant") %>%
st_transform(crs=crs)
,
aes(color = n),
fill = "transparent", alpha = 0.6, size = 0.5)+
scale_colour_viridis(direction = -1,
discrete = FALSE, option = "D")+
facet_grid(weekend ~ time_of_day)+
labs(title="Bike share trips per hr by station",
subtitle = "NYC, May-June, 2018",
caption = "Figure 6")+
mapTheme()
Next, we explore how the demand of bikes at a station depends on the demand in the past units of time. From Figure 7 we see that the demand at a bike station in the past hour highly correlates to the demand in the current hour (with decreasing correlation with each passing hour). Similar conclusions can also be made to demand in the past day at the same time. For re-balancing, this indicates that both demand in the past hour and the previous day can be good indicators of the expected current demand, although relying on the past hour demand for estimating the expected demand in current time can be challenging, with the re-balancing effort being required to be completed in an hour.
Note: We do not consider holiday lag variable here given that in the selected time frame, Memorial Day is the only holiday, and hence it would not be possible to check how our predictive model performs around holiday demand patterns
study.panel <-
expand.grid(interval60 = unique(ride2$interval60),
start.station.id = unique(ride2$start.station.id)) %>%
mutate(start.station.id = as.character(start.station.id))
# Aggregating Trip by Bike Station
ride.panel <-
ride2 %>%
select(start.station.id, interval60) %>%
mutate(Trip_Counter = 1) %>%
right_join(study.panel) %>%
group_by(start.station.id, interval60) %>%
summarize(Trip_Count = sum(Trip_Counter, na.rm=T)) %>%
left_join(weather.Panel, by = "interval60") %>%
left_join(bikeStations) %>%
mutate(week = week(interval60),
dotw = wday(interval60, label = TRUE)) %>%
st_sf()
# Get lag features
ride.panel <-
ride.panel %>%
arrange(start.station.id, interval60) %>%
group_by(start.station.id) %>%
mutate(lagHour = dplyr::lag(Trip_Count,1),
lag2Hours = dplyr::lag(Trip_Count,2),
lag3Hours = dplyr::lag(Trip_Count,3),
lag4Hours = dplyr::lag(Trip_Count,4),
lag12Hours = dplyr::lag(Trip_Count,12),
lag1day = dplyr::lag(Trip_Count,24)) %>%
ungroup()
# Plotting
plotData.lag <-
filter(as.data.frame(ride.panel), week == 19) %>%
dplyr::select(starts_with("lag"), Trip_Count) %>%
gather(Variable, Value, -Trip_Count) %>%
mutate(Variable = fct_relevel(Variable, "lagHour","lag2Hours","lag3Hours",
"lag4Hours","lag12Hours","lag1day"))
correlation.lag <-
group_by(plotData.lag, Variable) %>%
summarize(correlation = round(cor(Value, Trip_Count, use = "complete.obs"), 2)) %>%
mutate(correlation = paste0("R = ", correlation))
plotData.lag %>%
ggplot(aes(x = Value, y = Trip_Count)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~Variable) +
geom_text(data = correlation.lag, mapping = aes(x = -Inf, y = Inf, label = correlation), hjust = -0.1, vjust = 1.25, color="blue") +
labs(
title = "Rideshare trip count as a function of time lags",
subtitle = "One week in May 2018",
x = "Lag Trip Count",
caption = "Figure 7"
) +
plotTheme()
The animation shown in Figure 8 is a great summary of the findings we have so far, and it is rightly said that we can see the city breathing throughout the day. We can clearly see the two spikes in demand during the day, that is, the morning and the evening rush hour, with relatively more demand in the Manhattan area.
ride.animation.data <- filter(ride.panel , week == 19 & dotw == "Mon") %>%
mutate(Trips = case_when(Trip_Count == 0 ~ "0 trips",
Trip_Count > 0 & Trip_Count <= 5 ~ "1-5 trips",
Trip_Count > 5 & Trip_Count <= 10 ~ "5-10 trips",
Trip_Count > 10 & Trip_Count <= 15 ~ "10-15 trips",
Trip_Count > 15 ~ "15+ trips")) %>%
mutate(Trips = fct_relevel(Trips, "0 trips","1-5 trips","5-10 trips",
"10-15 trips","15+ trips"))
rideshare_animation <-
ggplot() +
geom_sf(data = nycCensus, color = "grey", fill = "transparent") +
geom_sf(data = ride.animation.data, aes(color = Trips)) +
scale_color_manual(values = palette5) +
labs(title = "Rideshare pickups for a Monday in May 2018",
subtitle = "60 minute intervals: {current_frame}",
caption = "Figure 8") +
transition_manual(interval60) +
mapTheme()
animate(rideshare_animation, duration=20, renderer = gifski_renderer())
Next, we look at some of the weather related features we extracted in Section 3.3, specifically temperature and weather. As can be seen from Figure 9, the average number of trips are generally higher on a non rainy/snowy hour of the day, which intuitively also makes sense. From the point of view of re-balancing, this indicates that the re-balancing efforts can be relaxed when there are chances of rain/snow.
# Plotting effect of precipitation of bike share rides
st_drop_geometry(ride.panel) %>%
group_by(interval60) %>%
summarize(Trip_Count = mean(Trip_Count),
Percipitation = first(Percipitation)) %>%
mutate(isPercip = ifelse(Percipitation > 0,"Rain/Snow", "None")) %>%
drop_na() %>%
group_by(isPercip) %>%
summarize(Mean_Trip_Count = mean(Trip_Count)) %>%
ggplot(aes(isPercip, Mean_Trip_Count)) + geom_bar(stat = "identity") +
labs(title="Does ridership vary with percipitation?",
x="Percipitation", y="Mean Trip Count",
caption = "Figure 9") +
plotTheme()
Figure 10 shows how the number of trips vary with respect to temperature, with the number of trips increasing as the temperature increases across all the weeks. In simple words this means that people tend to use the bike share system as the temperature gets more pleasant (say around 70 F) compared to when it is relatively colder (say around 50 F). From a re-balancing point of view, this calls for increasing efforts of re-balancing as the temperature gets more favorable for a bike ride (relying on the weather forecast of the day).
# Plotting how bike share rides vary across temperature.
st_drop_geometry(ride.panel) %>%
group_by(interval60) %>%
summarize(Trip_Count = mean(Trip_Count),
Temperature = first(Temperature)) %>%
mutate(week = week(interval60)) %>%
ggplot(aes(Temperature, Trip_Count)) +
geom_point() + geom_smooth(method = "lm", se= FALSE) +
facet_wrap(~week, ncol=8) +
labs(title="Trip Count as a fuction of Temperature by week",
x="Temperature", y="Mean Trip Count",
caption = "Figure 10") +
plotTheme()
Next, we consider some interesting exposure variables, like transit stations and Universities, and understand how proximity to these, alters the demand at a bike station. Figure 11 below shows the correlation between the total trips at each bike station on a typical day as a function of nearest distance to these amenities. As can be seen, both the plots reveal that there is a negative correlation, indicating that as one goes away from these amenities, the demand goes down. From a re-balancing point of view, this indicates that our plan needs to focus more of areas that are close to transit stations and Universities, as the demand is likely to be high here.
nn_function <- function(measureFrom,measureTo,k) {
measureFrom_Matrix <-
as.matrix(measureFrom)
measureTo_Matrix <-
as.matrix(measureTo)
nn <-
get.knnx(measureTo, measureFrom, k)$nn.dist
output <-
as.data.frame(nn) %>%
rownames_to_column(var = "thisPoint") %>%
gather(points, point_distance, V1:ncol(.)) %>%
arrange(as.numeric(thisPoint)) %>%
group_by(thisPoint) %>%
summarize(pointDistance = mean(point_distance)) %>%
arrange(as.numeric(thisPoint)) %>%
dplyr::select(-thisPoint) %>%
pull()
return(output)
}
st_c <- st_coordinates
nycBoundary.4326 <- nycBoundary %>% st_transform(crs=4326)
xmin <- st_bbox(nycBoundary.4326)[[1]]
ymin <- st_bbox(nycBoundary.4326)[[2]]
xmax <- st_bbox(nycBoundary.4326)[[3]]
ymax <- st_bbox(nycBoundary.4326)[[4]]
bbox <- c(xmin, ymin, xmax, ymax)
# Transit station
transit_station <- opq(bbox = bbox) %>%
add_osm_feature(key = 'railway', value = c("station","subway")) %>%
osmdata_sf()
transit_station <-
transit_station$osm_points %>%
.[nycBoundary.4326,] %>%
st_transform(crs=crs) %>%
mutate(Legend = "Transit_Station") %>%
dplyr::select(Legend)
# University
university <- opq(bbox = bbox) %>%
add_osm_feature(key = 'amenity', value = c("university")) %>%
osmdata_sf()
university <-
university$osm_points %>%
.[nycBoundary.4326,] %>%
st_transform(crs=crs) %>%
mutate(Legend = "University") %>%
dplyr::select(Legend)
# Nearest features
ride.panel <- ride.panel %>%
mutate(
TransitStation.nn = nn_function(st_c(ride.panel), st_c(transit_station),1),
University.nn = nn_function(st_c(ride.panel), st_c(university),1)
)
amenities <- st_drop_geometry(ride.panel) %>%
filter(week == 20, dotw == "Tue") %>%
select(start.station.id, TransitStation.nn, University.nn) %>%
distinct()
# Plotting
st_drop_geometry(ride.panel) %>%
filter(week == 20, dotw == "Tue") %>%
group_by(start.station.id) %>%
summarise(
total_trips = sum(Trip_Count)
) %>%
left_join(amenities) %>%
gather(Variable, Value, -start.station.id, -total_trips) %>%
ggplot(aes(Value, total_trips)) +
geom_point(size = .5) + geom_smooth(method = "lm", se=F, colour = "#FA7800") +
facet_wrap(~Variable, ncol = 2, scales = "free") +
labs(title = "Bike share demand as a function of amenity",
subtitle = "Based on bike share demand data for Tuesday of Week 20 ",
x = "Distance",
y = "Trip Count",
caption = "Figure 11") +
plotTheme()
We believe that our model has shown a decent performance in predicting the demand for bikes. From the above analysis, we found out that our model is mostly underpredicting demand for bikes in the Manhattan area and especially during the rush hours which is the peak place and time respectively in terms of demands for bikes. But this should not deter us from using this model for the rebalancing plan. From the above error analysis, we know that in most of these areas inside Manhattan during rush hours, we are underpredicting by 2-5 trip counts on average. Hence we could maintain a buffer of 2-5 Bikes in addition to the predicted demand for bikes to be loaded in the trikes to satisfy the high customer demand as these trikes are a much more convenient and quicker mode of transporting bikes than trucks during peak hours of demand and vehicular traffic. An alternative to this is that we set up corrals in the bike stations inside the Manhattan area, which allows bike stations to store more bikes than usual. Hence in this case, it would be a good idea to store an additional 10-15 bikes in these bike stations overnight so that we do not need to worry about keeping a buffer of bikes in addition to required bikes to be transported every hour by trikes or trucks thus reducing our transportation costs. For overnight transportation we can use trucks since the demand and the vehicular traffic will be the lowest and trucks will be more convenient and quicker in carrying a bigger load of bikes than trikes.
At the same time, this is a temporary proposed solution, and we need to capture factors which could correct the underprediction by the model. For example, one of the factors could be that Upper Manhattan is an uphill area and people like to rent bikes from these bike stations rather than return it which relatively increases its Bike demand.
Our model would also fail when the unprecedented hits. The best example is the current pandemic. What if there is a dramatic surge in bike ridership after officials urge the public to avoid public transportation? And then just after that there is a dramatic plunge in bike ridership because the Governor issued a Stay-at-Home order. These all cases show how our model is vulnerable in being useful for the rebalancing plan.