2022
Public Transportation
Challenge provided by the City of Porto and Associação Porto Digital

Predicting people's flow for public transport improvements

Dashboards and apps that could improve the quality of public transportation service, increasing the number of users and reducing the use of cars.

As cities grow, the flux of people moving from and to cities increases. While many people travel by public transport, some do not  due to service levels that might not be acceptable. [1] Nonetheless, according to TomTom’s traffic index [2], if a person were to drive every day during rush hours (typical commute time) instead of non-rush hours, that person would spend an extra four whole days inside their car. On the other hand, it was found that compared to cars, public transportation produces 95% less carbon dioxide and 92% fewer volatile organic compounds. [3]

Therefore, by increasing public transportation usage, it is possible to decrease pollution, improve public health and increase the general well-being of the population. While many factors affect public transportation usage, the quality of coverage is crucial. With access to mobility and public transportation data, there is an opportunity to optimize the public transportation system.

Goal

The goal of the challenge was to study how the city's public infrastructure can be improved to help reduce traffic and improve the quality of life for its citizens. For this, it was asked to create a model that predicts the in and outflow of people short-term (days) and long-term (months) to and from the municipality of Porto.

United Nations SDG 

GOAL 11: Sustainable Cities and Communities

  • Target 11.2.1: Provide access to safe, affordable, accessible, and sustainable transport systems for all.

Datasets
  • Entry and Exit validation data from public transportation in the Metropolitan Area of Porto, provided by Associação Porto Digital.
  • Origin-Destination (OD) matrices of Movement of People from/to the Porto Metropolitan Area, provided by Associação Porto Digital.
  • GTFS from Porto’s Metro and Public Bus System, provided by Associação Porto Digital.

Data

The teams identified data quality issues and highlighted the importance of clean datasets for increasing the model's performance. The quality issues identified were missing values or values that were clearly out of distribution. On the other hand, it would have been interesting to have a better spatial resolution in regard to the mobility data. This is because the origin-destination matrix was only at the municipality level, which is a vast area to optimize. 

Besides the provided data, the teams included data regarding big events in the city, holidays, COVID restrictions, and weather conditions. 

Methods and Techniques

The teams used different approaches to predict the flow of people to and from the city. Some focused on more classical time-series approaches such as ARIMA, while one team did an extensive analysis of the seasonality and stationarity using the Dickey-Fuller test. Other teams focused on deep learning techniques such as autoregressive Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM).

Main Insights from Data

All teams found that there is seasonality in the data with well-defined peak hours for each day and that the commutes are reasonably predictable. As expected, there is a large flow during the working days and a decrease during weekends, with peaks of flow near 8 AM and 6 PM on weekdays.

One team noted that some of the public transportation was not optimized - that is, while the rate of ticket validations varies, in many cases, the rate of available transport does not. An example of this effect can be seen in Figure 7.

Figure 1 - An example of a misalignment between the rate of ticket validation (orange) and the rate of transportation (blue).

Product

Most of the suggested products targeted the person responsible for planning the city’s public transport system and schedules. The proposed product was a dashboard that could predict the flow of people between different stations and locations to optimize the scheduling in the upcoming season. One approach was specifically directed to the Metro of Porto for predicting the short-term affluence of the increase or decrease of frequency/number of carriages. 

One team suggested creating an app for bus users offering an incentive for commuters to travel in off-peaks which would be predicted by the model. On the other hand, it could warn regarding a possible increase in usage and therefore inform the user about their commute peak and off-peak hours. Both suggestions aim to improve service quality and increase the number of users. 

Social Impact

The teams predicted that implementing the above products could potentially improve the quality of public transportation service, increasing the number of users and reducing the use of cars. Several metrics were proposed to measure the impact:

  • Number of passengers using public transport
  • Number of cars circulating in the city
  • Size of the traffic jams
  • Growth of the usage of public transport
  • Level of satisfaction of the users
  • Level of air quality in the city

While many metrics were proposed, no team estimated the improvement of the metrics in the case of implementation of the product.

References

[1] Nasrudin, Na’asah & Rostam, Katiman & Mohd Noor, Harifah. (2014). Barriers and Motivations for Sustainable Travel Behaviour: Shah Alam residents’ Perspectives. Procedia - Social and Behavioral Sciences. 153. 10.1016/j.sbspro.2014.10.084.

[2] TomTom 2022, Tom Tom Traffic Index. Available at: https://www.tomtom.com/en_gb/traffic-index/

Open-source code

More about this category

World Data League - a competition for data scientists
World Data League @Copyright 2022