Challenge provided by UNStudio

Air quality prediction in busy streets

What if city planners could use an application to view the different levels of air pollutants at any given time and predict future ones?

It is estimated that 9 out of 10 people worldwide live in places where air quality exceeds WHO guideline limits [1]. Due to high levels of air pollution, people risk getting diseases like respiratory infections, lung cancer, and heart disease. The most health-harmful pollutants are PM2.5 particles that penetrate deep into lung passageways.

The Green Mile is a project initiated by UNStudio, Blendingbricks, Heineken, the Rijksmuseum, the Amsterdam University of Applied Sciences, and the Dutch National Bank. It aims to transform Stadhouderskade street in Amsterdam, which is currently the most polluted, busiest, and the street with the most traffic and pedestrian accidents in the city.

The main sources of pollution are road traffic and industry; for that reason, people report feeling the effects of the bad air quality when spending large amounts of time in Stadhouderskade. As such, people are expectedly not attracted to spending more time there than strictly necessary [2]. Death rates attributed to air quality pollution have decreased in the Netherlands between 1990 and 2014 (approximately 45%) but plateaued in 2014 [3], which brings a renewed need to protect the air quality, not only in Stadhouderskade but everywhere.


The goal of this challenge was to help the initiators of the project create a case and buzz for the needed change in Stadhouderskade street and, more specifically, for the current impact it has on air pollution.

United Nations SDG 

GOAL 11: Sustainable Cities and Communities

  • Target 11.6: Reduce the environmental impacts of cities


The following datasets were provided to the participants:

  • Hourly measurements of different air pollutants at Stadhouderskade, provided by UNStudio
  • Weather data, provided by OpenWeather
  • Open city data, provided by the City of Amsterdam


Since no team used additional data to solve this challenge, only the provided datasets were used.

Several teams pointed out the critical relation between air pollution and weather conditions and how intrinsically related these two variables are. Wind, for example, plays a big role in determining the travel patterns of air pollutants since it can transport them. For that reason, one team mentioned that having hourly measurements of air pollution but only daily weather measurements posed a problem in analyzing the data.

Methods and Techniques

All teams started with EDA by analyzing the descriptive statistics of each variable and their pairwise relations through scatter plots and correlation values. Besides that, all teams looked into variations across different time frames and possible missing data.

During data cleaning, one team established a maximum threshold for pollutant variables after identifying unusual/extreme values in the series using a moving average plot. This team fixed missing data problems using linear interpolation for missing observations that were, at most, one day apart from a known observation. The remaining missing values were discarded from the analysis. Another team used a 3-point rolling mean to fill null values.

Regarding feature engineering, several teams computed the Common Air Quality Index, which provides a unified view of the air quality at any given moment, taking into consideration three of the measured pollutants: Nitrogen Dioxide (NO2), Particulate Matter 2.5 (PM2.5), and Particulate Matter 10 (PM10). One team also calculated mutual information, permutation importance, and Principal Component Analysis.

In terms of time series modeling, several teams evaluated stationarity and autocorrelation using the Dickey-Fuller test and used Autoregressive integrated moving average (ARIMA) or SARIMA (Seasonal Autoregressive Integrated Moving Average). Others used XGBoost and LightGBM, but performances were not significantly better.

Main Insights from Data

Several teams discovered that all air pollutants showed a decreasing trend from 2014 to 2022, ranging from 11% to 81%. Compounds Xylene (81%) and Toluene (71%) decreased the most, pointing to the fact that Stadhouderskade was already on the right track to decreasing air pollution.

One team used geographical data related to the location of outdoor activities in the nearby zones of Stadhouderskade street to show that there were no running routes in this street and that there was only one sports park in the vicinity of this road. This same team also showed there was strong traffic congestion since there was a high concentration of pollutants usually emitted by motor vehicles.

Figure 1 - Map showing Stadhouderskade street (in red) and its surrounding infrastructure (in blue, yellow, and green). There were no running routes on this street, and there was only one sports park in the vicinity of this road.

A team found that except for NO2 - whose values were lower in the early mornings when compared to the entire day - no other pollutants showed similar concentration patterns. However, there seems to be a pattern throughout the year: from May to August (Summer), the pollutant value decreases and the air quality index increases; in December, the pollution levels increase considerably.


As a way to productize the developed algorithm, the vast majority of teams suggested developing some type of dashboard or application that would enable city planners to view the different levels of air pollutants at any given time, along with predictions for other times in the future.

Figure 2 - Dashboard showing the different level of each air pollutant on different days. The user can also view predictions of when the peak values will be achieved.

Social Impact

The main outcome of this product would be changing the city policy using the gathered data. One team suggested the following examples:

  • Implement suggestions to the population if the Air Pollution Levels are "high" or "very high" - for example, suggest that people with respiratory diseases avoid passing the street at certain times of the day.
  • Create a "Low Emission Zone" where only cars registered after 2011 can circulate in Stadhouderskade, in times of the day when Air Pollution Levels are "high" or "very high" - based on the example of Krakow (Poland), the implementation of car traffic restrictions could lead to an 80% user satisfaction towards the quality of the public space. [4]
  • Optimize and add public green spaces in the vicinity of the street: fewer parking lots and more green areas, gardens, or parks.
  • Build outdoor interactive banners along the street displaying the amount of air pollutants emitted over a certain period, where people could filter and visualize these amounts in real-time.

Another team proposed as metrics the number of days with acceptable/unacceptable levels and the percentage of air pollution decrease after deployment of their product.

There was also a team proposing that by using their analysis, city planners could create efficient traffic control policies (i.e re-routing traffic at certain times of the day) or even create additional anti-pollution policies, like limiting the usage of specific fireworks in New Years to reduce the pollution levels in critical moments. On another note, the analysis could also be used to create articles or media campaigns to generate social conscience on the pollution problem.


[1] World Health Organization. “Air pollution”. Available at: https://www.who.int/health-topics/air-pollution#tab=tab_1

[2] Amsterdam Air Quality Institute. “Air quality in Amsterdam”. Available at: https://www.iqair.com/netherlands/north-holland/amsterdam

[3] Our Word in Data. “Deaths from air pollution, 1990 to 2019”. Available at:  https://ourworldindata.org/grapher/air-pollution-deaths-country?tab=chart&country=~NLD

[4] Szarata, A., Nosal, K., Duda-Wiertel, U. and Franek, L., 2017. The impact of the car restrictions implemented in the city centre on the public space quality. Transportation Research Procedia, 27, pp.752-759.

Open-source code

More about this category

World Data League - a competition for data scientists
World Data League @Copyright 2022