Challenge provided by UrbanAI

Predict waste production for its reduction

How can data help the city of Austin achieve its zero-waste goal by 2040?

According to the World Bank [1], in 2016, cities generated 2.01 billion tons of solid waste, corresponding to 0.74 kg/day per person. With the rapid growth of cities, this number is only expected to increase. It becomes urgent to create optimization processes for waste processing and target public education on waste management and separation. Finally, it is also important to note that waste collection significantly impacts air pollution [2].

The City of Austin is committed to a zero waste goal to reduce the amount of trash sent to landfills by 90% by 2040 [3]. Zero waste is a philosophy that goes beyond recycling: it focuses first on reducing trash and reusing products and then recycling and composting the rest.


The goal of this challenge was to help identify trends in waste production and help to create insights into how to reduce waste and optimize its collection.

United Nations SDG 

GOAL 11: Sustainable Cities and Communities

  • Target 11.6: Reduce the environmental impacts of cities


The following datasets were provided to the participants:

  • Daily waste collection data, provided by the City of Austin
  • Number of inhabitants per year, provided by the City of Austin
  • 2020 Census data, provided by the City of Austin
  • Weather data, provided by OpenWeather


Several teams resorted to Austin’s open data portal to fetch additional data that could be useful for this challenge. Examples of such data are the waste collection routes, the recycling collection routes, socioeconomic vulnerability data, data about events and festivals in Austin, and statistics about businesses in the United States.

One team mentioned that having access to datasets with metrics about past and current public policies may have helped correlate policies to socioeconomic cluster factors and waste trends already found. Another team found that street names were extremely inconsistent across open datasets, most likely due to the renaming or merging of different roads.

Methods and Techniques

Data pre-processing mainly revolved around basic data cleaning - removing outliers and impossible values such as garbage collection numbers above the maximum capacity of trucks or negative ones.

One team looked into harmonizing the unit for analysis, which involved getting it to the census tract level by finding the attribution of waste in each census tract. A census tract is a geographic region defined to take a census. Sometimes these coincide with the limits of cities, towns, or other administrative areas, and several tracts commonly exist within a county. Regarding modeling, most teams used Facebook’s Prophet algorithm as a model for time series forecasting.

There was a team that opted to use time embedding to produce weekly lagged data and use it to feed a traditional Machine Learning model, such as an XGBoost. Another team also did cluster analysis using k-means to cluster census tracts by their socioeconomic attributes to generate cohorts of census tracts.

Main Insights from Data

Several teams pointed out that garbage constituted the highest volume of generated waste, and the overall trend was increasing. One team also found that events play a significant role in waste production, as is the case of the SXSW Festival in Austin, which takes place in mid-March, and Christmas and New Year in December, the two months with the most significant waste production. This same team hypothesized that the March peak could also be due to an apparent seasonal trend of yard trimmings since that type of garbage has a big peak during this month.

Another team pointed out that the recycling-to-garbage ratio has been stagnating in the past years, which, combined with a growing population and retail consumption forecasts, could pose a challenge to Austin’s zero waste goal. This team also noticed that in 2020 there were over 5000 cases when the same garbage route had to be re-visited outside the normal service day, which could be due to suboptimal truck allocation. Their solution focused on solving this problem.

There was also a team who performed a cluster analysis to the socioeconomic data, and they found that the clusters also had a strong geographical correlation and that they represented distinct populations. For example, one of the clusters represented poor underdeveloped, and minority prevalent census tracts - which also happened to be geographically close to each other. With this socioeconomic perspective and the waste forecast models, both on the same census tract granularity, this team found that there was a possible improvement in waste recycling of 17.5M lb per month if all census tracts reached the same recycling percentage of the best representative for their respective cluster. They also plotted a quadrant view showing each cluster normalized by their population size in terms of how much waste they produced and how much waste they recycled.

Figure 1 - Four different behaviors in terms of total waste and recycling share for each normalized cluster. Plotting these four quadrants on a map helps understand which regions of Austin have a tendency towards certain types of behavior.

Figure 2 - Map of Austin showing the regions color-coded by their type of behavior regarding waste production and recycling.

Using the forecast model, the team found that, in the current situation, 2022 would have a total recycling weight projected at 7.5 million lb per month. However, if the census tracts followed the behavior of their reference census tracts, an additional monthly 9.63 million lb could be recycled, decreasing the waste sent to landfills.


One team proposed the development of an application that assists in the waste collection by planning collection trips along different routes based on the predictions for each route, whose primary users would be waste collection facilities. The application would suggest when to dispatch a collection truck on a specific route and for a particular load type based on the threshold values of the forecasting models. After each waste collection trip, the load weight could serve as a feedback input to the application to dynamically improve the schedules for the rest of the year.

Other teams suggested an Intelligent Decision Support System for policy decision-making regarding waste in Austin, whose primary users would be policymakers. This system would map waste generation in different regions and forecast waste and recycling per region within a tactical/strategic time. It could also generate cohorts of regions based on socioeconomic factors and provide macro-level target metrics based on the performance of the regions. This would enable the system to identify problematic areas due to the rapid increase in waste generation and low recycling performance.

Figure 3 - An example of a system that monitors waste production and recycling per region, showing macro-level target metrics.

Social Impact

One outcome of the proposed products would be monitoring and allocating city resources more efficiently, such as allocating garbage collection trucks.

One team suggested different metrics to measure such outcome: number of collection trips in a year, costs saved through better planning of trips, average capacity utilization of trucks, number of trucks added to the fleet per year, and number of new waste separation and recycling facilities. Another team suggested measuring the number of extra collection days saved (i.e., no need to resend trucks because planned numbers were not enough to transport quantity) and variation of surplus in allocated trucks (i.e., more trucks allocated to a given area than necessary).

There is potential to reduce approximately 6000 trips a year for garbage collection and recycling single stream. Similar optimization studies have shown huge potential for savings for the civic authorities in addition to the qualitative impacts of less traffic disruption, less vehicle driver fatigue, and less pollution.

Another outcome of these products would be better planning of city policies by making data-driven decisions and implementing educational campaigns that improve recycling efforts by the local population.

As a way to measure this outcome, one team proposed evaluating the curbside recycling share and curbside total waste share across census tracts and the trend percentage change of recycling share and total waste. By connecting the best representative of all the city's regions with its socioeconomic descriptors and applying similar waste strategies to other regions with similar parameters, the team estimates a 9.63 million lb monthly reduction in incinerated waste. This solution would also translate into a considerable reduction of air pollution in Austin - 4368 tonnes of waste reduction lead to between 3057 and 7426 tonnes of CO2 emissions [4].


[1] World Bank. "A Global Snapshot of Solid Waste Management to 2050". Available at: https://datatopics.worldbank.org/what-a-waste/

[2] Quintili, A., Castellani, B., 2020. The Energy and Carbon Footprint of an Urban Waste Collection Fleet: A Case Study in Central Italy. MDPI. Available at: https://mdpi.com/2313-4321/5/4/25/pdf

[3] Government of Austin, Texas. “Zero Waste by 2040”. Available at: https://www.austintexas.gov/zerowaste

[4] Environment Agency of the UK Government. “Pollution inventory reporting – incineration activities guidance note”. Available at: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/923125/Pollution-inventory-reporting-incineration-activities-guidance-note.pdf

Open-source code

More about this category

World Data League - a competition for data scientists
World Data League @Copyright 2022