A Demand Forecasting Model Based On Deep Learning: An Interview With Mariia Bulycheva From Zalando

Every day, neural networks become more prominent and integrated into various business domains. They excel at processing big data, and one of their promising applications lies in online retail. Mariia Bulycheva, an Applied Scientist at Zalando, has developed an ML-based demand forecasting model with her team. Their project resulted in a research paper published in Forecasting with Artificial Intelligence: Theory and Applications, part of the Palgrave Advances in the Economics of Innovation and Technology series.

In this Interview, Mariia discussed the business challenges addressed by implementing the model, the unique aspects of e-commerce it had to accommodate, and her forecasts for how similar approaches might be adopted across the online retail sector.

Mariia, to start, could you tell us about the business challenges you addressed while developing your demand forecasting model?

The primary business objective at Zalando was to forecast consumer demand to better understand how quickly specific items would sell, enabling the company to optimize pricing strategies. Discounts played a key role here—we needed to calculate how items would sell if a 5%, 10%, or 15% discount were applied. Accurate demand forecasting allowed us to achieve specific revenue growth targets by the end of the year and manage inventory effectively.

Previously, the company used a random forest-based model for this purpose. However, calculating optimal discounts with this approach took around 6-8 hours due to the slow recalculation of the gradients. This delay meant that updated prices often appeared on the website at 4 PM instead of the desired 8 AM, negatively impacting both financial metrics and user experience.

An additional goal was inventory management—ensuring the company could anticipate stock shortages or surpluses. For example, if the model indicated a popular demand for certain swimsuits, the team could plan to restock them before the end of summer.

Developing the model took about four months, during which we coded the solution and compared it against other approaches. Given the model’s complexity and scale, we conducted three months of A/B testing during the Spring/Summer season. This was followed by a four-month gradual rollout to production, starting with smaller product categories while monitoring results. By the end of the year, we were using the model across all product categories, including the highly popular men’s T-shirts segment. The entire process took approximately 11 months.

What makes demand forecasting more challenging than sales forecasting, and what tools did you use to address these challenges?

Previously, Zalando only forecasted sales and passed this information to the pricing team, which manually estimated demand. These calculations were less accurate because actual sales didn’t always reflect true demand—for instance, demand could have been higher if all the stock had not been sold out in that particular week . Switching to demand forecasting allowed the pricing team to determine discounts more precisely.

Sales forecasting is more straightforward because we have concrete data: how many units of a product were sold and at what price. For example, if 100 pairs of warm socks were sold in January and 20 in February, we could infer that demand spikes in December and discounts should follow in late January.

Demand forecasting is trickier as we lack direct data. To train the model, we applied a data augmentation process, using demand data from periods when items were in stock and extrapolating this to dates when stock was unavailable. This provided the pricing team with more accurate demand data. They could then determine discounts based on projected demand (unconstrained by stock limitations) and avoid unnecessary discounts during periods of high demand.

What practical results has the implementation of your model delivered?

The pricing team became, on average, 15 times faster at calculating optimal discounts and deploying them on the site. We now adjust discounts twice a week instead of once, allowing us to respond more effectively to changing demand.

Additionally, the model’s accuracy improved significantly. Offline testing showed that the root mean square error (RMSE) of forecasted demand versus actual data decreased by 20.5 percentage points compared to the previous random forest model—a substantial improvement. This enhanced accuracy translates to better financial performance, as metrics like revenue and discount costs rely heavily on demand forecast precision.

Across all budget categories and countries, the new model consistently outperformed the old one. The RMSE decreased by 5 to 30.5 percentage points for budget categories and by 12 to 42 percentage points across countries. The model is fast and can be retrained weekly with updated data.

We also improved in-season stock management by reducing instances of popular items going out of stock due to underestimated demand. This minimized situations where customers couldn’t purchase desired items, enhancing loyalty.

The architecture of our solution enables us to deliver results equally quickly for any number of SKUs, thanks to parallelized data processing. We further accelerated network training by distributing data computations across eight GPUs. This level of parallelization wouldn’t have been possible with the random forest model due to its inherent limitations. Specifically, its architecture requires certain processes, like tree splits, to be performed recursively, one after the other, making data processing sequential rather than parallel.

Has More Accurate Demand Forecasting Improved Customer Experience? Could Your Clients Notice the Difference?

Yes, absolutely. As I mentioned earlier, customers no longer face as many challenges when trying to purchase popular items. Zalando offers products that are in demand year-round—such as classic Adidas sneakers—and these high-demand items used to frequently go out of stock. Naturally, this led to customer dissatisfaction; people couldn’t order what they wanted, and some turned to other platforms instead. Winning those customers back was incredibly difficult because they’d think: “Why should I waste my time on Zalando if I can’t find what I need there?”

Now, such situations are far less frequent, and more customers choose to stay with us.

Can You Tell Us About the Model’s Architecture? How Does It Work?

Our core task is time series forecasting. The training data comprises a sequence of historical sales and discount information, along with contextual data about the items themselves (such as budget categories and seasonal relevance). The sequence of data points—what demand looked like the day before yesterday, yesterday, and today—is critical for accurate forecasting.

For this, a transformer architecture proved to be the optimal choice. It excels in two key areas: first, its ability to process data for thousands of items in parallel, and second, the incredible power of its multi-head attention mechanism to identify relationships within sequential data of any length.

In our case, we use a year’s worth of sales data as input for the model, which learns to recognize these relationships. For example, it can detect patterns like increased sales leading up to Easter as people prepare for holidays and travel, or a post-Christmas dip in demand when shoppers have already bought gifts and are spending time with family.

Traditionally, transformer models were used for processing text data to solve tasks like language translation, text summarization, answering questions based on a given text, and document classification. Zalando was one of the first companies to apply a transformer to time series forecasting. We deployed the first version of our transformer model to production in December 2019, while references to using transformers for time series forecasting began appearing in academic literature only in 2020.

Why Were Transformers Chosen for This Task, and What Are Their Advantages Compared to Traditional Methods Like LightGBM and DeepAR?

LightGBM is based on gradient-boosted decision trees, which are not ideal for our task because they don’t account for temporal patterns in the data. For example, if the model detects higher demand in spring and lower demand in winter, it tends to output an arithmetic average, ignoring seasonality.

DeepAR is specifically designed for time series forecasting, but it relies on the previous generation of recurrent neural networks (RNNs). These networks process sequences of data sequentially, step by step. In contrast, our transformer model represents the next generation of architectures, leveraging the multi-head attention mechanism. This enables transformers to process data in any order and efficiently parallelize heavy computations, including distribution across multiple GPUs.

One of the Innovations of the Model Was Incorporating the Dependency of Demand on Discounts. How Was This Achieved, and What Results Did It Deliver?

The relationship between discounts and demand is obvious to humans—higher discounts drive higher demand—but it’s not inherently clear to the model. Earlier versions of our model didn’t account for this monotonic relationship, leading to inaccuracies in specific cases. We trained the model to adjust its forecasts so that demand would increase monotonically with discounts: the higher the discount, the greater the expected demand.

The transformer architecture includes two key components: the encoder and the decoder. After the decoder, we added an extra layer to model the monotonic dependency of demand on discounts. This approach was inspired by an academic research paper that proposed such a technique, so our work was grounded in established scientific findings.

This monotonic modeling of demand relative to price helped align our forecasts more closely with the company’s targeted overall discount levels. Zalando operates within a predefined discount budget for any given week, but without precise forecasts, this budget was often misallocated. Our monotonic model improved budget alignment, increasing accuracy by 1.1 percentage points on average across all countries and by 2 percentage points across all budget categories compared to a similar model without monotonicity.

Given the scale of Zalando’s sales, such improvements can translate into millions of euros annually.

Can Zalando’s Demand Forecasting Experience Be Useful to Other Large E-Commerce Platforms? How Difficult Is It to Apply the Findings from Your Research Paper?

Our model isn’t particularly difficult to implement because there are pre-built libraries that provide transformer architectures. The paper thoroughly details the required data formats and how to preprocess the data correctly.

The model we developed is groundbreaking for the market, but in the paper, we’ve outlined its implementation in a way that’s accessible to any company. Essentially, the paper can serve as a step-by-step guide to improve business performance through accurate demand forecasting. I’m confident that, in the coming years, many companies will adopt and benefit from our advancements.

What Areas Do You See as the Most Promising for the Further Development of Demand and Sales Forecasting Technologies in E-Commerce?

I believe there’s significant potential in developing probabilistic forecasts. Unlike point forecasts, probabilistic approaches provide a range of possible outcomes along with the likelihood of each. This includes scenarios for the proposed forecast as well as stress scenarios for lower demand.

Point forecasts perform well in stable environments, but probabilistic models are better suited to account for unforeseen events, such as a pandemic. These forecasts enable businesses to prepare for multiple outcomes and adapt strategies based on external circumstances.

Such forecasts will require the integration of not only machine learning but also statistical methods and probability theory. The good news is that our model can also be adapted for probabilistic forecasting, making it a flexible foundation for future developments.

A Demand Forecasting Model based on Deep Learning: An Interview with Mariia Bulycheva from Zalando

A Demand Forecasting Model based on Deep Learning: An Interview with Mariia Bulycheva from Zalando

Mariia, to start, could you tell us about the business challenges you addressed while developing your demand forecasting model?

What makes demand forecasting more challenging than sales forecasting, and what tools did you use to address these challenges?

What practical results has the implementation of your model delivered?

Has More Accurate Demand Forecasting Improved Customer Experience? Could Your Clients Notice the Difference?

Can You Tell Us About the Model’s Architecture? How Does It Work?

Why Were Transformers Chosen for This Task, and What Are Their Advantages Compared to Traditional Methods Like LightGBM and DeepAR?

One of the Innovations of the Model Was Incorporating the Dependency of Demand on Discounts. How Was This Achieved, and What Results Did It Deliver?

Can Zalando’s Demand Forecasting Experience Be Useful to Other Large E-Commerce Platforms? How Difficult Is It to Apply the Findings from Your Research Paper?

What Areas Do You See as the Most Promising for the Further Development of Demand and Sales Forecasting Technologies in E-Commerce?

Subscribe

Related articles

About us

Quick Links

Latest

Subscribe