Margherita - AI Demand Forecasting
Data-Science Consultant at Cérélia
Before joining Cérélia as a Data Science consultant, I had the following challenge:
"Prove us the potential of AI"
So I came up with a project, crafted a proposal, and defined a strategy. I decided to focus on Demand Forecasting to benchmark my solution to the existing forecasting approach. Once the project was approved, everything began.


The forecasting tool I developed, Margherita, was custom-built to address Cérélia’s specific needs. Cérélia is a multinational food company specialized in raw dough products, with annual revenues around €844 Mn in 2024 and a portfolio of more than 1,200 SKUs serving clients worldwide.
Margherita processes 815+ SKUs across 14 countries (around 97% of the Continental Europe Dough division), generating weekly forecasts for approximately 215,000 tons of products with a horizon of 72 weeks. It has already been deployed in production, and the Operations team has full control over how extensively they wish to use it.
Project Goals
Enhance long-term forecasts to reduce waste, improve service levels, and support strategic capacity planning

Increase forecast accessibility to improve communication and collaboration across departments
Deploy the solution in production to generate a tangible impact on the supply chain

Empower Cérélia with full control over their forecasting approach and methodology

Scroll to the bottom to see the outcome for each goal.
Quantitative Results
Measuring global accuracy for more than 815+ weekly SKU forecasts is a challenging task. Since the goal was to focus on long-term performance, we selected Mean Percentage Error (MPE) as the main metric. Moreover, if I just looked at the total forecasted volume, an overestimate on pizza dough could easily hide an underestimate on pie crusts — giving a misleading view of overall performance. To avoid that, I evaluated each SKU individually. Since some products sell far more than others, I also weighted the results by sales volume, so that a high-selling product would have more impact on the final score than a niche product. Using Mean Percentage Error (MPE) over a 6-month horizon, this approach improved the overall global accuracy by 9 points, from ~73% to ~82%.

Average Accuracy Non Seasonal products AI Model: 82.2% SAP Model: 72.7%

Average Accuracy French clients Non Seasonal products AI Model: 88.1% SAP Model: 78.9%

Average Accuracy Italian clients Non Seasonal products AI Model: 76.2% SAP Model: 68.1%

Average Accuracy Non Seasonal products AI Model: 82.2% SAP Model: 72.7%
For the 6-month horizon forecasts, the comparison covers 25 consecutive forecast start dates, from week 51 of 2024 to week 23 of 2025.

Average Accuracy Non Seasonal products AI Model: 82.6% SAP Model: 71.1%

Average Accuracy French clients Non Seasonal products AI Model: 88.1% SAP Model: 78.9%

Average Accuracy Italian clients Non Seasonal products AI Model: 75.9% SAP Model: 68.8%

Average Accuracy Non Seasonal products AI Model: 82.6% SAP Model: 71.1%
For the 9-month horizon forecasts, the comparison covers 16 consecutive forecast start dates, from week 51 of 2024 to week 14 of 2025.
As shown by the graphs above, we can clearly see how the AI model accuracy over a 6-month period outperformed the current model consistently. Its variability is also significantly lower, proving to be a more robust option for long term forecasts. The same can be said for a longer horizon (9-month forecasts).
Qualitative Results
From a more qualitative perspective, metrics such as the MPE are not the best indicator to assess the sales distribution over the forecasted weeks. For example, the existing solution provided by SAP does not understand peak periods such as Easter or Ramadan, which do not occur on the same weeks every year. This means that the existing solution would simply forecast a peak on the same week number of the previous year. Another key issue to address was exploding and collapsing forecasts. The current model favours short term accuracy at the expense of long term performance. This is useful to react quickly to sharp changes in demand, but turns out to be incredibly problematic if we want to have a reliable forecast in the long run. Both cases can be seen below.

Forecast Start date: May 4, 2025

Forecast Start date: January 19, 2025

Forecast Start date: May 4, 2025
___
Customer Demand
___
Margherita AI
___
SAP solution
_ _
.
Easter
_ _
Ramadan Period
Forecasted Period
Technical Challenges
Upon arrival, no one in the company had ever used Python. Running pip install in VS Code would return "Access Denied," and the laptop I was given had as many GBs of RAM as my phone. The only way to retrieve past sales data was by manual downloads using a third-party solution in Excel. I could only download sales data two weeks at a time and up to three years in the past.
Cultural Challenges
Reinventing the forecasting process for a company that deals with perishable products, and is reliant on manual Excel adjustments was no easy task. Introducing an AI-driven approach that cuts manual steps and uses up to 12 variables required clear communication and reassurance that they’d still retain full control. Explaining with easy examples how the algorithm works was key.
Data Challenges
Not all the data I expected to use was actually available. For instance, knowing the type of promotion launched by each customer could have been a key variable, but that information is buried in emails, making it nearly impossible to retrieve systematically (though NLP could be a promising future solution). Data quality also proved to be a major challenge throughout the project.
Data Preparation
Past sales data is available at the warehouse level for each SKU–client combination, with daily granularity. However, this level of detail can be problematic when volumes are transferred between warehouses — one may show a sharp drop while another spikes. To address this, we aggregated data at the client level for each SKU. Using a disaggregation factor based on the past eight weeks of sales (for non-seasonal products), we can then split the total forecast back to the warehouse level.
Daily granularity also posed a challenge, as the sales series contained many zeros (clients typically order twice a week). To address this, we aggregated sales data at weekly level.
Since Cérélia products are really good, what if a client suddenly decides to double its volumes? No worries, we can use the PELT algorithm to detect sudden changes and then shift the sales before the detected change point by a given quantity (I used the difference between the mean sales before and after the change point detected, as shown below).

We can see how this product experienced a decrease in sales around May 2025

The PELT Algorithm detected this change point and the sales have been shifted down

We can see how this product experienced a decrease in sales around May 2025
Lastly, vacation days falling on a Monday can also affect demand, as some clients place larger orders the week before. By shifting a portion of those sales to the following week, we can better account for this behavior.
What about service level? Historical client demand can be heavily influenced by it. For instance, if clients don’t receive the full quantity they ordered, this may create artificial demand peaks that aren’t truly representative. By comparing demand with delivered quantities from the previous week (week n–1), I was able to adjust the demand of week n downward by a correction factor.

Adjustment of historical sales to reduce artificial peaks resulting from service level issues.

Adjustment of historical sales to reduce artificial peaks resulting from service level issues.
___
SAP solution
___
Deliveries
___
Customer Demand
___
Margherita AI
_ _
.
Easter
_ _
Ramadan Period
Forecasted Period
Feature Engineering and Feature Selection
Feature engineering played a key role in improving the AI model’s performance. Special events such as Easter and Ramadan were modeled to capture the periods before, during, and after each event using one-hot encoding. It was also important to consider the Ramadan week number.
In addition, I tested several other factors, including vacation days, the number of vacation days per week (and per weekend), national holidays, the day of the week on which vacations occurred, major sporting events, and even TV cooking shows. After experimenting with various combinations, the most influential features turned out to be Easter, Ramadan, promotions (described below), and national holidays. The remaining variables mostly led to overfitting.
The Impact of Promotions
Promotions were the most challenging aspect to model. They can last anywhere from a few days to several weeks, take different forms, and result in anything from sharp demand spikes to no significant change at all. Moreover, the type of promotion was almost never recorded in the system — it existed only in email exchanges, and collecting it systematically would require a dedicated project. In addition, promotions were managed locally (i.e., at the warehouse level in terms of data aggregation). For the reasons explained above, I aggregated the data at the national client–SKU level, which introduced some inconsistencies in how promotions were represented.
To better characterize them, I created several features. First, I added a variable to record the promotion week number (e.g., promo_week_1, promo_week_2, promo_week_3 for a three-week promotion). This proved particularly useful, as clients often repeat similar purchasing behaviors across promotion weeks. However, this alone was not sufficient: a promotion week could include anywhere from one to seven active promotion days. I therefore added another feature capturing the number of promotion days per week.
Finally, since promotions launched from larger warehouses typically have a stronger impact on sales than those from smaller ones — and my data was aggregated at the national level — I multiplied the one-hot encoded promo_week_number feature by the sum of the average sales across all warehouses involved in each promotion. This adjustment helped the algorithmto estimate more accurately the impact of promotions.

Example 1

Margherita vs SAP solution

Example 1
___
SAP solution
___
Customer Demand
___
Margherita AI
Promo Weeks
_ _
.
Easter
_ _
Ramadan Period
Forecasted Period
From the images above, we can see that Margherita accurately captures the impact of promotions for this specific SKU–client combination. In this case, the quantities are also correctly distributed across the three promotion weeks. What about the absence of a promotion that took place the previous year? It’s important for the algorithm to automatically adjust and reduce expected sales accordingly, as shown below.

Example 2

Margherita vs SAP Solution

Example 2
___
SAP solution
___
Customer Demand
___
Margherita AI
Promo Weeks
_ _
.
Easter
_ _
Ramadan Period
Forecasted Period
In the example above, we can clearly see that during weeks 20, 21, and 22 of 2024, the client launched a promotion that did not occur during the same weeks in 2025. The AI model successfully recognized the resulting decrease in demand and accurately adjusted the forecast accordingly.
The AI Algorithm
Choosing the right AI algorithm is never straightforward. The best choice depends not only on accuracy but also on the business context, the structure of the data, and various practical constraints. In Cérélia’s case, the forecasting process needs to be fast and efficient — the algorithm must run on Sundays and deliver forecasts by Monday morning for every SKU. In addition, the number of data points available per SKU is usually too limited to justify heavy models like deep learning. On the other hand, traditional statistical methods often rely on assumptions that don’t always hold in real-world data.
There are many ways to combine machine learning models and to do it effectively we first need to understand how different algorithms actually make predictions.
In general, regression algorithms can be divided into two main types:
-
​Feature-transforming algorithms
-
Target-transforming algorithms
Feature-transforming algorithms — like linear regression or neural networks — learn a mathematical function that takes the input features, transforms them, and produces an output that matches the target values in the training data. In other words, they try to model how the features create the target.
Target-transforming algorithms, on the other hand — like decision trees or nearest neighbors — work by using the features to group similar target values in the training set. When they make a prediction, they look at which group the new observation belongs to and average the target values from that group.
Here’s the crucial difference:
-
Feature transformers can extrapolate — they can predict values beyond what they’ve seen during training if the features support it.
-
Target transformers cannot extrapolate — their predictions will always stay within the range of target values found in the training data.
For example, if we include a time variable, a linear regression model can keep extending the trend line into the future. A decision tree, however, will just keep predicting the last known value forever. And since random forests and gradient-boosted trees (like XGBoost) are built from multiple decision trees, they share the same limitation — they can’t extrapolate trends beyond what they’ve already seen.
At the same time:
-
Feature transformers may not be as good at explaining variability (or must be quite "heavy" to do so — Deep Learning algorithms)
-
Target transformers capture a lot of variance in the data while remaining light in terms of computation
To address these challenges, I adopted an ensemble approach that leverages the strengths of both methods: a statistical model (Linear Regression) and a machine learning model (XGBoost). The Linear Regression captures the overall trend of the series, and then an XGBoost Regressor is trained on the residuals from the linear model. By adding the forecasted residuals from XGBoost to the trend predicted by Linear Regression, we obtain the final forecast.


___
Linear Regression (LR)
___
Extrapolated Trend
_ _
LR Training Residual
Forecasted Residuals
_ _
Finally, because I developed the entire algorithm myself, I have complete control over which models to run. This flexibility allows me to experiment with different parameters, variable sets, and configurations depending on the country, product type, or forecast horizon — greatly increasing the adaptability and potential of the solution compared to the fixed forecasting options provided by SAP.
Information at a Glance - Power BI
While developing the algorithm was both challenging and rewarding, it was equally important to make the forecast outputs easily accessible to non-technical employees. Previously, forecasts could only be obtained through SAP extractions and complex third-party Excel tools, creating friction between departments such as Sales and Operations.
Only technical staff in the Operations department knew how to extract forecast data, and their workload made it impractical to provide forecasts on demand. To address this, I created a user-friendly, multi-page dashboard using Power BI.


Thank to the filters menu, products can be filtered by recipe, type, temperature, segment and more. On top of this, the user has access to several other Accuracy metrics for the horizon selected (Mean Absolute Percentage Error, Mean Percentage Error, Mean Absolute Error, Total Absolute Error, Total Overestimation Error, and Total Underestimation Error).
I have developed several other pages to accomodate different requests. Most importantly, the Power BI report is already online and deployed in Fabric.
Deploying in Production
After developing the AI model and the Power BI dashboards to make forecasts easily accessible, the next crucial step was integrating the solution into the supply chain process to generate real, tangible value. This required identifying the best approach to give the Operations team full control without disrupting the existing forecasting workflow. To achieve this, I designed the new Key Figures and led their implementation, coordinating efforts between up to seven internal and external stakeholders.
The result was a dedicated solution that enables the integration of external files into the SAP forecasting process. Incorporating the AI output directly within SAP was essential, as all subsequent steps in the supply chain are managed through this system. The Operations team can now select, for each SKU–client combination, the forecast horizon up to which they prefer to use either the AI-generated forecast or the standard SAP forecast. This flexibility ensures a gradual and seamless transition, minimizing any potential friction within the supply chain.
Goals Outcome and Business Implications
-
The first goal was to enhance long-term forecasts to support more accurate capacity planning. Compared to the existing SAP solution, the AI algorithm proved significantly more reliable over the long run, avoiding the “collapsing” and “exploding” forecast patterns observed in the previous approach.
-
The second objective was giving an easy access to non technical stakeholders, and I have achieved this by creating tailor-made Power BI visualisations. The dashboard is already online and accessible by all the designated Cérélia employees.
-
The third goal was to deploy the solution in production. I accomplished this by designing and leading the implementation of a custom SAP development request. As a result, the Operations team is now fully autonomous and can decide to what extent they wish to use the AI solution.
-
The fourth goal was to make Cérélia fully independent from third-party solutions in their forecasting process. The deployment framework developed for the AI output can also be used to integrate forecasts generated by any other model. This gives Cérélia complete flexibility to experiment with different forecasting approaches, without being limited by the constraints of external software solutions.
Conclusions
Getting this project over the line was without a doubt among the most difficult challenges of my life. Constantly proving my vision, getting approval from directors with more than 20 years of experience in this industry was not easy.
Today Margherita processes 815+ SKUs across 14 countries, covering around 97% of Cérélia’s Continental Europe Dough division. It generates weekly forecasts for about 215,000 tons of products with a 72-week horizon, improving the overall global accuracy by 9 points, from ~73% to ~82%. The tool is already deployed in production, giving the Operations team full autonomy over how extensively they choose to use it.
Margherita isn't just the AI brain behind every forecast. It is also is the dashboard to analyze, derive insights, and take data driven decisions at a glance. Already online, connected to Fabric for a weekly refresh, and available on Power BI.
Finally, Margherita is the bridge to bring people together. It will improve communication between Sales, Controlling, and Operations, to work towards common goals such as reducing waste, increasing service level, and meeting the clients’ demand.
Upon arrival, I barely spoke a word of French, and no one had ever used Python before. Today, after countless days (and nights) of trial and error, setbacks, and lessons learned I couldn’t be happier of where this project is headed.
Acknowledgements
While I took full ownership to make this project move forward, it would not have been possible without the help and support of several people within Cérélia. I am incredibly thankful to the whole Cérélia family, who took me in as one of them. I am truly grateful to Cédric, who believed in my vision and supported me since the beginning, and to Hendrik, who is pushing me every day and teaching me so much. Many thanks to Bilal as well, who has always guided me in the right direction, and to the whole Operations team. Last but not least, big thanks to the whole Data team, who is doing an incredible job in putting in place an incredibly solid foundation to take full advantage of the available data. Everyone has played a key role in helping me turning an idea into something real.


Skills acquired/improved and tools used in the project
Taking this forecasting project has allowed me to learn so much about time series. I was able to put in practice several skills acquired during my master degree and learn many others by doing. I have also understood the challenges involved in getting a solution in production, which was incredibly challenging.

SciKit Learn
Pandas

Microsoft Power Platform

Fabric
Power BI

IBP
SharePoint