Sales Forecasting with Prophet

by Nick | Dec 4, 2023 | Machine Learning

Demand Prediction with Prophet_blog_size

Forecasting Sales Using Prophet

Welcome to another post about my Data Analytics journey. As you all know businesses rely heavily on accurate forecasting to make informed decisions and plan for the future. Time series forecasting, in particular, provides valuable insights into trends and patterns, making it a crucial tool for various industries.

In this blog post, I’ll explore the process of forecasting sales using a python library called ‘Prophet‘. Prophet was developed by Meta’s Core Data Science team, and is still a powerful tool today. I’ll walk you through each step of the predicting process, from data preparation to model evaluation, offering practical insights and tips along the way.

Introduction: e-commerce data

Sales forecasting is an essential task for businesses across all sectors. Whether you’re a retail giant or a small-scale e-commerce store, understanding future sales trends can help:

– optimize inventory mgt

– plan marketing campaigns

– allocate resources effectively.

The data I’m using in this project is a sample of the sales from an e-commerce webshop. Sales were recorded over a 2 year period.

Getting Started

Understanding the dataset

The first steps in understanding a dataset is to do some exploratory analysis. We want to know how large the file is, how many columns we’ve got, the min & max values for each datetime- and numerical column, and some statistical information.

In my case, I’ll be analyzing sales data, which typically includes information such as product names, quantities sold, and timestamps of transactions.

Check data quality

This step involves a number of action like:

– check each column for missing values

– check the set for outliers (for example: a-typical sales qty’s or prices)

In most cases some information needs to be removed or re-engineered to make the data suitable for further processing. Empty values or null values can cause the model to be less accurate in its predictions.

Ideally most points fall within the blue box, or within the whiskers (the 2 vertical black lines). But as you can see there are quite a number of individual black data points that can be considered outliers. Let me explain the plot in a bit more detail:

Box: The box represents the interquartile range (IQR), which spans from the 25th percentile (Q1) to the 75th percentile (Q3) of the data distribution. The length of the box indicates the spread of the middle 50% of the data. The line inside the box represents the median (50th percentile) of the data.
Whiskers: The whiskers extend from the edges of the box to the furthest data points within 1.5 times the IQR from the quartiles. Any data points beyond the whiskers are considered outliers and are plotted individually as points.
Outliers: The individual data points that fall outside the whiskers are plotted as individual points. These points represent values that are significantly different from the rest of the data and may need further investigation.

For the objective of this project I had no need to investigate the anomalies in the Discount column. I just used it as an example of what you can find when investigating a dataset.

Data Filtering

In order to create a prediction model I experimented using just one product from the entire dataset. So I filtered the data and then made a split in order to prepare it for the Prophet model

Filtering steps I took in this e-commerce dataset were:

1. Identifying the Most Popular Product

To demonstrate the forecasting process, I began by identifying the most popular product in the sales dataset. This involves analyzing the total quantity of each product sold over the entire time period (3 years. Result was one product (code: Go-Wo-NMDVGP) sold on 905 days.

2. Check its sales over time (3-year period)

As you can see sales seem to have a pretty regular pattern (but is that the whole story…?)

Building the Forecasting Model

My goal is to make forecasts to see what sales we can expect in the coming weeks or year. We have the historical data and need to feed it into a Prophet model. Steps I took are:

1. Split the Data

Having the target product, I split the sales data into training and testing sets. The training set will be used to train the Prophet model, while the testing set will be used to evaluate its performance.

2. Create the Prophet Model

With the data prepared, I created a Prophet model and fit it to the training data. Prophet’s intuitive interface allows to specify various parameters, such as seasonality and holidays, to customize the forecasting model according to our dataset.

I created this piece of python code to create the model:

model_Go_Wo_NMDVGP = Prophet(weekly_seasonality=‘auto’, holidays=None)

This tells the model that I wanted to include seasonality on a weekly basis, and disregard any holidays.

Generating Forecasts

With the trained Prophet model, I can now generate forecasts for future time periods.
In my example, I have predicted sales for the next 52 weeks, thus providing valuable insights into long-term trends and potential fluctuations.

The plot shows all weekly data points, a (rising) trend line and a forecast line (with confidence interval).
Interestingly there are some points outside the confidence intervals indicating the wideness of the spread (variance) of sales over the weeks. Orders for over 30 items are non-typical, but they show up from time to time. It is important to further zoom to these occurances.

Plotting 3 trend components

To visualize the individual components of the trends and patterns, I have a created a 3-chart plot. This forecast plot depicts the long term trend, the expected yearly trend and the weekly trend.

The plot shows 3 components:

Trend Component: It shows the overall trend in the data. It helps visualize the long-term behavior of the time series data, allowing you to identify patterns and trends.
Seasonality Component: The second plot is the seasonality component of the forecast. In my case it illustrates the weekly seasonality. By examining this plot, you can identify seasonal fluctuations and understand how they contribute to the overall pattern of the time series.
Weeky Component: The plot on the bottom depicts the weekly sales or ordering pattern of this product. Monday’s and Saturday’s don’t seem to be very popular

Changepoints in the trend

The Prophet tool is good at indicating clear changepoints in trend, but in our case the trendline is steadily moving up, and does not show major breaks in the trend.

In red the main trend line.

Evaluating Performance

To assess the accuracy of the forecast, I have calculated performance metrics such as Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). These metrics quantify the difference between the predicted and actual values, providing a measure of the model’s predictive power. In my case having a wide spread in demand from 1 to sometimes over 50, resulted in an overall 3 year MAE of 9.

Conclusion

In this blog post, I have explored the process of forecasting sales using Prophet. I have given insights in its capabilities by showing graphs of how forecasts depict trend and seasonal fluctuation. This is information that can be hidden under the service of any business that is involved in sales or logistics.

By following the steps outlined above, businesses can leverage the power of time series analysis to make data-driven decisions and gain a competitive edge in today’s markets.

I believe that every data scientist or business owner should look at this as an opportunity to optimize his or her daily operations, thus driving growth and profit.

The entire code

Check out all of the code of this project at Github: Nick Analytics – Demand Prediction with Prophet

Check out some of my other blogs!

Power BI: Inventory, Margins & Sales Analysis

by Nick | Power BI

My Power BI Dashboard Welcome to this new post about my Data Analytics journey. When I studied Data Science at Datacamp, one of the exam assignments was to create a Power BI Dashboard. The Dashboard should give clear insights in the Logistics and Sales of a company...

I created a Predictive Energy Model

by Nick | Machine Learning

I’m going to delve into the world of predictive energy modeling by using the Enefit Energy dataset. This dataset was one of the most interesting and challenging I’ve done so far. Goal was to predict energy consumption and production for Estonia. Predictions had to be made hourly for the next 2 days.

Stock Gap Analysis Using Streamlit

by Nick | Stock Analysis

As a stock market enthusiast I tried many strategies to make a profit with technical analysis and data predictions. Most of them failed, because the stock market is basically unpredictable. But I found one interesting phenomenon that could potentially turn your odds.

PORTFOLIO