Modeling Short Time Series with Prior Knowledge

What ‘Including Prior Information’ really looks like.

Tim Radtke

2019-04-16

It is generally difficult to model time series when there is insuffient data to model a (suspected) long seasonality. Here, we show how this difficulty can be overcome by learning a seasonality on a different, long related time series and transferring the posterior as a prior distribution to the model of the short time series. The result is a forecast that is believable and can be used for decisions in a business context. In contrast to traditional methods that are not able to incorporate the long seasonality, we observe a drastic increase in common evaluation metrics. Default models in the forecast and prophet package fail to produce good forecasts on this example.

Find the data and code necessary to reproduce the analysis on Github.

Often in forecasting, a key step is knowing when something can be forecast accurately, and when forecasts are no better than tossing a coin. Good forecasts capture the genuine patterns and relationships which exist in the historical data, but do not replicate past events that will not occur again.

Prior Information in Bayesian Models

“The statistical technique allows us to encode expert knowledge into a model by stating prior beliefs about what we think our data looks like.” - Fast Forward Labs

“It provides a natural and principled way of combining prior information with data” - SAS

When you read about the advantage of probabilistic modeling and Bayesian inference in statistics and machine learning, one argument that comes up again and again is the possibility of including expert knowledge via the prior distribution into your model. This is touted as a giant leap promising vast possibilities. At second look, however, it often is left unclear how business expertise can be encoded into a model for your application.

Even more frustratingly, the option of choosing a prior distribution consciously may be entirely dismissed and some default uniform prior is used. In the following, we try to provide an example that both underlines the usefulness of prior information as well as shows explicitly how it can be done in a non-obvious situation.

Modeling a Short Time Series

Imagine the following scenario. Your company has launched a new division whose task it is to sell bikes in New York City. After observing sales for the first quarter since launch, it is time to forecast sales for the holidays and the upcoming year. The following graph shows the daily sales so far.

The data we use in the following is publicly available Citi Bike data from station 360 in New York City. You can access the raw data here.

A Naive Benchmark Model

Give someone such a time series and they tend to grab ready-made solutions such as Hyndman’s forecast or Sean J. Taylor’s and Ben Letham’s prophet package (for good reason). Let’s see how far these methods get us.

sales <- hciti %>%
  pull(rides) %>%
  ts(frequency = 7)

arima_fit <-  sales %>%
  forecast::auto.arima()
arima_fc <- forecast::forecast(arima_fit, h = 180)

While it is super simple to fit an ARIMA model, it’s often not sufficient to throw the automatic procedure at the data. Here, we quickly lose the weekly seasonality of the data in our forecasts.

The first trial can be improved with a little bit of intervention.

xreg_fourier <- forecast::fourier(sales, K = 3)
xreg_fourier_future <- forecast::fourier(sales, K = 3, h = 180)

arima_fit <- forecast::auto.arima(sales, seasonal = FALSE, 
                                  xreg = xreg_fourier)
arima_fc <- forecast::forecast(arima_fit, h = 180, 
                               xreg = xreg_fourier_future)

While this second iteration might be a better model of the weekly seasonality in our data, it fails to incorporate most of our problem-specific information.

Let’s take a step back and actually think about what we’re trying to do.

Existing Business Knowledge

First of all, we are dealing with count data: We sell 1 bike, we sell 102 bikes, but we don’t sell neither 8.1 bikes nor -15 bikes. As we can see in the plots above, this knowledge is not well represented by the ARIMA models with its assumption of Normal-distributed errors. The prediction intervals quickly move into the negative area which is not sensible and implies that we attribute not enough probability to the positive outcomes.1 One can try to account for this via a Box-Cox transformation, such as forecast::auto.arima(sales, seasonal = FALSE, xreg = xreg_fourier, lambda = 0).

Second, the forecast does not incorporate a yearly seasonality. Even though we have not observed a year of sales, our boss might have the strong belief that bike sales have a seasonality following nature’s seasons, just like the demand for ice cream: Higher in summer, lower in winter. The problem with any kind of seasonality, though, is that common advice is to only model the seasonality once you have at least two periods of data. That is less of a problem with the weekly seasonality, as we already observed several weeks. But waiting for two years before we can even try to incorporate yearly seasonality? That’s going to leave our stakeholders unsatisfied.