This is a walk through of how the main package function SynthCast::run_synthetic_forecast() can be used with the furrr package. The goal of furrr is to combine purrr’s family of mapping functions with future’s parallel processing capabilities. Therefore, the objective of this article is to show how you could run multiple forecasts in a parallel manner.

Please fell free to contribute if you think of a batter way to do so! If you want see an usage example of the function SynthCast::run_synthetic_forecast() see the article How to Run a Synthetic Forecast.

Lets load the packages we will need:

The Dataset

The first thing that a forecast needs a data to be forecasted. The SynthCast provides a example of how it expected a dataset to look like, the code bellow loads the package and the example dataset:

library(knitr)
library(SynthCast)
data('df_example')
kable(head(df_example)) 
unit time_period x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25 x26 x27 x28
1 1 0.4279268 0.2329316 0.4531898 0.5010649 0.0140657 0.5 0.0103704 0.0126492 0.0061209 0.0016722 0.0020701 0.0229175 0.1717596 0.0028440 0.2961483 0.2777202 0.0179579 0.5 0.0186335 0.0196256 0.0140659 0.5 0.0191083 0.0193874 0.0280014 0.5 0.0062926 0.0193874
1 2 0.3923215 0.0661752 0.4300946 0.4639223 0.1523873 0.5 0.0167901 0.1340623 0.0940312 0.0016722 0.0063536 0.0896040 0.1362349 0.0028440 0.2961483 0.2352990 0.1657939 0.5 0.1428571 0.1479287 0.1589145 0.5 0.1974522 0.1750037 0.1949374 0.5 0.0181592 0.1750037
1 3 0.4420440 0.1649872 0.4336537 0.5034269 0.2919640 0.5 0.0395062 0.2602215 0.1796289 0.0016722 0.0137895 0.1695727 0.1045988 0.0028440 0.2961483 0.2088865 0.3180237 0.5 0.3167702 0.2890312 0.3442300 0.5 0.3949045 0.3201550 0.2198580 0.5 0.0167533 0.3201550
1 4 0.4545717 0.1076923 0.4433019 0.5427364 0.4315704 0.5 0.0501235 0.3791298 0.2685505 0.0016722 0.0172917 0.2420208 0.0822586 0.0028440 0.2961483 0.1556901 0.4694968 0.5 0.4223602 0.4250857 0.5346481 0.5 0.5859873 0.4600435 0.2291281 0.5 0.0072638 0.4600435
1 5 0.4223203 0.1391912 0.4767905 0.5474351 0.5673960 0.5 0.0501235 0.4999604 0.3522328 0.1638796 0.0279551 0.3139178 0.0689121 0.2787148 0.0835851 0.1119981 0.6177005 0.5 0.6149068 0.5627327 0.7247700 0.5 0.7834395 0.5979929 0.2351954 0.5 0.0072638 0.5979929
1 6 0.3827364 0.1078405 0.5021293 0.5456524 0.6992290 0.5 0.0688889 0.6161397 0.4334900 0.3311037 0.0335161 0.3829171 0.0602702 0.2787148 0.0835851 0.0985164 0.7600335 0.5 0.7826087 0.6957559 0.9102858 0.5 0.9745223 0.7413431 0.2458748 0.5 0.0072638 0.7413431

The dataset is expected to have 3 types of columns:

    1. A unit column: containing a numeric identification of the unit. In the credit card example this could the the customer, a group of customer, etc.,;
    1. A time columns: containing the time in integer. In the credit card example this would be the age in months of the respective unit (say 1 for first month, 2 for the second month, etc.,);
    1. Feature Columns: Numeric features, with both the serie(s) that will be forecasted as well as features to use to forecast. In the credit card this could be the profitability and transactional features.

Setting Up the Requirements

The solution that I present here consists of creating another dataset that will contain the parameters as columns and each row the combination of parameters.

So lets work with a practical example: Lets supose we want to forecasts 12 and 40 time periods for the units 30 and 40, for the columns x1 and x3.

Dataframe with Parameters

The first thing that we need is a dataframe with the parameters as columns, with the exeception of the dataset itself:



## parametros variaveis
units_of_interest <- c(30,40)
series_of_interest <- c('x1', 'x3')
periods_to_forecast = c(12, 20) 

col_time = c('time_period')
col_unit_name = c('unit')
periods_to_forecast = tidyr::tibble(
  unit_of_interest = units_of_interest,
  periods_to_forecast = periods_to_forecast
)

crossArg <- cross_df(
  list(
    unit_of_interest = units_of_interest,
    serie_of_interest = series_of_interest,
    col_time=col_time,
    col_unit_name=col_unit_name
  )
) %>%
  dplyr::left_join(periods_to_forecast, by='unit_of_interest')

kable(crossArg)
unit_of_interest serie_of_interest col_time col_unit_name periods_to_forecast
30 x1 time_period unit 12
40 x1 time_period unit 20
30 x3 time_period unit 12
40 x3 time_period unit 20
Wrapper Function

The second thing that we need is to define a function that will wrap the SynthCast::run_synthetic_forecast(). This is done because in order to map over multiple parameters with the furrr package with need the parameters in a dataframe. But we cannot put a dataframe in a column, without using column lists.

Note that the object df_example that is called inside f() is a global parameter defined above. This is a workaround, but gets the job done.

f = function(col_unit_name, unit_of_interest, col_time, periods_to_forecast, serie_of_interest){
  synth_forecast = SynthCast::run_synthetic_forecast(
    df=as.data.frame(df_example), # Global environment
    col_unit_name=col_unit_name,
    unit_of_interest=unit_of_interest,
    col_time=col_time,
    periods_to_forecast=periods_to_forecast,
    serie_of_interest=serie_of_interest
  )
  return(synth_forecast)
}

Running the Forecasts Parallelized

Setup the parallel backend, as indicated in the furrr documentation:

future::plan(multisession) # Setup to use multiple core

Now we can use the furrr:future_pmap() to run the forecasts:

synthetic_forecasts <- furrr::future_pmap(
  crossArg,
  f
)
#> [1] "Forecasting Unit:  30 . Serie:  x1"
#> 
#> X1, X0, Z1, Z0 all come directly from dataprep object.
#> 
#> 
#> **************** 
#>  searching for synthetic control unit  
#>  
#> 
#> **************** 
#> **************** 
#> **************** 
#> 
#> MSPE (LOSS V): 0.005105562 
#> 
#> solution.v:
#>  0.03795838 0.02953412 0.03356642 0.01533716 0.1226315 0.1285906 0.05816525 0.02318678 0.01465216 0.01080646 0.06187415 0.0289542 0.01702719 0.08006876 0.009607601 0.01627082 0.1278952 0.02615566 0.01342692 0.04431671 0.04468165 0.01097563 0.04431671 
#> 
#> solution.w:
#>  1.1452e-05 0.0002666085 0.0001873182 0.0002686277 0.0001636778 0.0003347625 0.0004905744 0.0005939929 0.0005203545 0.5502766 4.8739e-06 0.0007708196 0.0003844661 0.001002508 0.000803214 0.000999687 0.1285913 0.3143298 
#> 
#> [1] "Forecasting Unit:  40 . Serie:  x1"
#> 
#> X1, X0, Z1, Z0 all come directly from dataprep object.
#> 
#> 
#> **************** 
#>  searching for synthetic control unit  
#>  
#> 
#> **************** 
#> **************** 
#> **************** 
#> 
#> MSPE (LOSS V): 0.01126689 
#> 
#> solution.v:
#>  0.00451887 0.1198275 0.001215545 0.01511271 0.0005242874 0.0207889 0.1270972 0.03638089 0.1136289 0.07111816 0.06962764 0.03914189 0.00266398 0.003293383 0.003680319 0.002879397 0.0001513126 0.1898451 0.003094035 0.0532468 0.06650722 0.002409148 0.0532468 
#> 
#> solution.w:
#>  2.892e-07 7.7e-09 8.85e-08 2.97e-08 2.778e-07 1.147e-07 1.083e-07 3.687e-07 1.0996e-06 6.876e-07 2.4978e-06 0.0002996523 0.00148343 0.4549331 0.0003138607 0.001220969 0.526065 0.0005761354 5.26296e-05 0.01504971 
#> 
#> [1] "Forecasting Unit:  30 . Serie:  x3"
#> 
#> X1, X0, Z1, Z0 all come directly from dataprep object.
#> 
#> 
#> **************** 
#>  searching for synthetic control unit  
#>  
#> 
#> **************** 
#> **************** 
#> **************** 
#> 
#> MSPE (LOSS V): 0.004444729 
#> 
#> solution.v:
#>  0.0008966688 0.05006106 0.04589722 0.01215437 0.1364756 0.1380885 0.07131376 0.02247537 0.06353448 0.001899661 0.05256282 0.02746854 0.02928654 0.06565559 0.001513901 0.01546533 0.1404815 0.008559098 0.0178953 0.02120212 0.04279917 0.01311125 0.02120212 
#> 
#> solution.w:
#>  1.11103e-05 0.0004018295 0.000251307 0.0004156746 0.0002930494 0.0004620298 0.0007838085 0.001376503 0.0006554179 0.5237144 0.001810757 0.001349141 0.0005160587 0.002039712 0.001832605 0.002641334 0.03367259 0.4277733 
#> 
#> [1] "Forecasting Unit:  40 . Serie:  x3"
#> 
#> X1, X0, Z1, Z0 all come directly from dataprep object.
#> 
#> 
#> **************** 
#>  searching for synthetic control unit  
#>  
#> 
#> **************** 
#> **************** 
#> **************** 
#> 
#> MSPE (LOSS V): 0.0003451711 
#> 
#> solution.v:
#>  0.0838238 0.01402602 0.02662944 0.0001756046 0.08158252 0.04514179 0.03369849 0.0432683 0.04256028 0.01812175 5.0628e-05 0.04334514 0.1161261 0.0002120456 6.248e-05 0.01880793 0.05472154 0.03005398 1.5966e-06 0.142656 0.0129223 0.04935617 0.142656 
#> 
#> solution.w:
#>  4.158e-07 1.9116e-06 4.2001e-06 3.7943e-06 0.05303326 0.02782933 0.0003817437 1.66687e-05 1.06607e-05 4.89541e-05 8.04749e-05 3.8502e-06 8.695e-07 0.6799389 2.1e-08 3.14e-06 0.001466123 0.151224 0.08586669 8.49833e-05

The synthetic_forecasts object is a list of output from SynthCast::run_synthetic_forecast().

Bind Result Tables

We can now bind the tables from the different forecasts together:

mape_backtest = bind_rows(lapply(synthetic_forecasts, function(x) x$mape_backtest))

kable(mape_backtest) 
execution_date projected_unit projected_serie max_time_unit_of_interest periods_to_forecast elegible_control_units number_control_units mape
2022-03-08 30 x1 21 12 17 9 13.009279
2022-03-08 40 x1 11 20 19 6 21.482920
2022-03-08 30 x3 21 12 17 12 7.898514
2022-03-08 40 x3 11 20 19 6 3.090236