how-to-run-multiple-synthetic-forecasts-with-furrr.Rmd
This is a walk through of how the main package function SynthCast::run_synthetic_forecast()
can be used with the furrr package. The goal of furrr is to combine purrr’s family of mapping functions with future’s parallel processing capabilities. Therefore, the objective of this article is to show how you could run multiple forecasts in a parallel manner.
Please fell free to contribute if you think of a batter way to do so! If you want see an usage example of the function SynthCast::run_synthetic_forecast()
see the article How to Run a Synthetic Forecast.
Lets load the packages we will need:
The first thing that a forecast needs a data to be forecasted. The SynthCast provides a example of how it expected a dataset to look like, the code bellow loads the package and the example dataset:
unit | time_period | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | x11 | x12 | x13 | x14 | x15 | x16 | x17 | x18 | x19 | x20 | x21 | x22 | x23 | x24 | x25 | x26 | x27 | x28 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0.4279268 | 0.2329316 | 0.4531898 | 0.5010649 | 0.0140657 | 0.5 | 0.0103704 | 0.0126492 | 0.0061209 | 0.0016722 | 0.0020701 | 0.0229175 | 0.1717596 | 0.0028440 | 0.2961483 | 0.2777202 | 0.0179579 | 0.5 | 0.0186335 | 0.0196256 | 0.0140659 | 0.5 | 0.0191083 | 0.0193874 | 0.0280014 | 0.5 | 0.0062926 | 0.0193874 |
1 | 2 | 0.3923215 | 0.0661752 | 0.4300946 | 0.4639223 | 0.1523873 | 0.5 | 0.0167901 | 0.1340623 | 0.0940312 | 0.0016722 | 0.0063536 | 0.0896040 | 0.1362349 | 0.0028440 | 0.2961483 | 0.2352990 | 0.1657939 | 0.5 | 0.1428571 | 0.1479287 | 0.1589145 | 0.5 | 0.1974522 | 0.1750037 | 0.1949374 | 0.5 | 0.0181592 | 0.1750037 |
1 | 3 | 0.4420440 | 0.1649872 | 0.4336537 | 0.5034269 | 0.2919640 | 0.5 | 0.0395062 | 0.2602215 | 0.1796289 | 0.0016722 | 0.0137895 | 0.1695727 | 0.1045988 | 0.0028440 | 0.2961483 | 0.2088865 | 0.3180237 | 0.5 | 0.3167702 | 0.2890312 | 0.3442300 | 0.5 | 0.3949045 | 0.3201550 | 0.2198580 | 0.5 | 0.0167533 | 0.3201550 |
1 | 4 | 0.4545717 | 0.1076923 | 0.4433019 | 0.5427364 | 0.4315704 | 0.5 | 0.0501235 | 0.3791298 | 0.2685505 | 0.0016722 | 0.0172917 | 0.2420208 | 0.0822586 | 0.0028440 | 0.2961483 | 0.1556901 | 0.4694968 | 0.5 | 0.4223602 | 0.4250857 | 0.5346481 | 0.5 | 0.5859873 | 0.4600435 | 0.2291281 | 0.5 | 0.0072638 | 0.4600435 |
1 | 5 | 0.4223203 | 0.1391912 | 0.4767905 | 0.5474351 | 0.5673960 | 0.5 | 0.0501235 | 0.4999604 | 0.3522328 | 0.1638796 | 0.0279551 | 0.3139178 | 0.0689121 | 0.2787148 | 0.0835851 | 0.1119981 | 0.6177005 | 0.5 | 0.6149068 | 0.5627327 | 0.7247700 | 0.5 | 0.7834395 | 0.5979929 | 0.2351954 | 0.5 | 0.0072638 | 0.5979929 |
1 | 6 | 0.3827364 | 0.1078405 | 0.5021293 | 0.5456524 | 0.6992290 | 0.5 | 0.0688889 | 0.6161397 | 0.4334900 | 0.3311037 | 0.0335161 | 0.3829171 | 0.0602702 | 0.2787148 | 0.0835851 | 0.0985164 | 0.7600335 | 0.5 | 0.7826087 | 0.6957559 | 0.9102858 | 0.5 | 0.9745223 | 0.7413431 | 0.2458748 | 0.5 | 0.0072638 | 0.7413431 |
The dataset is expected to have 3 types of columns:
The solution that I present here consists of creating another dataset that will contain the parameters as columns and each row the combination of parameters.
So lets work with a practical example: Lets supose we want to forecasts 12 and 40 time periods for the units 30
and 40
, for the columns x1
and x3
.
The first thing that we need is a dataframe with the parameters as columns, with the exeception of the dataset itself:
## parametros variaveis
units_of_interest <- c(30,40)
series_of_interest <- c('x1', 'x3')
periods_to_forecast = c(12, 20)
col_time = c('time_period')
col_unit_name = c('unit')
periods_to_forecast = tidyr::tibble(
unit_of_interest = units_of_interest,
periods_to_forecast = periods_to_forecast
)
crossArg <- cross_df(
list(
unit_of_interest = units_of_interest,
serie_of_interest = series_of_interest,
col_time=col_time,
col_unit_name=col_unit_name
)
) %>%
dplyr::left_join(periods_to_forecast, by='unit_of_interest')
kable(crossArg)
unit_of_interest | serie_of_interest | col_time | col_unit_name | periods_to_forecast |
---|---|---|---|---|
30 | x1 | time_period | unit | 12 |
40 | x1 | time_period | unit | 20 |
30 | x3 | time_period | unit | 12 |
40 | x3 | time_period | unit | 20 |
The second thing that we need is to define a function that will wrap the SynthCast::run_synthetic_forecast()
. This is done because in order to map over multiple parameters with the furrr
package with need the parameters in a dataframe. But we cannot put a dataframe in a column, without using column lists.
Note that the object df_example
that is called inside f()
is a global parameter defined above. This is a workaround, but gets the job done.
f = function(col_unit_name, unit_of_interest, col_time, periods_to_forecast, serie_of_interest){
synth_forecast = SynthCast::run_synthetic_forecast(
df=as.data.frame(df_example), # Global environment
col_unit_name=col_unit_name,
unit_of_interest=unit_of_interest,
col_time=col_time,
periods_to_forecast=periods_to_forecast,
serie_of_interest=serie_of_interest
)
return(synth_forecast)
}
Setup the parallel backend, as indicated in the furrr
documentation:
future::plan(multisession) # Setup to use multiple core
Now we can use the furrr:future_pmap()
to run the forecasts:
synthetic_forecasts <- furrr::future_pmap(
crossArg,
f
)
#> [1] "Forecasting Unit: 30 . Serie: x1"
#>
#> X1, X0, Z1, Z0 all come directly from dataprep object.
#>
#>
#> ****************
#> searching for synthetic control unit
#>
#>
#> ****************
#> ****************
#> ****************
#>
#> MSPE (LOSS V): 0.005105562
#>
#> solution.v:
#> 0.03795838 0.02953412 0.03356642 0.01533716 0.1226315 0.1285906 0.05816525 0.02318678 0.01465216 0.01080646 0.06187415 0.0289542 0.01702719 0.08006876 0.009607601 0.01627082 0.1278952 0.02615566 0.01342692 0.04431671 0.04468165 0.01097563 0.04431671
#>
#> solution.w:
#> 1.1452e-05 0.0002666085 0.0001873182 0.0002686277 0.0001636778 0.0003347625 0.0004905744 0.0005939929 0.0005203545 0.5502766 4.8739e-06 0.0007708196 0.0003844661 0.001002508 0.000803214 0.000999687 0.1285913 0.3143298
#>
#> [1] "Forecasting Unit: 40 . Serie: x1"
#>
#> X1, X0, Z1, Z0 all come directly from dataprep object.
#>
#>
#> ****************
#> searching for synthetic control unit
#>
#>
#> ****************
#> ****************
#> ****************
#>
#> MSPE (LOSS V): 0.01126689
#>
#> solution.v:
#> 0.00451887 0.1198275 0.001215545 0.01511271 0.0005242874 0.0207889 0.1270972 0.03638089 0.1136289 0.07111816 0.06962764 0.03914189 0.00266398 0.003293383 0.003680319 0.002879397 0.0001513126 0.1898451 0.003094035 0.0532468 0.06650722 0.002409148 0.0532468
#>
#> solution.w:
#> 2.892e-07 7.7e-09 8.85e-08 2.97e-08 2.778e-07 1.147e-07 1.083e-07 3.687e-07 1.0996e-06 6.876e-07 2.4978e-06 0.0002996523 0.00148343 0.4549331 0.0003138607 0.001220969 0.526065 0.0005761354 5.26296e-05 0.01504971
#>
#> [1] "Forecasting Unit: 30 . Serie: x3"
#>
#> X1, X0, Z1, Z0 all come directly from dataprep object.
#>
#>
#> ****************
#> searching for synthetic control unit
#>
#>
#> ****************
#> ****************
#> ****************
#>
#> MSPE (LOSS V): 0.004444729
#>
#> solution.v:
#> 0.0008966688 0.05006106 0.04589722 0.01215437 0.1364756 0.1380885 0.07131376 0.02247537 0.06353448 0.001899661 0.05256282 0.02746854 0.02928654 0.06565559 0.001513901 0.01546533 0.1404815 0.008559098 0.0178953 0.02120212 0.04279917 0.01311125 0.02120212
#>
#> solution.w:
#> 1.11103e-05 0.0004018295 0.000251307 0.0004156746 0.0002930494 0.0004620298 0.0007838085 0.001376503 0.0006554179 0.5237144 0.001810757 0.001349141 0.0005160587 0.002039712 0.001832605 0.002641334 0.03367259 0.4277733
#>
#> [1] "Forecasting Unit: 40 . Serie: x3"
#>
#> X1, X0, Z1, Z0 all come directly from dataprep object.
#>
#>
#> ****************
#> searching for synthetic control unit
#>
#>
#> ****************
#> ****************
#> ****************
#>
#> MSPE (LOSS V): 0.0003451711
#>
#> solution.v:
#> 0.0838238 0.01402602 0.02662944 0.0001756046 0.08158252 0.04514179 0.03369849 0.0432683 0.04256028 0.01812175 5.0628e-05 0.04334514 0.1161261 0.0002120456 6.248e-05 0.01880793 0.05472154 0.03005398 1.5966e-06 0.142656 0.0129223 0.04935617 0.142656
#>
#> solution.w:
#> 4.158e-07 1.9116e-06 4.2001e-06 3.7943e-06 0.05303326 0.02782933 0.0003817437 1.66687e-05 1.06607e-05 4.89541e-05 8.04749e-05 3.8502e-06 8.695e-07 0.6799389 2.1e-08 3.14e-06 0.001466123 0.151224 0.08586669 8.49833e-05
The synthetic_forecasts
object is a list of output from SynthCast::run_synthetic_forecast()
.
We can now bind the tables from the different forecasts together:
mape_backtest = bind_rows(lapply(synthetic_forecasts, function(x) x$mape_backtest))
kable(mape_backtest)
execution_date | projected_unit | projected_serie | max_time_unit_of_interest | periods_to_forecast | elegible_control_units | number_control_units | mape |
---|---|---|---|---|---|---|---|
2022-03-08 | 30 | x1 | 21 | 12 | 17 | 9 | 13.009279 |
2022-03-08 | 40 | x1 | 11 | 20 | 19 | 6 | 21.482920 |
2022-03-08 | 30 | x3 | 21 | 12 | 17 | 12 | 7.898514 |
2022-03-08 | 40 | x3 | 11 | 20 | 19 | 6 | 3.090236 |