Parallelize 'pls' functions
Henrik Bengtsson
Source:vignettes/futurize-81-pls.md
futurize-81-pls.Rmd+
=

The futurize package allows you to easily turn
sequential code into parallel code by piping the sequential code to the
futurize() function. Easy!
Introduction
This vignette demonstrates how to use this approach to parallelize
pls functions
such as mvr(), plsr(), pcr(), and
crossval().
The pls package provides Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR) methods. These methods often use cross-validation (CV) to determine the number of components to use, which can be computationally intensive and is an ideal candidate for parallelization.
Example: PLS Regression with Cross-Validation
The plsr() function is used to perform PLS regression.
When validation = "CV" is specified, it performs
cross-validation.
library(pls)
data(yarn)
## Sequential evaluation
m <- plsr(density ~ NIR, ncomp = 10, data = yarn, validation = "CV")To make it evaluate in parallel, simply pipe the call to
futurize():
library(futurize)
library(pls)
data(yarn)
## Parallel evaluation
m <- plsr(density ~ NIR, ncomp = 10, data = yarn, validation = "CV") |> futurize()This will automatically use the parallel backend set by
plan(), e.g.
plan(multisession)Example: Stand-alone Cross-Validation
The crossval() function can be used to perform
cross-validation on an already fitted model:
Supported Functions
The following pls functions are supported by
futurize():
mvr()plsr()pcr()cppls()-
crossval()withseed = TRUEas the default
Without futurize: Manual ‘pls.options’ setup
For comparison, here is what it takes to parallelize pls
functions using the parallel package directly, without
futurize:
library(pls)
library(parallel)
## Set up a cluster
ncpus <- 4L
cl <- makeCluster(ncpus)
## Configure pls to use the cluster
old_opts <- pls.options(parallel = cl)
## Run regression with cross-validation
data(yarn)
m <- plsr(density ~ NIR, ncomp = 10, data = yarn, validation = "CV")
## Restore original options and stop the cluster
pls.options(old_opts)
stopCluster(cl)This requires you to manually manage the cluster lifecycle and the
global pls.options(). With futurize, the
cluster setup and option management are handled automatically and
localized to the function call.