Parallelize 'scuttle' functions
Henrik Bengtsson
Source:vignettes/futurize-81-scuttle.md
futurize-81-scuttle.Rmd
+
=

The futurize package allows you to easily turn
sequential code into parallel code by piping the sequential code to the
futurize() function. Easy!
Introduction
This vignette demonstrates how to use this approach to parallelize the scuttle functions.
The scuttle Bioconductor package provides basic utility functions for single-cell RNA-seq data analysis, including quality control, normalization, and aggregation, which can be parallelized across cells or features.
Example: Computing per-feature QC metrics in parallel
The perFeatureQCMetrics() function computes quality
control metrics for each feature (gene) in a
SingleCellExperiment object:
library(scuttle)
# Simulate data
sce <- mockSCE()
qc <- perFeatureQCMetrics(sce)Here perFeatureQCMetrics() runs sequentially, but we can
easily make it run in parallel by piping to futurize():
library(futurize)
qc <- perFeatureQCMetrics(sce) |> futurize()This will distribute the work across the available parallel workers, given that we have set up parallel workers, e.g.
plan(multisession)The built-in multisession backend parallelizes on your
local computer and works on all operating systems. There are other parallel
backends to choose from, including alternatives to parallelize
locally as well as distributed across remote machines, e.g.
plan(future.mirai::mirai_multisession)and
plan(future.batchtools::batchtools_slurm)Supported Functions
The following scuttle functions are supported by
futurize():
calculateAverage()perFeatureQCMetrics()numDetectedAcrossFeatures()summarizeAssayByGroup()medianSizeFactors()computeMedianFactors()pooledSizeFactors()computePooledFactors()fitLinearModel()
The following scuttle functions are deprecated in
scuttle (>= 1.22) in favor of counter-part functions
in Bioconductor package scrapper.
Support for futurize() of the these deprecated functions
remains, but will be phased out;
logNormCounts()normalizeCounts()perCellQCMetrics()addPerCellQCMetrics()addPerFeatureQCMetrics()addPerCellQC()addPerFeatureQC()numDetectedAcrossCells()sumCountsAcrossCells()sumCountsAcrossFeatures()aggregateAcrossCells()aggregateAcrossFeatures()librarySizeFactors()computeLibraryFactors()geometricSizeFactors()computeGeometricFactors()