Computational Biology Lab Guide

Parallelization

Home
Parallelization

Tutorial on Parallelization in R with mclapply

When you have a list of repetitive tasks, you may be able to speed it up by adding more computing power. If each task is completely independent of the others, then it is a prime candidate for executing those tasks in parallel, each on its own core

Step 1: Loading Necessary Libraries

The parallel library can be used to send tasks (encoded as function calls) to each of the processing cores on your machine in parallel. This is done by using the parallel::mclapply function, which is analogous to lapply, but distributes the tasks to multiple processors. mclapply gathers up the responses from each of these function calls, and returns a list of responses that is the same length as the list or vector of input data (one return per input item).
# Install and load the 'parallel' library
install.packages("parallel")
library(parallel)

Step 2: Define the Processing Function

The essence of parallelization is to execute a function simultaneously on different data (a list of data). The first step is to define the function to be executed. For example, let's create a simple function that takes a number as input and returns its square.
# Define the processing function
process_number <- function(num) { result <- num^2 return(result) }

Step 3: Create a List of Numbers

The essence of parallelization is to execute a function simultaneously on different data (a list of data). Therefore, it is crucial to create a list containing all the elements on which the function will be applied. For example, let's create a list of numbers on which we want to apply the function in parallel.
numbers_list <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Step 4: Parallelize Processing with mclapply

To parallelize the function on each element of the list, mclapply is employed.

mclapply(X, FUN, ..., mc.cores = detectCores())

X: a vector (atomic or list) or an expressions vector. Other objects (including classed objects) will be coerced by as.list.
FUN: the function to be applied to (mclapply) each element of X.

Define the number of cores to be used for parallelization (in this example, all available cores). FUN should be written in this form:

function(y) function_name(y,arg2,arg3, ...)

y represents each element of X (can be any name).
function_name is the name of the function to apply to each element of X.

Here's an example of using mclapply:
# Parallelize processing with mclapply
processing <- mclapply( numbers_list, function(num) process_number(num), mc.cores = detectCores() )

For more information on mclapply:
https://www.rdocumentation.org/packages/parallel/versions/3.4.0/topics/mclapply
https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html