In a recent preprint by Carrier et al., they describe the application of High-Performance Computing (HPC) methodologies to Next Generation Sequencing (NGS) pipelines and workflows. Specifically, they show classic MPI parallelization techniques applied to the Inchworm module in Trinity RNA-Seq. Inchworm is the de novo assembly step in the Trinity RNA-Seq pipeline. The MPI-enabled code is naturally called MPI-Inchworm; source code is available in current Trinity repositories.
Test runs on Cray XC40‘s with mouse (Mus musculus) and salamander (Axolotl sp.) transcriptome datasets showed both time-scaling and memory-scaling. Having both parameters scaling is very important for large-scale transcriptome assembly. They also discuss how MPI distributed memory parallelism enables Inchworm assembly of large eukaryotic transcriptomes.
An important contribution of the paper is how it bridges the gap between HPC software development methods and NGS workflows.