Authors: Toni Monleon-Getino,Jorge Frias-Lopez

Institutions:

  • Section of Statistics (Department of Genetics, Microbiology, and Statistics), University of Barcelona, Barcelona, Spain
  • College of Dentistry, University of Florida, Gainesville, FL, USA

Publication: Ecology and Evolution

Acknowledgements: Walter Sanseverino and Andreu Paytuvi Gallart

Date: December 2020

Full paper: https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.6941

Abstract:

Metatranscriptome analysis or the analysis of the expression profiles of whole microbial communities has the additional challenge of dealing with a complex system with dozens of different organisms expressing genes simultaneously. An underlying issue for virtually all metatranscriptomic sequencing experiments is how to allocate the limited sequencing budget while guaranteeing that the libraries have sufficient depth to cover the breadth of expression of the community. Estimating the required sequencing depth to effectively sample the target metatranscriptome using RNA-seq is an essential first step to obtain robust results in subsequent analysis and to avoid overexpansion, once the information contained in the library reaches saturation. Here, we present a method to calculate the sequencing effort using a simulated series of metatranscriptomic/metagenomic matrices. This method is based on an extrapolation rarefaction curve using a Weibull growth model to estimate the maximum number of observed genes as a function of sequencing depth. This approach allowed us to compute the effort at different confidence intervals and to obtain an approximate a priori effort based on an initial fraction of sequences. The analytical pipeline presented here may be successfully used for the in-depth and time-effective characterization of complex microbial communities, representing a useful tool for the microbiome research community.