On the optimal trimming of high-throughput mRNA sequence data.

Abstract

The widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score <2 or <5, is optimal for most studies across a wide variety of metrics.

Authors

MacManes, Matthew

Publication Date

2014

Published In

Frontiers in Genetics Journal

Keywords

RNAseq

assembly error

illumina

quality control

quality trimming

Digital Object Identifier (doi)

https://doi.org/10.3389/fgene.2014.00013

On the optimal trimming of high-throughput mRNA sequence data.

Overview

Abstract

Authors

Publication Date

Published In

Research

Keywords

Identity

Digital Object Identifier (doi)

Additional Document Info

Start Page

Volume