Description
RNA-seq reads generated by the modENCODE project for D. virilis were
mapped against the D. virilis genome using TopHat2. Unmapped reads are
collected and assembled using ABySS and CAP3. Coding regions within the
assembled unmapped contigs are identified using TransDecoder. This collection of
coding regions are aligned against the D. mojavensis genome using tblastn
followed by Spaln2.
Methods
Unmapped RNA-seq reads are partitioned into 1GB chunks and assembled separately using
ABySS. The assembled contigs
are merged together using CAP3.
Candidate coding regions in the collection of assembled D. virilis
contigs were identified using TransDecoder
using the following parameters: -m 50, --search_pfam Pfam-A.hmm.
The collection of predicted D. virilis proteins were initially mapped
against the D. mojavensis genome using tblastn to identify regions of similarity.
Spaln2 is then used to re-align each protein against their corresponding region
with cross-species parameters optimized for D. melanogaster: (-Tdromel -yX -yS).
References
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I.
ABySS: a parallel assembler for short read sequence data.
Genome Res. 2009 Jun;19(6):1117-23.
Huang X, Madan A.
CAP3: A DNA sequence assembly program.
Genome Res. 1999 Sep;9(9):868-77.
Hass B. TransDecoder (Finding Coding Regions Within Transcripts).
Iwata H., and Gotoh, O.
Benchmarking spliced alignment programs including Spaln2,
an extended version of Spaln2 that incorporates additional species-specific features.
Nucleic Acids Research. 2012, 109
The RNA-Seq data were submitted by the modENCODE project.
The original RNA-Seq dataset can be obtained from the NCBI GEO database under the accession number
GSE28078.
|