Schema for D. virilis Unmapped Proteins - Assembled Unmapped Proteins from modENCODE D. virilis RNA-Seq
  Database: DmojImproved    Primary Table: dvir_unmapped_proteins    Row Count: 51,792
fieldexampleSQL type info
bin 587smallint(5) unsigned range
chrom improved_6498varchar(255) values
chromStart 384877int(10) unsigned range
chromEnd 388933int(10) unsigned range
name Dvir_unmapped_89183_95399varchar(255) values
score 938int(10) unsigned range
strand -char(1) values
thickStart 384877int(10) unsigned range
thickEnd 388933int(10) unsigned range
reserved 0int(10) unsigned range
blockCount 3int(10) unsigned range
blockSizes 11,280,3,longblob  
chromStarts 0,3384,4053longblob  

Sample Rows
 
binchromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStarts
587improved_6498384877388933Dvir_unmapped_89183_95399938-3848773889330311,280,3,0,3384,4053
587improved_6498386950387136Dvir_unmapped_23286_27092924-38695038713601186,0
587improved_6498387121387509Dvir_unmapped_22446_69476845-3871213875090264,230,0,158
587improved_6498387518388086Dvir_unmapped_3584_75022919-38751838808602326,13,0,555
587improved_6498387623387836Dvir_unmapped_59031_45599885-38762338783601213,0
587improved_6498387839388933Dvir_unmapped_5593_83119896-38783938893302410,7,0,1087
587improved_6498387956388211Dvir_unmapped_18302_24464902-38795638821101255,0
587improved_6498388196388397Dvir_unmapped_53075_42452980-38819638839701201,0
587improved_6498388397388679Dvir_unmapped_7666_49711984-38839738867901282,0
587improved_6498388550389623Dvir_unmapped_39284_76428919-38855038962304163,90,173,3,0,231,380,1070

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

D. virilis Unmapped Proteins (dvir_unmapped_proteins) Track Description
 

Description

RNA-seq reads generated by the modENCODE project for D. virilis were mapped against the D. virilis genome using TopHat2. Unmapped reads are collected and assembled using ABySS and CAP3. Coding regions within the assembled unmapped contigs are identified using TransDecoder. This collection of coding regions are aligned against the D. mojavensis genome using tblastn followed by Spaln2.

Methods

Unmapped RNA-seq reads are partitioned into 1GB chunks and assembled separately using ABySS. The assembled contigs are merged together using CAP3.

Candidate coding regions in the collection of assembled D. virilis contigs were identified using TransDecoder using the following parameters: -m 50, --search_pfam Pfam-A.hmm.

The collection of predicted D. virilis proteins were initially mapped against the D. mojavensis genome using tblastn to identify regions of similarity. Spaln2 is then used to re-align each protein against their corresponding region with cross-species parameters optimized for D. melanogaster: (-Tdromel -yX -yS).

References

Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009 Jun;19(6):1117-23.

Huang X, Madan A. CAP3: A DNA sequence assembly program. Genome Res. 1999 Sep;9(9):868-77.

Hass B. TransDecoder (Finding Coding Regions Within Transcripts).

Iwata H., and Gotoh, O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln2 that incorporates additional species-specific features. Nucleic Acids Research. 2012, 109

The RNA-Seq data were submitted by the modENCODE project. The original RNA-Seq dataset can be obtained from the NCBI GEO database under the accession number GSE28078.