Schema for D. virilis Assembled Proteins - Assembled Proteins from modENCODE D. virilis RNA-Seq
  Database: DmojImproved    Primary Table: dvir_asm_proteins    Row Count: 90,279
fieldexampleSQL type info
bin 585smallint(5) unsigned range
chrom improved_6498varchar(255) values
chromStart 72630int(10) unsigned range
chromEnd 74508int(10) unsigned range
name Dvir_cfl_males_00000434_84938varchar(255) values
score 428int(10) unsigned range
strand -char(1) values
thickStart 72630int(10) unsigned range
thickEnd 74508int(10) unsigned range
reserved 0int(10) unsigned range
blockCount 4int(10) unsigned range
blockSizes 86,269,154,16,longblob  
chromStarts 0,301,618,1862longblob  

Sample Rows
 
binchromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStarts
585improved_64987263074508Dvir_cfl_males_00000434_84938428-72630745080486,269,154,16,0,301,618,1862
587improved_6498378559379615Dvir_cfl_females_00021405_84507440+3785593796150215,801,0,255
587improved_6498378559379615Dvir_cfl_males_00024546_106552440+3785593796150215,801,0,255
587improved_6498386506387698Dvir_cem_SRR166837_45710_2_2_22251934-386506387698032,554,419,0,125,773
587improved_6498386506389076Dvir_cfl_males_00014122_97065927-38650638907604679,1434,90,146,0,773,2275,2424
587improved_6498386506389076Dvir_cfl_females_00012412_74523927-38650638907604679,1434,90,146,0,773,2275,2424
587improved_6498386506388798Dvir_cem_SRR166836_35598_0_2_5657920-38650638879803679,1434,17,0,773,2275
587improved_6498386506389076Dvir_cem_SRR166836_35598_0_3_5659927-38650638907604679,1434,90,146,0,773,2275,2424
587improved_6498386701387698Dvir_cem_SRR768440_43597_1_1_52382941-38670138769802484,419,0,578
587improved_6498386701387698Dvir_cem_SRR768440_43597_1_2_52384941-38670138769802484,419,0,578

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

D. virilis Assembled Proteins (dvir_asm_proteins) Track Description
 

Description

RNA-seq reads generated by the modENCODE project for D. virilis were mapped against the D. virilis genome using TopHat2 and predicted transcripts are assembled using Cufflinks and CEM. Coding regions within the predicted transcripts are identified using TransDecoder. This collection of coding regions are aligned against the D. mojavensis genome using blastx followed by Spaln2.

Methods

Candidate coding regions in the collection of assembled D. virilis transcripts were identified using TransDecoder using the following parameters: -m 50, --search_pfam Pfam-A.hmm.

The collection of predicted D. virilis proteins were initially mapped against the D. mojavensis genome using BLASTX to identify regions of similarity. Spaln2 is then used to re-align each protein against their corresponding region with cross-species parameters optimized for D. melanogaster: (-Tdromel -yX -yS).

References

Hass B. TransDecoder (Finding Coding Regions Within Transcripts).

Iwata H., and Gotoh, O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln2 that incorporates additional species-specific features. Nucleic Acids Research. 2012, 109

The RNA-Seq data were submitted by the modENCODE project. The original RNA-Seq dataset can be obtained from the NCBI GEO database under the accession number GSE28078.