Schema for D. mojavensis Assembled Proteins - Assembled Proteins from modENCODE D. mojavensis RNA-Seq
  Database: DmojImproved    Primary Table: asm_proteins    Row Count: 89,363
fieldexampleSQL type info
bin 585smallint(5) unsigned range
chrom improved_6498varchar(255) values
chromStart 108969int(10) unsigned range
chromEnd 109302int(10) unsigned range
name Dmoj_cfl_females_00000234_5...varchar(255) values
score 1000int(10) unsigned range
strand -char(1) values
thickStart 108969int(10) unsigned range
thickEnd 109302int(10) unsigned range
reserved 0int(10) unsigned range
blockCount 1int(10) unsigned range
blockSizes 333,longblob  
chromStarts 0longblob  

Sample Rows
 
binchromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStarts
585improved_6498108969109302Dmoj_cfl_females_00000234_551911000-10896910930201333,0
586improved_6498142100142700Dmoj_cfl_males_00000258_722621000-14210014270001600,0
586improved_6498142760143096Dmoj_cfl_males_00000258_722611000-14276014309601336,0
586improved_6498215866216808Dmoj_cfl_females_00000240_551921000-21586621680801942,0
586improved_6498217807218170Dmoj_cfl_females_00000241_551931000-21780721817001363,0
586improved_6498218393218606Dmoj_cfl_females_00000241_551941000-21839321860601213,0
586improved_6498255953257055Dmoj_cfl_males_00022483_94355878-2559532570550238,1064,0,38
587improved_6498309820311062Dmoj_cfl_males_00000262_72263992+309820311062011242,0
587improved_6498316641319733Dmoj_cfl_males_00000003_71972410+31664131973303333,31,266,0,851,2826
587improved_6498373749374889Dmoj_cfl_males_00000265_722651000+373749374889011140,0

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

D. mojavensis Assembled Proteins (asm_proteins) Track Description
 

Description

RNA-seq reads generated by the modENCODE project for D. mojavensis were mapped against the D. mojavensis genome using TopHat2 and predicted transcripts are assembled using Cufflinks and CEM. Coding regions within the predicted transcripts are identified using TransDecoder. This collection of coding regions are aligned against the D. mojavensis genome using blastx followed by Spaln2.

Methods

Candidate coding regions in the collection of assembled D. mojavensis transcripts were identified using TransDecoder using the following parameters: -m 50, --search_pfam Pfam-A.hmm.

The collection of predicted D. mojavensis proteins were initially mapped against the D. mojavensis genome using BLASTX to identify regions of similarity. Spaln2 is then used to re-align each protein against their corresponding region with same-species parameters optimized for D. melanogaster: (-Tdromel -yS).

References

Hass B. TransDecoder (Finding Coding Regions Within Transcripts).

Iwata H., and Gotoh, O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln2 that incorporates additional species-specific features. Nucleic Acids Research. 2012, 109

The RNA-Seq data were submitted by the modENCODE project. The original RNA-Seq dataset can be obtained from the NCBI GEO database under the accession number GSE28078.