Schema for N-SCAN PASA-EST - N-SCAN PASA-EST Gene Predictions
  Database: DkikGB2    Primary Table: nscan_pasa    Row Count: 16,089
fieldexampleSQL type info
bin 585smallint(5) unsigned range
name AFFH02000026.001.1varchar(255) values
chrom AFFH02000026varchar(255) values
strand -char(1) values
txStart 669int(10) unsigned range
txEnd 1644int(10) unsigned range
cdsStart 672int(10) unsigned range
cdsEnd 1644int(10) unsigned range
exonCount 2int(10) unsigned range
exonStarts 669,1030,longblob  
exonEnds 971,1644,longblob  
score 0int(11) range
name2 gene.AFFH02000026.001varchar(255) values
cdsStartStat cmplenum('none', 'unk', 'incmpl', 'cmpl') values
cdsEndStat cmplenum('none', 'unk', 'incmpl', 'cmpl') values
exonFrames 1,2,longblob  

Sample Rows
 
binnamechromstrandtxStarttxEndcdsStartcdsEndexonCountexonStartsexonEndsscorename2cdsStartStatcdsEndStatexonFrames
585AFFH02000026.001.1AFFH02000026-669164467216442669,1030,971,1644,0gene.AFFH02000026.001cmplcmpl1,2,
585AFFH02000049.001.1AFFH02000049+3222923201927121322,2923,0gene.AFFH02000049.001cmplcmpl0,
585AFFH02000236.001.1AFFH02000236-5140554405151,405,0gene.AFFH02000236.001cmplcmpl0,
585AFFH02000335.001.1AFFH02000335+48277582514837158248748277,48679,49060,49668,53028,56634,58115,48386,48997,49609,49853,55630,56729,58251,0gene.AFFH02000335.001cmplcmpl0,0,0,0,2,0,2,
585AFFH02000335.002.1AFFH02000335+59834621796005662176459834,60700,60943,61628,60640,60855,61570,62179,0gene.AFFH02000335.002cmplcmpl0,2,1,1,
585AFFH02000335.003.1AFFH02000335+63167685096489167887563167,64860,65637,67284,67489,63481,65570,66144,67428,68509,0gene.AFFH02000335.003cmplcmpl-1,0,1,1,1,
585AFFH02000335.003.1.1.55030110AFFH02000335+64100685096489167887564100,64860,65637,67284,67489,64396,65570,66144,67428,68509,0gene.AFFH02000335.003cmplcmpl-1,0,1,1,1,
585AFFH02000335.004.1AFFH02000335-67553700696814370056867553,68306,68573,68906,69421,69575,69703,69983,68210,68517,68851,69361,69517,69643,69928,70069,0gene.AFFH02000335.004cmplcmpl2,1,2,0,0,1,1,0,
585AFFH02000335.005.1AFFH02000335+70153728777093172729270153,71728,71670,72877,0gene.AFFH02000335.005cmplcmpl0,1,
585AFFH02000335.006.1AFFH02000335+73747746847384474681373747,74134,74354,74070,74226,74684,0gene.AFFH02000335.006cmplcmpl0,1,0,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

N-SCAN PASA-EST (nscan_pasa) Track Description
 

Description

This track shows gene predictions using the N-SCAN gene structure prediction program with multiple Drosophila species as informant.

Methods

N-SCAN

N-SCAN combines biological-signal modeling in the target genome sequence along with information from a multiple-genome alignment to generate de novo gene predictions. It extends the TWINSCAN target-informant genome pair to allow for an arbitrary number of informant sequences as well as richer models of sequence evolution. N-SCAN models the phylogenetic relationships between the aligned genome sequences, context-dependent substitution rates, insertions, and deletions.

N-SCAN PASA-EST

N-SCAN PASA-EST combines EST alignments into N-SCAN. Similar to the conservation sequence models in TWINSCAN, separate probability models are developed for EST alignments to genomic sequence in exons, introns, splice sites and UTRs, reflecting the EST alignment patterns in these regions. N-SCAN PASA-EST is more accurate than N-SCAN while retaining the ability to discover novel genes to which no ESTs align.

In N-SCAN PASA-EST, the TransDecoder gene predictions are used as 'EST' sequences in N-SCAN PASA-EST. The resulting gene models were updated with the input PASA clusters using the assembly tool of the PASA pipeline. These updates consist of automatically generated alternative splices, UTR features and sometimes merging of two gene models. In addition, PASA assigned open reading frames to clusters that did not overlap a gene prediction, but that did contain a full length cDNA, and output them as 'novel genes'. Note that PASA does not use any cDNA annotation from input but assigns the ORF itself.

References