Schema for N-SCAN PASA-EST - N-SCAN PASA-EST Gene Predictions

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Schema for N-SCAN PASA-EST - N-SCAN PASA-EST Gene Predictions

Database: DkikGB2 Primary Table: nscan_pasa Row Count: 16,089

field	example	SQL type	info
`bin`	585	`smallint(5) unsigned`	range
`name`	AFFH02000026.001.1	`varchar(255)`	values
`chrom`	AFFH02000026	`varchar(255)`	values
`strand`	-	`char(1)`	values
`txStart`	669	`int(10) unsigned`	range
`txEnd`	1644	`int(10) unsigned`	range
`cdsStart`	672	`int(10) unsigned`	range
`cdsEnd`	1644	`int(10) unsigned`	range
`exonCount`	2	`int(10) unsigned`	range
`exonStarts`	669,1030,	`longblob`
`exonEnds`	971,1644,	`longblob`
`score`	0	`int(11)`	range
`name2`	gene.AFFH02000026.001	`varchar(255)`	values
`cdsStartStat`	cmpl	`enum('none', 'unk', 'incmpl', 'cmpl')`	values
`cdsEndStat`	cmpl	`enum('none', 'unk', 'incmpl', 'cmpl')`	values
`exonFrames`	1,2,	`longblob`

Sample Rows

bin	name	chrom	strand	txStart	txEnd	cdsStart	cdsEnd	exonCount	exonStarts	exonEnds	score	name2	cdsStartStat	cdsEndStat	exonFrames
585	AFFH02000026.001.1	AFFH02000026	-	669	1644	672	1644	2	669,1030,	971,1644,	0	gene.AFFH02000026.001	cmpl	cmpl	1,2,
585	AFFH02000049.001.1	AFFH02000049	+	322	2923	2019	2712	1	322,	2923,	0	gene.AFFH02000049.001	cmpl	cmpl	0,
585	AFFH02000236.001.1	AFFH02000236	-	51	405	54	405	1	51,	405,	0	gene.AFFH02000236.001	cmpl	cmpl	0,
585	AFFH02000335.001.1	AFFH02000335	+	48277	58251	48371	58248	7	48277,48679,49060,49668,53028,56634,58115,	48386,48997,49609,49853,55630,56729,58251,	0	gene.AFFH02000335.001	cmpl	cmpl	0,0,0,0,2,0,2,
585	AFFH02000335.002.1	AFFH02000335	+	59834	62179	60056	62176	4	59834,60700,60943,61628,	60640,60855,61570,62179,	0	gene.AFFH02000335.002	cmpl	cmpl	0,2,1,1,
585	AFFH02000335.003.1	AFFH02000335	+	63167	68509	64891	67887	5	63167,64860,65637,67284,67489,	63481,65570,66144,67428,68509,	0	gene.AFFH02000335.003	cmpl	cmpl	-1,0,1,1,1,
585	AFFH02000335.003.1.1.55030110	AFFH02000335	+	64100	68509	64891	67887	5	64100,64860,65637,67284,67489,	64396,65570,66144,67428,68509,	0	gene.AFFH02000335.003	cmpl	cmpl	-1,0,1,1,1,
585	AFFH02000335.004.1	AFFH02000335	-	67553	70069	68143	70056	8	67553,68306,68573,68906,69421,69575,69703,69983,	68210,68517,68851,69361,69517,69643,69928,70069,	0	gene.AFFH02000335.004	cmpl	cmpl	2,1,2,0,0,1,1,0,
585	AFFH02000335.005.1	AFFH02000335	+	70153	72877	70931	72729	2	70153,71728,	71670,72877,	0	gene.AFFH02000335.005	cmpl	cmpl	0,1,
585	AFFH02000335.006.1	AFFH02000335	+	73747	74684	73844	74681	3	73747,74134,74354,	74070,74226,74684,	0	gene.AFFH02000335.006	cmpl	cmpl	0,1,0,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

N-SCAN PASA-EST (nscan_pasa) Track Description


	Description This track shows gene predictions using the N-SCAN gene structure prediction program with multiple Drosophila species as informant. Methods N-SCAN N-SCAN combines biological-signal modeling in the target genome sequence along with information from a multiple-genome alignment to generate de novo gene predictions. It extends the TWINSCAN target-informant genome pair to allow for an arbitrary number of informant sequences as well as richer models of sequence evolution. N-SCAN models the phylogenetic relationships between the aligned genome sequences, context-dependent substitution rates, insertions, and deletions. N-SCAN PASA-EST N-SCAN PASA-EST combines EST alignments into N-SCAN. Similar to the conservation sequence models in TWINSCAN, separate probability models are developed for EST alignments to genomic sequence in exons, introns, splice sites and UTRs, reflecting the EST alignment patterns in these regions. N-SCAN PASA-EST is more accurate than N-SCAN while retaining the ability to discover novel genes to which no ESTs align. In N-SCAN PASA-EST, the TransDecoder gene predictions are used as 'EST' sequences in N-SCAN PASA-EST. The resulting gene models were updated with the input PASA clusters using the assembly tool of the PASA pipeline. These updates consist of automatically generated alternative splices, UTR features and sometimes merging of two gene models. In addition, PASA assigned open reading frames to clusters that did not overlap a gene prediction, but that did contain a full length cDNA, and output them as 'novel genes'. Note that PASA does not use any cDNA annotation from input but assigns the ORF itself. References Gross SS, Brent MR. Using multiple alignments to improve gene prediction. In Proc. 9th Int'l Conf. on Research in Computational Molecular Biology (RECOMB '05):374-388 and J Comput Biol. 2006 Mar;13(2):379-93. Korf I, Flicek P, Duan D, Brent MR. Integrating genomic homology into gene structure prediction. Bioinformatics. 2001 Jun 1;17(90001):S140-8. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 2003 Oct 1;31(19):5654-66.