Drosophila Conservation (36 Species) Track Settings

This track shows the multiple genome alignment of 36 Drosophila species. It also shows the measurements of evolutionary conservation using phastCons and phyloP from the Phylogenetic Analysis with Space/Time models (PHAST) package.

Methods

Whole Genome Alignments

The genome assemblies for 35 Drosophila species were obtained from the NCBI RefSeq database. Each Drosophila genome assembly was aligned against the Drosophila melanogaster (dm6) assembly using LAST. The following table shows the 36 Drosophila genome assemblies used to construct the ROAST Alignments track:

Species	Assembly Name	RefSeq Accession	UCSC Assembly
Drosophila melanogaster	Release 6 plus ISO1 MT	GCF_000001215.4	dm6
Drosophila mauritiana	ASM438214v1	GCF_004382145.1	DmauRefSeq1
Drosophila sechellia	ASM438219v1	GCF_004382195.1	DsecRefSeq1
Drosophila simulans	Prin_Dsim_3.1	GCF_016746395.2	DsimRefSeq3
Drosophila yakuba	Prin_Dyak_Tai18E2_2.1	GCF_016746365.2	DyakRefSeq3
Drosophila santomea	Prin_Dsan_1.1	GCF_016746245.2	DsanRefSeq2
Drosophila teissieri	Prin_Dtei_1.1	GCF_016746235.2	DteiRefSeq1
Drosophila erecta	DereRS2	GCF_003286155.1	DereRefSeq1
Drosophila ficusphila	ASM1815226v1	GCF_018152265.1	DficRefSeq2
Drosophila suzukii	LBDM_Dsuz_2.1.pri	GCF_013340165.1	DsuzRefSeq2
Drosophila subpulchrella	RU_Dsub_v1.1	GCF_014743375.2	DspuRefSeq1
Drosophila biarmipes	ASM1814893v1	GCF_018148935.1	DbiaRefSeq2
Drosophila takahashii	ASM1815269v1	GCF_018152695.1	DtakRefSeq2
Drosophila eugracilis	ASM1815383v1	GCF_018153835.1	DeugRefSeq2
Drosophila rhopaloa	ASM1815211v1	GCF_018152115.1	DrhoRefSeq2
Drosophila elegans	ASM1815250v1	GCF_018152505.1	DeleRefSeq2
Drosophila kikkawai	ASM1815253v1	GCF_018152535.1	DkikRefSeq2
Drosophila serrata	Dser1.0	GCF_002093755.1	DserRefSeq1
Drosophila bipectinata	ASM1815384v1	GCF_018153845.1	DbipRefSeq2
Drosophila ananassae	ASM1763931v2	GCF_017639315.1	DanaRefSeq2
Drosophila pseudoobscura	UCI_Dpse_MV25	GCF_009870125.1	DpseRefSeq1
Drosophila persimilis	DperRS2	GCF_003286085.1	DperRefSeq1
Drosophila miranda	D.miranda_PacBio2.1	GCF_003369915.1	DmirRefSeq1
Drosophila guanche	DGUA_6	GCF_900245975.1	DguaRefSeq1
Drosophila subobscura	UCBerk_Dsub_1.0	GCF_008121235.1	DsobRefSeq1
Drosophila obscura	ASM1815110v1	GCF_018151105.1	DobsRefSeq2
Drosophila willistoni	UCI_dwil_1.1	GCF_018902025.1	DwilRefSeq2
Drosophila arizonae	ASM165402v1	GCF_001654025.1	DariRefSeq1
Drosophila mojavensis	ASM1815372v1	GCF_018153725.1	DmojRefSeq2
Drosophila navojoa	UFRJ_Dnav_4.2	GCF_001654015.2	DnavRefSeq1
Drosophila hydei	DhydRS2	GCF_003285905.1	DhydRefSeq1
Drosophila virilis	DvirRS2	GCF_003285735.1	DvirRefSeq1
Drosophila novamexicana	DnovRS2.1	GCF_003285875.2	DnovRefSeq1
Drosophila albomicans	drosAlbom15112-1751.03v1	GCF_009650485.1	DalbRefSeq1
Drosophila grimshawi	ASM1815329v1	GCF_018153295.1	DgriRefSeq2
Drosophila busckii	ASM1175060v1	GCF_011750605.1	DbusRefSeq1

Phylogenetic tree of 36 Drosophila species

The initial set of whole genome alignments were filtered using the 2-split, post-masked strategy with last-split and last-postmask to construct the one-to-one alignments between D. melanogaster and the target genome. The alignments were then processed using the utilities developed by the UCSC Genome Bioinformatics Group. These whole genome alignments were combined into a multiple sequence alignment using ROAST.

The codon translations associated with the multiple sequence alignment were based on FlyBase release 6.46 for D. melanogaster.

Phylogenetic Tree Model

The non-conserved model used by phastCons and phyloP was constructed by the phyloFit program from the PHAST package based on four-fold degenerate (4d) sites. The 4d sites were defined by the FlyBase gene annotations, and extracted from the multiple sequence alignment using msa_view. The non-conserved phylogenetic model was estimated by phyloFit using the general reversible (REV) substitution model, the EM algorithm, and medium (MED) precision.

PhastCons Conservation

Conserved elements were identified by phastCons using a target coverage of 0.3 and an expected length of 45. The conserved model is defined as a scaled version of the non-conserved model with the scaling factor rho of 0.3.

PhyloP Conservation

The conservation score for each site of the alignment was determined by phyloP using the likelihood ratio test (LRT) and the CONACC mode. Sites with positive scores indicate conservation while sites with negative scores indicate acceleration.

Display Conventions and Configuration

In full and pack display modes, conservation scores are displayed as a wiggle track (histogram) in which the height reflects the value of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. (See the "Configuring graph-based tracks" page for details.)

Pairwise alignments of each species to the D. melanogaster genome are displayed below the conservation histogram as a grayscale density plot (in pack mode) or as a wiggle (in full mode) that indicates alignment quality. In dense display mode, conservation is shown in grayscale using darker values to indicate higher levels of overall conservation as scored by phastCons.

Checkboxes on the track configuration page allow selection of the species to include in the pairwise display. The "+" and "-" buttons allow you to select or unselect multiple species at once. Note that excluding species from the pairwise display does not alter the conservation score display.

To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment.

Gap Annotation

The following display conventions are used to depict the different types of gaps in the alignment:

Genomic Breaks

Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows:

Base Level

When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the D. melanogaster sequence at those alignment positions relative to the longest non-D. melanogaster sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+".

Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes:

Description