bigMaf Track Format

The bigMaf format stores multiple alignments in a format compatible with MAF files, which is then compressed and indexed as a bigBed.

The bigMaf files are created using the program bedToBigBed, run with the -as option to pull in a special autoSql (.as) file that defines the fields of the bigMaf.

The bigMaf files are in an indexed binary format. The main advantage of this format is that only those portions of the file needed to display a particular region are transferred to the Genome Browser server. Because of this, bigMaf files have considerably faster display performance than regular MAF files when working with large data sets. The bigMaf file remains on your local web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for the currently displayed chromosomal position is locally cached as a "sparse file". If you do not have access to a web-accessible server and need hosting space for your bigMaf files, please see the Hosting section of the Track Hub Help documentation.

bigMaf file definition

The following autoSql definition is used to specify bigMaf multiple alignment files. This definition, contained in the file bigMaf.as, is pulled in when the bedToBigBed utility is run with the -as=bigMaf.as option.

bigMaf.as

table bedMaf
"Bed3 with MAF block"
    (
    string chrom;      "Reference sequence chromosome or scaffold"
    uint   chromStart; "Start position in chromosome"
    uint   chromEnd;   "End position in chromosome"
    lstring mafBlock;  "MAF block"
    )

An example: bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt hg38.chrom.sizes bigMaf.bb

Supporting `frame` and `summary` definitions

Alongside the bigMaf file, two other summary and frame bigBeds are created. The following autoSql definition is used to create the first file, pointed to online with summary <url>, rather than the standard bigDataUrl <url> used with bigMaf. The file mafSummary.as, is pulled in when the bedToBigBed utility is run with the -as=mafSummary.as option.

mafSummary.as

table mafSummary
"Positions and scores for alignment blocks"
    (
    string chrom;      "Reference sequence chromosome or scaffold"
    uint   chromStart; "Start position in chromosome"
    uint   chromEnd;   "End position in chromosome"
    string src;        "Sequence name or database of alignment"
    float  score;      "Floating point score."
    char[1] leftStatus;  "Gap/break annotation for preceding block"
    char[1] rightStatus; "Gap/break annotation for following block"
    )

An example, bedToBigBed -type=bed3+4 -as=mafSummary.as -tab bigMafSummary.bed hg38.chrom.sizes bigMafSummary.bb. Another tool, hgLoadMafSummary generates the input bigMafSummary.bed file.

The following autoSql definition is used to create the second file, pointed to online with frames <url>. The file mafFrames.as, is pulled in when the bedToBigBed utility is run with the -as=mafFrames.as option.

mafFrames.as

table mafFrames
"codon frame assignment for MAF components"
    (
    string chrom;      "Reference sequence chromosome or scaffold"
    uint   chromStart; "Start range in chromosome"
    uint   chromEnd;   "End range in chromosome"
    string src;        "Name of sequence source in MAF"
    ubyte frame;       "frame (0,1,2) for first base(+) or last bast(-)"
    char[1] strand;    "+ or -"
    string name;       "Name of gene used to define frame"
    int    prevFramePos;  "target position of the previous base (in transcription direction) that continues this frame, or -1 if none, or frame not contiguous"
    int    nextFramePos;  "target position of the next base (in transcription direction) that continues this frame, or -1 if none, or frame not contiguous"
    ubyte  isExonStart;  "does this start the CDS portion of an exon?"
    ubyte  isExonEnd;    "does this end the CDS portion of an exon?"
    )

An example, bedToBigBed -type=bed3+8 -as=mafFrames.as -tab bigMafFrames.txt hg38.chrom.sizes bigMafFrames.bb. Another tool, genePredToMafFrames generates the input bigMafFrames.txt file.

Note that the bedToBigBed utility uses a substantial amount of memory: approximately 25% more RAM than the uncompressed BED input file.

Creating a bigMaf track

To create a bigMaf track, follow these steps:

Step 1. If you already have a MAF file you would like to convert to a bigMaf, skip to Step 3. Otherwise, download this example MAF file for the human GRCh38 (hg38) assembly.

Step 2. If you would like to include optional reading frame and block summary information, download the chr22_KI270731v1_random.gp genePred file.

Step 3. Download the autoSql file bigMaf.as needed by bedToBigBed. If you have opted to include the optional frame summary and information with your bigMaf file, you must also download the autoSql files mafSummary.as and mafFrames.as files.

Here are wget commands to obtain the above files and the hg38.chrom.sizes file mentioned below:

wget https://genome.ucsc.edu/goldenPath/help/examples/chr22_KI270731v1_random.maf
wget https://genome.ucsc.edu/goldenPath/help/examples/chr22_KI270731v1_random.gp
wget https://genome.ucsc.edu/goldenPath/help/examples/bigMaf.as
wget https://genome.ucsc.edu/goldenPath/help/examples/mafSummary.as
wget https://genome.ucsc.edu/goldenPath/help/examples/mafFrames.as
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes

Step 4. Download the bedToBigBed and mafToBigMaf programs from the UCSC binary utilities directory. If you have opted to generate the optional frame and summary files for your multiple alignment, you must also download the hgLoadMafSummary, genePredSingleCover, and genePredToMafFrames programs from the same directory.

Step 5. Use the fetchChromSizes script from the same directory to create a chrom.sizes file for the UCSC database with which you are working (e.g., hg38). Alternatively, you can download the chrom.sizes file for any assembly hosted at UCSC from our downloads page (click on "Full data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38 database is located at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.

mafToBigMaf hg38 chr22_KI270731v1_random.maf stdout | sort -k1,1 -k2,2n > bigMaf.txt
bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt hg38.chrom.sizes bigMaf.bb

Note that the hg38 in the mafToBigMaf hg38 command indicates the referenceDb and matches the expected prefix of the primary species' sequence name, for instance hg38 for the hg38.chr22_KI270731v1_random found in the input example chr22_KI270731v1_random.maf file.

Step 6. Follow the below steps to create the binary indexed mafFrames and mafSummary files to accompany your bigMaf file:

genePredSingleCover chr22_KI270731v1_random.gp single.gp
genePredToMafFrames hg38 chr22_KI270731v1_random.maf bigMafFrames.txt hg38 single.gp
bedToBigBed -type=bed3+8 -as=mafFrames.as -tab bigMafFrames.txt hg38.chrom.sizes bigMafFrames.bb

hgLoadMafSummary -minSeqSize=1 -test hg38 bigMafSummary chr22_KI270731v1_random.maf
cut -f2- bigMafSummary.tab | sort -k1,1 -k2,2n > bigMafSummary.bed
bedToBigBed -type=bed3+4 -as=mafSummary.as -tab bigMafSummary.bed hg38.chrom.sizes bigMafSummary.bb

Step 7. Move the newly created bigMaf file (bigMaf.bb) to a web-accessible http, https or ftp location. If you generated the bigMafSummary.bb and/or bigMafFrames.bb files, move those to a web accessible location, likely same location as the bigMaf.bb file.

Step 8. Construct a custom track using a single track line. Note that any of the track attributes listed here are applicable to tracks of type bigBed. The most basic version of the track line will look something like this:

track type=bigMaf name="My Big MAF" description="A Multiple Alignment" bigDataUrl=http://myorg.edu/mylab/bigMaf.bb summary=http://myorg.edu/mylab/bigMafSummary.bb frames=http://myorg.edu/mylab/bigMafFrames.bb

Step 9. Paste the custom track line into the text box on the custom track management page. Navigate to chr22_KI270731v1_random to see the example data for this track.

The bedToBigBed program can be run with several additional options. For a full list of the available options, type bedToBigBed (with no arguments) on the command line to display the usage message.

Examples

Example #1

In this example, you will create a bigMaf custom track using an existing bigMaf file, bigMaf.bb, located on the UCSC Genome Browser http server. This file contains data for the hg38 assembly.

To create a custom track using this bigMaf file:

Construct a track line that references the file:

track type=bigMaf name="bigMaf Example One" description="A bigMaf file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb frames=http://genome.ucsc.edu/goldenPath/help/examples/bigMafFrames.bb summary=http://genome.ucsc.edu/goldenPath/help/examples/bigMafSummary.bb

Paste the track line into the custom track management page for the human assembly hg38 (Dec. 2013).
Click the "submit" button.

Note that additional track line options exist that are specific to the MAF format. For instance, adding the parameter setting speciesOrder="panTro4 rheMac3 mm10 rn5 canFam3 monDom5" to the above example will specify the order of sequences by species.

Custom tracks can also be loaded via one URL line. This link loads the same bigMaf.bb track and sets additional display parameters in the URL:

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random&hgct_customText=track%20type=bigMaf%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb%20visibility=pack

After this example bigMaf is loaded in the Genome Browser, click into an alignment on the browser's track display. Note that the details page displays information about the individual alignments, similar to that which is available for a standard MAF track.

Example #2

In this example, you will create a bigMaf file from an existing bigMaf input file, bigMaf.txt, located on the UCSC Genome Browser http server.

Save the bed3+1 example file, bigMaf.txt, to your computer (Step 6, above).
Save the autoSql file bigMaf.as to your computer (Step 3, above).
Download the bedToBigBed utility (Step 4, above).
Save the hg38.chrom.sizes text file to your computer. This file contains the chrom.sizes for the human (hg38) assembly (Step 5, above).

Run the bedToBigBed utility to create a binary indexed MAF file (Step 6, above):

bedToBigBed -type=bed3+1 -tab -as=bigMaf.as bigMaf.txt hg38.chrom.sizes bigMaf.bb

Move the newly created bigMaf file (bigMaf.bb) to a web-accessible location (Step 7, above).
Construct a track line that points to the bigMaf file (Step 8, above).
Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser (step 9, above).

Sharing your data with others

If you would like to share your bigMaf data track with a colleague, learn how to create a URL by looking at Example 6 on this page.

Extracting data from the bigMaf format

Because bigMaf files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory.

bigBedToBed — converts a bigBed file to ASCII BED format.
bigBedSummary — extracts summary information from a bigBed file.
bigBedInfo — prints out information about a bigBed file.

As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement.

Troubleshooting

If you encounter an error when you run the bedToBigBed program, check your input file for data coordinates that extend past the the end of the chromosome. If these are present, run the bedClip program (available here) to remove the problematic row(s) in your input file before running the bedToBigBed program.