Using UCSC Genome Browser Track Hubs

Using UCSC Genome Browser Track Hubs

Table of Contents:


What Are Track Hubs? Viewing Track Hubs in the Browser Setting Up Your Own Track Hub Debugging and Updating Track Hubs Registering a Track Hub with UCSC

Questions and feedback are welcome.

What Are Track Hubs?

Track hubs are web-accessible directories of genomic data that can be viewed on the UCSC Genome Browser alongside native annotation tracks. Hubs are a useful tool for visualizing a large number of genome-wide data sets. For example, a project that has produced several wiggle plots of data can use the hub utility to organize the tracks into composite and super-tracks, making it possible to show the data for a large collection of tissues and experimental conditions in a visually elegant way, similar to how the ENCODE native data tracks are displayed in the browser.

The track hub utility allows efficient access to data sets from around the world through the familiar Genome Browser interface. Browser users can display tracks from any public track hub that has been registered with UCSC. Additionally, users can import data from unlisted hubs or can set up, display, and share their own track hubs.

The data underlying the tracks in a hub reside on the remote server of the data provider rather than at UCSC. The data are stored in compressed binary indexed files in bigBed, bigWig, BAM or VCF format that contain the data at several resolutions. When a hub track is displayed in the Genome Browser, only the relevant data needed to support the view of the current genomic region are transmitted rather than the entire file. The transmitted data are cached on the UCSC server to expedite future access. This on-demand transfer mechanism eliminates the need to transmit large data sets across the Internet, thereby minimizing upload time into the browser.

The track hub utility offers a convenient way to view and share very large sets of data. Individuals wishing to display only a few small data sets may find it easier to use the Genome Browser custom track utility. As with hub tracks, custom tracks can be uploaded to the UCSC Genome Browser and viewed alongside the native annotation tracks. Custom tracks can be constructed from a wide range of data types; hub tracks are limited to compressed binary indexed formats that can be remotely hosted. However, the custom tracks utility does not offer the data persistence and track configurability provided by the track hub mechanism: hub tracks can be grouped into composite or super-tracks and configured to display the data using a wide variety of options. In general, for users who have large data sets that would be prohibitive to upload, need to ensure the persistence of their data, or would like to take full advantage of track functionality, track hubs are a better solution. Both mechanisms give data providers the flexibility to directly add, update, and remove data from their display as needed.

Viewing Track Hubs in the Browser

Public hubs
The Genome Browser provides links to a collection of public track hubs that have been registered with UCSC. To view a list of the public track hubs available for the currently selected assembly, click the "track hubs" button on the Genome Browser gateway or annotation tracks page. The Public Hubs tab on the Track Hubs page lists the hubs that are available for display in the browser. To add a hub to your display, check the Display checkbox in front of the hub's name, then click the "Display Selected Hubs" button. Tracks from the selected hub will be displayed in a separate track group below the browser image. The tracks can be configured and manipulated in the same fashion as native browser tracks. As with any very large track in the Genome Browser, exercise caution when viewing a broad genomic region that requires the Genome Browser to display a large number of track features: the browser display may time out.

Unlisted hubs (located in the My Hubs tab)
In addition to the publicly available hubs listed on the Public Hubs tab, it is possible to load your own unlisted track hub or one created by a colleague as long as you know its URL. To add an unlisted hub, open the Track Hubs page and click the My Hubs tab. This tab lists the unlisted track hubs that you have loaded into your browser. To import a new hub, type its URL into the text box, then click the Add Hub button. If the track hub is imported successfully, it will be added to the list. It is also possible to add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=http://vizhub.wustl.edu/VizHub/RoadmapRelease3.txt). If the browser is unable to load the track hub, it will display an error message. Some common causes for an import to fail include typos in the URL, attempting to add a track hub from a different assembly, a hub server that is offline, or errors in the track hub files. Once you have successfully loaded a hub, you can view it in the browser by clicking the "Display Selected Hubs" button. Please note that unlisted hubs are in no way secure. The URL helps to obfuscate the location of the data; it is a simple barrier to casual users.

To remove a track hub from your Genome Browser display, uncheck its Display checkbox on the Track Hubs page. The hub will no longer be visible on the browser tracks page, but will still be available from the Track Hubs page. To completely remove an unlisted hub from your My Hubs list, click the "Disconnect" button next to the track hub you wish to discard.

Occasionally, remote track hubs may be missing, off-line, or otherwise unavailable. When a track hub is no longer accessible from the Genome Browser, it is automatically removed from the Track Hubs page. If a user is already browsing data from the remote hub when it disconnects, a yellow error message will be displayed instead of the expected data.

The track sets in hubs are genome assembly specific. The Track Hubs page lists only those hubs that contain tracks for the currently selected genome assembly. To switch to a different assembly, click the Genomes link in the top menu bar, then select the new assembly from the Gateway page.

Tracks accessed through a hub can be used in Genome Browser sessions and custom tracks in the same manner as other tracks. The data underlying data hub tracks can be viewed, manipulated, and downloaded using the UCSC Table Browser.

Setting Up Your Own Track Hub

This section provides a step-by-step description of the process used to set up a track hub on your own server.

To create your own hub you will need:

one or more data sets formatted in one of the compressed binary index formats supported by the Genome Browser: bigBed, bigWig, BAM or VCF
a set of text files that specify properties for the track hub and for each of the data tracks within it
an Internet-enabled web server or ftp server

The files are placed on the server in a file hierarchy like the one shown in Example 1. Users experienced in setting up Genome Browser mirrors that contain their own data will find that setting up a track hub is similar, but is usually much easier. Depending on the number and complexity of the data sets, a track hub can typically be set up in a day or two. It is generally easiest to run the command-line data formatting programs in a Linux programming environment, although it's possible to manipulate smaller data sets using Mac OS-X as well.

Example 1: Directory hierarchy for a hub containing DNase and RNAseq data for the hg18 and hg19 human genome assemblies. The hg18/ and hg19/ subdirectories contain the assembly-specific data files.

myHub/ - directory containing track hub files

     hub.txt -  a short description of hub properties
     genomes.txt - list of genome assemblies included in the hub data
     hg19/ - directory of data for the hg19 (GRCh37) human assembly
          trackDb.txt - display properties for tracks in this directory
          dnase.html - description text for a DNase track 
          dnaseLiver.bigWig - wiggle plot of DNase in liver
          dnaseLiver.bigBed - regions of active DNase
          dnaseLung.bigWig - wiggle plot of DNase in lung
          dnaseLung.bigWig - regions of active DNase
          ...
          rnaSeq.html - description text for an RNAseq track
          rnaSeqLiver.bigWig - wiggle plot of RNAseq data in liver
          rnaSeqLiver.bigBed - intron/exon lists for liver
          rnaSeqLung.bigWig - wiggle plot of RNAseq data in lung
          rnaSeqLung.bigBed - intron/exon lists for lung
     hg18/ - directory of data for the hg18 (Build 36) human assembly
          trackDb.txt - display properties for tracks in this directory
          dnase.html - description text for a DNase track 
          dnaseLiver.bigWig - wiggle plot of DNase data in liver
          dnaseLiver.bigBed - regions of active DNase
          dnaseLung.bigWig - wiggle plot of DNase data in lung
          dnaseLung.bigWig - regions of active DNase
          ...
          rnaSeq.html - description text for an RNAseq track
          rnaSeqLiver.bigWig - wiggle plot of RNAseq data in liver
          rnaSeqLiver.bigBed - intron/exon lists for liver
          rnaSeqLung.bigWig - wiggle plot of RNAseq data in lung
          rnaSeqLung.bigBed - intron/exon lists for lung

Step 1. Format the data
The data tracks provided by a hub must be formatted in one of the compressed binary index formats supported by the Genome Browser: bigWig, bigBed, BAM or VCF.

bigWig - The bigWig format is best for displaying continuous value plot data, such as read depths from short read sequencing projects or levels of conservation observed in a multiple-species alignment. A bigWig file contains a list of chromosome segments, each of which is associated with a floating point value. When graphed, the segments may appear as a big "wiggle". Although each bigWig file can contain only a single value for any given base, bigWig tracks are often combined into "container multiWig" or "composite on" tagged tracks. For information on creating and configuring bigWig tracks, see the bigWig Track Format help page.

bigBed - BigBed files are binary indexed versions of Browser Extensible Data (BED) files. BED format is useful for associating a name and (optionally) a color and a score with one or more related regions on the same chromosome, such as all the exons of a gene. See the bigBed Track Format help page for information on creating and configuring bigBed tracks.

BAM - BAM files contain alignments of (generally short) DNA reads to a reference sequence, usually a complete genome. BAM files are binary versions of Sequence Alignment/Map (SAM) format files. Unlike bigWig and bigBed formats, the index for a BAM file is in a separate file, which the track hub expects to be in the same directory with the same root name as the BAM file with the addition of a .bai suffix. See the BAM Track Format help page for more information.

VCF - VCF (Variant Call Format) files can contain annotations of single nucleotide variants, insertions/deletions, copy number variants, structural variants and other types of genomic variation. When a VCF file is compressed and indexed using tabix (available here), it can be used as a data track file. Unlike bigWig and bigBed formats, the tabix index is in a separate file, which the track hub expects to be in the same directory with the same root name as the VCF file with the addition of a .tbi suffix. See the VCF Track Format help page for more information.

Step 2. Create the track hub directory
Create a track hub directory in an Internet-accessible location on your web or ftp server. This directory will contain the hub.txt and genomes.txt files that define properties of the track hub and a subdirectory for each of the genome assemblies covered by the hub track data.

Step 3. Place the track data files in an Internet-accessible location
The data files underlying a track in a hub do not have to reside in the track hub directory or even on the same server, but they must be accessible via the Internet. The track hub utility supports Internet protocols such as http://, https://, and ftp://, as well as file paths relative to the hub directory hierarchy. The location of a track file is defined by its bigDataUrl tag in the associated trackDb.txt file (Step 7).

Step 4. Create the hub.txt file
Within the hub directory, create a hub.txt file containing a single stanza with five fields that define properties of the track hub:

hub hub_name shortLabel hub_short_label longLabel hub_long_label genomesFile genomes_filelist email email_address

hub - a single-word name of the directory containing the track hub files. Not displayed to hub users. This must be the first line in the hub.txt file.

shortLabel - the short name for the track hub. Suggested maximum length is 17 characters. Displayed as the hub name on the Track Hubs page and the track group name on the browser tracks page.

longLabel - a longer descriptive label for the track hub. Suggested maximum length is 80 characters. Displayed in the description field on the Track Hubs page.

genomesFile - the relative path of the genomes.txt file, which contains the list of genome assemblies covered by the track data and the names of their associated configuration files. By convention the genomes.txt file is located in the same directory as the hub.txt file.

email - the contact to whom questions regarding the track hub should be directed.

Example 2: Sample hub.txt file defining attributes for the track hub shown in Example 1.

hub UCSCHub shortLabel UCSC Hub longLabel UCSC Genome Informatics Hub for human DNase and RNAseq data genomesFile genomes.txt email genome@soe.ucsc.edu

Step 5. Create the genomes.txt file
Create a genomes.txt file within the track hub directory that contains a two-line stanza for each genome assembly that is supported by the hub data. Each stanza shows the location of the trackDb file that defines display properties for each track in that assembly.

genome assembly_database_1 trackDb assembly_1_path/trackDb.txt genome assembly_database_2 trackDb assembly_2_path/trackDb.txt

genome - a valid UCSC database name. Each stanza must begin with this tag.

trackDb - the relative path of the trackDb file for the assembly designated by the genome tag. By convention, the trackDb file is located in a subdirectory of the hub directory. However, the trackDb tag may also specify a complete URL.

Example 3: Sample genomes.txt file defining attributes for the hub shown in Example 1.

genome hg18 trackDb hg18/trackDb.txt genome hg19 trackDb hg19/trackDb.txt

Step 6. Create the genome assembly subdirectories
Within the track hub directory, create a subdirectory for each of the genome assemblies that have track data in the hub. The subdirectory names must have a 1:1 correspondence with the database names defined by the genome tags in the genomes.txt file.

Step 7. Create the trackDb.txt files
The trackDb.txt file, which is based on the Genome Browser .ra format, is the most complicated of the text files in the hub directory. It contains a stanza for each of the data files for the given assembly that defines display and configuration properties for the track. If the tracks are grouped into larger entities, such as composite or super-tracks, the larger entities will have a stanza in the file as well.

At a minimum, each trackDb.txt file must contain at least five tags:

track track_name bigDataUrl track_data_URL shortLabel short_label longLabel long_label type track_type

track - the symbolic name of the track. The first character must be a letter, and the remaining characters must be letters, numbers, or under-bar ("_"). Each track must have a unique name. This tag pair must be the first entry in the trackDb.txt file.

bigDataUrl - the file name, path, or Web location of the track's data file. The bigDataUrl can be a full URL. If it is not prefaced by a protocol, such as http://, https:// or ftp://, then it is considered to be a path relative to the trackDb.txt file.

shortLabel - the short name for the track displayed in the track list, in the configuration and track settings, and on the details pages. Suggested maximum length is 17 characters.

longLabel - the longer description label for the track that is displayed in the configuration and track settings, and on the details pages. Suggested maximum length is 80 characters.

type - the format of the file specified by bigDataUrl. Must be either bigWig, bigBed, bam or vcfTabix.

Example 4: Sample trackDb.txt file containing two simple tracks.

track dnaseSignal bigDataUrl dnaseSignal.bigWig shortLabel DNAse Signal longLabel Depth of alignments of DNAse reads type bigWig track dnaseReads bigDataUrl dnaseReads.bam shortLabel DNAse Reads longLabel DNAse reads mapped with MAQ type bam

Suggestions:

Default subtracks for composite

Default composites within a super-track

hgTrackUi controls

Step 8. Create track description files
Each track in the hub may have an associated description file that describes the track to viewers. The file provides detailed information about the data displayed in the track, including methods used to produce and validate the data, background information, display conventions, acknowledgments, and reference publications. The description file, which must be in HTML format, is inserted into the track configuration page that displays when the user clicks on the track's short label. It also displays on the track details page that is shown when the user clicks on a feature in the track image.

The track description file must have the same name as the symbolic name for the track (defined by the track tag in the trackDb.txt file) with a suffix of .html. For instance, a description file associated with the track named "dnaseSignal" in Example 4 would be named "dnaseSignal.html". The description file must reside in the same directory as the trackDb.txt file.

Both parent and child tracks within a super-track can have their own description files. If the description file is not present, the corresponding sections of the track settings and details pages are left blank. Only one description page can be associated with composite and multiWig tracks; the file name should correspond to the symbolic name of the top-level track in the composite.

Debugging and Updating Track Hubs

It is a good practice to run the command-line utility hubCheck on your track hub when you first bring it online and whenever you make significant changes. This utility checks that the files in the hub are correctly formatted.


hubCheck - Check a track hub for integrity.
usage:
   hubCheck http://yourHost/yourDir/hub.txt
options:
   -udcDir=/dir/to/cache - place to put cache for remote bigBeds and bigWigs

Note that you will have to use the udcDir if /tmp/udcCache is not writable on your machine.

The hubCheck program is available from the UCSC downloads server at http://hgdownload.cse.ucsc.edu/admin/exe/.

As part of the track hub mechanism, UCSC caches data from the hub on the local server. The hub utility periodically checks the time stamps on the hub files, and downloads them again only if they have a time stamp newer than the UCSC one. For performance reasons, UCSC checks the time stamps every 300 seconds, which can result in a 5-minute delay between the time a hub file is updated and the change appears on the Genome Browser. Hub providers can work around this delay by inserting the CGI variable udcTimeout=1 into the Genome Browser URL, which will reduce the delay to one second. To add this variable, open the Genome Browser tracks page and zoom or scroll the image to display a full browser URL in which the CGI variables visible. Insert the CGI variable just after the "hgTracks" portion of the URL so that it reads http://genome.ucsc.edu/cgi-bin/hgTracks?udcTimeout=1& (with the remainder of the URL following the ampersand). To restore the default timeout, repeat this process and set the udcTimeout variable to 300, i.e. udcTimeout=300.

Registering a Track Hub with UCSC

If you would like to share your track hub with other Genome Browser users, you can register your hub with UCSC by contacting the Genome Browser technical support mailing list at genome@soe.ucsc.edu. Please include the URL of your hub.txt file in the message. Once registered, your hub will appear as a link on the Public Hubs tab on the Track Hubs page.

Alternatively, you can share your track hub with selected colleagues by providing them with the URL needed to load your hub via the My Hubs tab.