Tools for Comparative Genomics

 
 

Using wgVISTA web-interface

  1. Input to the server
    1. Required Fields
    2. Optional Fields
  2. Results

  1. Input to the server

    Required Fields

    E-mail address

    We ask for you email address so that we can notify you when the results are ready.

    Sequences

    You can submit your sequences to the server two ways:

    Upload them from your computer as a plain text file in Fasta format using the "Browse" button. The first word in header lines (lines which start with the ">" character) will be used as the name of the corresponding sequence. The sequence can be in upper or lower case letters. If the "Soft-masked" box is checked, the lower case nucleotides will be interpreted as repeats, otherwise they will be converted to upper case.

    Sample sequence in Fasta format (you will find more details on the format at the NCBI site):

    >contig1 
    ATCACGCTCTTTGTACACTCCGCCATCTCTCTCT
    CTCTCGAGCAGATCTCTCTCGGGAATATCGACAA
    ...
    >contig2 
    ATCACGCTCTTTGTACACTCCGCCATCTCTCTCT
    CTCTCGAGCAGATCTCTCTCGGGAATATCGACAA
    ...
    >contig3 
    ATCACGCTCTTTGTACACTCCGCCATCTCTCTCT
    CTCTCGAGCAGATCTCTCTCGGGAATATCGACAA
    ...
                                

    Note: at this time we accept only the letters CAGTN and X in your sequence. Please make sure to submit a sequence as plain text, not a Word or HTML file.

    You can specify the genome's GenBank accession number (or a list of accession numbers separated by spaces or commas), which will be used to automatically retrieve the sequence(s) from the GenBank database and process on our server.

    Note: In both cases, the size of the genome should not exceed 10 Megabases.

    Optional Fields

    These options allow you to customize your VISTA analysis. You can select to use translated anchoring in Shuffle-LAGAN, which can improve the alignment of distant species, and specify the RankVISTA probability threshold. You also can use independently obtained gene annotations, select an appropriate repeat-masking option, and give specific names to the analyzed genomes. If you do not fill in these additional options, we will use their default values.

    For each genome you can select:

    Name

    Select names for your genomes. We suggest that you use something meaningful, such as the name of an organism, the number of your experiment, or your database identifier. When you use a GenBank identifier to input your sequence, by default we will use it as a name of the genome.

    Annotation

    If a gene annotation of the sequence is available, you can submit it in a plain text file to be displayed on the plot.

    Each line in the file should have five tab-separated columns:
    1) gene name;
    2) sequence name;
    3) strand ("+" or "-");
    4) gene start;
    5) gene end.

    For example:

    gene1	contig1	+	10	100 
    gene2	contig1	-	1000	1200 
    gene3	contig2	+	1100	2100 
    gene4	contig2	-	4100	5100 
                        

    If you leave the "Annotation" field empty, the program will automatically retrieve gene annotations for sequences specified by their GenBank accession numbers from the GenBank.

    Repeat-masking

    If the "Soft-masked" box is checked, the lower case nucleotides in the submitted sequence will be interpreted as repeats, otherwise they will be converted to upper case.

  2. Results

    Several minutes after submitting your sequences you will receive an email from vista@lbl.gov with web links to the VISTA Browser and the location from where you can download the raw alignments. The results will be stored on the server for one month. Detailed help and instructions for the VISTA Browser are available at http://pipeline.lbl.gov/vgb2help.shtml