These instructions explain the process of submitting an alignment to rVISTA. If you are submitting a
pre-calculated alignment, you only need this document. If you wish to align your own sequences, you will need to read
mVISTA instructions first.
TRANSFAC matrices
If you choose to use TRANSFAC matrices, you can select from several options to refine your search. The line of checkboxes allows the user to select factor groups based on the type of organism being analyzed (vertebrates, plants, nematodes, insects, fungi, or bacteria).
The following cut-off selections are available, as described by TRANSFAC:
Recommended Threshold: The recommended thresholds are calculated automatically by subtracting one standard deviation from the average score of the true binding sites for each matrix.
Cut-off to minimize false positive matches (minFP) :
In order to estimate this cut-off, which will reduce the number of random sites found by MatchTM , we have applied the algorithm described above to third exon sequences, because these sequences are presumed to contain no biologically relevant TF binding sites. For every matrix the lowest cut-off for which no match is found in the set of exon sites is considered to be the minFP cut-off.
When a minFP cut-off is applied for searching a DNA sequence, the algorithm will find a relatively low number of matches per nucleotide. In the output the user will only find putative sites with a good similarity to the weight matrix; however, some known genomic binding sites could not be recognized. This kind of cut-off is useful, for example, for searching the most promising potential binding sites in the extended genomic DNA sequences.
Cut-off to minimize false negative matches (minFN):
We used sets of generated oligonucleotides for estimating the cut-offs to minimize the false negative rate, using actual weight matrices to calculate the probability of a nucleotide occurring at a certain position of a binding site.
For each matrix we applied the MatchTM algorithm to these test sequence sets without using any matrix similarity cut-offs. (The core similarity has been set to 0.75). Then we set the cut-off to a value that provides recognition of at least 90% of oligonucleotides. We decided to tolerate an error rate of ten percent. We call this set of cut-offs minFN cut-offs.
Applying the minFN cut-offs, the user will find most genomic binding sites, but in this case a high rate of false positives should be taken into account as well. The minFN cut-offs are useful for the detailed analysis of relatively short DNA fragments.
Cut-offs for core and matrix similarity :
The matrix similarity is a score that describes the quality of a match between a matrix and an arbitrary part of the input sequences. Analogously, the core similarity denotes the quality of a match between the core sequence of a matrix (i.e. the five most conserved positions within a matrix) and a part of the input sequence. A match has to contain the "core sequence " of a matrix, i.e. the core sequence has to match with a score higher than or equal to the core similarity cut-off. In addition, only those matches which score higher than or equal to the matrix similarity threshold appear in the output.
After choosing your cut-off selection, a choice of matrices will be presented for you to choose from. This is the last step in the submission process.
Consensus Sequence
The user can specify his own motif search in the form of IUPAC one letter codes. The IUPAC recommendations include letters to represent all possible ambiguities at a single position in the sequence except a gap.
To the right you can see a sample visualization options screen. It allows you to set the length of sequence that will be depicted on a single row of the VISTA graph (1), the width of the picture produced (2), clustering (3 and 4) and which binding sites will be visualized (5).
The visualization features a VISTA graph of the alignment, which identifies regions of high conservation between the two species and provides an annotation (more here), and tick marks for all the transcription binding sites. Green tick marks represent conserved binding sites, red represents aligned sites, and blue tick marks represent all found sites.