GVS Mouse: Genome Variation Server BETA
An NHLBI Program for Genomic Applications  

How to Use GVS Mouse
To use this site, your browser must have cookies and JavaScript enabled.
There are 4 steps to access GVS information:
    1. select the search type
    2. select the data source
    3. set query and analysis parameters (optional)
    4. choose the results to be displayed
1. Search Type
Choose the search type on the home page. There are 2 categories: "search database" (the most common option) and "input from file".
Within "search database", there are five different methods for querying variations:
A. chromosomal location
B. gene name (HUGO, case insensitive; synonyms are ok)
C. gene ID (from NCBI Entrez Gene)
D. dbSNP rs ID
E. browse
For options A through D, the next page presents a form for the chromosome region, gene, or rs ID. In the cases of B through D, you have the option to extend the chromosome region. In cases B and C, "upstream" is on the 5' end, and "downstream" is on the 3' end of the gene. For case D, "upstream" and "downstream" are relative to the genome assembly (mm9). In the "browse" case E, you can choose a 10-Mb section of a chromosome on the next page, optionally navigate on the resulting map to a region of interest, and select a gene.
When a search by gene name or gene ID is made, there are sometimes alternative transcripts. Preference is given to transcripts with an accession ID beginning with NM_ (NCBI RefSeq). If there is at least one such NM_transcript, the longest NM_ transcript is chosen. Otherwise the longest transcript of any kind is chosen. If there is a tie in the number of transcribed bases, the transcript with the largest number of coding bases is selected. The chosen transcript is displayed in the header information when the Text display option is chosen (see below). If you desire more control over the genomic region, choose the chromosomal location search type.
If you select "input from file" you will be able to upload a file of genotypes for analysis.
2. Data Sources
Querying the Database
Database queries give genotype search results in a table of data sets categorized by the submitter and the population in which the variations were identified, with the populations having the most genotyped polymorphisms for your query appearing at the top of the list.
From the top table select one or more Population/Submitter data sets.

Analyzing Data from Your Own File of Genotypes
Select your genotype file ("Choose File"). The file must have one line for each genotype, each with 4 white-space-separated values:
    (a) the position (or other identifying string) of the variation
    (b) the sample ID
    (c) the first allele
    (d) the second allele
An unknown allele should be indicated as "N". If there are any header lines, there must be a "#" at the beginning of the line. Here is an example. If you have genotypes in an Excel spreadsheet with these 4 columns, and save it as "Text (Tab delimited)", it should work.
3. Parameters for Query And Analysis
Merging Data Sets
Merge Samples and Variations: A - common samples with combined variations, genotypes will be output for the samples common to all selected data sets and combined variations from all selected data sets. B - combined samples with common variations, genotypes will be output for the variations common to all selected data sets and combined samples from all selected data sets. C - combined samples with combined variations, genotypes will be output for combined variations and combined samples from all selected data set. See this link for details.
Data Output and Display
Output SNPs By: type of identifier for the variation
rs ID or Position are the choices for data from the GVS database, where rs ID is the dbSNP reference id for a SNP based on dbSNP build 128 (October 2007), and Position is the chromosome location mapped to the mouse genome reference sequence based on NCBI build 37. Under Position there are two choices that affect only the visual genotype graph: Position in graph and rs ID and Position in graph. In the latter case, both the values are displayed in the graph. In all cases, rs ID or Position, the variations in the graphs are shown in order of chromosome position (if not clustered). SNP ID in File is the only choice if you are loading your own genotypes from a file. The first column in your white-space-separated input file will be treated as the variation identifier (though it need not be a position, just any unique identifier).
Display SNPs By: a format for variation and genotype results
The Table/Image option prompts for a choice of table or graphical format. The table provides a number of links to other sites.

The Text option will present space-delimited results. The space-delimited output can be saved into an ascii file, and is designed to be easily parsed for further computer analysis.

The Custom-Text option allows further choices of file format and annotation.
For genotype output, there is a choice of "prettybase" or Haploview formats, or download of a tarball containing both. In the case of Haploview, two files must be generated, one for the genotypes, the other for the marker information. In the marker information file, the first column is a SNP identification string, and the second is the SNP position. In the case of database searches, the identification string is the rs ID. In the case of file input, the identification string is set to the position. Trialleles are included in "prettybase" output, but are excluded in Haploview output (as Haploview does not allow trialleles). In that case, the least frequent allele is determined, and the genotype for any individual having that allele is set to NN. In the Haploview case, SNPs alleles are the A, C, G, T bases, and indels are 1 for deletion, 2 for insertion.
For SNP summary, the output format is the same as that of the text display, but with a choice of annotation columns.
Filtering SNPs
Allele Frequency Cutoff (%): cutoff for filtering variations by minor allele frequency (in percent, range 0 through 50)
No Monomorphic Sites: if turned on, all monomorphic sites will be filtered from the output and analysis
If there are multiple population groups, the frequency and no-monomorphic filters are applied to the merged set of genotypes.
Clustering in Graphic Display
Cluster SNPs: if turned on, variations will be clustered based on the similarities of their genotype patterns in the graphical displays
Cluster Samples: if turned on, samples will clustered based on the similarities of their genotype patterns in the graphical displays
4. Results to be Displayed
Once the data sets are chosen and the parameters are set, you have a choice of 2 buttons to click (they can be clicked consecutively without re-starting the search).

The first is "display genotypes" for listing the genotypes for all samples and all variations in the data set. A visual genotype graph (if Table/Image) can be chosen to show color-coded genotypes. The NCBI Build 37 (C57BL/6J) allele is shown below the genotypes.

The second button "display snp summary" presents a large number of calculated values and annotations for the variations and (for database queries) a map of the chromosome region. The GVS page "SNP Summary Columns" details the quantities displayed.

If "Text" or "Custom-Text" has been chosen, it is possible from some browsers to save the output as a text file. If your browser does not have a save-as-text option (e.g. Mac Safari), you will have to copy and paste. The fields will be space-delimited. If you import the saved file to Excel, it will be necessary to choose "Data/Get External Data/Import Text File" and select "Delimited" and "Space".
Map Information
Maps showing gene and variation locations are available at several locations on this site.
List of All Documentation Pages
About GVS Mouse

Sources of Data for GVS Mouse

OpenHelix GVS Online Tutorial

How To Use GVS Mouse (this page)

Build Notes

SNP Summary Columns

Merging Populations

Navigating the Map

File Input Example

 
Skip footer links and go to content