|
| To use this site, your browser must have cookies and JavaScript enabled. |
|
There are 4 steps to access GVS information:
1. select the search type
2. select the data source
3. set query and analysis parameters (optional)
4. choose the results to be displayed
|
|
|
Choose the search type on the home page. There are 2 categories: "search database" (the most common option)
and "input from file".
|
|
Within "search database", there are five different methods for querying variations:
|
|
|
| A. chromosomal location |
| B. gene name (HUGO, case insensitive; synonyms are ok) |
| C. gene ID (from NCBI Entrez Gene) |
| D. dbSNP rs ID |
| E. browse |
|
For options A through D, the next page presents a form for the chromosome region, gene, or rs ID. In the cases of B through D, you have the option to extend the chromosome region.
In cases B and C, "upstream" is on the 5' end, and "downstream" is on the 3' end of the gene. For case D, "upstream" and "downstream" are relative to the genome assembly (mm9).
In the "browse" case E, you can choose a 10-Mb section of a chromosome on the next page, optionally navigate on the resulting map to a region of interest, and select a gene.
|
|
When a search by gene name or gene ID is made, there are sometimes alternative transcripts. Preference is given to transcripts with
an accession ID beginning with NM_ (NCBI RefSeq). If there is at least one such NM_transcript, the longest NM_ transcript is chosen.
Otherwise the longest transcript of any kind is chosen. If there is a tie in the number of transcribed bases, the transcript with the largest number of coding bases is selected.
The chosen transcript is displayed in the header information when the Text display option is chosen (see below). If you desire more control
over the genomic region, choose the chromosomal location search type.
|
|
If you select "input from file" you will be able to upload a file of genotypes for analysis.
|
|
|
Database queries give genotype search results
in a table of data sets categorized by the submitter and the population in which the
variations were identified, with the populations having the most genotyped polymorphisms
for your query appearing at the top of the list.
|
From the top table select one or more Population/Submitter data sets.
|
|
Select your genotype file ("Choose File"). The file must have
one line for each genotype, each with 4 white-space-separated values: |
| (a) the position (or other identifying string) of the variation |
| (b) the sample ID |
| (c) the first allele |
| (d) the second allele |
| An unknown allele should be indicated as "N". If there are any header lines, there must be a "#" at the beginning of the line.
Here is an example. If you
have genotypes in an Excel spreadsheet with these 4 columns, and save it as "Text (Tab delimited)", it should work. |
|
| Merge Samples and Variations: A - common samples with combined variations,
genotypes will be output for the samples common to all selected data sets and combined variations from all selected data sets.
B - combined samples with common variations, genotypes will be output for the variations common to all selected data sets and
combined samples from all selected data sets. C - combined samples with combined variations, genotypes will be output for combined
variations and combined samples from all selected data set. See this link for details. |
|
| Output SNPs By: type of identifier for the variation |
| rs ID or Position are the choices for data from the GVS database,
where rs ID is the dbSNP reference id for a SNP based on dbSNP build 128 (October 2007), and
Position is the chromosome location mapped to the mouse genome reference sequence based on NCBI build 37.
Under Position there are two choices that affect only the visual genotype graph: Position in graph and rs ID and Position in graph.
In the latter case, both the values are displayed in the graph. In all cases, rs ID or Position, the variations
in the graphs are shown in order of chromosome position (if not clustered).
SNP ID in File is the only choice if you are loading your own
genotypes from a file. The first column in your white-space-separated input file will be treated as
the variation identifier (though it need not be a position, just any unique identifier).
|
| Display SNPs By: a format for variation and genotype results |
The Table/Image option prompts for a choice of table or graphical format.
The table provides a number of links to other sites.
The Text option will present space-delimited results.
The space-delimited output can be saved into an ascii file, and is designed to be easily parsed for further computer analysis.
The Custom-Text option allows further choices of file format and annotation.
|
|
For genotype output, there is a choice of "prettybase" or Haploview formats, or download of a tarball containing both.
In the case of Haploview, two files must be generated, one for the genotypes, the other for the marker information. In the marker information
file, the first column is a SNP identification string, and the second is the SNP position. In the case of database searches, the identification
string is the rs ID. In the case of file input, the identification string is set to the position. Trialleles are included in "prettybase" output,
but are excluded in Haploview output (as Haploview does not allow trialleles). In that case, the least frequent allele is determined, and the
genotype for any individual having that allele is set to NN. In the Haploview case, SNPs alleles are the A, C, G, T bases, and indels are
1 for deletion, 2 for insertion.
|
|
For SNP summary, the output format is the
same as that of the text display, but with a choice of annotation columns.
|
|
| Allele Frequency Cutoff (%): cutoff for filtering variations by minor allele frequency (in percent, range 0 through 50) |
| No Monomorphic Sites: if turned on, all monomorphic sites will be filtered from the output and analysis |
|
If there are multiple population groups, the frequency and no-monomorphic filters are applied to the merged set of genotypes.
|
|
| Cluster SNPs: if turned on, variations will be clustered based on the similarities of their genotype patterns in the graphical displays |
| Cluster Samples: if turned on, samples will clustered based on the similarities of their genotype patterns in the graphical displays |
|
Once the data sets are chosen and the parameters are set, you have a choice of 2 buttons to click (they can be clicked consecutively without re-starting the search).
The first is "display genotypes" for listing the genotypes for all samples and all variations in the data set.
A visual genotype graph (if Table/Image) can be chosen to show color-coded genotypes. The NCBI Build 37 (C57BL/6J) allele is shown below the genotypes.
The second button "display snp summary" presents a large number of calculated values and annotations for the variations and (for database queries) a map of the chromosome region.
The GVS page "SNP Summary Columns" details the quantities displayed.
If "Text" or "Custom-Text" has been chosen, it is possible from some browsers to save the output as a text file. If your browser does not have a save-as-text option (e.g. Mac Safari), you will
have to copy and paste. The fields will
be space-delimited. If you import the saved file to Excel, it will be necessary to choose "Data/Get External Data/Import Text File" and select "Delimited" and "Space".
|
|
| Maps showing gene and variation locations are available at several locations on this site. |
|
|
About GVS Mouse
Sources of Data for GVS Mouse
OpenHelix GVS Online Tutorial
How To Use GVS Mouse (this page)
Build Notes
SNP Summary Columns
Merging Populations
Navigating the Map
File Input Example
|