GVS: Genome Variation Server
An NHLBI Program for Genomic Applications  
SNP Summary Table Values

SNP base:

location on the chromosome (hg18), 1-based

SNP rs ID:

dbSNP reference SNP identifier

Alleles:

the alternative bases, in order of increasing frequency

Minor Allele:

the allele with the lowest frequency

Minor-Allele Frequency (%):

the minor-allele frequency in percent

Heterozygosity:

the expected fraction of heterozygotes if the population is in Hardy-Weinberg equilibrium, calculated from the minor allele frequency q: 2q(1-q)

Hardy-Weinberg Chi-Square:

obtained by summing 3 terms (common homozygous, heterozygous, and rare homozygous), where each term is calculated from the number of individuals in one of the three classes:
(observed number - expected number)2
expected number
where the observed numbers are just the genotype counts, and the expected numbers are the Hardy-Weinberg values p2N (common homozygotes), 2pqN (heterozygotes), and q2N (rare homozygotes), where p is the major allele frequency, q is the minor allele frequency, and N is the number of individuals; p+q=1

Genes:

one or more genes for which the SNP is in the transcribed region

Function:

If the SNP has been given a function by dbSNP, that classification is used and "(dbSNP)" is added to the text:
nonsense(within an exon and translated, amino acid changed to stop codon)
frameshift(within an exon and translated, insertion or deletion interrupts the reading frame)
coding-nonsynonymous(within an exon and translated, protein amino acid change, but not nonsense or frameshift; dbSNP calls this missense)
splice-5 or splice-3(in first two bases or last two bases of an intron)
coding-synonymous(within an exon and translated, no protein amino acid change)          
utr-5 or utr-3(within an exon, but not translated)
near-gene-5 or near-gene-3(intergenic, but within 2000 bases of a transcribed region)
intron(between exons)

If the SNP has not been given a function by dbSNP, the SNP is classified according to the location of the gene and its transcription and coding boundaries, and "(GVS)" is added to the text:
coding(within an exon and translated, protein amino acid change unknown)
splice-site(in first two bases or last two bases of an intron)
mrna-utr(within an exon, but not translated)
intron(between exons)
intergenic                          (between genes)

In both cases, there can be more than one function for a given SNP, if two or more genes overlap or if there is alternative splicing; one function is reported, that highest in the relevant list above. This dbSNP function is the one used to color-code the SNPs in various places (and not the GVS function described below).

Conservation Score:

a number between 0 and 1 that describes the degree of sequence conservation among 17 vertebrate species; these numbers are downloaded from the UCSC Genome site and are defined as the "posterior probability that the corresponding alignment column was generated by the conserved state of the phylo-HMM, given the model parameters and the multiple alignment" (see UCSC description).

Chimp Allele:

Chimp alleles are acquired from UCSC human/chimp alignment files. If the variation does not fall within an alignment block, or if it is an indel, the chimp allele is listed as "unknown". If the variation falls within a gap in the alignment, it is listed as "-". (Note that we do not use the chimp alleles from dbSNP, though ours are the same in most cases.)

Submitter IDs (only available if "Text" or "Custom-Text" is selected):

one or more SNP identifiers, as assigned by the submitters to dbSNP (comma separated); for now, the list includes all submissions to dbSNP, not just those of the population/submitter combination chosen in the search

Genotyping Chip Or Assay Availability (case of "Table/Image"):

In limited cases, dbSNP assigned multiple rs IDs to the same SNPs. If there is an "alternate id" listed for a chip in the Genotyping Chip Availability column, it is an ID representing the same SNP as the rs ID listed in the "SNP rs ID" column. This alternate ID should be used to access the chip information from the corresponding company.

Whether the variation is on one or more whole-genome genotyping chips or is available as a TaqMan assay; the chips and assays are as follows:
Affymetrix Mapping 100K Set
Affymetrix Mapping 500K Set
Affymetrix Genome-Wide Human SNP Array 6.0
Illumina Human-1 BeadChip (100K)
Illumina HumanHap300 BeadChip
Illumina HumanHap550 BeadChip
Illumina HumanHap650Y BeadChip
Illumina Human610-Quad BeadChip
Illumina Human1M BeadChip (1 million)
Applied Biosystems Validated TaqMan Assay
Applied Biosystems DME TaqMan Assay (also validated)

GenotypingChipIDs (case of "Text" or "Custom-Text"):

In limited cases, dbSNP assigned multiple rs IDs to the same SNPs. If there is an "alternate-id" attached to a chip in the GenotypingChipIDs column, it is an ID representing the same SNP as the rs ID listed in the "rsID" column. This alternate ID should be used to access the chip information from the corresponding company.

Identifiers of SNPs that are on one or more whole-genome genotyping chips or are available as TaqMan assays (comma separated); the chips and assays are as follows:
GVS identifierchip
A1Affymetrix Mapping 100K Set
A5Affymetrix Mapping 500K Set
A9Affymetrix Genome-Wide Human SNP Array 6.0
I1Illumina Human-1 BeadChip (100K)
I3.2Illumina HumanHap300 BeadChip
I5Illumina HumanHap550 BeadChip
I6Illumina HumanHap650Y BeadChip
I6QIllumina Human610-Quad BeadChip
I10Illumina Human1M BeadChip (1 million)
ABVALApplied Biosystems Validated TaqMan Assay
ABDMEApplied Biosystems DME TaqMan Assay


RepeatMasker:

whether the SNP is in a repeat region; the regions, as identified by the RepeatMasker program, were downloaded in the file chromOut.zip from the UCSC Genome site.

Tandem Repeats Finder:

whether the SNP is in a repeat region; the regions, as identified by the Tandem Repeats Finder program filtered to keep repeats with period of less than or equal to 12, were downloaded in the file chromTrf.zip from the UCSC Genome site.

Copy Number Variation:

whether the SNP is in one or more copy-number-variation regions tabulated by the Centre for Applied Genomics Database of Genomic Variants.

GVS Function (only available if "Table/Image" or "Custom-Text" is selected):

similar to Function above, but these functions are calculated locally; in general the two will agree; the GVS functions are calculated in advance and stored in the database; they are based on the alleles for all populations and individuals

the SNP is classified according to the location of the gene and its transcription and coding boundaries, and the bases in the coding region are divided into codons (if a multiple of 3):
coding-nonsynonymous(within an exon and translated, protein amino acid change)
splice-site(in first two bases or last two bases of an intron)
coding-synonymous(within an exon and translated, no protein amino acid change)
coding-indel(within an exon and translated, variation is an indel, and no attempt is made to rate as synonymous or not)
coding-notMod3(within an exon and translated, number of coding bases is not a multiple of 3, and no attempt is made to rate as synonymous or not)
coding-monomorphic(within an exon and translated, all genotypes in the database are the same, and no attempt is made to rate as synonymous or not)
mrna-utr(within an exon, but not translated)
intron(between exons)
intergenic                          (between genes)

there can be more than one function for a given SNP, if two or more genes overlap or if there is alternative splicing; one function is reported, that highest in the list above

Upstream Flank and Downstream Flank:

Sequence upsteam and downstream of a variation (not including the variation); from the UCSC hg18 genome sequence; upstream and downstream are relative to the genome assembly, not to the strand of any gene present; no flanks are available for indels at this time (they are listed as "NA"); if "Table/Image" is selected, 25 bases on each side are listed, if "Text", 100 bases.

NumberAlleles, NumberMajorAlleles, NumberMinorAlleles: (only available if "Custom-Text" is selected)

the number of alleles measured for the individuals queried (not counting missing data)
 
Skip footer links and go to content