Batch Genome Variation Server 138
SNP Summary Table Values

SNP base:

location on the chromosome (hg19), 1-based

SNP rs ID:

dbSNP reference SNP identifier

Alleles:

the alternative bases, in order of increasing frequency

Minor Allele:

the allele with the lowest frequency

Minor-Allele Frequency (%):

the minor-allele frequency in percent

Heterozygosity:

the expected fraction of heterozygotes if the population is in Hardy-Weinberg equilibrium, calculated from the minor allele frequency q: 2q(1-q)

Hardy-Weinberg Chi-Square:

obtained by summing 3 terms (common homozygous, heterozygous, and rare homozygous), where each term is calculated from the number of individuals in one of the three classes:
(observed number - expected number)2
expected number
where the observed numbers are just the genotype counts, and the expected numbers are the Hardy-Weinberg values p2N (common homozygotes), 2pqN (heterozygotes), and q2N (rare homozygotes), where p is the major allele frequency, q is the minor allele frequency, and N is the number of individuals; p+q=1

Genes:

one or more genes for which the SNP is in the transcribed region

Function:

If the SNP has been given a function by dbSNP, that classification is used and "(dbSNP)" is added to the text:
stop-gained or stop-lost(within an exon and translated, non-stop codon changed to stop codon or stop codon changed to non-stop codon)
frameshift-variant(within an exon and translated, insertion or deletion interrupts the reading frame)
cds-indel(within an exon and translated, insertion or deletion keeps the reading frame)
missense(within an exon and translated, protein amino acid change, but not nonsense or frameshift)
splice-donor-variant(two locations at the 5' end of an intron)
splice-acceptor-variant(two locations at the 3' end of an intron)
synonymous-codon(within an exon and translated, no protein amino acid change)          
utr-variant-5-prime(within an exon, but not translated, 5' end of the gene)
utr-variant-3-prime(within an exon, but not translated, 3' end of the gene)
upstream-variant-2KB(upstream of the gene)
downstream-variant-500B(downstream of the gene)
intron-variant(between exons)
nc-transcript-variant(transcript variant of a non-coding RNA gene)

If the SNP has not been given a function by dbSNP, the SNP is classified according to the location of the gene and its transcription and coding boundaries, and "(GVS)" is added to the text:
stop-gained or stop-lost(within an exon and translated, codon change to or from a stop codon)
coding-indel(within an exon and translated, variation is an indel, and no attempt is made choose frameshift or not)
missense(within an exon and translated, protein amino acid change)
splice-5 or splice-3(in first two bases or last two bases of an intron)
coding-synonymous(within an exon and translated, no protein amino acid change)
coding-notMod3(within an exon and translated, number of coding bases is not a multiple of 3, and no attempt is made to rate as synonymous or not)
coding-monomorphic(within an exon and translated, all genotypes in the database are the same, and no attempt is made to rate as synonymous or not)
utr-5 or utr-3(within an exon, but not translated)
near-gene-5 or near-gene-3(within 2000 bases of an exon, upstream or downstream of a gene)
intron(between exons)
intergenic                          (between genes)

In both cases, there can be more than one function for a given SNP, if two or more genes overlap or if there is alternative splicing; one function is reported, that highest in the relevant list.

Conservation Score phastCons:

a number between 0 and 1 that describes the degree of sequence conservation among 46 placental mammals; these numbers are downloaded from the UCSC Genome site and are defined as the "posterior probability that the corresponding alignment column was generated by the conserved state of the phylo-HMM, given the model parameters and the multiple alignment" (see UCSC description).

Conservation Score GERP:

the rejected-substitution score from the program GERP, a number between -12.3 and 6.17 that describes the degree of sequence conservation among many mammalian species, with 6.17 being the most conserved (see this manuscript and this website)

Chimp Allele:

Chimp alleles are acquired from UCSC human/chimp alignment files. If the variation does not fall within an alignment block, or if it is an indel, the chimp allele is listed as "unknown". If the variation falls within a gap in the alignment, it is listed as "-". (Note that we do not use the chimp alleles from dbSNP, though ours are the same in most cases.)

Submitter IDs:

one or more SNP identifiers, as assigned by the submitters to dbSNP (comma separated); for now, the list includes all submissions to dbSNP, not just those of the population/submitter combination chosen in the search

GenotypingChipIDs:

In limited cases, dbSNP assigned multiple rs IDs to the same SNPs. If there is an "alternate-id" attached to a chip in the GenotypingChipIDs column, it is an ID representing the same SNP as the rs ID listed in the "rsID" column. This alternate ID should be used to access the chip information from the corresponding company.

Identifiers of SNPs that are on one or more whole-genome genotyping chips (comma separated); the chips are as follows:
GVS identifierchip
A9Affymetrix Genome-Wide Human SNP Array 6.0
I6QIllumina Human610-Quad BeadChip
I10Illumina Human1M BeadChip (1 million)
I7Illumina OmniExpress

RepeatMasker:

whether the SNP is in a repeat region; the regions, as identified by the RepeatMasker program, were downloaded in the file chromOut.zip from the UCSC Genome site.

Tandem Repeats Finder:

whether the SNP is in a repeat region; the regions, as identified by the Tandem Repeats Finder program filtered to keep repeats with period of less than or equal to 12, were downloaded in the file chromTrf.zip from the UCSC Genome site.

Copy Number Variation:

whether the SNP is in one or more copy-number-variation regions tabulated by the Centre for Applied Genomics Database of Genomic Variants.

Upstream Flank and Downstream Flank:

Sequence upsteam and downstream of a variation (100 bases, not including the variation); from the UCSC hg19 genome sequence; upstream and downstream are relative to the genome assembly, not to the strand of any gene present; no flanks are available for indels at this time (they are listed as "NA")

NumberAlleles, NumberMajorAlleles, NumberMinorAlleles:

the number of alleles measured for the individuals queried (not counting missing data)
 
Skip footer links and go to content
Privacy Terms National Heart, Lung, and Blood Institute National Heart, Lung, and Blood Institute logo