|
|
 |
SNP base:
location on the chromosome (hg18), 1-based
SNP rs ID:
dbSNP reference SNP identifier
Alleles:
the alternative bases, in order of increasing frequency
Minor Allele:
the allele with the lowest frequency
Minor-Allele Frequency (%):
the minor-allele frequency in percent
Heterozygosity:
the expected fraction of heterozygotes if the population is in Hardy-Weinberg equilibrium, calculated from
the minor allele frequency q: 2q(1-q)
Hardy-Weinberg Chi-Square:
obtained by summing 3 terms (common homozygous, heterozygous, and rare homozygous), where each
term is calculated from the number of individuals in one of the three classes:
|
(observed number - expected number)2
|
|
expected number
|
where the observed numbers are just the genotype counts, and the expected numbers are the Hardy-Weinberg values
p2N (common homozygotes), 2pqN (heterozygotes), and q2N (rare homozygotes), where p is the
major allele frequency, q is the minor allele frequency, and N is the number of individuals; p+q=1
Genes:
one or more genes for which the SNP is in the transcribed region
Function:
If the SNP has been given a function by dbSNP, that classification is used and "(dbSNP)" is added to the text:
| nonsense | (within an exon and translated, amino acid changed to stop codon) |
| frameshift | (within an exon and translated, insertion or deletion interrupts the reading frame) |
| coding-nonsynonymous | (within an exon and translated, protein amino acid change, but not nonsense or frameshift; dbSNP calls this missense) |
| splice-5 or splice-3 | (in first two bases or last two bases of an intron) |
| coding-synonymous | (within an exon and translated, no protein amino acid change) |
| utr-5 or utr-3 | (within an exon, but not translated) |
| near-gene-5 or near-gene-3 | (intergenic, but within 2000 bases of a transcribed region) |
| intron | (between exons) |
If the SNP has not been given a function by dbSNP, the SNP is classified according to the location of the gene
and its transcription and coding boundaries, and "(GVS)" is added to the text:
| coding | (within an exon and translated, protein amino acid change unknown) |
| splice-site | (in first two bases or last two bases of an intron) |
| mrna-utr | (within an exon, but not translated) |
| intron | (between exons) |
| intergenic | (between genes) |
In both cases, there can be more than one function for a given SNP, if two or more genes overlap or if there is alternative splicing;
one function is reported, that highest in the relevant list above. This dbSNP function is the one used to color-code the SNPs in
various places (and not the GVS function described below).
Conservation Score:
a number between 0 and 1 that describes the degree of sequence conservation among 17 vertebrate species; these numbers are downloaded from
the UCSC Genome site
and are defined as the "posterior probability that the corresponding alignment column was generated by the conserved state
of the phylo-HMM, given the model parameters and the multiple alignment"
(see UCSC description).
Chimp Allele:
Chimp alleles are acquired from UCSC human/chimp alignment files.
If the variation does not fall within an alignment block, or if it is an indel, the chimp allele is listed as "unknown". If the variation
falls within a gap in the alignment, it is listed as "-". (Note that we
do not use the chimp alleles from dbSNP, though ours are the same in most cases.)
Submitter IDs (only available if "Text" or "Custom-Text" is selected):
one or more SNP identifiers, as assigned by the submitters to dbSNP (comma separated); for now, the list includes
all submissions to dbSNP, not just those of the population/submitter combination chosen in the search
Genotyping Chip Or Assay Availability (case of "Table/Image"):
In limited cases, dbSNP assigned multiple rs IDs to the same SNPs. If there is an "alternate id" listed for a chip in the Genotyping Chip Availability column, it is an ID representing the same SNP as the rs ID listed in the "SNP rs ID" column. This alternate ID should be used to access the chip information from the corresponding company.
Whether the variation is on one or more whole-genome genotyping chips or is available as a TaqMan assay; the chips and assays are as follows:
| Affymetrix Mapping 100K Set |
| Affymetrix Mapping 500K Set |
| Affymetrix Genome-Wide Human SNP Array 6.0 |
| Illumina Human-1 BeadChip (100K) |
| Illumina HumanHap300 BeadChip |
| Illumina HumanHap550 BeadChip |
| Illumina HumanHap650Y BeadChip |
| Illumina Human610-Quad BeadChip |
| Illumina Human1M BeadChip (1 million) |
| Applied Biosystems Validated TaqMan Assay |
| Applied Biosystems DME TaqMan Assay (also validated) |
GenotypingChipIDs (case of "Text" or "Custom-Text"):
In limited cases, dbSNP assigned multiple rs IDs to the same SNPs. If there is an "alternate-id" attached to a chip in the GenotypingChipIDs column, it is an ID representing the same SNP as the rs ID listed in the "rsID" column. This alternate ID should be used to access the chip information from the corresponding company.
Identifiers of SNPs that are on one or more whole-genome genotyping chips or are available as TaqMan assays (comma separated); the chips and assays are as follows:
| GVS identifier | chip |
| A1 | Affymetrix Mapping 100K Set |
| A5 | Affymetrix Mapping 500K Set |
| A9 | Affymetrix Genome-Wide Human SNP Array 6.0 |
| I1 | Illumina Human-1 BeadChip (100K) |
| I3.2 | Illumina HumanHap300 BeadChip |
| I5 | Illumina HumanHap550 BeadChip |
| I6 | Illumina HumanHap650Y BeadChip |
| I6Q | Illumina Human610-Quad BeadChip |
| I10 | Illumina Human1M BeadChip (1 million) |
| ABVAL | Applied Biosystems Validated TaqMan Assay |
| ABDME | Applied Biosystems DME TaqMan Assay |
RepeatMasker:
whether the SNP is in a repeat region; the regions, as identified by the RepeatMasker program, were downloaded in the file chromOut.zip from
the UCSC Genome site.
Tandem Repeats Finder:
whether the SNP is in a repeat region; the regions, as identified by the Tandem Repeats Finder program filtered to keep
repeats with period of less than or equal to 12, were downloaded in the file chromTrf.zip from
the UCSC Genome site.
Copy Number Variation:
whether the SNP is in one or more copy-number-variation regions tabulated by the
Centre for Applied Genomics Database of Genomic Variants.
GVS Function (only available if "Table/Image" or "Custom-Text" is selected):
similar to Function above, but these functions are calculated locally; in general the two will agree; the GVS functions are calculated in advance and stored in the database;
they are based on the alleles for all populations and individuals
the SNP is classified according to the location of the gene
and its transcription and coding boundaries, and the bases in the coding region are divided into codons (if a multiple of 3):
| coding-nonsynonymous | (within an exon and translated, protein amino acid change) |
| splice-site | (in first two bases or last two bases of an intron) |
| coding-synonymous | (within an exon and translated, no protein amino acid change) |
| coding-indel | (within an exon and translated, variation is an indel, and no attempt is made to rate as synonymous or not) |
| coding-notMod3 | (within an exon and translated, number of coding bases is not a multiple of 3, and no attempt is made to rate as synonymous or not) |
| coding-monomorphic | (within an exon and translated, all genotypes in the database are the same, and no attempt is made to rate as synonymous or not) |
| mrna-utr | (within an exon, but not translated) |
| intron | (between exons) |
| intergenic | (between genes) |
there can be more than one function for a given SNP, if two or more genes overlap or if there is alternative splicing;
one function is reported, that highest in the list above
Upstream Flank and Downstream Flank:
Sequence upsteam and downstream of a variation (not including the variation); from the UCSC hg18 genome sequence;
upstream and downstream are relative to the genome assembly,
not to the strand of any gene present; no flanks are available for indels at this time (they are listed as "NA");
if "Table/Image" is selected, 25 bases on each side are listed, if "Text", 100 bases.
NumberAlleles, NumberMajorAlleles, NumberMinorAlleles: (only available if "Custom-Text" is selected)
the number of alleles measured for the individuals queried (not counting missing data)
|
|
|
|
|
 |
|