|
| The following files were used to populate the database supporting GVS. |
|
| ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/genotype (October 2007) |
|
NOTE: There are are about ten times more conflicting
genotypes in dbSNP build 128 than in build 127 (though the total number of
genotypes is not that much different). The majority of the conflicts
arise from data submitted in January of 2007, with a submitter ID beginning with PGP.
The genotypes in this data set represent about 10% of the total number of genotypes.
About 4% of these PGP genotypes are in conflict with other
genotypes. We have removed some of these PGP genotypes, as there are internal inconsistences.
About 10,000 of the PGP SNPs have genotypes that are heterozygous for every HapMap individual.
These all-heterozygous genotypes were removed, as well as
all genotypes for which there were more than 14 inconsistent genotypes for a given SNP.
|
|
| ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML (October 2007, mapping, gene function information, population definitions, submitter information) |
|
| ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq (November 2007, genes and coding regions) |
| ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/mapview/seq_gene.md (downloaded November 2007, exons) |
| ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info (November 2007, unoffical names for the gene) |
|
| http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons17way/ (April 6, 2006) |
|
| http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/ (Feb.3, 2006) |
| chromOut.zip: RepeatMasker |
| chromTrf.zip: Tandem Repeats Finder |
|
| files from Affymetrix Inc. and Illumina Inc. (January, 2008) and from Applied Biosystems (April 2006) |
|
| http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/ (March 2006) |
|
| http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/axtNet/ (July 15, 2006) |
|
| http://projects.tcag.ca/variation/tableview.asp?table=DGV_Content_Summary.txt, files variation.hg18.v3.txt and indel.hg18.v3.txt(December, 2007) |
|
| http://www.hapmap.org/downloads/phasing/2006-07_phaseII/phased/ (July 2006) |
|
The GVS phased genotypes (autosomes only) are those generated by the HapMap group, who ran the PHASE software for
HapMap release #21. This "phased" data set (rather than the "all" data set) only includes sites that segregate in
the population selected (genotypes not monomorphic for all individuals in the population). If a SNP
isn't found in a particular population, it isn't included in the phased genotypes for
that population. It will show up as "NN" if multiple populations are combined, and at least one of the populations
is not monomorphic there.
|
|