|
| The following files were used to populate the database supporting GVS. |
|
| ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/genotype (June 2008) |
|
NOTE: There are are about ten times more conflicting
genotypes in dbSNP build 128 than in build 127 (though the total number of
genotypes is not that much different). The majority of the conflicts
arise from data submitted in January of 2007, with a submitter ID beginning with PGP.
The genotypes in this data set represent about 10% of the total number of genotypes.
About 4% of these PGP genotypes are in conflict with other
genotypes. We have removed some of these PGP genotypes from our 128 and 129 databases, as there are internal inconsistences.
About 10,000 of the PGP SNPs have genotypes that are heterozygous for every HapMap individual.
These all-heterozygous genotypes were removed, as well as
all genotypes for which there were more than 14 inconsistent genotypes for a given SNP.
|
|
| http://ftp.hapmap.org/genotypes/2008-07_phaseIII/hapmap_format/forward/ (draft release 1, August 2008) |
|
The HapMap3 genotypes for the CEU, CHB, JPT, and YRI populations were entered into the GVS database with the previous dbSNP population IDs. If the individuals
were the same as in phases I and II, those IDs were used. For the 7 new populations, populations were assigned these values: 1001401 for ASW, 1001402
for CHD, 1001403 for GIH, 1001404 for LWK, 1001405 for MEX, 1001406 for MKK, and 1001407 for TSI.
For all new individuals, the ID was set to the numerical part of the ID in the HapMap
file + 1,000,000 (e.g. NA10837 was given an individual ID of 1010837). Once these genotypes are available from dbSNP, the population and individual IDs will be
updated.
|
|
| ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML (June 2008, mapping, gene function information, population definitions, submitter information) |
|
| ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq (July 2008, genes and coding regions) |
| ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/mapview/seq_gene.md (downloaded May 2008, exons) |
| ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info (July 2008, unoffical names for the gene) |
|
| http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons17way/ (April 6, 2006) |
|
| http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/ (Feb.3, 2006) |
| chromOut.zip: RepeatMasker |
| chromTrf.zip: Tandem Repeats Finder |
|
| files from Affymetrix Inc. and Illumina Inc. (July, 2008) and from Applied Biosystems (April 2006) |
|
| http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/ (March 2006) |
|
| http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/axtNet/ (July 15, 2006) |
|
| http://projects.tcag.ca/variation/tableview.asp?table=DGV_Content_Summary.txt, files variation.hg18.v5.txt and indel.hg18.v5.txt(July, 2008) |
|
| http://www.hapmap.org/downloads/phasing/2006-07_phaseII/phased/ (July 2006) |
|
The GVS phased genotypes (autosomes only) are those generated by the HapMap group, who ran the PHASE software for
HapMap release #21. This "phased" data set (rather than the "all" data set) only includes sites that segregate in
the population selected (genotypes not monomorphic for all individuals in the population). If a SNP
isn't found in a particular population, it isn't included in the phased genotypes for
that population. It will show up as "NN" if multiple populations are combined, and at least one of the populations
is not monomorphic there.
|
|