GVS: Genome Variation Server
An NHLBI Program for Genomic Applications  

Input Data Files
The following files were used to populate the database supporting GVS.
1. dbSNP genotypes
ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/genotype (October 2007)
NOTE: There are are about ten times more conflicting genotypes in dbSNP build 128 than in build 127 (though the total number of genotypes is not that much different). The majority of the conflicts arise from data submitted in January of 2007, with a submitter ID beginning with PGP. The genotypes in this data set represent about 10% of the total number of genotypes. About 4% of these PGP genotypes are in conflict with other genotypes. We have removed some of these PGP genotypes, as there are internal inconsistences. About 10,000 of the PGP SNPs have genotypes that are heterozygous for every HapMap individual. These all-heterozygous genotypes were removed, as well as all genotypes for which there were more than 14 inconsistent genotypes for a given SNP.
2. dbSNP annotations
ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML (October 2007, mapping, gene function information, population definitions, submitter information)
3. NCBI gene files and synonyms
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq (November 2007, genes and coding regions)
ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/mapview/seq_gene.md (downloaded November 2007, exons)
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info (November 2007, unoffical names for the gene)
4. UCSC conservation scores
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons17way/ (April 6, 2006)
5. repeats
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/ (Feb.3, 2006)
   chromOut.zip: RepeatMasker
   chromTrf.zip: Tandem Repeats Finder
6. SNPs on chips
files from Affymetrix Inc. and Illumina Inc. (January, 2008) and from Applied Biosystems (April 2006)
7. UCSC sequence files
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/ (March 2006)
8. UCSC chimp alleles
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/axtNet/ (July 15, 2006)
9. copy number variation
http://projects.tcag.ca/variation/tableview.asp?table=DGV_Content_Summary.txt, files variation.hg18.v3.txt and indel.hg18.v3.txt(December, 2007)
10. phased genotypes
http://www.hapmap.org/downloads/phasing/2006-07_phaseII/phased/ (July 2006)
The GVS phased genotypes (autosomes only) are those generated by the HapMap group, who ran the PHASE software for HapMap release #21. This "phased" data set (rather than the "all" data set) only includes sites that segregate in the population selected (genotypes not monomorphic for all individuals in the population). If a SNP isn't found in a particular population, it isn't included in the phased genotypes for that population. It will show up as "NN" if multiple populations are combined, and at least one of the populations is not monomorphic there.
 
Skip footer links and go to content