GVS: Genome Variation Server 144

Input Data Files
The following files were used to populate the database supporting GVS.
1. dbSNP genotypes
ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/genotype (June 2015)
NOTE: As for previous dbSNP builds, we have removed some genotypes with a submitter ID beginning with PGP, as some of these (for a given rs ID) have high numbers of conflicts (more than 14) with other genotypes, or are heterozygous for every HapMap individual.
2. dbSNP annotations
ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/XML (May 2015, mapping, gene function information, population definitions, submitter information)
3. NCBI gene files and synonyms
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq (May, 2015, genes and coding regions)
ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/ref_GRCh38.p2_top_level.gff3 (March, 2015, exons)
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info (May, 2015, unoffical names for the gene)
4. GERP conservation scores
The manuscript describing the program Genomic Evolutionary Rate Profiling (GERP) can be found at http://genome.cshlp.org/content/15/7/901.full.
The GERP website is at http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html.
The hg19 GERP rejected-substitution scores were downloaded from this site in September of 2011, and were lifted to hg38.
5. repeats
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/ (January, 2014)
   hg38.fa.out: RepeatMasker
   hg38.trf.bed: Tandem Repeats Finder
6. SNPs on chips
files from Affymetrix Inc. (July, 2008) and Illumina Inc. (July, 2008 and November, 2009)
7. UCSC sequence files
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/chromosomes/ (Dec. 2013)
8. UCSC chimp alleles
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/vsPanTro4/hg38.panTro4.net.axt.gz (May 2014)
9. phased genotypes
http://www.hapmap.org/downloads/phasing/2006-07_phaseII/phased/ (July 2006)
The GVS phased genotypes (autosomes only) are those generated by the HapMap group, who ran the PHASE software for HapMap release #21. This "phased" data set (rather than the "all" data set) only includes sites that segregate in the population selected (genotypes not monomorphic for all individuals in the population). If a SNP isn't found in a particular population, it isn't included in the phased genotypes for that population. It will show up as "NN" if multiple populations are combined, and at least one of the populations is not monomorphic there.
