GVS: Genome Variation Server 147

GVS Build Notes

The current GVS version is 12.00, July 1, 2016.

The variation locations are again mapped to the human genome reference sequence of December 2013 (UCSC hg38, NCBI build 38). The dbSNP build is 147.

Build notes for 11.00, November 1, 2015.

The variation locations are again mapped to the human genome reference sequence of December 2013 (UCSC hg38, NCBI build 38). The dbSNP build is 144.

Build notes for 10.00, January 30, 2015.

The variation locations are now mapped to the human genome reference sequence of December 2013 (UCSC hg38, NCBI build 38). The dbSNP build is 141.

PhastCons conservation scores and presence in CNV regions have been discontinued.

Build notes for 9.00, December 24, 2013.

The dbSNP build is now 138. The gene model, copy number variations, and chimp alleles have been updated. The variation locations are still mapped to the human genome reference sequence of February 2009 (UCSC hg19, NCBI build 37).

Build notes for 8.00, December 13, 2012.

The dbSNP build is now 137. This build has an improved set of functions. Our GVS functions are unchanged. The gene model has been updated. The variation locations are still mapped to the human genome reference sequence of February 2009 (UCSC hg19, NCBI build 37).

Build notes for 7.00, December 12, 2011.

The dbSNP build is now 134. The gene models, GERP scores, chimp alleles, and copy number variations have been updated. The variation locations are still mapped to the human genome reference sequence of February 2009 (UCSC hg19, NCBI build 37).

The list of GVS functions has been augmented to add the string "-near-splice" if the variation is in the first two or last two positions in an exon. If you are parsing text files, and want to pick out missense SNPs, for example, it will be necessary to use "contains" instead of "equals" for the strings. The "nonsense" classification has been replaced by "stop-gained" and "stop-lost".

Newer genotyping chips have not yet been added, though some older ones have been deleted.

For more details, see the How-to-Use page and the links at the bottom of that page.

Build notes for 6.01 Beta, July 11, 2010.

More related individuals have been identified for populations outside the HapMap set. In the case of HapMap population 1409 (CEU), there is better identification of individuals sequenced only in HapMap3 (affecting only searches where the "No HapMap 3" box is checked). The Illumina OmniExpress chip has been added.

Build notes for 6.00 Beta, June 4, 2010.

Data for dbSNP build 131 are now served out. The variation locations are now mapped to the human genome reference sequence of February 2009 (UCSC hg19, NCBI build 37).

The default for the "No HapMap 3" box is now false. HapMap 3 data will be included unless this box is checked. To compensate for uneven coverage of SNPs between HapMap 1/2 and HapMap3, the default for "Data Coverage (%) for Tag SNPs" has been reduced to 14, and that for "Data Coverage (%) for Clustering" has been reduced to 12. If you are selecting tagSNPs, but not using a mixture of HapMap 1/2 and HapMap3, you may want to set these values to the previous 85 and 70. (These thresholds determine when a SNP with missing data will be put into a separate tag SNP bin.)

Non-synonymous SNPs are now labeled "missense" on most web pages of the site.

dbSNP build 131 has no splice-site function calls. If we detect a splice-site, we annotate it as splice-5 or splice-3, overriding the dbSNP intron call.

The UCSC phastCons conservation scores are now those for 46 placental mammals.

Build notes for 5.11, February 26, 2010.

GERP conservation scores are now available in the text displays. For the "Text" display, with fixed columns, there is now always an additional column. In both text and tables, the original conservation score is now labeled with "phastCons".

Build notes for 5.10, February 5, 2010.

Conservation scores from the program GERP are now available (though so far in the tables, and not yet in the text displays). See the data sources documentation. Internally, there has been a major upgrade to the application server software (to JBoss 5).

Build notes for 5.09, June 30, 2009.

For LD plots, there is a choice of color, grayscale, or black-and-white (found after clicking the "Show More Parameters" button).

Build notes for 5.08, April 7, 2009.

In the haploviewGenotypes file of Custom-Text output, conflicts (X/X) have been replaced by unknown (N/N), as Haploview treats X alleles as third alleles.

Build notes for 5.07, February 4, 2009.

The default setting for the "No HapMap 3" box (found after clicking the "Show More Parameters" button) is now true. To include HapMap 3 data for populations having a mix of HapMap 1/2 and HapMap 3 (CEU, YRI, HCB, JPT), it is now necessary to remove the check from this box.

Build notes for 5.06, February 2, 2009.

There were only minor changes. The documentation lists of individuals and populations were updated to include HapMap phase 3 entries.

Build notes for 5.05, November 7, 2008.

Further rare cases were identified for the situation described in the build note for 5.02 below, and the code was corrected to accommodate them. In addition, the binning process to extract tag SNPs (ldSelect), was improved slightly to make a better choice when the search for the largest bin resulted in a tie. The results should now be independent of the order of SNPs presented to the algorithm.

For searches on one rs ID plus an extended region, and for displays by rs ID, a box will be drawn around that ID in the "display linkage disequilibrium" table and in all the graphical displays, to identify the SNP of interest.

Build notes for 5.04, October 10, 2008.

HapMap phase 3 genotypes are available. See the data sources documentation. For populations CEU, CHB, JPT, and YRI the results have been merged with earlier HapMap data. In those cases the HapMap 3 genotypes can be omitted by checking a "No HapMap 3" box (after clicking the "Show More Parameters" button).

Build notes for 5.03, September 19, 2008.

For file input, a ? is accepted as an allele, though it is changed to N in the analysis. There is better detection of, and error messages for, files that are not plain text.

The size of the map has been increased to accommodate displays with large numbers of genes or alternative transcripts.

Build notes for 5.02, September 10, 2008.

A bug that affected the linkage disequilibrium calculation of r2 was fixed. This bug affected the rare instance when the minor-allele frequencies of two SNPs were each very close to 50%, there were no individuals heterozygous for one SNP but not for the other (red-blue or red-yellow in the visual genotype graph), and no individuals homozygous-common (blue-blue) or homozygous-rare (yellow-yellow) for both SNPs. In these cases, the r2 value is now 1.0 rather than zero.

Build notes for 5.01, August 15, 2008.

For egp candidate-gene searches, the population menu now relects the choices for the particular gene. If the gene was sequenced on the original anonymous panel (PDR90), the only choice will now be "All".

For file input, with SNP IDs set to chromosome locations, additional annotation has been extended to the Custom-Text option.

For file input, visual-genotype graphics will now recognize floating-point SNP identifiers (e.g. 123.1), and sort them numerically, even if mixed with integral identifiers.

Build notes for 5.00, August 4, 2008.

The database has been upgraded to dbSNP build 129 (June 2008). The dbSNP function list is now nonsense, frameshift, missense, splice-5, splice-3, coding-synonymous, intron, utr-5, utr-3, near-gene-5, and near-gene-3. We retain the term coding-nonsynonymous for missense, though the set of coding-nonsynonymous SNPs no longer contains nonsense and frameshift SNPs.

There are modest changes in the chip set.

Build notes for 4.05, May 27, 2008.

For file input, with SNP IDs set to chromosome locations, more annotation is available in the SNP summary output.

Build notes for 4.04, May 21, 2008.

For file input, with SNP IDs set to chromosome locations, some annotation is available in the SNP summary output. The haplotype graphics is now available for Internet Explorer browsers.

Build notes for 4.03, May 12, 2008.

For file input, the SNP IDs can be set to chromosome locations (hg18), and rs IDs (if known) will be listed in the SNP summary output. A file-input bug was fixed: non-numerical SNP IDs are again allowed.

Build notes for 4.02, May 2, 2008.

HapMap phased genotypes are available for the 4 HapMap populations (autosomal chromosomes only). On the SNP summary page, our own calculations of SNP function are available as an optional column (See the column documentation.). Also on the SNP summary page, the genome locations now have links to the UCSC Genome Browser. Two features have been added to the maps: alternative transcripts and SNP density plots.

Build notes for 4.01, March 18, 2008.

The Custom-Text output for PHASE and Haploview was fixed to handle indels and genotype conflicts better, and to handle file input of genotypes with numerical rather than alphabetical genotypes.

Build notes for 4.00, March 6, 2008.

A new database with dbSNP build 128 is in place.

Build notes for 3.11, January 11, 2008.

In the visual genotype graphics of database searches, the SNPs are ordered by chromosome position (if not clustered), regardless of the "Output SNPs By" parameter setting. There is also the option of displaying both the position and the rs ID in these graphs.

Three new columns are available for the SNP summary if Custom-Text is selected: NumberAlleles, NumberMajorAlleles, NumberMinorAlleles.

The copy-number-variation links have been fixed to synchronize with changes at the site of the Centre for Applied Genomics Database of Genomic Variants.

If there are spaces in the names of submitter IDs, these are now replaced by underscores (to maintain space-separation of columns in our output).

A bug was fixed that affected only the display in the maps, for the case of genes with single exons: the 3' UTR is now correctly colored.

Build notes for 3.10, November 29, 2007.

For database searches, individuals may be listed by individual submitter IDs (rather than dbSNP pop:individ) by asking for more parameters with a new button and then selecting "Submitter IDs" from the "Output Individuals By" menu.

Candidate genes (table only so far) now have the same SNP summary information columns as the database SNP summary page.

Copy-number-variation annotation has been added to custom-text displays.

For custom-text PHASE output, the SNP labels are now always position, not rs ID, regardless of the "Output SNPs By" parameter setting, as PHASE requires ordered positions.

There are differences in the header lines (mostly the number of blank lines) for the text pages.

There is a new FAQ page linked from the bottom of the how-to-use page.

If a frequency cutoff is used, it is once again noted on the plot image.

Build notes for 3.09, October 23, 2007.

There are no functional changes. Better memory management should result in fewer server-down episodes.

There were 3 genes (IL9R, SYBL1, and SPRY3) for which the accessions in the NCBI files were listed both for X and Y coordinates. Their ranges ended up in our database with large values. This has been corrected, though for now only by eliminating the Y coordinates. If, in previous annotations, the gene list for SNPs on the X chromosome contained these 3 genes, this was an error.

Build notes for 3.08, October 5, 2007.

It is now possible to display the linkage disequilibrium between a search SNP and other SNPs nearby. The steps are as follows:
(1) select "dbSNP rsID" on the home page
(2) enter the rs ID of a SNP, and extend the chromosome region upstream and/or downstream to define the region of nearby SNPs (large enough to contain a number of other SNPs)
(3) set "Display SNPs By" to Custom-Text
(4) click on the "display linkage disequilibrium" button
(5) select the radio button "SNPs paired with ..."
(6) choose an r2 threshold and select annotation
(7) click the submit button

At the bottom of the "How To Use" page there are now links to our database population and individual content.

The setting of the "No Monomorphic Sites" checkbox is now displayed in the output header lines.

Build notes for 3.07, September 27, 2007.

In rare cases, and only when there is a large amount of missing data for some SNPs, the r2 linkage disequilibrium value was undefined (a strange character would have appeared in the LD table). This has been corrected. Tag SNP selection would only have been distorted if the "Data Coverage" parameters were set very low (much lower than the default values of 85 and 70).

Build notes for 3.06, September 13, 2007.

Use of the frequency cutoff has changed slightly. Because of rounding errors, the cutoff had been applied differently depending on the type of output. There is now a consistent cutoff used site-wide. For a given SNP, the minor-allele frequency is calculated and rounded to the nearest integer. When the integer is greater than or equal to the Allele Frequency Cutoff in the parameters form, the SNP is retained. The actual frequency cutoff is thus 0.5% below the Allele Frequency Cutoff. For example, setting the cutoff to 5% results in an actual cutoff of 4.5%.

Build notes for 3.05, August 16, 2007.

In addition to output by Table/Image and Text, there is now a Custom-Text option with more choices of formats and annotation.

Regions of copy number variation have been added to the list of SNP annotations (not yet available for Custom-Text).

It is now possible to search for genes with lower-case letters in the name (e.g. C7orf26). The gene name entered in the form can be in either upper or lower case.

Build notes for 3.04, August 9, 2007.

For the text version of the MultiPop-TagSelect table, the missing base/rsID column for equivalent SNPs has been added.

Build notes for 3.03, July 23, 2007.

The "Output SNPs By" default has been changed to "RS_ID".

Build notes for 3.02, July 12, 2007.

A base/rsID column has been added to the text version of the MultiPop-TagSelect table.

A chimp-allele column has been added to the text version of the SNP summary table.

For tag SNPs, there is a change in the way that the frequency and no-monomorphic filters are applied when there are multiple population groups (African, European, Asian, Amerindian, Hispanic, or Unknown). Before this build, the genotypes were merged and then the filters were applied. The genotypes were then separated into population groups and tag SNPs were selected for each population group. Now the genotypes are separated into population groups, the filters are applied for each group, and the tag SNPs are selected for each group. (In both cases the tag SNPs from each population are finally submitted to the MultiPop-TagSelect algorithm.) In this new way of handling the filters, a SNP that is monomorphic in one population but not others, is not considered when binning the one population if the no-monomorphic filter is selected. Also, for finite frequency cutoffs, a SNP that has high frequency in one population but not another now may not be filtered out because the average is too low to pass the filter. This change mostly affects tag SNP selection when the frequency threshold is set to zero or a very small number and the no-monomorphic filter is selected. In the rare case that after the filtering there is only one population group with genotypes left, the calculation reverts to the single-group case, and the MultiPop-TagSelect algorithm is not invoked.

Build notes for 3.01, June 29, 2007.

A base/rsID column has been added to the table version of the MultiPop-TagSelect table. The final Illumina 1-million-SNP chip data is available. The Affymetrix Genome-Wide Human SNP Array 6.0 chip data has been added.

Build notes for 3.00, June 4, 2007.

A new database with dbSNP build 127 is in place. Genes models (coding regions and exons) are now taken from NCBI files rather than UCSC files. Flanking sequence lengths have been increased from 75 to 100 bases in the text output. The Illumina 650 BeadChip has been added.

Build notes for 2.06, April 24, 2007.

For tag SNPs, SNPs with coverage below the "data coverage for Tag SNPs" cutoff are displayed with brackets around the SNP ID (in order to identify those possibly placed into separate bins just because of low coverage). Because this annotation does not work well when there are monomorphic SNPs, the default monomorphic check box has been set to checked. These brackets are so far only in the table output; text output is in progress.

The chimp (ancestral) alleles have been added to the graphical displays.

In choosing a transcript, the longest NM_ transcript, if available, is chosen; otherwise the longest of any type. If there is a tie in transcript length, the one with the most coding bases is chosen. This brings our choice of transcript into better alignment with the dbSNP functions. The transcript choice is printed out in the text header lines.

Applied Biosystems pre-designed SNPs are no longer in the table SNP summary output; they can be found in the text output.

The Illumina 550 BeadChip has been added.

Build notes for 2.05, March 16, 2007.

There are only performance enhancements.

Build notes for 2.04, March 6, 2007.

The major addition is the MultiPop-TagSelect algorithm. If multiple populations are selected, GVS will automatically divide the genotypes into population groups, perform the tag-SNP selection for each group, and feed those tag SNPs to the MultiPop-TagSelect algorithm. See the documentation.

Bold and plain text have been added to the graphical displays and to the tag-SNP tables to indicate whether a variation is in a unique or repeat region.

There are no longer any popup windows. Links are provided to choose tables and graphical displays.

Any table or text entry of N/A has been changed to NA (to help users of the R language).

At the end of the "How to Use GVS" page there is a list of links to all the documentation pages.

Build notes for 2.03, January 4, 2007.

Flanking sequences have been added to the SNP Summary page. For the column SubmitterIDs that appears in the text version of SNP Summary, spaces in the text have been replaced by underscores (so that spaces are used only to separate columns). There are improvements in the download speed. A training tutorial has been added: see the "Online Tutorial" link.

Build notes for 2.02, November 8, 2006.

Transcript, coding, and exon regions have been corrected for the use of base-0 coordinates for start positions in the UCSC knownGene.txt file, which was used to populate our database. This caused an occasional SNP at the edges of features to have the incorrect function (if not annotated by dbSNP).

Build notes for 2.01, October 27, 2006.

Chimp alleles have been added to the SNP Summary table.

Candidate gene searches now use NCBI 36 (hg18) locations (same as the database searches).

An infrequent bug, causing a database-connection error message when an indel with an unusual "+" genotype was encountered, was corrected.

Build notes for 2.00, October 14, 2006.

The database now contains the contents of dbSNP build 126. There is no longer data directly from the HapMap site (presumably this data is in build 126 now). For database searches, the variation locations are those of the UCSC Genome Browser for March 2006 (UCSC hg18, NCBI build 36). For candidate gene searches, the locations are still NCBI 35.1 and hg17 (but these will be upgraded in a few weeks).

For the cases of gene name and gene ID searches, the meanings of "upstream" and "downstream" have been changed so that "upstream" is always on the 5' end of the gene, and "downstream" is always on the 3' end, for either strand. Region extensions can now go up to 250 Kbp on each side.

Picking an mRNA accession number for defining gene and exon regions has been changed so that preference is given to those beginning with NM_.

New features include repeat information for the SNP Summary page, multiple transcript information available from the links in the Function column of the SNP Summary page, and choice of which additional information to display on the SNP Summary page.

Build notes for 1.06, September 12, 2006.

A bug was fixed: when merging data sets for type B (combined samples with common variations), a few genotypes with missing data in one data set (N/N) were reported twice: once as N/N and once as the known genotype in another set.

Build notes for 1.05, July 3, 2006.

A bug was fixed: when merging data sets, a few genotypes were reported as conflicts (X/X), when there was only missing data (N/N) for one set. (This bug was introduced when we switched from lower-case to upper-case letters for the genotypes.)

Build notes for 1.04, June 16, 2006.

A new parameter panel, Merging Data Sets, is added for users to select the way in which multiple data sets should be merged. Please follow the link, "see the detailed description of data merging", on the query result page to get more information regarding data merging.

Build notes for 1.03, May 4, 2006.

The SNP Summary results in both Table and Text formats have been updated. In limited cases, dbSNP assigned multiple rs IDs to the same SNPs. If there is an "alternate id" listed for a chip in the Genotyping Chip Availability column, it is an ID representing the same SNP as the rs ID listed in the "SNP rs ID" column. This alternate ID should be used to access the chip information from the corresponding company.

Build notes for 1.02, April 19, 2006.

The UCSC conservation scores have been updated to those for 17 vertebrates (still hg17). Applied Biosystems TaqMan assay information has been added. There is a new page listing files that were used to populate the database supporting GVS (see link in "About GVS").

Build notes for 1.01, April 12, 2006

For the maps, clicking on the gene names works again.

Build notes for 1.00, April 4, 2006

Major additions since the last beta version are as follows:
(1) The algorithm for binning variations by the r2 LD value has been modified to account for missing genotypes. There are two new parameters for selecting tag SNPs: "Data Coverage (%) for Tag SNPs" and "Data Coverage (%) for Clustering". The first is the minimum genotype coverage in percent for a SNP to be a potential tag SNP, and the second is the minimum genotype coverage in percent for a SNP to be potentially clustered with other SNPs. If a SNP falls below these requirements, it will be put into a bin by itself in the tag SNP output.
(2) The "Candidate Genes" data source directly accesses all the genes sequenced in the Seattle SNPs NHLBI Programs for Genomic Applications (PGA) and the NIEHS Environmental Genome Project (EGP) SNPs Program at the University of Washington.
(3) A genotyping-chip column has been added to the SNP summary table.

If you have been saving text data to files, note the following changes:
(a) A column has been added to the middle of the SNP summary columns: that for the minor allele.
(b) The genotypes are in upper case characters, rather than lower case.
(c) The tag SNP delimiters have been changed from "|" to space.
(d) There are several changes to the header lines.
(e) The format has been changed to eliminate unnecessary line breaks, for ease in exporting to Excel.
Skip footer links and go to content
Privacy Terms National Heart, Lung, and Blood Institute National Heart, Lung, and Blood Institute logo