Welcome

This is the documentation of UPDtool. UPDtool is used to detect uniparental disomy in SNP chip data of trios (father, mother, child).

Below you find information about installation, usage and algorithms of the tool.

More information can be found at the website of the Genomics working group of the Department of Medical Genetics, University Hospital Tübingen, Germany.
Please send feedback (e.g. bug reports or feature requests) to Christopher Schroeder or Marc Sturm.


Changelog

Version 0.2 (2013-07-15) Version 0.1

Installation

The executables of UPDtool can be found in the 'bin' folder. They are portable (i.e. they need no installer) and platform-independent command-line applicatons. You can simply run all executables on any platform (see notes below for installation of dependencies).

Notes for Windows users:

Notes for Linux users:


Example

Example input data and scripts that analyze the data can be found in the 'examples' folder. There is an example for UPDtool and an example for UPDbatch.

On windows the analysis is started using the 'example.bat' file.
On Unix/Linux platforms the 'example.sh' file should be used.

Note: To run the UPDbatch example you have to download the NetAffx annotation file for the Affymetrix Genome-Wide Human SNP Array 6.0 in CSV-format from Affymetrix (free registration required, tested with GenomeWideSNP_6.na32.annot.csv.zip). Rename the unzipped file to annotation_UPDbatch.csv and place it into the UPDbatch folder.

The output of the analysis can be found in the 'output' sub-folder.


Format documentation

UPDtool requires a list of SNP genotypes of a trio (father, mother, child) as input. The input file has to be formatted as a tab-separated value file (TSV) with the following columns:
chromosome - the chromosome, e.g. 'chr1'
position - the chromosomal position, e.g. '1156131'
father genotype - only 'AA', 'AB', 'BB' and 'NoCall' are accepted
mother genotype - see above
child genotype - see above

Note: To convert Affymetrics Genome-Wide Human SNP Array 6.0 genotype data to the UPDtool input format, use the UPDconvert.exe.
Note: Batch analysis of many trios can be similyfied using UPDbatch.exe.

 

The output of UPDtool is a TSV file containing identified UPD stretches. It contains the following columns:
chr - the chromosome
start - the start position of the stretch
end - the end position of the stretch
size_bp - the size of the stretch in bp
size_snps - the number of SNPs contained in the stretch
type - the UPD type (maternal/paternal hetero-/iso-disomy)
frac_hom - fraction of homocygous SNPs in the stretch
frac_het - fraction of heterocygous SNPs in the stretch
frac_me - fraction of mendelian error SNPs in the stretch
frac_ident_father - fraction of SNPs in the stretch where the genotype is identical to the father
frac_ident_mother - fraction of SNPs in the stretch where the genotype is identical to the mother


Algorithm and parameter details

The UPD detection algorithm consists of several steps. The steps and the parameters are described below:

Step 1: Determine inheritance information

First, the non-autosomal SNPs are removed from the input data and the SNPs are sorted according to chromosome/position.
Then, the inheritance information is determined (ME_paternal, ME_maternal, Possible_ME, No_ME).

Step 2: Determine stretches

In this step, putative UPD stretches are determined. Each SNP with inheritance 'ME_paternal' or 'ME_maternal' serves as a starting point for a stretch. The stretches are extended to both sides until a SNP with 'No_ME' inheritance or of the opposite ME inheritance is encountered.
In order to compensate for genotyping errors, adjacent stretches of the same inheritance type are merged.
Finally, all stretches with less then a given number of mendelian errors ('min_mes' paramter) are removed.

Step 3: Calculate sliding window statistics

In the third step, a sliding window statistics for each stretch is calculated. A window of a given size ('window_size' parameter) is slid over the stretch and the following statistics are determined for each window:
frac_hom - fraction of homocygous SNPs in the stretch
frac_het - fraction of heterocygous SNPs in the stretch
frac_me - fraction of mendelian error SNPs in the stretch
frac_ident_father - fraction of SNPs in the stretch where the genotype is identical to the father
frac_ident_mother - fraction of SNPs in the stretch where the genotype is identical to the mother
The sliding window statistics are smoothed using moving-average filter with a 10-th of the window size.

Step 4: Trim stretches

In this step, each stretch is trimmed from both ends until only the part with a minimum mendelian error fraction ('min_mes_fraction' parameter) remains.
If there is no window that exceeds the minimum mendelian error fraction, the stretch is rejected.

Step 5: Determine UPD type

Finally, for each stretch the UPD type (maternal/paternal hetero-/iso-disomy) is determined using the thresholds given by the parameters 'min_hetero, 'min_iso', 'min_mes_paternal' and 'max_mes_paternal'.
In this step, stretches that consist of different UPD types are split to two or more parts.
Note: Analysis of trios with consangouineous parents may give false positive results.
Note: UPDtool is not optimized for detection of mosaics. Mosaicism may give false negative results.

License

This software is provided with the freedom to use and distribute without any restrictions or fees.
When used as part of another software or software pipeline, the original paper has to be cited/referred to.

BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. WE PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.