One of the major aims of my research is to develop computational tools for DNA sequence analysis and to make them available to other researchers.
ancIBD: Calling long idendity by descent (IBD) segments in ancient DNA
Available as Python package ancIBD that can be installed via the Python package index (PIP).
This softwares identifies the signposts of genealogical relationships between pairs of individuals - so called idendity by descent (IBD) segments. The input data are imputed and phased ancient DNA data (preferable using the software GLIMPSE and 1000G as reference haplotypes). The program then screens the imputed data for pairwise sharing of IBD segments and outputs these segments. In addition to the IBD caller, the software package contains tools for visualization purposes, such as depicting the IBD segments or comparing to expected IBD sharing between pairs of relatives. Vignette notebooks walking you through running the method as well as visualizations are available here.
hapROH: Calling runs of homozygosity from low coverage ancient DNA
Available as Python package hapROH that can be installed via the Python package index (PIP).
This software identifies the signposts of parental relatedness, so called long runs of homozygosity (ROH). The program is designed to work for the data type most commonly used in ancient DNA, pseudo-haploid eigenstrat files. The method is described and applied to the global aDNA record in a a publication in Nature Communications. In addition to the inference machinery, there are several programs to visualize and analyze the inferred ROH. Vignette notebooks walking you through the method as well as visualizations are linked in the official documentation.
hapCON: Inferring rates of DNA contamination in ancient genomes via haplotype copying
Together with Yilei Huang
Available as Python software package, hapCON, that can be installed via the Python package index (PIP).
Yilei and I developed a program to estimate rates of DNA contamination in ancient DNA. Estimating contamination from modern or other ancient DNA is a key quality control step in the aDNA field - to ensure that ancient data is authentic and not contaminated by other human DNA. Our new approach uses a haplotype copying approach to model male X chromosomes as haplotype mosaic from a reference panel. The method is described in detail in a preprint. Our software can analyze human aDNA BAM files, a common data type in aDNA.
Inferring Dispersal from long identity by descent tracts
During my PhD, I developed a likelihood framework to fit geographic patterns of long identity by descent tracts. It fits isolation by distance in population occupying a 2D habitat (pairs of indivdiuals nearby have more IBD sharing than more distant individuals as they are on average closer related). The Python code implementing this likelihood inference scheme and code to simulate IBD in 2D populations are available via github. The method is described in a publication in the journal Genetics.
Github: Ongoing projects
For past and ongoing projects I release development code at my github.