Don Klinkenberg, Jantien Backer, Jacco Wallinga
Whole-genome sequencing of pathogens from patient samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events, which can be used for further epidemiologic analysis such as identification of risk factors for infectivity and transmission. However, the relation between transmission events and WGS data is obscured by uncertainty arising from unobserved processes on three levels: transmission, within-host pathogen dynamics, and mutation. To properly resolve the transmission events and uncertainty therein, all these unobserved processes should be taken into account. Here we present a novel method to reconstruct transmission trees with WGS data. The method is Bayesian, combining elementary models for transmission, case observation, within-host pathogen dynamics, and mutation. It is implemented through MCMC, for which we designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. The result is a distribution of posterior transmission trees, which can be used to identify the most likely infector for each host, and their infection times.