Chieh-Hsi Wu, Nicola de Maio, Daniel J Wilson
Information on the demographic history of pathogens provides insights into the epidemiological dynamics of pathogen populations. The skyline-plot methods (Pybus et al., 2000; Drummond et al., 2005; Gill et al., 2012) aim to recover the temporal variation in the effective population size from molecular sequence data. However, these methods assume that the samples are all collected from a single panmictic population. Many studies have shown that ignoring the population structure, when it is present, can result in misleading inference of the population trends (Heller et al., 2013; Hall et al., 2015). Therefore, in order to avoid model misspecification, estimating demographic history of structured populations requires a method that accounts for the interacting effects of the population trend and the migration process between subpopulations on the shape of the genealogy. The structured coalescent (Notohara, 1990) extends the Kingman coalescent to geographically structured populations, models the migration process between subpopulations and incorporates its effect on the tree shape. Recently developed structured coalescent-based methods aim to overcome the computational burden that hampered Bayesian inference under the structured coalescent (Vaughan et al., 2014; de Maio et al., 2015). However, these methods do not accommodate changes in the effective population size through time.
Here, we present a new method extending a recently proposed structured coalescent approximation (de Maio et al., 2015) to allow efficient joint inference of the demographic history and the migration process within structured populations. Our method employs a Bayesian nonparametric smoothing approach in the reconstruction of the demographic history of the subpopulations. In a simulation study, we demonstrate how our method improves the population trend estimates under various sampling and demographic scenarios with population structure. Furthermore, we apply our method to a number of empirical virus datasets and compare the reconstructed demographic history with that recovered from methods that ignore population structure.