Last modified: 2014-04-11
Abstract
Pathogens sampled from different hosts feature genetic differences that can be captured with high resolution tests for genetic variation. Genetic data can be presented in the form of phylogenetic trees, in which similar genomes are positioned close to each other. Phylogenetic trees are being used in outbreak investigations as well as to explore the evolution and spread of infections. The stochastic branching processes most commonly used to link epidemiological models to phylogenetic trees are based on the principle that, at each branching event in the tree, two new instantiations of the process arise. But in our setting, at each branching event, one edge is the continuing edge and represents the infector, whereas the other is a new edge and represents the infected. The infector's time flows undisrupted, but the infected's time starts at 0. Therefore, in contrast to previous studies, we define only one new instantiation of the process at each branching event. This is the more general Crump-Mode-Jagers (CMJ) branching process in which the only hypotheses are the infectious time distribution and the type of point process that generates new branches. Using a modification of the baseline CMJ process we build a new process to count the numbers of small subtree structures such as cherries and pitchforks. We find asymptotic and non asymptotic estimates of the frequencies of some such subtrees. This novel technique to describe outbreaks is then compared to data from real and simulated outbreaks and is found to describe subtree frequency better than the branching processes previously used. We discuss the implications for the rapidly growing field of pathogen "phylodynamics": linking genomic data for pathogens to their underlying dynamics.