Chalmers Conferences, 9th European Conference on Mathematical and Theoretical Biology

A hierarchical Bayesian model for gene ranking in in metagenomics based on differential abundance
Viktor Jonsson

Last modified: 2014-03-28


Metagenomics is a growing research field within ecology and medicine where complex mixtures of microbes are studied on the genome level. The majority of bacteria in the environment are unculturable and therefore difficult to study individually. Metagenomics does not rely on cultivation and therefore enables analysis bacteria in their natural communities. The aim is to gain insight into such communities by observing differences in the abundance of genes between conditions. Gene abundance is quantified by counting the number of DNA fragments for each gene in each sample. The statistical challenge lies in finding the genes which have a significantly different abundance between conditions. However, the discrete and overdispersed nature of the data makes analysis challanging and methods based on normality assumptions become suboptimal. In addition, the number of samples is often small while the number of genes present in a bacterial community is vast, creating the problem of finding a few truly differentially abundant genes in a sea of noise. On this poster we present a novel statistical model for inference in metagenomics. The model is based on a generalized linear model with a canonical log-link but we extend this to include robust moderation of gene-specific variance. The moderation of variance is achieved by putting a hierarchical structure on the variance and assuming that the gene-specific variability is drawn from a global prior distribution. The model is implemented in a Bayesian framework and relies on Markov Chain Monte Carlo (MCMC) sampling. The performance of the model has been evaluated on both simulated and real datasets and our results show that it performs significantly better than standard methods, including ordinary generalized linear models and t-tests with and without variance stabilizing transforms. We conclude that hierarchical Bayesian modeling can substantially increase the power of statistical inference in metagenomics.


metagenomics; statistics;