Last modified: 2014-06-09

#### Abstract

Binary or count matrices are widely used in numerous fields and namely in ecology. Metagenomics which studies microbial communities directly from environmental samples, provides abundance matrices where rows correspond to bacteria and columns to biological samples. One major goal is to find associations between bacteria communities and biological samples. We will propose a model and tassociated inference procedures for a simultaneous clustering: one on the populations of bacteria constituting the metagenome, and the other one on the samples. We will use the latent block model framework introduced by Govaert and Nadif (2010). As metagenomics data are abundance matrices, it seems natural to use a Poisson distribution. Nevertheless recent studies show that metagenomics data are sparse and overdispersed. We will thus propose a Zero-inflated Poisson in order to take this excess of zero counts into account. Since latent variables (unknown labels of groups respectively in rows and columns) are not independent conditionally on observed variables, the classical maximum likelihood inference is impossible. We will present an inference algorithm based on a variational approach (Wainwright and Jordan (2008)). We will apply this model to metagenomics data in order to study the plant-microbial communities interactions in the rhizosphere, the region of soil directly influenced by root secretions and associated soil microorganisms.

References :

Govaert, G. et Nadif, M. (2010), *Latent Block Model for contingency table*, Communications in Statistics – Theory and Methods, 39, 3, 416-425.

Wainwright, M.-J. And Jordan, M. I. (2008), *Graphical models, exponentials families, and variational inference*. Foundations and Trends in Machine Learning, Vol. 1, Numbers 1-2, pp. 1 :305.