Back to search

IS-AUR-Samarb.progr. Norge Frankrike

Statistical modelling in Epigenetics: a pan-species study of distributions of CpG o/e ratios and DNA methylation patterns

Awarded: NOK 60,000

The project is located in the field of epigenetics, which is defined as the study of heritable changes in gene expression changes without DNA mutation. Epimutation can contribute to allow rapid adaptation to changing environmental conditions (Danchin 2011 ). Epigenetic modifications on the level of DNA methylation are often measured by means of the so-called CpG o/e ratios, which are, roughly speaking, calculated by the proportion of CpG-strings in a given DNA sequence. From a statistical perspective, the degree of methylation in a species has classically been captured by investigating the distribution of CpG o/e ratios in multiple DNA sequences. For varying species, these distributions may follow different shapes. In particular, the question of uni- or m ultimodality is crucial, because multiple modes may indicate the presence of different sub-populations of genes within a single genome/species. The statistical approach regularly utilized for identifying the number of modes is mixtures of Gaussian distrib utions (Fneich et al. 2013, Park et al. 2011). An alternative approach would be to identify the location and concentration of certain CpG-patterns directly in the DNA sequence, best described by means of hidden Markov models (HMMs). The main aim of the research project consists in the development of improved statistical methods for analyzing a) distributions of CpG o/e ratios and b) DNA methylation patterns, and application of these methods to a large variety of species. More precisely, in a first step DNA sequences for approx. 1000 species from dbEST are prepared/cleaned and CpG o/e ratios computed. Secondly, the distributions of these ratios are modelled by means of mixtures of non-Gaussian distributions. Thirdly, we analyze the DNA sequences directly by a specifically structured HMM which allows the identification of different gene classes. In a last step, we apply both methods to all species.

Funding scheme:

IS-AUR-Samarb.progr. Norge Frankrike