Single cell genomics techniques allow the variability of biological samples to be studied and new questions to be asked that were beyond the scope of conventional genomics. Despite considerable recent research in the characterization of intercellular transcriptomic variations, the fundamental mechanisms that generate variability from identical genomes remain poorly understood. This variability concerns all the molecular characteristics of a cell, namely, the genome, the transcriptome, the states and conformation of chromatin, and the proteome. For example, it has long been suspected that changes that cause chromatin to open or close for regulatory sequences play a key role in maintaining cell identity. Thus, in conjunction with transcriptomic measurements, these epigenomic measurements would reveal the role of different regulations (transcription factors, cis-regulatory sequences, chromatin conformation, etc.) in modulating gene expression in a population of cells sharing the same genome.
My thesis project is based on the development of statistical learning methods to integrate transcriptomic data, and epigenomic data concerning these regulatory mechanisms (HiC-Seq, ATAC-Seq, …), in order to reveal the structure of the cell population. Existing integration methods do not allow for the very specific distribution of genomic data in single cells to be taken into account, nor for unpaired tables, that is without correspondence between the individuals in each table. Moreover, the relationships considered between tables of different natures are often linear and do not always reveal the biological relational model. The objective of the thesis is therefore to develop statistical learning methods to study the regulatory mechanisms supporting differentiation and transcriptional variability in, for example, a CD8 lymphocyte population.