Back to search

FRIMEDBIO-Fri prosj.st. med.,helse,biol

Micropeptides - searching for function in the dark matter of the genome

Alternative title: Mikrogener - søken etter funksjon i genomets mørke materie

Awarded: NOK 7.0 mill.

The number of human genes have not been easy to determine. Before the mapping of the human genome estimates ranged in the order of 100,000, but as the human DNA sequence became available this estimate was drastically reduced. Gene annotation, however, is not a straightforward task and even though we now know the exact sequence of DNA that is required to encode a human being does not mean we can immediately extract all genes. Sophisticated algorithms have therefore been developed that can predict and map the genes. As these algorithms increased in accuracy the estimated number of genes has steadily dropped converging to about 20,000 where it has remained for the last few years. Surprisingly, new technological developments that enable us, with great precision, to map exactly which genes are active have revealed that there are a lot more activity in the genome than these 20,000 canonical genes. These mysterious novel sites of gene-activity have eluded discovery due to their diminutive size, effectively hiding in plain sight in the vast sea of DNA. The functions of most of this activity is still highly disputed, but it is likely some of these encode bona fide "microgenes" and that the current number of genes is an under-estimate. This project has developed several new methods, both experimental and computational that can be used to identify microgenes hiding in the genome. Using these methods, we have found several new candidates for novel genes and our future work is focused on characterizing the function of these. We have also developed new tools to study the regulation of translation that will be useful for the field of translation and for gene regulation studies.

Outside of the 9 published papers and the 3 forthcoming ones, the project has led to the training of top researchers in the field of protein synthesis and high-throughput sequencing-based discovery assays. Specifically, these researchers have contributed to pushing the methodological boundaries in studying protein synthesis and have contributed to the development of several new computational methods and experimental assays. In particular, we expect the development of ribosome complex profiling (RCP-seq), an assay to probe the transcriptome-wide activity of translation initiation, to be a significant boon to the community.

Over the last decade advances in high-throughput sequencing assays and work by us and others have identified a large number of previously unknown transcripts. While their existence is undisputed, their biological functions have been largely unclear spurring debates about the fidelity of transcription initiation and leading to controversies surrounding "junk DNA". While most of these transcripts were presumed to be non-coding due to their lack of obvious protein coding potential, recent evidence from high-throughput assays have suggested that a large number of these are in fact undergoing translation. These mysterious new peptides have eluded discovery due to their diminutive size effectively hiding them from most homology searches, random mutagenesis screens and computational methods. Their biological relevance is controversial, but given that a few have already been shown to have important functions it it is likely that several more of these exist and that the number of genes in the genome is under-estimated. Building on our previous experience in uncovering protein-coding potential and our proven track record in discovering novel, functional peptides I propose to develop rigorous methods to detect these small peptides to uncover the extent of novel genes. The project will combine state-of-the-art molecular biology and bioinformatics to determine the optimal protocol for discovery based on high-throughput sequencing assays. These assays will be combined with the use of chemical compounds to perturb the dynamics of protein synthesis, taking advantage of the resulting changes to obtain further evidence of active translation. We will apply this method to discover novel genes in three separate systems: during embryogenesis, a circadian time course in brain and in human cell lines. Novel peptides will be confirmed with independent lines of evidence, e.g. mass spectrometry, and the most promising candidates will be selected for functional characterization.

Funding scheme:

FRIMEDBIO-Fri prosj.st. med.,helse,biol

Funding Sources