Data-driven health research will provide more accurate health and disease profiles and better predictive models for disease progression and recovery. This will allow the healthcare service providers to offer tailored treatments that reduces economic and personal costs imposed by inefficient treatments. A prerequisite to develop solid, mathematic models is access to high-quality digital datasets of clinical data. Our vision is to make the metabolome from clinical samples accessible for data-driven health research.
The primary goal of TROMBOLOME is to build a comprehensive digital archive of the metabolome in clinical samples. We wish to explore how as much data as possible from biological samples can be analysed, digitalized, and finally applied in data-driven health research.
A blood sample holds massive amounts of biochemical data that reflects the state of the organism. From immune system transmitters to the degradation products of our breakfast. Beyond cells, proteins, and lipids, blood also consists of thousands of small organic molecules, referred to as the metabolome. It is challenging to analyse the whole metabolome because of its chemical diversity and large differences in concentration levels. In the project so far, analytical methods for comprehensive untargeted metabolome profiling of serum samples has been developed and validated. The methods cover amino acids, organic acids, carnitines, sugars, lipids, phospholipids and metabolites thereof. A compound library of around 600 known metabolites has been analyzed with the developed methods which will be used to identify these compounds in unknown samples. The analytical data from the compound library will also be used to train machine learning models to predict behavior of unknown metabolites on our analytical system. These model predictions will allow us to tentatively identify a much larger part of the measured metabolome.
In order to make scalable workflows to compare the metabolome between thousands of samples and to make the data more accessible, analytical raw data must be archived in a manner that makes it readable without the restrictions of vendor software. This will allow full control of results and subsequent data treatment processes. TROMBOLOME is being set up as a SQL database in the cloud, and the scrips needed to parse raw data from vendor file formats to the SQL database has been developed. The project has been evaluated by the regional ethical committee, and an application for access to biobank serum samples was subsequently sent to a population study (the Tromsø study). The application was not successful on first attempt as the method validation data was not complete at the time; a second application will be filed together with a validation report in the next reporting period.
The potential benefits of the project includes (i) setting new standards for digitalization and application of analytical raw data, (ii) to develop a method for streamlined annotation of biochemical components in the digital archive that can improve future analytical workflows, and (iii) to harvest a bigger potential of Norway's largest population study by adding this unique and vast data set
The vision is to integrate small organic molecule (the metabolome) analytical data with biobank big data. This will be achieved by building an annotated digital archive of biological samples from the Tromsø population study. We propose a radically new approach by facilitating bottom-up metabolomics with full metabolome component annotation. The project is initiated in the context of the UN sustainable development goal: ‘Ensure healthy lives and promote well-being for all at all ages’. We strongly believe that coupling the metabolome onto the vast digital archives of health and diseases status, genomics, and additional well-curated big data sets in the Tromsø study, can through focused efforts open up new scientific opportunities in data-driven research on diagnostic and lifestyle markers and lead to radical breakthroughs.
The main novelties in the project include significant methodological advancements for a) rational storage and organization of metabolome big data, and b) development of a complete multi-parametric virtual analytical method to perform automated large-scale metabolome annotation and define the borders of investigated chemical space. This will require basic research on statistical machine learning combined with applied deep machine learning.
The project will set new standards for accessibility of metabolome data to stakeholders and push the frontier for metabolomics in data-driven health research. This unique, well-organized, veracious, and readily retrievable digital archive will allow harnessing a bigger potential of Norway’s largest population study and increase its competitiveness.
A work package is dedicated to dissemination to relevant stakeholders to establish familiarity with TROMBOLOME’s merits.
The project group is international and cross-disciplinary with experts in machine learning, cheminformatics, and metabolomics with the complementary skills necessary to answer the research questions and realize the vision.