This project will create intelligent computer systems for personalized and precise risk prediction and diagnosis of cardiovascular and other "lifestyle diseases" from genomic data and measurements of circulating protein concentrations in the blood.
Lifestyle diseases result from malfunctioning processes in multiple organs, which communicate by secreting proteins in the bloodstream. By intercepting these signals, we aim to detect malfunctioning processes before the disease manifests itself in clinical symptoms. Modern "omics" technologies measure the concentrations of thousands of blood proteins simultaneously, vastly expanding medical diagnostic capabilities. However, for most of these proteins we do not know their role in disease. In other words, we can intercept the communication signals between organs, but we do not know the language they are written in.
Machine learning is a branch of artificial intelligence that can decode complex signals, given sufficient training data. This project will have privileged access to a unique resource of anonymised measurements of genomic data, blood protein concentrations, and molecular activity levels in organs from more than twenty thousand Nordic individuals.
To distinguish causation from correlation in big data, the project will use a technique called "Mendelian randomization" that mimics randomised controlled trials by assigning individuals to groups based on their genetic profiles. We anticipate that statistical machine learning models based on causal relations will result in truly predictive, intelligent systems for risk prediction and diagnosis of complex, lifestyle diseases.
This project will create intelligent systems for personalized and precise risk prediction and diagnosis of non-communicable diseases using multi-omics data, by developing, implementing and validating novel algorithms for structure learning and inference in large-scale, multi-organ causal Bayesian gene networks, based on computational methods that we have developed previously to infer, characterize and validate gene regulatory networks in complex diseases.
Risk prediction algorithms are used to identify high-risk patients for early intervention to reduce the premature mortality from non-communicable diseases, but we have yet to discover models that can integrate the tens of thousands of data points that can be measured by omics technologies in a single drop of blood.
To distinguish causation from correlation in multi-omics data we will develop non-parametric models for causal inference between tens of thousands of variables.
To link multi-omics causal networks from multiple organs to disease states we will develop novel algorithms for large-scale Bayesian network structure learning.
To create intelligent systems for risk prediction and diagnosis of non-communicable diseases that only use blood-based measurements we will develop efficient methods for inference in large-scale Bayesian networks.
To implement a proof-of-concept application in cardiovascular medicine we will apply newly developed methods on a unique resource of multi-omics data from more than twenty thousand Nordic individuals to which the project will have privileged access.
This project is an international and interdisciplinary collaboration in bioinformatics, systems biology, computer engineering, machine learning and cardiovascular medicine that will deliver a well-validated and scalable platform to create intelligent systems for personalized and precise risk prediction and diagnosis of non-communicable diseases that will identify high risk individuals more accurately than existing methods.