Tilbake til søkeresultatene

IKTPLUSS-IKT og digital innovasjon

Function-driven Data Learning in High Dimension

Alternativ tittel: Function-driven Data Learning in High Dimension

Tildelt: kr 7,2 mill.

I dag spiller høy-dimensjonale data en stadig viktigere rolle i de fleste databaserte analyser av virkelige problemer. Denne utviklingen skyldes bland annet flere teknologiske fremskritt som for eksempel forbedrede fysiske målinger og datalagringskapasitet, men også komplekse datasimuleringer av multifysikk. Analyser av slike data kan avsløre komplekse samspill mellom i naturen, eller i menneskeskapte miljøer som Instagram eller et Google-søk. Samtidig sliter forskere med å trekke ut nyttig informasjon og utvikle gode prediktive modeller. Datadrevet modellering er et voksende og utfordrende fagfelt i anvendt matematikk med et enormt potensiale, spesielt kombinert med andre grener av vitenskap som informatikk, ingeniørfag, eller biomedisinsk databehandling. Motivert av den økte etterspørselen av robuste prediktive metoder utviklet vi i dette prosjektet avanserte matematiske metoder for robust automatisk læring av funksjoner og datastrukturer i høy dimensjon fra minimalt antall observerte prøver. Tilnærmingen vi foreslår er å få lavere-dimensjonale representasjoner av data og geometri, for å approksimere løsningen med betydelig redusert kompleksitet gitt visse rasjonelle forutsetninger.

We designed novel and generic model-based approaches for learning functions in high dimensions from the minimal number of observed samples. We consider different classes of models, ranging from simple generalized linear models to complex neural networks. The resulting approaches are 1. statistically efficient 2. computationally efficient 3. theoretically sound 4. numerically viable Our results are presented in 1 PhD thesis, 12 research articles, 4 refereed conference proceedings, and 3 book chapters, and in numerous (more than 30) scientific presentations. The FunDaHd results have the potential to impact several scientific and technological disciplines. Our results provide a solid foundation and tools for obtaining a better mathematical understanding of neural networks. Beyond fundamental mathematical investigations, our goal is to apply our estimators on various types of big data applications such as cardiac modeling and analysis of brain activities.

Technological advances, such as in physical measurements, computer simulations, and storage capabilities led to vast amounts of often highly complex data sets, and this continues to grow rapidly. Despite substantial R&D to develop tools for complex data analysis, in many cases our understanding on how to extract useful information and predictive models still remains rather limited. Motivated by the increased demand of robust predictive methods, in this project we develop analysis techniques and numerical methods to explore new applications in tractable and robust automatic learning of functions and data structures in high dimension from the minimal number of observed samples. The approach we propose is to obtain lower-dimensional representations of the data and their geometry, to perform the approximation with significantly reduced complexity, assuming the data clustered around manifolds. The key innovative assumption for us is that the underlying manifold, not only possesses lower dimension, but its tangent spaces are also spanned by relatively sparse principal directions. Additionally we consider the learning of the manifold as guided by the function acting on the data. Hence, our approach differs from established methods on manifold learning and establishes a novel connection between manifold and function learning in high dimension, leading to development of more robust algorithms under less restrictive assumptions. Eventually, as the most ambitious part of the project, we will address the learning from high-dimensional multi-manifold data. To demonstrate the performance and robustness of the constructed algorithms, a variety of experiments and numerical tests will be performed throughout the project with real and synthetic data. We will also address several problems in computational biomedicine, such as in cardiac modeling, and in bioinformatics, such as gene expressions. This project will also contribute to strengthen the profile of Simula in big data analysis.

Budsjettformål:

IKTPLUSS-IKT og digital innovasjon