Back to search

IKTPLUSS-IKT og digital innovasjon

Next Generation Kernel-Based Machine Learning for Big Missing Data Applied to Earth Observation

Alternative title: Neste Generasjon Maskinlæring for Store Manglende Data Anvendt mot Jordobservasjon

Awarded: NOK 8.3 mill.

In today's society, data is gathered at an incredible speed mainly because of massive sensory monitoring and logging of processes, and abundant and inexpensive storage. Machine learning is the state-of-the-art scientific field for revealing patterns in big data for making data-driven decisions, forming the backbone of technology such as face detection in digital cameras, recommender systems (Amazon, Facebook etc.), machine translation, and speech recognition, to name a few, and is extremely important in areas like health and medicine, neuroscience, and satellite based monitoring. However, low quality of data in the form of incomplete recordings, known as missing data, severely limits the power of machine learning algorithms. This frequently leads to inferior decision-making. This project shall develop the next generation machine learning algorithms, with the power to handle missing data. This will leap forward decision making from big data and the field of information and communication technologies. The project has resulted in a range of publications. Specifically, one article that will be mentioned here is the paper "Time series cluster kernel for learning similarities between multivariate time series with missing data" by Mikalsen, Jenssen et al., published in the journal Pattern Recognition in 2018. The method is generic, built on so-called "kernel methods" and applicable for all types of multivariate time series data, and handles missing data in a more efficient manner compared to traditional techniques. In addition, the article "Urban Land Cover Classification With Missing Data Modalities Using Deep Convolutional Neural Networks" by Kampffmeyer, Salberg, Jenssen in the journal IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing in 2018, is mentioned here. This method develops so-called deep learning in order to monitor the earth's surface using satellite images, where some of the spectral bands are missing completely or partially. This type of research has been further extended, for instance in the paper "A Comparison of Deep Learning Architectures for Semantic Mapping of Very High Resolution Images" av Liu, Salberg, Jenssen i International Geoscience and Remote Sensing Symposium (IGARSS) 2019. These works are building on a range of other publications within the project, and have pushed the research frontier forward.

Dette grunnforskningsprosjektet innen IKT har ført til en rekke publikasjoner og utvikling av nyskapende og generisk maskinlæringsmetodikk innen såkalte kjernemetoder (kernel methods) og dype nevrale nettverk for å handtere manglende data (missing data). Virkninger av resultatene har vært å flytte forskningsfronten innen maskinlæring, og å demonstrere nytten av slik maskinlæringsmetodikk innen jordobservasjon for eksempel for kartlegging av jordoverflata i tilfeller der data mangler. Ytterligere effekter har vært å etablere nytt og tett samarbeid mellom maskinlæringsgruppa ved UiT og Norsk regnesentral, i tillegg til utstrakt internasjonalt samarbeid med toppmiljøer f.eks i USA. Prosjektet har finansiert flere gjesteforskere fra Tyskland, Spania, England, og USA, noe som har ført til en stor kompetanseoppbygging. Prosjektet har delfinansiert den nye Northern Lights Deep Learning Workshop nldl.org, en internasjonal workshop som samler store deler av det Norske maskinlæringsmiljøet.

Missing data is a key problem in numerous scientific fields, severely degrading the power of learning and inference algorithms, frequently causing erroneous decision making and loss of accuracy. In the emerging era of big data, gaps in data increase exponentially, and manual handling is impossible, hence creating a big missing data problem. As a key solution, the KERNEO project will develop the next generation machine learning algorithms for big missing data as a game-changer in future knowledge extraction. This will be achieved by a highly novel approach whereby the versatility of kernel methods will be synergistically cross-fertilized by the probabilistic nature of information theoretic learning for big data latent (missing) variable analysis. Earth observation (EO), a field where missing data is extremely common e.g. due to clouds, and where data is big, will serve as the test bed for KERNEO, focusing in particular on tropical forest monitoring using the coming Sentinel satellites. In EO, ad-hoc solutions, like simply discarding missing values, are implemented in the analysis to handle cloud-contaminated images, thereby ignoring valuable information. The KERNEO next generation missing data machine learning tools, providing superior knowledge extraction on challenging missing data scenarios, will be highly innovative in EO monitoring and will moreover translate to scientific fields far beyond EO. KERNEO is high risk because of the profound challenges and interdisciplinary nature of the endeavor, yet feasible due to the high quality of the PI and the team, the extensive mobility, and the unique network of researchers in kernel-based machine learning, statistics, computer science and EO, creating the synergy effects needed in order to reach the ambitious project objectives.

Publications from Cristin

No publications found

No publications found

No publications found

Funding scheme:

IKTPLUSS-IKT og digital innovasjon