Back to search

IKTPLUSS-IKT og digital innovasjon

Advanced Analytics on Smart Data

Alternative title: Avansert analyse av smarte data

Awarded: NOK 15.5 mill.

In STEM (Science, Technology, Engineering and Mathematics), the availability of data and computational resources - from manual calculations to today's supercomputers - have always been restricting factors in developing and investigating new theories. In the last decades, increases in processing power and decreased costs of data storage has created many new opportunities - it has, for example, made recent advances in Big Data Analytics and Artificial Intelligence (AI) possible. But this has also created new challenges, from the logistics of managing huge dataset and the classical "needle in the haystack" problem, to creating workflows and value chains for using the data and analytics in practice. In ANALYST we are focussing on geometric data - data which has a spatial structure. The raw data is point cloud-based, either from sonar scans of the sea bottom in the use case with the Norwegian Hydrographic Service (NHS), or from volumetric CT or MR scans in the use case with the Intervention Centre at Oslo University Hospital (IVS). By using state-of-the-art geometric techniques such as Locally Refined (LR) B-splines, we create geometric models of the data, showing the underlying structures and information of the point clouds, whether they are the surface of a heart or a range of underwater mountains. The model is then used for further analytics, extracting relevant features of the shape. In collaboration with the IVS, we are investigating how machine learning can be combined with geometry in image segmentation: how can we identify a specific organ or physical feature from the greyscale images produced by CT and MR scans. This is traditionally a task that takes a lot of manual labour, so the use of machine learning to simplify the process is an active research field. What distinguishes the work done in ANALYST is that we are integrating the geometrical aspect of the problem throughout the machine learning architecture, not only using geometric objects or images as input and output objects but adapting and optimizing the performance of the neural network itself. A central part of this work is to use smooth geometry defined as a small discrete set of coefficients - instead of a set of discrete points - to describe geometric objects such as boundary curves and surfaces. This can be both more accurate, and more efficient, but it requires adaption of existing AI frameworks which are largely point or image based. One of the main responsibilities of the NHS is the creation of naval maps along the Norwegian coast. They are collecting huge datasets, using for example sonar towed behind boats, or airborne lidar in shallow waters. We have investigated the use of LR B-splines for these types of geospatial data for over a decade, however it is not yet a standard surface format in geographic information systems (GIS). One of the activities in ANALYST has been to demonstrate how a LR B-spline model can easily be exported to arbitrary resolution GeoTIFF, to have interoperability with current GIS workflows. As large parts of our own geometry software libraries predate the invention of LR B-splines, we have also developed a loss-less conversion to traditional tensor product B-spline surfaces. This makes high quality, well tested algorithms such as contouring and extremal point calculations available for the models. One clear advantage of our approach - doing analytics directly in the surface description of the terrain - is that all features extracted lie nicely in the surface, which can be a challenge when, for example, depth curves are created separately from the 3D surface they are visualised with. In addition to these use cases, ANALYST also aims to improve both the theory behind and the practical use of LR B-splines. The main advantage of LR B-splines over tensor product structures, is the ability to adjust the level of detail of the model to the variation in the underlying data: you need a less complex surface to describe plains than a mountain range. In order to obtain this local refinement, we develop refinement strategies, iteratively adding degrees of freedom where the data require them. Our preliminary investigations have shown that not all such strategies are created equal. Some provide a good balance of model quality and computational efficiency in some cases but not others, while others show stability over several different datasets. The degree of the LR B-spline model is an important factor in flexibility and smoothness. However, when the underlying data is not smooth, such as for the rocky parts of the sea bottom, this flexibility is not needed. Our investigations in ANALYST have shown that degree 2 models are more suitable for geospatial data than either degree 1 or degree 3.

Over the last ten years, a new field of research has emerged: Big Data Analytics (BDA). Massively distributed systems offer new tools for exploring large datasets. In parallel, a steady increase in computing power and available training data has enabled the field of Artificial Intelligence (AI) to gain critical mass. The challenge of managing, investigating, and visualizing big datasets is not new in the fields of Science, Technology, Engineering and Mathematics (STEM). Recent developments in BDA offer a new set of tools for overcoming these challenges, however, there are significant challenges that arise from the structural differences between most STEM data and the unstructured textual data typical in classical Big Data applications. In order to address these challenges, we propose using Locally Refined (LR-) spline data modelling to turn Big Data into Smart Data. In early implementations of LR-spline algorithms in 2D and 3D, we have seen their potential as compact interactive models for visual and quantitative analytics on big datasets, well suited for hardware-accelerated interrogation and visualization. By spatial tiling and stitching, we have an extremely versatile and parallelizable approach, well suited for Big Data infrastructures. We can therefore include time and other relevant variables in a compact, interactive, multi-scale, higher order, locally refined model. However, substantial theoretical developments are needed before this vision of LR-spline modelling of Big Data can be realized. ANALYST will provide the research platform to bring the theoretical foundation of LR-splines up to a level where their full potential can be explored, combining BDA and AI to provide advanced analytics on the LR-spline model. While the focus will be on data from applications in the STEM fields, the resulting algorithms have a wider applicability, providing highly scalable complex modelling tools for Big Data Analytics.

Publications from Cristin

No publications found

No publications found

Activity:

IKTPLUSS-IKT og digital innovasjon