Back to search

IKTPLUSS-IKT og digital innovasjon

Reduced subspace in big data treatment: A new paradigm for efficient geophysical Data Assimilation

Alternative title: REDDA: Redusert-rom behandling av store data: et nytt paradigme for effektiv geofysisk data assimilasjon

Awarded: NOK 9.3 mill.

Project Number:

250711

Application Type:

Project Period:

2016 - 2020

Location:

Partner countries:

The term Big Data is nowadays used to refer to the analysis and synthesis of high-volume data. The ability to make predictions based on huge datasets and efficiently extract the relevant information in short time is paramount in many topics: Environment, medicine or security among others. Climate science is a major Big Data problem: every day hundreds of millions of observations of the Earth are collected, either from satellites or in situ, and are then used to calibrate complex numerical models to predict the evolution of the climate system. This merging process is called data assimilation and relies heavily on supercomputers. The assimilation process must be done in the best possible way so as to describe and predict realistically the climate phenomena, but also very quickly. Unfortunately, the present trend of increasing computational power alone will not suffice since the problem complexity grows commensurately with both the data volume and model size. Smart solutions for efficient Big Data treatment are thus required and REDDA wants to contribute to their development. Its main idea derives from concepts in Chaos theory such as the recurrent behaviours of certain natural systems, and the existence of a subset of observations carrying the largest informational content. Is it possible, despite the large size of the problem, to achieve a satisfactory description of the climate system at lower cost, by monitoring and using only this reduced, optimally chosen, subset of data? REDDA wants to answer this question using an interdisciplinary approach across mathematics and geosciences. A theoretical research line devoted to the development of new methods will go in parallel with a practical application to sea ice forecasting. A third line of research investigated the synergies between data assimilation and machine learning methods. REDDA will study how to best combine all sea ice data available from buoys and satellite with the latest generation of sea ice computer models, which describe sea ice with a realism never reached before.

REDDA has improved the understanding of ensemble-based data assimilation methods, based on an analysis of the effects of dimensional reduction, numerical simulations and bias in the uncertainty quantification of the forecast. This will allow users of data assimilation to make better choices of the ensemble size. REDDA has as well introduced a new method for assimilating observations in a model using Lagrangian coordinates and implemented an Ensemble Kalman Filter in the neXt generation Sea Ice Model (neXtSIM), which can be further used both for climate reconstructions and for operational forecasts at NERSC. Finally, REDDA has introduced a synergetic method using both data assimilation and machine learning to reconstruct a data-driven model and perform skillful predictions with it.

The analysis and synthesis of high-volume data is, at this present time, paramount in many real problems. The issue of estimating the state of a system based on a large, but limited, set of information is ubiquitous in science, and the methods designed for this scope in the geosciences is referred to as data assimilation (DA). Geophysical DA is an exemplar of a Big Data problem: models have O(109) variables and the observational datasets are as large as O(108). Computationally efficient state estimation and uncertainty quantification must be carried out using massive datasets and huge dynamical models. Increasing computational power alone will not suffice to solve the issue since the problem?s complexity will grow commensurately with both the data volume and model size, making timely and continual development of advanced DA procedures necessary. REDDA will study novel Big Data methods capable of efficiently treating a huge amount of data while extracting as much information as possible. REDDA is an interdisciplinary project between geoscientists and mathematicians with two research lines having their origin in climate science, but that will be investigated with a mathematical perspective: (1) Reduced order fully Bayesian DA methods for nonlinear systems, (2) DA methods for Lagrangian sea-ice models. The fundamental driving idea is the existence of a subspace of the system dynamics, and a subset of the observations, in which is embedded the largest informational content for the signal to be retrieved. Despite the intractable large size of the full problem, by monitoring and exploiting this subspace one can hope to achieve a satisfactory track of the unknown signal while reducing the computational load. In the context of sea-ice modeling, there is an urgent need for blending all available data with state-of-the-art Lagrangian models. REDDA will study novel methods of fully Lagrangian DA, and apply them to the new Lagrangian sea-ice model developed at NERSC.

Publications from Cristin

No publications found

No publications found

No publications found

Funding scheme:

IKTPLUSS-IKT og digital innovasjon