MAELSTROM er et europeisk prosjekt for utnyttelse av maskinlæring (ML) på store data innen værvarsling og klima. Varsler forbedres gjennom optimalisering av tungregningsarkitektur og utvikling av software for å håndtere de store datamengdene.
MAELSTROM kombinerer ekspertise innen ML, tungregning og værvarsling fra en rekke institusjoner i Europa: Meteorologisk Institutt, European Centre for Medium-Range Weather Forecasts, Jülich Supercomputing Center, 4cast GmBh & Co KG, E4 Computer Engineering SpA, ETH Zürich og University of Luxembourg. Dette tverrfaglige teamet representerer værvarslingssentre, tungregning og forskningsmiljøer innen ML.
MAELSTROM leverer en rekke benchmark-datasett (på omtrent 10 terabyte) for en rekke anvendelser innen vær og klima. Disse inkluderer prosessering av observasjoner, assimilering av observasjoner inn i værvarslingsmodeller, korreksjon av værvarslingsprognoser og andre skreddersydde varselprodukter. Datasettene gir forskningsmiljøer innen ML muligheten til å utvikle, teste og validere nye metoder på relevante og realistiske datasett. Tilbydere innen tungregning vil kunne bruke datasettene for benchmarking av ytelsen til ny hardware og endringer i system design.
I prosjektet har Meteorologisk institutt utviklet og trent en maskinlæringsmodell som predikerer temperatur 58 timer frem i tid over Norden. Modellen er basert på en U-Net arkitektur med over 1 million parametre og er trent på 6 TB med treningsdata. Treffsikkerheten ble til slutt så bra at metoden ble tatt i bruk på Yr (https://www.yr.no) i februar 2024 og brukes nå av ca 6 millioner unike ukentlige brukere. Dataene er også fritt tilgjengelig og brukes av mange aktører innen energi, transport, og landbruk.
In the MAELSTROM project, we have developed a machine learning model for accurately predicting air temperature 58 hours into the future over the Nordics.This has been integrated into MET Norway's operational forecasting system and disseminated on our weather app Yr. On a weekly basis, over 6 million Yr visitors benefit from improved weather forecasts provided by MAELSTROM. The data is also provided through our open data service, which serves data to a wide range of downstream user groups within energy, transportation, and agriculture. The project therefore has an immediate wide-spread impact on society and the economy.
The field of weather forecasting has dramatically changed over the last two years with the introduction of data-driven weather models. These have recently become highly competitive with conventional physics-based weather models. MAELSTROM has built important competency in machine learning and high-performance computing at MET Norway that has allowed us to quickly take part in this rapid development. MAELSTROM has therefore put us in a position where we can be effecting in delivering improve forecast products to society,
To develop Europe’s computer architecture of the future, MAELSTROM will co-design bespoke compute system
designs for optimal application performance and energy efficiency, a software framework to optimise usability and
training efficiency for machine learning at scale, and large-scale machine learning applications for the domain of
weather and climate science.
The MAELSTROM compute system designs will benchmark the applications across a range of computing systems
regarding energy consumption, time-to-solution, numerical precision and solution accuracy. Customised compute
systems will be designed that are optimised for application needs to strengthen Europe’s high-performance computing
portfolio and to pull recent hardware developments, driven by general machine learning applications, toward needs of
weather and climate applications.
The MAELSTROM software framework will enable scientists to apply and compare machine learning tools and
libraries efficiently across a wide range of computer systems. A user interface will link application developers with
compute system designers, and automated benchmarking and error detection of machine learning solutions will be
performed during the development phase. Tools will be published as open source.
The MAELSTROM machine learning applications will cover all important components of the workflow of weather
and climate predictions including the processing of observations, the assimilation of observations to generate
initial and reference conditions, model simulations, as well as post-processing of model data and the development
of forecast products. For each application, benchmark datasets with up to 10 terabytes of data will be published
online for training and machine learning tool-developments at the scale of the fastest supercomputers in the world.
MAELSTROM machine learning solutions will serve as blueprint for a wide range of machine learning applications
on supercomputers in the future.