Back to search

IKTPLUSS-IKT og digital innovasjon

Safe Reinforcement Learning using Model Predictive Control

Alternative title: Sikker forsterkningslæring ved hjelp av modellprediktiv kontroll

Awarded: NOK 14.8 mill.

Reinforcement Learning (RL) is a form of Artificial Intelligence (AI) that deals with the problem of taking decisions with respect to a given goal. Popular examples of RL successes are robots learning to walk or fly by themselves, or computers beating human masters at the Chess and Go games. The decisions taken by an RL-based AI are most often supported by Artificial Neural Networks (ANNs). ANNs are in-silico structures that mimic the functioning of biological neural systems, including the capability to learn from experiences. RL provides a wide set of methods that can be used to teach ANNs how to take better decisions for the given goal. A recognized deficiency of using ANN-based decision making is the lack of transparency of the decision process, the difficulty to design an ANN suited for learning a given task, and the difficulty to certify that the ANN will take safe decisions. For these reasons, while RL is based on very solid mathematical foundations, its implementation using ANNs is sometimes compared to alchemy, and the industry is often reluctant to use it when liabilities are at stake. The SARLEM project is developing a novel approach to implement RL methods. Within the classical decision-making theory, decisions are taken via formally and explicitly assessing their effects on the future. The benefit is that the decision-making is fairly transparent, easy to design and its safety certifiable. Our research show that classical decision-making tools can be combined to Reinforcement Learning methods, creating a new form of AI that combines the best of both worlds: the capability to learn from experience and the possibility to take transparent and safe decisions. The SARLEM project is bringing this idea to maturity, and tackling the main technical hurdle to its wide-scale deployment, and demonstrating its potential in practice. Example of results we obtained so far are the establishment of strong theoretical basis for the project approach, methods delivering safety certificates for RL, new concepts of stability for uncertain systems, and new methods for the optimal control of smart buildings and autonomous robots in uncertain environments.

AI is increasingly deployed in the industry. It is currently used for applications where decisions are not safety-critical or where human operators can vet the decisions before their deployment. Industries dealing with cyber-physical systems evolving in complex environments could substantially benefit from AI tools that can learn to improve the decisions process using data collected in the past. High-tech companies commonly use Model Predictive Control (MPC) to deal with control and decision problems involving safety requirements, and have started using AI tools for cyber-physical systems at the Research and Development level. Reinforcement Learning (RL), a subfield of AI capable of learning to take optimal decisions for cyber-physical systems, is a very common choice. Unfortunately, deploying RL is problematic whenever safety requirements and liabilities are at stake. Industries want to understand and have safety certificates on the automated decisions driving their products, and this is difficult to obtain for existing RL methods. Hence deploying RL tools in systems involving safety requirements is currently a major difficulty. Some companies involved in Autonomous Driving use ad hoc heuristics to deal with the problem, but a genuine solution is still missing. This project will merge theoretical results from RL with advanced, formal control methods resulting from the field of MPC to create a novel form of AI for cyber-physical systems where the decisions can be explained and certified for safety. Performing the research proposed in this project requires a unique combination of in-depth knowledge both in RL and MPC, which few groups possess. NTNU is currently in a great position to carry this research forward. The project will be integrated within the AMOS center and the Open AI Lab at NTNU, which offer unique expertize in the field of safety for autonomous systems and AI. The companies DNVGL and Kongsberg Maritime will be fully active project partners.

Funding scheme:

IKTPLUSS-IKT og digital innovasjon