Back to search

IKTPLUSS-IKT og digital innovasjon

Adaptive Immunity for Software: Making Systems and Services Autonomously Self-Healing

Alternative title: Adaptiv Immunitet for Programvare: Teknologi til å lage autonomt selvhelbredende systemer og tjenester

Awarded: NOK 16.8 mill.

Software has become a central part of nearly all economic activity, and our daily lives have become increasingly dependent on complex software-intensive systems and services. As such, failures in these systems can affect thousands or even millions of people and lead to massive damages. Despite significant investments in software testing, much of our software is still plagued by failures. One reason is that the existing techniques for software testing are mainly aimed at checking that the conditions corresponding to known or anticipated problems do not occur. However, the complexity of modern software makes it impossible to anticipate all problems that could be encountered. The main goal of the cureIT project is to significantly increase the dependability, robustness, and resilience of today's software systems by addressing the faults that remain after thorough testing. We do this by developing new methods and techniques that help software engineers with the creation of so-called self-healing software systems. These are systems that can autonomously detect the occurrence of unanticipated faults during execution, diagnose their causes, and recover from these situations. To achieve this goal, we build on the notion of an artificial immune system. Similar to the human immune system, it will recognize and take care of unanticipated "foreign bodies" (resp. faults/infections) that could have negative effects. In particular, the project will address the following challenges: (1) Techniques that can detect failures by learning what is the system's normal behavior and recognize when a system behaves abnormally. (2) Adaptive learning techniques that enable early recognition of failures that are similar to the ones that have been seen before. (3) Cost-effective techniques to diagnose the root causes of a failure, and for containing its impact, both inside and outside the system. (4) Techniques for the systematic evaluation of the correct functioning of self-healing software. The results thus far include the development of a research agenda for self-healing software systems based on artificial immune systems (AISs), a survey of the main approaches to model AISs together with a prototype implementation for anomaly detection using an AIS, a self-healing smart office exemplar, and a method to systematically evaluate self-healing software systems using chaos engineering.

Software has become a central part of nearly all sectors of economic activity, and our daily lives have become increasingly dependent on complex software-intensive systems and services. Failures in these systems can affect thousands or even millions of people and lead to massive damages. Despite significant investments in software verification and validation (V&V), the software industry is still plagued by failures. One reason is that conventional V&V can only target anticipated faults: it can only check that the conditions corresponding to known or expected problems do not occur. However, the complexity of modern software makes it impossible to anticipate all problems that could be encountered. The overall goal of this project is to devise novel methods and techniques to create self-healing software-intensive systems, i.e. systems that support autonomous detection, diagnosis, and containment of unanticipated faults during execution, thereby significantly increasing their dependability, robustness, and resilience. We reach this goal by building on the concept of an artificial immune system to achieve three scientific break-throughs: (1) Autonomic techniques that can detect unanticipated faults by distinguishing between normal behavior and anomalies in runtime observational data. (2) Adaptive learning techniques that make it easier to recognize faults that are similar to the ones that have seen before. (3) Cost-effective techniques to diagnose the root causes of a fault and to contain its impact, both inside and outside the system. Timeliness: Recent advances in machine learning together with the PI's new results on automatically learning patterns in high volume data and generalizing them using rule aggregation [23 in project description] make that now is the best time to start this research. These failures need to be addressed, and the global state-of-the-art was not at the required level to start this ambitious research undertaking until just recently.

Funding scheme:

IKTPLUSS-IKT og digital innovasjon