Back to search

BIA-Brukerstyrt innovasjonsarena

PReVENT: PRediction + eVENT

Alternative title: PReVENT: PRediction + eVENT

Awarded: NOK 11.2 mill.

Project Manager:

Project Number:

321536

Project Period:

2021 - 2024

Funding received from:

Organisation:

Predicting service degradation will allow companies to prevent the disruption of their digital business. The vision of this project was to develop an automatic and generalised anomaly detection and prediction solution with root cause detection by combining analysis of numerical time series and text-based logs. A novel algorithm for this use case was developed, and while the results for anomaly prediction are still inconclusive, the algorithm exhibits interesting results for anomaly detection. A novel dashboard solution for displaying the algorithm's results focused on root cause analysis for anomalies was developed and user-tested. Based on this, the next steps for implementing the results in a production setting and new avenues of research are identified.

Achieved results: The ultimate result of this project is a novel machine learning framework for joint anomaly detection on numeric metrics and log messages. Correlations among computational nodes are analysed, enhancing the understanding of IT system architecture dependencies. The novel algorithm exhibited a significant improvement over AIMS’ current approach, but did not match the state-of-the-art deep neural network (DNN) approaches in terms of accuracy. However, the novel algorithm compensates with increased interpretability, allowing for more detailed root cause analysis. The Anomaly Detection Dashboard facilitates root cause analysis by offering intuitive visualisation tools including line graphs and heatmaps. This allows for the user to identify and analyse service outages efficiently, navigating through metric trends and log template occurrences, expediting the error resolution process and ensuring system reliability. Dissemination and Utilisation Plans: The natural next step would be to integrate the novel algorithm with AIMS’ current system and introduce it and the new dashboard to selected customers. Based on feedback from these customers, continuous improvements would be made before a full-scale release. In parallel to this, three future research tracks could be continued: ? Further hyperparameter tuning of the novel algorithm to maximise its performance on the data of AIMS’ customers. ? Further experimentation with the dashboard on visualising the decision-making process of the novel algorithm. ? Further experimentation on extending the novel algorithm to anomaly prediction. The work on online changepoint detection with its implementation in Python is available as an open-source package, facilitating broader access and application. Expected Future Results: Continued tuning and benchmarking of the novel algorithm could reveal its competitive edge, especially in terms of explainability. Exploring anomaly prediction further and enhancing the dashboard’s visualisation capabilities remain pivotal in future tasks, promising enhanced root cause analysis and system reliability. The outcomes of this project have relevance in both industry and academia. A novel algorithm for anomaly detection and prediction on heterogeneous data was developed and compared to state-of-the-art DNNs. Visualisation techniques for observability and root cause analysis were developed and user-tested. The next steps for utilising the results in a production setting were identified. The project has also served as a skills development platform for the organisation, increasing competency in multivariate anomaly detection and log processing. A few main takeaways are worth mentioning: 1. The field of anomaly detection is rapidly advancing. 2. The field of real-time anomaly detection on heterogeneous data is severely underdeveloped. 3. The current state-of-the-art is DNN-based models, outperforming classical models but lacking explainability.

Background During the last few years, businesses across geographies and industries have faced an explosion of IT complexity driven by the implementation of new digital businesses and the drive for digitalization. Technology vendors have responded by providing public cloud platforms and the introduction of microservices. This has allowed businesses to implement new solutions faster, and applications are no longer monolith applications deployed to a few physical servers. Now, applications often consist of complex combinations of custom developed applications, packaged vendor applications, microservices and containers with dependencies spanning industry value chains. Hence, any performance degradation in even a single component (out of thousands) can have ripple effects that disrupt a full value chain and massive financial consequences. Project Goal 1. Find a way to combine numerical time series data and text-based log events to deliver intuitive insight into the performance of IT systems and alert about potential problems and its root cause, and 2. Do it in a completely automatic and general way such that the framework can be adapted and used in any kind of business, with any kind of time series/logs, and with no additional tuning. Strategic Importance PReVENT will position AIMS as a complete and generic insight and anomaly detection platform for the current addressable market (IT monitoring), but will also open the possibility to capitalize on the generic platform capabilities in adjacent, new, and emerging markets. Todays Substitutes / Alternatives Todays alternatives only target specific data types and technologies with low automatization, and do not combine time-series data and events for anomaly detection FoU Challenges - multivariate anomaly detection algorithms using all available time series data - unsupervised analysis tools for event log text - Combine time series data and event analysis - develop visual dashboard tools for customers

Publications from Cristin

No publications found

No publications found

No publications found

Funding scheme:

BIA-Brukerstyrt innovasjonsarena