Back to search

FRINATEK-Fri prosj.st. mat.,naturv.,tek

Exploiting Abstract Data-Access Patterns for Better Data Locality in Parallel Processing

Alternative title: Utnytte abstrakte datatilgangsmønstre til å forbedre datalokalitet i parallell prosessering

Awarded: NOK 8.0 mill.

The digitalized world is increasing its distributed services, and its software is moving towards flexible distributed and parallel processing environments. Poorly designed concurrent applications can reduce their performance due to delays in access to remote services, slow communication, and hot spots with many processes with different dependencies. At its worst, this means that a system can collapse when its demand increases significantly, e.g., a tax system collapses the day before returns are due, or an online shop crashes a week before Christmas. The ADAPT project aims to contribute to the solution of these problems by formalizing the concept of access patterns that describe how processes access resources, e.g., computation, memory, networking, and data. The project is studying how to represent these patterns for a particular concurrent distributed application before execution, and how to use them later to monitor and control process distribution, resource allocation, and data movement of the application while it is running. This will allow better service management and could prevent performance degradation that could cause operative failures. ADAPT's research agenda is based on two hypotheses. First, application-specific management can improve the performance of highly scalable applications that access large amounts of resources. Second, abstractions, formal foundations, and executable modeling together enable model-based analysis techniques that can predict runtime decisions to improve the management of a particular application. Therefore, the ADAPT project has developed a formal model that captures abstractly the interaction of workflows with dynamically created tasks and deployment locations in parallel and distributed environments. Concretely, a distributed model of Kubernetes deployment. These abstract models form the basis to combine model-based simulations with formal analysis, using access patterns, to better control the management for a specific application. The approach has been validated with a proof-of-concept methodology applied to a well-known cloud-native demo, the online boutique, used in industry.

Cloud computing has radically changed the way organizations operate their Software by allowing them to achieve high availability of services at affordable cost. Containerized microservices are an enabling technology for this change, and advanced container orchestration platforms such as Kubernetes are used for service management. Despite the flourishing ecosystem of monitoring tools for such orchestration platforms, service management is still mainly a manual effort. The modeling of cloud computing systems is an essential step towards automatic management, but the modeling of cloud systems of such complexity remains challenging and yet, unaddressed. In fact, such modeling approaches might be key to predicting future outcomes. Results in this project consider how to derive models for cloud systems empirically. We do so based on models of deployed services in a formal modeling language; once the adherence to the real system is good enough, formal properties can be explored in the model. In the project, we have used data collected empirically from small scenarios to simulate the execution of higher-intensity scenarios, to predict resource consumption. A similar approach can be further used for other sorts of predictions.

The software that we are dependent upon for daily life, business and administration is moving to scalable architectures, where tasks, combined in parallel workflows, process multiple data sources at the same time. However, scalability and parallelism can reduce software performance when tasks interact with data, for example when tasks access remote data or tasks modify shared data. Such issues are challenging to find and mitigate in a systematic or automatic way. This motivate the aim of the ADAPT project, which is to improve data processing, by systematically extracting data-access patterns from applications and by designing data management strategies that uses such patterns. The ADAPT project is formalizing the notion of data-access patterns that describes abstractly how computation interacts with memory. ADAPT combines formal models of parallel systems with basic research in programming language theory, with the aim of developing novel techniques to improve data locality and demonstrating their applicability by means of experimental proof of concept. This approach may lead to completely new ways of efficiently managing data processing on parallel architectures, both with respect to how data is distributed over different memory locations and how data will move between locations during data processing, thereby allowing better data management. ADAPT plans to validate its outcomes on a case study based on state-of-the-art database systems.

Funding scheme:

FRINATEK-Fri prosj.st. mat.,naturv.,tek