Back to search

IKTPLUSS-IKT og digital innovasjon

Efficient and Robust Architecture for the Big Data Cloud - ERAC

Awarded: NOK 12.0 mill.

Initially, the main activity of the ERAC project was centered round designing and putting into operation a private cloud based on Linux, OpenStack, KVM, OFED and InfiniBand. Together, these components constitute a flexible open source high-performance cloud environment facilitating the exploration of new ideas and prototyping of new methods and mechanisms resulting from our research. The ERAC project comprises three main research activities. The first activity focuses on challenges related to efficient and scalable live migration of virtual machines (VMs) in a cloud built on top of a high-performance interconnection network, like InfiniBand (IB). In such a network, a fabric manager (FM) is handling administrative tasks on behalf of the network components to ensure stable and efficient network operation. In large installations, however, a central FM might become a bottleneck. This is particularly so in a dynamic cloud environment comprising thousands of VMs. If the FM fails to serve the (virtual) machines in a timely manner, the performance of the cloud drops. To ease the burden of the FM, and make the associated cloud architecture scalable, we have designed a novel caching scheme and prototyped it in our private cloud. The caching scheme allows clients in the network to retrieve information cached locally, instead of always sending requests to the FM. Furthermore, to improve flexibility and facilitate transparent virtualization and live migration in IB based clouds, novel vSwitch based architectures for IB network cards have been proposed, accompanied with scalable and efficient dynamic reconfiguration methods. The second activity is on optimizing the routing algorithms used by the underlying network supporting a cloud infrastructure. By improving the routing, the network efficiency increases and the overall system utilization is improved. In particular, widely used routing algorithms for high-performance network infrastructures, like IB, typically do not consider the actual roles of the nodes in the network, and the corresponding traffic patterns, when distributing paths onto available links inside the fabric - nor do the routing algorithms support traffic isolation internally in the network fabric. The result in a multi-tenant cloud environment might be poor load-balancing and cross-tenant traffic interference, respectively. To tackle these two challenges, we have designed and prototyped two novel routing algorithms: wFatTree and pFTree. Our experiments with wFatTree show an improvement of up to 60% in total network throughput for large installations, while the pFTree algorithm significantly reduces cross-tenant traffic interference. Moreover, to improve reconfiguration efficiency in large networks as new routes are to be calculated (e.g. caused by component failures/additions/removals), new routing reconfiguration schemes have been designed, realized by the novel SlimUpdate algorithm and metabase-aided re-routing. The third activity is on security, privacy and trust in the context of cloud services. Accountability is a predefined process, aiming to demonstrate practices and activities concerning the services and implicit security relating to the responsibilities inherent to each agent and component in the complex cloud ecosystems. Thus, accountability must be attended by an evidence framework; it must assure demonstration of data movements and storage relocations, support for enforcement of security and privacy policies, and provide for integrity and authenticity verification protocols and tools. Therefore, we use data analytic approaches anchored in accountability. In ERAC, we create a data analytic engine, which analyzes time-series data stored by OpenTSDB. This tool can store and analyze logging information of e.g. NetFlow data for different data centers. Using data analytic approach on the log at different locations, the user can visualize and track the movement of their data. This type of analysis provides transparency to the user and the cloud provider, thus enabling users and cloud providers hassle-free data movement between the clouds. Furthermore, we have predicted different types of failures in data centers with 91% accuracy, provided efficient data processing and storage, avoided data loss, and detected several types of abnormality behavior in the cloud infrastructure. We proposed a Markov random field (MRF) approach to build a classifier for machine-generated logs, produced in large amount and at high speed. Categories are defined by samples (selected lines) of interest, without patterns or knowledge of the logs' output. The MRF method is the foundation of a model built from the training set that classifies similar and related lines in a posterior log analysis. The statistical model allows to extend this classifier to an unsupervised learning approach, which can optimize the training set by removing ambiguous (low-classified) samples, and extend it with newer higher-classified lines.

ERAC leverages on the documented excellence of the research group on interconnection networks and cluster computing at Simula. In close collaboration, Oracle, Lyse, Simula, the University of Oslo, and the University of Stavanger will build world-class re search-based competence in core areas of cloud computing that will form the backbone for services and applications in the future Internet. The selected core areas are carefully aligned with the needs and the strategy of the contributing industrial partner s and the following scientific priority areas in VERDIKT: (i) communication technology and infrastructure, (ii) security, privacy, protection and vulnerability. The consortium will focus on selected problem areas where current technology is not well suit ed to meet the promise of cloud computing: rapid elasticity, massive scale, resilient computing, virtualization, ease-of-use and energy efficiency. These are areas of vital importance for realizing the potential of cloud computing where there are signific ant architectural and technological challenges that must be resolved. The project will strengthen the national knowledge base in several ways. First, it will create a community of cutting-edge expertise on cloud computing with members from both academia and industry. Second, it will develop several cloud computing experts through the education of doctoral candidates and through the expansion of knowledge for established researchers. Third, it will contribute with one or more educational courses to the so on to be established educational program on cloud computing at Ifi/UiO. In the long term this promotes Norwegian leadership in the selected areas of the segment.

Publications from Cristin

No publications found

No publications found

Funding scheme:

IKTPLUSS-IKT og digital innovasjon