Back to search

FRINATEK-Fri prosj.st. mat.,naturv.,tek

Logical and categorical methods in data transformation

Alternative title: Logiske og kategoriteoretiske metoder i datatransformasjon

Awarded: NOK 3.1 mill.

A current and pressing problem in business and public administration is that finding and retrieving relevant information can be difficult and time consuming if that information is stored in separate locations in separate ways and, perhaps, among very large amounts of other data. ''Ontology-based data access'' is a current approach to solving this problem by developing efficient means of connecting multiple, separate, and perhaps very large, databases to a single knowledge representation system, called an ontology. The standard way to theoretically represent databases - called the relational model - has certain deficiencies when faced by this challenge and the ontology-based approach to solving it. For instance, it has problems handling the dynamical aspects arising from the fact that the databases that are to be connected to the ontology need not be fixed once and for all, but may be subject to constant change, updates and replacement. Similar challenges face the representation of information in terms of ontologies. In addition, large ontologies are notoriously hard and costly to maintain, leading to a problem of scalability. It is therefore necessary to combine the hands-on development of the ontology-based approach with new theoretical and foundational research on how to represent data. For this reason, the project consists of two mutually supporting parts, one ''theoretical'' and one ''applied''. The applied part is focused on developing solutions for ontology-based data access. The theoretical part researches foundational and theoretical questions concerning the representation of data suitable for a dynamical setting using modern mathematical tools. The theoretical part supplies the applied part with new theoretical ideas and tools, and receives feedback from the hands-on development and real-life applications of the applied part. The project collaborates closely with similarly oriented projects and groups; in particular the Center for Scalable Access in the Oil and Gas Domain (Sirius) at the University of Oslo and its affiliated researchers and industrial partners. The project has published and is publishing results on: improving the current method for mapping data from databases to ontologies, in particular with regard to handling exception conditions; theoretical solutions for problems arising from missing information in the transformation of data; and formal definition and development of a new alternative representation of databases. The main focus of the project, however, has been the development, in collaboration with researchers from Sirius, of a theory of patterns and pattern management in ontologies. Being able to identify and manipulate recurring patterns greatly simplifies the construction and management of large-scale ontologies. The approach has been successfully tested on a real-world large-scale ontology in the engineering domain, and continues as an active research project in Sirius.

Et stadig økende problem i næringsliv og offentlig administrasjon er å finne og inhente data når disse finnes på ulike steder og lagret på forskjellige måter. Ontologibasert datatilgang er en ny tilnærming til dette problemet der hovedidéen er å koble flere ulike, og ofte store, databaser opp mot et enhetlig kunnskapsrepresentasjonssystem kalt en ''ontologi''. Prosjektet har bidratt til denne tilnærmingen med resultater knyttet til utfordringene med å kunne håntere store og dynamiske datamengder både på database- og ontologisiden, og for koblingene mellom de to sidene. Størst potensiell virkning for brukeraktører i næringslivet har kanskje den teori for gjentagende mønstre i ontologier som prosjektet har bidratt til å utvikle. Denne har blitt møtt med stor interesse blant brukere, og videreutvikles i et forskningsprosjekt ved Universitetet i Oslo.

In this project, we aim to develop completely new and groundbreaking techniques, perspectives, and ideas for the field of database representation, outside the current paradigm---the relational model---and in order to address the deficiencies of that model , especially in the area of data transformation (in a wide sense of the term, including transforming to or from ontologies). Basic theoretical machinery and strategies are in place for this part, and some results have already been obtained, but results us eful for end users (the database community) cannot be guaranteed. That, rather, is what we propose to investigate, hence the ``high risk''. The ``high gain'' would lie in a radical new perspective and resulting completely new techniques for the manipulati on of databases, especially in the area of comparing and transforming data structured under different schemas, and potentially in other areas such as the representation of partial records and missing data (nulls). The project's second part will begin in the other end, with the current research frontier in data transformation, particularly in the new field of ontology-based data access. While the first part will start with developing a new, abstract framework for representing data with an aim to benefit c urrent developments in the field, the second part will start with an analysis of the current challenges and methods of the field, then develop a framework for addressing them. In a manner of speaking, while the first part has techniques and ideas, and wan ts to explore what results can be obtained from them, the second knows what results it wants to obtain, and is looking for techniques to obtain them. The two parts will then continuously interact and feed off each other.

Publications from Cristin

No publications found

No publications found

No publications found

Funding scheme:

FRINATEK-Fri prosj.st. mat.,naturv.,tek