Back to search

ISPNATTEK-ISP - naturvit. og teknologi

ExiBiDa: Exploring new dimensions in Big Data

Alternative title: ExiBiDa: Utforsking av nye dimensjoner i Big Data

Awarded: NOK 7.2 mill.

A large fraction of Big Data is textual, and has spatial and temporal dimensions. A prime example is textual social media data, which can have associated the location of the user when he/she wrote the message and the timestamp of when it was posted. In ExiBiDa, we have focused on exploratory analysis of data containing such spatiotemporal-textual contents. The aim of the project has been to develop techniques that make it possible to perform the analysis on data of extreme scale, and demonstrate its feasibility on real data. The research in the project has been performed on three sub-topics: 1) Explorative analysis to find patterns in textual data, such as interesting phrases (given a defined metric) in the results of a web search. 2) Use techniques from statistics and information retrieval to predict missing information, or information that is not explicitly represented in data. In our case, we have tried to predict the location and time of texts from social media so that texts without this explicitly represented information can have these be added, thus contributing to higher quality in the exploratory analysis. 3) Exploratory spatio-temporal-textual queries. Here, the analysis is typically represented as a query (e.g., given time and location), and one is interested in finding interesting information given these parameters. In this project, among other things, we have looked at trend detection given time/location, an example of what one can find there is that "Happy New Year" "trends" for Times Square on New Year's Eve.

Prosjektet har vore viktig for å utvide gruppa sin kompetanse innan analyse av spatio-temporal-tekstlege data. Dette gjeld spesielt for doktorgradsstudent og postdoktor, som har fått anledning til å fokusere på dette forskingstemaet over lenger tid, og produsere forsking på temaet. Prosjektet har også bidratt til at vi har kunne utarbeide nye prosjektsøknadar med høg relevans og kvalitet, og bidratt til deltaking i internasjonale forskarnettverk. Relevans av forskinga for reelle problemstillingar i bedrifter har også bidratt til finansiering av nye doktorgradsstudentar på relaterte tema, med fokus på morgondagens problemstillingar som for tida er mykje fokusert rundt analyse av datastraumar og datadreven maskinlæring. Meir avanserte og skalerbare teknikkar for dataanalyse kan også vere nyttig for offentlege institusjonar for å få verdi ut av data.

This proposal is for a research project within the FRINATEK program of the Norwegian Research Council, investigating issues in efficient execution of exploratory spatiotemporal-textual (STT) queries on Big Data. A large fraction of Big Data is textual, and has spatial and temporal dimensions, and in this project we will focus on exploratory analysis of data containing such attributes. The project will focus on four main challenges: 1) novel indexing and algorithms for STT queries, 2) fast, approximate, indicative answers to exploratory STT queries, 3) exploratory STT queries with budgetary constraints, and 4) methods for parallel/distributed execution of the exploratory STT queries. The feasibility of the developed techniques will be demonstrated through a prototype implementation.

Funding scheme:

ISPNATTEK-ISP - naturvit. og teknologi

Funding Sources