Tilbake til søkeresultatene

IKTPLUSS-IKT og digital innovasjon

WeSearch: Language Technology for the Web

Tildelt: kr 10,6 mill.

-

The project sets out to enable next-generation Web services in the realm of social networking, as characterized by user-centric information sharing and on-line collaboration. Here, a key element is so-called user-generated content (UGC), which already to date accounts for a large proportion of Internet traffic. The vast majority of UGC is cast in human language. The project develops so-called semantic parsing technology, an automated process to allow IT systems to 'make sense' of human language. While semantic parsing systems exist for at least a few languages, current technology does not scale to the size of the Web, nor is it capable of coping with the linguistic complexity and diversity of typical types of UGC. Large-scale semantic parsing technol ogy is prohibitively expensive to build for a single player. Therefore, a long-term perspective, collaborative development, focus on task-, genre-, and domain-adaptable approaches, and the reuse of knowledge and resources are prerequisites to broader use of parsing in next-generation ICT solutions. Parsing technology has matured to a point where its large-scale application to Web content is now within reach. However, there are important scientific and technological challenges that need to be addressed to actually reach this goal. These are scalability (i.e. primarily parser efficiency), robustness (to out-of-scope or ill-formed inputs), and precision (of output representations). Finally, it is necessary to define adequate, task- and application-indep endent output representations for semantic parsing (abstractly, a linguistic API), and such standardization for use in applications needs to be approached in close cooperation with key international players. Project results will be showcased through a no vel Web service, a search interface based on semantic relations between concepts, which is applied to a large selection of diverse Web content from the domain of information technology.

Budsjettformål:

IKTPLUSS-IKT og digital innovasjon