Back to search

IS-DAAD-Forskerutveksl. Norge-Tyskland

Digital Infrastructure for corpus work in research and education - A model approach

Awarded: NOK 70,000

The project will create an infrastructure that uses the Leipzig Corpora Collection (LCC) and the NTNU tool TypeCraft (TC) to create selected datasets that can be used for graduate studies and research. In phase one of the project we focus on German Studies, a field that is threatened in its existence although Norway has an acknowledged need for a multilingual academic elite. In this situation the present project suggests to leverage the potential of data-oriented language studies as an integral part of academic teaching. Using the LCC and the TC infrastructure, we offer to initiate an infrastructure project that makes the combination of quantitative and qualitative work with German and Norwegian data an internal part of German graduate studies, and, in its second phase, expends the approach to lesser-resourced languages. As an immediate outcome we expect that the active use of digital language resources will increase the students' language proficiency and data-management skills, as it will give a modern edge to the education of future teachers, translators, text analysts, and scholars. The work with large corpora, such as the LCC, while still a challenge for many scholars and students in the Humanities, will be one of the basic skills of the next generation of language experts. Finding, groupings, and ranking data in order to isolate significant patterns requires statistical measures when working with large data, and deliberate data-management and active annotation skills when working with smaller datasets. Realising our second objective, we will build in phase 2 of the project on the experience gained in the first phase, and create datasets for lesser-resourced languages. Applying learned techniques for the presentation of data and the visualisation of data patterns, we will also take into consideration techniques known to be suitable for lesser-resourced languages.

Funding scheme:

IS-DAAD-Forskerutveksl. Norge-Tyskland