Tilbake til søkeresultatene

KUNSTI-Kunnskapsutviklling for norsk språkteknologi

Acquisition of nouns for language processing

Tildelt: kr 0,15 mill.

This project addresses the update problem of lexica by proposing a way to automatically provide lexical properties for words in a given corpus. The project singles out nouns and their properties, since they appear to be the most active word class with res pect to shifting properties and innovations, and because, when an application involving a computational grammar and its lexicon addresses a new domain, nouns are the items that in largest number will be unknown to the lexicon. The basis for the acquisiti on process is the development of the joint probability distribution of a word's properties P, its structures S, and its strings Q, written as p(P,Q,S). When observing a word in a particular string, the probability of that word having a certain property is obtained from the joint distribution by summing over the possible structures. In order to achieve this, the distribution is initially described by exploiting the lexical description in the grammar which provides the conditional probability assessments be tween P and S, and then between S and Q, where the information flows from the lexicon to the strings. That is, the questions that are answered using the lexical resources are: what strings will realise a given structure? and what structures will realise a word with a certain property? The answers provide the components of the joint distribution which takes the form p(P,Q,S)=p(Q|S)p(S|P). The joint distribution can then be used to turn the flow around going from observations of words in strings to properti es of those words in the lexicon. While the lexical knowledge provides a flow from lexicon to strings, as used in the design of the distribution, the joint distribution is utilized for the opposite flow, going from observations in strings to properties of the lexicon. The project first develops the algorithm in connection with Norwegian, testing for four properties of nouns: gender, animacy, and status as relational and measure.

Budsjettformål:

KUNSTI-Kunnskapsutviklling for norsk språkteknologi

Temaer og emner

Ingen temaer knyttet til prosjektet