Back to search

SAMRISK-2-Samfunnssikkerhet og risiko

Fakespeak - The language of fake news Fake news detection based on linguistic cues

Alternative title: Fakespeak - språket i falske nyheter. Avsløring av falske nyheter på grunnlag av språklige kjennetegn

Awarded: NOK 12.6 mill.

The lion's share of the research on the detection of fake news (defined as "news" items in which case the author knows that they are false and intends to deceive) is conducted by computer scientists alone. However, corpus linguists have shown that the linguistic features of a text vary according to its purpose. Thus, the language of fake news may be the key to its detection. Against this background this 4-year linguistics-driven project emerged, which involves a core team of linguists and computer scientists based in Norway and UK. The linguists will seek to reveal the grammatical and stylistic features of the language of fake news, referred to as Fakespeak, in Russian, Norwegian and English. To achieve this goal they will first build, and make use of existing, corpora of fake and real news from various online media outlets in all three languages, and then subject the datasets to thorough linguistic analyses. They will apply methods and draw on insights from corpus linguistics, computational linguistics, applied linguistics, including forensic linguistics, as well as pragmatics and rhetoric. Taking the linguists´ findings as their point of departure, along with existing fake news detection systems, the computer scientists will seek to improve these systems by automating the defining features of Fakespeak. The overall aim of the project is to enable fake news detection systems to discover and flag potentially harmful fake news items in a more accurate, efficient and timely manner than offered by current state-of-the-art systems. By automating all and only the features of Fakespeak the project team will enable the systems to detect and flag only deliberate disinformation, excluding, for example, (inadvertent) misinformation, satirical texts, parody, and texts reflecting a certain set of opinions. Thus, the project will take societal safety and security into consideration while at the same time guarding the freedom of speech.

This 5-year project involves a core team of linguists and computer scientists based in Norway and UK. The linguists will seek to reveal the grammatical and stylistic features of the language of fake news, referred to as Fakespeak, in Russian, Norwegian and English. To achieve this goal they will first build, and make use of existing, large corpora of fake and real news from various online media outlets in all three languages, and then subject the datasets to thorough linguistic analyses. They will apply methods and draw on insights from corpus linguistics, computational linguistics, applied linguistics, including forensic linguistics, as well as pragmatics and rhetoric. Taking the linguists´ findings as their point of departure, along with existing fake news detection systems such as those used by Faktisk.no, the computer scientists will seek to improve these systems by automating the defining features of Fakespeak. This will be done by applying and developing neural network models, algorithms, knowledge graphs or sar-graphs, sometimes in combination. The overall aim of the project is to enable fact-checking services to detect and flag potentially harmful fake news items in a more accurate, efficient and timely manner than offered by current state-of-the-art systems. By automating all and only the features of Fakespeak the project team will enable the systems to detect and flag only deliberate disinformation, excluding, for example, (inadvertent) misinformation, satirical texts and texts reflecting a certain set of opinions. Thus, the project will take societal safety and security into consideration while at the same time guarding the freedom of speech.

Publications from Cristin

No publications found

No publications found

Funding scheme:

SAMRISK-2-Samfunnssikkerhet og risiko