Back to search

BIA-Brukerstyrt innovasjonsarena

ANGAS: Audibility for all by NGA utilizing sensor fusion

Alternative title: ANGAS: Hørbarhet for alle med NGA og sensorfusjon

Awarded: NOK 13.4 mill.

Next Generation Audio (NGA) is a new approach to delivering audio content -- including radio and television broadcasts, podcasts, and online media content -- that are more accessible, interactive, personalizable and immersive for listeners. It is spearheaded by the European Broadcasting Union (EBU) and cooperating partners including the BBC and IRT. The challenge is that creating NGA content currently requires advanced tools, equipment and expertise. Through this research project, we will build a solution that greatly simplifies this process, to help democratize content creation and ensure that new media content is accessible to peoples of all abilities. The primary objective of the project is to develop an integrated hardware/software prototype that enables creation of rich, object-based 3D audio content that supports personalization and accessibility, compliant with the NGA guidelines. Nomono will use the project results in the development of a recording system capable of capturing object-based audio content and optimizing the content for speech intelligibility, transcription, and immersive, 360 degree listening experiences. The project is being carried out with the help of research partners at SINTEF Digital, and NRKs Department of Audio Product Development and Department for Availability and Universal Design. Initial research findings have revealed that the proposed methods for noise reduction and signal enhancement have met or exceeded the performance of current industry-standard technologies. Additionally, current methods for automated positioning of audio objects in motion could benefit from the utilization of additional sensors beyond microphones to provide directional and positional information. Precise positioning data have immediate value for the 3D positioning of sounds within an immersive sound field, but can also enable more discriminating noise and crosstalk reduction techniques through data-driven mapping of the sound sources in a given audio scene. During the first year of the project Nomono has successfully developed a speech enhancement processing pipeline that will be commercially available early 2022 when we launch our cloud service to be integrated with our HW to be launched later that year. Sharing code through Github has become an important publication channel for the Machine learning community. Nomono supports this by opening up technical parts to the open source community. Many of our 2021 publications are on this platform, especially https://github.com/iver56/audiomentations/ that is starred by 784 users and is a dependency for 89 public open source repositories.

The primary objective of the project is to develop an integrated hardware/software prototype that enables creation of rich, object-based 3D audio content with support for personalization and accessibility, compliant with the guidelines of Next Generation Audio (NGA). The research will focus on AI-driven enhancement of speech audio signals, supported by auditory scene analysis and exploration of sensor fusion technologies. Validation of project results with respect to audio quality, functionality and accessibility for the hearing impaired will be carried out in cooperation with NRK. Nomono will use the project results in the development of a recording system capable of capturing object-based audio content and optimizing the content for speech intelligibility, transcription, and immersive, 360 degree listening experiences. The ultimate goal is to democratize creation of and access to immersive storytelling using audio.

Funding scheme:

BIA-Brukerstyrt innovasjonsarena