ASR4Memory

Automatic Transcription of Audiovisual Research Data

Klicken Sie hier für die deutschsprachige Version der Webseite.

That's what this is all about:

The "ASR4Memory" project, funded by the NFDI4Memory, has developed a service for the automatic transcription of audiovisual research data for the research community. One specialist focus is on the historical digital humanities.

With this service, audiovisual resources from heterogeneous sources can be automatically transcribed in different languages for various research, reuse and archiving scenarios. The research data is processed in compliance with data protection regulations exclusively on locally operated infrastructures at Freie Universität Berlin.

The ASR4Memory transcription service is part of the official NFDI4Memory service portfolio.

What we offer:

There is great interest in the research community in technically optimizing audiovisual resources, making them available in text form in accordance with scientific standards and making them accessible in terms of content, as well as considering their reusability in new projects. This service makes it possible to automatically transcribe audiovisual research resources - e.g. interviews with contemporary witnesses, documentary films or sound recordings - in the original language (30 languages are currently possible) and thus create an important basis for the scientific indexing of audiovisual resources.

The service uses open-source-based speech recognizers for automatic transcription (ASR) while critically examining the topic of “artificial intelligence”. The audiovisual research resources are first uploaded to secure storage, then automatically processed in the highest possible audio quality, subsequently speech-recognized with the best possible word accuracy and computing performance and finally converted into time-coded transcript and exchange formats according to scientific standards.

Users receive the generated transcripts in various export formats for further use, including TXT and ODS files for manual post-processing, CSV and JSON files for automatic data processing, VTT and SRT files for subtitling AV resources and PDF files for long-term preservation of the transcripts, some with speaker markups and word/sentence-based timecodes. The TEI/XML exchange format will also be available in the near future. The export formats are suitable for importing into the research and indexing platform “Oral-History.Digital”, for example.

The scope of services at a glance:

Usage options: Tool can be used as a web service or open source software (via CPU or GPU)
Data protection: Data processing exclusively in the infrastructure of "Oral-History. Digital" infrastructure (server at Freie Universität) or installation and operation on your own computer
Performance: fast processing, high-quality transcription (low word error rate)
Multilingualism: support for more than 30 languages, automatic language detection possible
Diarization: sentence-based recognition and annotation of speakers
Alignment: word-based time stamps accurate to the millisecond for synchronization of transcript and AV
Segment length: intelligent, dynamically adjustable limitation of characters per segment
Transcript formats: Export of standardized file types such as txt, rtf, odt, ods, pdf, csv, json, vtt, srt, xml

What we are currently developing:

We are currently training a deep-learning-based speech recognition model with expertly curated and optimally prepared training data on a high-performance computer (HPC). The aim of this fine-tuning is to improve transcription quality, reduce serious errors and create domain-specific speech recognition models for individual specialist disciplines. We are collaborating with the “Geometric Neuroevolution for Fine-tuning Automatic Speech Recognition” project from the “MATH+” Cluster of Excellence, which is based at the Zuse Institute in Berlin.

The needs and requirements of the research community as well as weaknesses and potentials of the transcription pipeline are continuously recorded in exchange with the users and incorporated into our development work.

The forthcoming integration of the open authorization standard OAuth enables secure and low-threshold access to the ASR4Memory application for external users - without having to transmit passwords directly.

This is how you can use ASR4Memory::

If you would like to have your audiovisual resources transcribed automatically, please get in touch with the contact addresses below. In order to use ASR4Memory, the resources to be transcribed must be available in standard digital media formats and be used for research purposes, e.g. as part of the “Oral-History.Digital” platform.

After we have clarified whether your data is technically and content-wise suitable for our offer and whether we can take on this order, you will be activated for web access and can upload the audio/video resources for transcription.

Your audiovisual data will be transcribed via the “Oral-History.Digital” infrastructure operated at the FU Berlin and then made available in various transcript formats via web access.

How can I refer to ASR4Memory in my publication?

You are welcome to refer to the use of ASR4Memory in your publication with the following text:

“The transcript(s) was (were) generated in <2025> using the transcription application ‘ASR4Memory’ (https://www.fu-berlin.de/asr4memory) provided by the University Library of Freie Universität Berlin.”

The current project results are published here:

Project website: https://www.fu-berlin.de/asr4memory

Github repositories: https://github.com/asr4memory

Further links:

Article in the 4Memory Blog, 7.3.2025: "From spoken word to text: AI-assisted transcription of audiovisual research data"

4Memory Incubator Funds 2024: https://4memory.de/4memory-incubator-funds/

Digital interview collections at the University Library of Freie Universität Berlin: https://www.fu-berlin.de/sites/interviewsammlungen/index.html

Research data at Freie Universität Berlin: https://www.fu-berlin.de/sites/forschungsdatenmanagement/index.html

Team: