ASR4Memory
Automatic Transcription of Audiovisual Research Data
Klicken Sie hier für die deutschsprachige Version der Webseite.
The "ASR4Memory" project, funded by the NFDI4Memory, has developed a service for the automatic transcription of audiovisual research data for the research community. One specialist focus is on the historical digital humanities.
With this service, audiovisual resources from heterogeneous sources can be automatically transcribed in different languages for various research, reuse and archiving scenarios. The research data is processed in compliance with data protection regulations exclusively on locally operated infrastructures at Freie Universität Berlin.
The ASR4Memory transcription service was included in the NFDI4Memory service portfolio in 2025 as a "4Memory initiative service".
There is great interest in the research community in technically optimizing audiovisual resources, making them available in text form in accordance with scientific standards and making them accessible in terms of content, as well as considering their reusability in new projects. This service makes it possible to automatically transcribe audiovisual research resources - e.g. interviews with contemporary witnesses, documentary films or sound recordings - in the original language (30 languages are currently possible) and thus create an important basis for the scientific indexing of audiovisual resources.
The service uses open-source-based speech recognizers for automatic transcription (ASR) while critically examining the topic of “artificial intelligence”. Users first upload the audiovisual research resources to a secure storage area via the access-protected, browser-based Media Management Tool (MMT). The audio/video files are then automatically processed to the highest possible audio quality, speech-recognized with the best possible word accuracy and computing performance, converted into time-coded transcript and exchange formats according to scientific standards, and removed from the local transcription server after successful processing.
Users receive the various transcript formats and summaries in the standardized BagIt format for further reuse, including TXT and ODS files for manual post-processing (e.g. in MAXQDA), CSV and JSON files for automatic data processing, VTT and SRT files for subtitling AV resources and PDF files for long-term preservation of the transcripts, some with speaker markups and word/sentence-based timecodes. The TEI/XML exchange format will be available in the near future. The export formats are suitable for importing into the research and indexing platform “Oral-History.Digital”, for example.
The scope of services at a glance:- Usage options: Tool can be used as a web application or open source software (via CPU or GPU)
- Data protection: Data processing exclusively in the infrastructure of "Oral-History. Digital" infrastructure (server at Freie Universität) or installation and operation on your own computer
- Performance: Fast processing, high-quality transcription (low word error rate)
- Multilingualism: Support for more than 30 languages (list of language models), automatic language detection possible
- Diarization: Sentence-based recognition and annotation of speakers
- Alignment: Word-based time stamps accurate to the millisecond for synchronization of transcript and AV
- Segment length: Intelligent, dynamically adjustable limitation of characters per segment
- Transcript formats: Export of standardized file types such as rtf, odt, ods, pdf, csv, json, vtt, srt, xml, txt, including a format suitable for MAXQDA import
- Provision of export files in BagIt format: Platform-independent, hierarchical directory structure for storing and transferring digital content
- Summary: Creating an multilingual abstract from the transcript using a large language model running locally
- Integration of the OAuth authorization standard: Secure and low-threshold access for Oral-History.Digital and other user groups
We are currently training a deep-learning-based speech recognition model with expertly curated and optimally prepared training data on a high-performance computer (HPC). The aim of this fine-tuning is to improve transcription quality, reduce serious errors and create domain-specific speech recognition models for individual specialist disciplines. We are collaborating with the “Geometric Neuroevolution for Fine-tuning Automatic Speech Recognition” project from the “MATH+” Cluster of Excellence, which is based at the Zuse Institute in Berlin.
The needs and requirements of the research community as well as weaknesses and potentials of the transcription pipeline are continuously recorded in exchange with the users and incorporated into our development work.
If you would like to have your audiovisual resources transcribed automatically, please get in touch with the contact addresses below. In order to use ASR4Memory, the resources to be transcribed must be available in standard digital media formats and be used for research purposes, e.g. as part of the “Oral-History.Digital” platform.
After we have clarified whether your data is technically and content-wise suitable for our offer and whether we can take on this order, you will be activated for web access and can upload the audio/video resources for transcription.
Your audiovisual data will be transcribed via the “Oral-History.Digital” infrastructure operated at the FU Berlin and then made available in various transcript formats via web access.
You are welcome to refer to the use of ASR4Memory in your publication with the following text:
“The transcript(s) was (were) generated in <2025> using the transcription application ‘ASR4Memory’ (https://www.fu-berlin.de/asr4memory) provided by the University Library of Freie Universität Berlin.”
Project website: https://www.fu-berlin.de/asr4memory
Github repositories: https://github.com/asr4memory
Further links:Interview with the FU online magazine “Campus.Leben,” October 6, 2025: https://www.fu-berlin.de/campusleben/forschen/2025/251006-ASR4-memory/index.html
Article in the 4Memory Blog, March 7, 2025: "From spoken word to text: AI-assisted transcription of audiovisual research data"
4Memory Incubator Funds 2024: https://4memory.de/4memory-incubator-funds/
Digital interview collections at the University Library of Freie Universität Berlin: https://www.fu-berlin.de/sites/interviewsammlungen/index.html
Research data at Freie Universität Berlin: https://www.fu-berlin.de/sites/forschungsdatenmanagement/index.html
Team:- Project management: Dr. Tobias Kilgus
- Project staff: Peter Kompiel, Marc Altmann, Dr. Christian Horvat

