ASR4Memory

Automatic Transcription of Audiovisual Research Data

Klicken Sie hier für die deutschsprachige Version der Webseite.

That's what this is all about:

The "ASR4Memory" project, funded by the NFDI4Memory, has developed a service for the automatic transcription of audiovisual research data for the research community. One specialist focus is on the historical digital humanities.

With this service, audiovisual resources from heterogeneous sources can be automatically transcribed in different languages for various research, reuse and archiving scenarios. The research data is processed in compliance with data protection regulations exclusively on locally operated infrastructures at Freie Universität Berlin.

The ASR4Memory transcription service was included in the NFDI4Memory service portfolio in 2025 as a "4Memory initiative service".

What we offer:

There is great interest in the research community in technically optimizing audiovisual resources, making them available in text form in accordance with scientific standards and making them accessible in terms of content, as well as considering their reusability in new projects. This service makes it possible to automatically transcribe audiovisual research resources - e.g. interviews with contemporary witnesses, documentary films or sound recordings - in the original language (30 languages are currently possible) and thus create an important basis for the scientific indexing of audiovisual resources.

The service uses open-source-based speech recognizers for automatic transcription (ASR) while critically examining the topic of “artificial intelligence”. Users first upload the audiovisual research resources to a secure storage area via the access-protected, browser-based Media Management Tool (MMT). The audio/video files are then automatically processed to the highest possible audio quality, speech-recognized with the best possible word accuracy and computing performance, converted into time-coded transcript and exchange formats according to scientific standards, and removed from the local transcription server after successful processing.

Users receive the various transcript formats and summaries in the standardized BagIt format for further reuse, including TXT and ODS files for manual post-processing (e.g. in MAXQDA), CSV and JSON files for automatic data processing, VTT and SRT files for subtitling AV resources and PDF files for long-term preservation of the transcripts, some with speaker markups and word/sentence-based timecodes. The TEI/XML exchange format will be available in the near future. The export formats are suitable for importing into the research and indexing platform “Oral-History.Digital”, for example.

The scope of services at a glance:

Usage options: Tool can be used as a web application or open source software (via CPU or GPU)
Data protection: Data processing exclusively in the infrastructure of "Oral-History. Digital" infrastructure (server at Freie Universität) or installation and operation on your own computer
Performance: Fast processing, high-quality transcription (low word error rate)
Multilingualism: Support for more than 30 languages (list of langu age models), automatic language detection possible
Diarization: Sentence-based recognition and annotation of speakers
Alignment: Word-based time stamps accurate to the millisecond for synchronization of transcript and AV
Segment length: Intelligent, dynamically adjustable limitation of characters per segment
Transcript formats: Export of standardized file types such as rtf, odt, ods, pdf, csv, json, vtt, srt, xml, txt, including a format suitable for MAXQDA import
Provision of export files in BagIt format: Platform-independent, hierarchical directory structure for storing and transferring digital content
Summary: Creating an multilingual abstract from the transcript using a large language model running locally
Integration of the OAuth authorization standard: Secure and low-threshold access for Oral-History.Digital and other user groups

What we are currently developing:

We are currently training a deep-learning-based speech recognition model with expertly curated and optimally prepared training data on a high-performance computer (HPC). The aim of this fine-tuning is to improve transcription quality, reduce serious errors and create domain-specific speech recognition models for individual specialist disciplines. We are collaborating with the “Geometric Neuroevolution for Fine-tuning Automatic Speech Recognition” project from the “MATH+” Cluster of Excellence, which is based at the Zuse Institute in Berlin.

The needs and requirements of the research community as well as weaknesses and potentials of the transcription pipeline are continuously recorded in exchange with the users and incorporated into our development work.

This is how you can use ASR4Memory::

If you would like to have your audiovisual resources transcribed automatically, please get in touch with the contact addresses below. In order to use ASR4Memory, the resources to be transcribed must be available in standard digital media formats and be used for research purposes, e.g. as part of the “Oral-History.Digital” platform.

After we have clarified whether your data is technically and content-wise suitable for our offer and whether we can take on this order, you will be activated for web access and can upload the audio/video resources for transcription.

Your audiovisual data will be transcribed via the “Oral-History.Digital” infrastructure operated at the FU Berlin and then made available in various transcript formats via web access.

How can I refer to ASR4Memory in my publication?

You are welcome to refer to the use of ASR4Memory in your publication with the following text:

“The transcript(s) was (were) generated in <2025> using the transcription application ‘ASR4Memory’ (https://www.fu-berlin.de/asr4memory) provided by the University Library of Freie Universität Berlin.”

The current project results are published here:

Project website (german): https://www.fu-berlin.de/asr4memory
Project website (english): https://www.fu-berlin.de/asr4memory-en
Github repositories: https://github.com/asr4memory

Further links:

Interview with the FU online magazine “Campus.Leben,” October 6, 2025: https://www.fu-berlin.de/campusleben/forschen/2025/251006-ASR4-memory/index.html
4Memory Incubator Funds 2024: https://4memory.de/4memory-incubator-funds/
Digital interview collections at the University Library of Freie Universität Berlin: https://www.fu-berlin.de/sites/interviewsammlungen/index.html
Research data at Freie Universität Berlin: https://www.fu-berlin.de/sites/forschungsdatenmanagement/index.html

Team:

Project management: Dr. Tobias Kilgus
Project staff: Peter Kompiel, Marc Altmann, Dr. Christian Horvat

Contact:

asr@oral-history.digital

tobias.kilgus@fu-berlin.de

peter.kompiel@fu-berlin.de

Publications:

Kompiel, P., & Kilgus, T. (2025). ASR4Memory: Automatische Transkription von audiovisuellen Forschungsdaten. Drittes NFDI4Memory Community Forum, Bonn. Zenodo. https://doi.org/10.5281/zenodo.17235951

Kilgus, T., & Kompiel, P. (2025). Vom gesprochenen Wort zum Text: KI-gestützte Transkription audiovisueller Forschungsdaten. Beitrag im Blog von 4Memory, 7.3.2025. https://4memory.de/fileadmin/files/Incubator_Funds/2024/Blogbeitrag_ASR4Memory.pdf

Kompiel, P., Nägel, V., & Kilgus, T. (2025). Open.Oral-History. Ein Projekt zur Risikobewertung, Anonymisierung und Bereitstellung rechtlich geschützter und ethisch sensibler audiovisueller Interviews. FORGE 2025 - Daten neu denken (FORGE2025), Rostock. https://doi.org/10.5281/zenodo.17178280

Nägel, V., Kilgus, T., Leh, A., & Mischke, D. (2025). Empfehlungen und Werkzeuge für die Risikobewertung, Anonymisierung und Bereitstellung rechtlich geschützter und ethisch sensibler audiovisueller Interviews. Beitrag im DFG-Blog "Rechtebewehrte Sammlungsobjekte". https://schutzrechte.hypotheses.org/1206

Nägel, V. (2025). Bits and Bytes: Oral Histories of Holocaust Survivors as Digital Research Data", in Cave, M., Leydesdorff, S. (Eds.). Handbook of Global Oral History. Leiden, Niederlande: Brill. https://doi.org/10.1163/9789004737181_012

ASR4Memory

ASR4Memory

Automatic Transcription of Audiovisual Research Data

Related Links

Downloads