28834

WiSe 24/25: AI-powered content analysis: Using generative AI to measure media and communication content

Marko Bachl

Zusätzl. Angaben / Voraussetzungen

Requirements: Some prior exposure to (standardized, quantitative) content analysis will be helpful. However, qualitative methods also have their place in evaluating content analysis methods. If you have little experience with the former but can contribute with the latter, make sure to team up with students whose skill set complements yours. Prior knowledge in R or Python, applied data analysis, and interacting with application programming interfaces (API) will be helpful but are not required. Again, make sure that the teams overall have a balanced skill set. You will use your computer to conduct your evaluation study. Credit for commercial APIs (e.g., OpenAI) will be provided within sensible limits. This is not a programming class. Neither are programming skills required nor will you acquire such skills in a systematic way. We will learn the basics of interacting with an API using R. Code examples will be provided and discussed. Schließen

Kommentar

Large language models (LLM; starting with Google’s BERT) and particularly their implementations as generative or conversational AI tools (e.g., OpenAI’s ChatGPT) are increasingly used to measure or classify media and communication content. The idea is simple yet intriguing: Instead of training and employing humans for annotation tasks, researchers describe the concept of interest to a model such as ChatGPT, present the coding unit, and ask for a classification. The first tests of the utility of ChatGPT and similar tools for content analysis were positive to enthusiastic [1, 2]. However, others pointed out the need for more thorough validation and reliability tests [3, 4]. Easy-to-use tools and user-friendly tutorials have proliferated the methods to the average social scientist [5, 6]. Yet (closed-source, commercial) large language models are not entirely understood even by their developers, and their uncritical use has been criticized on ethical grounds [7, 8]. In this seminar, we will engage practically with this cutting-edge methodological research. We start with a quick refresher on the basics of quantitative content analysis (both human and computational), focusing on quality criteria and evaluation (validity, reliability, reproducibility, robustness, replicability). We will then attempt an overview of the rapidly developing literature on LLMs’ utility for content analysis. The central part of the seminar will be dedicated to small evaluation studies by student teams. Questions can range from understanding a tool’s parameters (e.g., What’s the effect of a model’s “temperature” on reliability and validity?) to practical optimization (e.g., Which prompts work best for a given task?) to critical questions (e.g., Does the classification show gender, racial, …, biases?). Schließen

Literaturhinweise

References: [1] Gilardi, F., Alizadeh, M., & Kubli, M. (2023). ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30), e2305016120. https://doi.org/10.1073/pnas.2305016120 [2] Heseltine, M., & Clemm von Hohenberg, B. (2024). Large language models as a substitute for human experts in annotating political text. Research & Politics, 11(1). https://doi.org/10/gtkhqr [3] Reiss, M. V. (2023). Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark. arXiv. https://doi.org/10.48550/arXiv.2304.11085 [4] Pangakis, N., Wolken, S., & Fasching, N. (2023). Automated annotation with generative AI requires validation. arXiv. https://doi.org/10.48550/arXiv.2306.00176 [5] Kjell, O., Giorgi, S., & Schwartz, H. A. (2023). The text-package: An R-package for analyzing and visualizing human language using natural language processing and transformers. Psychological Methods, 28(6), 1478–1498. https://doi.org/10/gsmcq8 [6] Törnberg, P. (2024). Best practices for text annotation with large language models. arXiv. https://doi.org/gtn9qf [7] Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? ??. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10/gh677h [8] Spirling, A. (2023). Why open-source generative AI models are an ethical way forward for science. Nature, 616(7957), 413–413. https://doi.org/10/gsqx6v Schließen