SoSe 25  
Mathematics and...  
Data Science  
Course

Data Science

Data Science

0590a_MA120
  • Ethical Foundations of Data Science

    0590aB1.2
  • Data Base Systems for Students of Data Science

    0590aB1.20
    • 19301501 Lecture
      Database Systems (Agnès Voisard)
      Schedule: Di 14:00-16:00, Do 14:00-16:00, zusätzliche Termine siehe LV-Details (Class starts on: 2025-04-15)
      Location: T9/Gr. Hörsaal (Takustr. 9)

      Additional information / Pre-requisites

      Requirements

      • ALP 1 - Functional Programming
      • ALP 2 - Object-oriented Programming
      • ALP 3 - Data structures and data abstractions
      • OR Informatik B

      Comments

      Content

      Database design with ERM/ERDD. Theoretical foundations of relational database systems: relational algebra, functional dependencies, normal forms. Relational database development: SQL data definitions, foreign keys and other integrity constraints, SQL as applicable language: essential language elements, embedding in programming language. Application programming; object-relational mapping. Security and protection concepts. Transaction subject, transactional guaranties, synchronization of multi user operations, fault tolerance features. Application and new developments: data warehousing, data mining, OLAP.

      Project: the topics are deepened in an implementation project for student groups.

      Suggested reading

      • Alfons Kemper, Andre Eickler: Datenbanksysteme - Eine Einführung, 5. Auflage, Oldenbourg 2004
      • R. Elmasri, S. Navathe: Grundlagen von Datenbanksystemen, Pearson Studium, 2005

    • 19301502 Practice seminar
      Practice seminar for Database systems (Muhammed-Ugur Karagülle)
      Schedule: Mo 12:00-14:00, Mo 14:00-16:00, Mo 16:00-18:00, Di 08:00-10:00, Di 10:00-12:00, Di 12:00-14:00, Mi 10:00-12:00, Mi 12:00-14:00, Mi 14:00-16:00, Do 08:00-10:00, Do 10:00-12:00, Do 12:00-14:00, Do 16:00-18:00, Fr 10:00-12:00, Fr 14:00-16:00, Fr 16:00-18:00 (Class starts on: 2025-04-14)
      Location: T9/SR 006 Seminarraum (Takustr. 9)
  • Mobile Communications

    0590aB1.22
    • 19303901 Lecture
      Mobile Communications (Jochen Schiller)
      Schedule: Mi 10:00-12:00 (Class starts on: 2025-04-16)
      Location: T9/049 Seminarraum (Takustr. 9)

      Comments

      The module mobile communication presents major topics from mobile and wireless communications - the key drivers behind today's communication industry that influence everybody's daily life. 

      The whole lecture focuses on a system perspective giving many pointers to real systems, standardization and current research.

      The format of the lecture is the flipped classroom, i.e., you should watch the videos of a lecture BEFORE participating in the Q&A session. We will then discuss all open issues, answer questions etc. during the Q&A session.

      Main topics of the lecture are:

      • Basics of wireless transmission: frequencies, signals, antennas, multiplexing, modulation, spread spectrum
      • Medium access: SDMA, FDMA, TDMA, CDMA;
      • Wireless telecommunication systems: GSM, TETRA, IMT-2000, LTE, 5G
      • Wireless local area networks: infrastructure/ad-hoc, IEEE 802.11/15, Bluetooth, ZigBee
      • Mobile networking: Mobile IP, ad-hoc networks
      • Mobile transport layer: traditional TCP, additional mechanisms
      • Outlook: 5 to 6G, low power wireless networks

      Suggested reading

      Jochen Schiller, Mobilkommunikation, Addison-Wesley, 2.Auflage 2003

      Alle Unterlagen verfügbar unter http://www.mi.fu-berlin.de/inf/groups/ag-tech/teaching/resources/Mobile_Communications/course_Material/index.html

  • Machine Learning in Bioinformatics

    0590aB1.30
    • 19405701 Lecture
      Machine Learning in Bioinformatics (Philipp Florian Benner, Hugues Richard)
      Schedule: Mo 08:00-10:00 (Class starts on: 2025-04-14)
      Location: A6/SR 025/026 Seminarraum (Arnimallee 6)

      Comments

      This course introduces key machine learning concepts and is accompanied by tutorials and exercises where machine learning methods are applied to actual bioinformatics problems. After a short recap of probability theory, we introduce probabilistic methods for classification and sequence analysis (Naive Bayes, Mixture Models, Hidden Markov Models). We discuss Expectation Maximization (EM) from a probabilistic perspective and use it for sequence analysis. Linear and logistic regression serve as an entry point to more complex machine learning methods, including kernel methods and neural networks. The lecture covers multiple neural network architectures (CNNs, GNN, Transformers) that are currently used in the bioinformatics community and other research domains. During the tutorials and as part of homework assignments, selected machine learning models are implemented in Python using scikit-learn and pytorch. The course should enable students to understand all common machine learning techniques and devise state of the art classification strategies that can then be applied to problems in bioinformatics and related fields.
      Contents:
      - Naive Bayes
      - Clustering and Mixture Models
      - Hidden Markov Models
      - Regression and Partial Least Squares
      - Kernel Methods
      - Neural Networks and Architectures
      - Regularization and Model Selection   Requirements:
      - Linear algebra (basic vector and matrix algebra)
      - Analysis (mathematical optimization, Lagrange)
      - Programming in Python -- including object oriented programming
      - A basic understanding or keen interest in molecular biology and bioinformatics applications

    • 19405702 Practice seminar
      Practice Seminar for Machine Learning in Bioinformatics (Philipp Florian Benner, Hugues Richard)
      Schedule: Mi 08:00-10:00 (Class starts on: 2025-04-16)
      Location: A7/SR 031 (Arnimallee 7)
  • Complex Systems in Bioinformatics

    0590aB1.32
    • 19405201 Lecture
      Complex Systems in Bioinformatics (Martin Vingron, Max von Kleist, Jana Wolf)
      Schedule: Di 12:00-14:00 (Class starts on: 2025-04-15)
      Location: A3/SR 120 (Arnimallee 3-5)

      Comments

      Students have acquired a deeper understanding of fundamental mathematical and algorithmic concepts in the field of modeling, simulation and analysis of complex biological systems against the background of current research trends in system biology and biotechnology. They are capable of analyzing a given biological or medical problem, selecting a suitable modeling approach, independently developing a solution and assessing and communicating the results.

      Content:

      Topics from the following areas are considered in depth:

      - Network structure analysis

      - Graphical modeling

      - Modeling of biochemical networks using standard differential equations

      - Discrete modeling of regulatory networks

      - Constraint-based modeling

      - Stochastic and hybrid modeling

      Suggested reading

      wird in der Veranstaltung bekanntgegeben.

    • 19405202 Practice seminar
      Practice seminar for Complex Systems in Bioinformatics (Martin Vingron, Max von Kleist, Jana Wolf)
      Schedule: Di 14:00-16:00 (Class starts on: 2025-04-15)
      Location: A3/SR 120 (Arnimallee 3-5)
    • 19405211 Seminar
      Seminar for Complex Systems in Bioinformatics (Martin Vingron, Max von Kleist, Jana Wolf)
      Schedule: Do 10:00-12:00 (Class starts on: 2025-04-17)
      Location: A3/SR 119 (Arnimallee 3-5)
  • Data Science in the Life Sciences

    0590aB2.1
    • 19405606 Seminar-style instruction
      Data Science in the Life Sciences (Katharina Jahn)
      Schedule: Mo 10:00-14:00 (Class starts on: 2025-04-14)
      Location: T9/SR 006 Seminarraum (Takustr. 9)

      Comments

      This course offers an introduction to various types of data and analysis techniques which are typically used in the life sciences (e.g. omics technologies). The goal is to get a deeper understanding of advanced concepts and data analytical methods in the area of life sciences.

      The focus will be on the following topics:

      * acquisition and pre-processing of data from the area of life sciences,
      * explorative analysis techniques,
      * concepts and tools for reproducible research,
      * theory and practice of methods and models for the analysis of data from the life sciences (statistical inference, regression models, methods of machine learning),
      * introduction to methods of big data analysis.

      After successful completion of this course, participants are able to evaluate, plan and conduct investigations in the life sciences using common methods.

       

    • 19405612 Project Seminar
      Projectseminar for Data Science in the Life Sciences (Katharina Jahn)
      Schedule: Mi 10:00-14:00 (Class starts on: 2025-04-16)
      Location: A6/SR 031 Seminarraum (Arnimallee 6)
  • Special Aspects of Data Science in Life Sciences

    0590aB2.4
    • 19336901 Lecture
      Advanced Data Visualization for Artificial Intelligence (Georges Hattab)
      Schedule: Mi 10:00-12:00 (Class starts on: 2025-04-16)
      Location: A6/SR 007/008 Seminarraum (Arnimallee 6)

      Comments

      The lecture on Advanced Data Visualization for Artificial Intelligence is a comprehensive exploration of state-of-the-art techniques and tools to create and validate complex visualizations for communicating data insights and stories, with a specific focus on applications in Natural Language Processing (NLP) and Explainable AI. The lecture will introduce participants to the nested model of visualization, which encompasses four layers: characterizing the task and data, abstracting into operations and data types, designing visual encoding and interaction techniques, and creating algorithms to execute techniques efficiently. This model will serve as a framework for designing and validating data visualizations.

      Furthermore, the lecture will delve into the application of data visualization in NLP, emphasizing the visualization of word embeddings and language models to aid in the exploration of semantic relationships between words and the interpretation of language model behavior. In the context of Explainable AI, the focus will be on using visualizations to explain model predictions and feature importance, thereby enhancing the interpretability of AI models. By leveraging the nested model of visualization and focusing on NLP and Explainable AI, the lecture aims to empower participants with the essential skills to design and validate advanced data visualizations tailored to these specific applications, ultimately enabling them to effectively communicate complex data patterns and gain deeper insights from their data.

    • 60102501 Lecture
      Resampling techniques and their application (Frank Konietschke)
      Schedule: Mi 14:00-16:00 (Class starts on: 2025-04-16)
      Location: A6/SR 032 Seminarraum (Arnimallee 6)

      Comments

      In this course, we introduce resampling techniques for analyzing trials with small sample sizes. Special attention will be given to both estimation methods as well as inference procedures. We hereby will find answers to the questions (1) "How does resampling work?" and "When does resampling work"? Throughout the class we will study one sample, two samples and even factorial designs with independent and dependent observations. All algorithms will be presented and illustrated using R statistical software. Knowledge of fundamentals in statistical testing as well as basic skills in R are recommended and prerequisite. 

    • 60102701 Lecture
      Complex Data Analysis in Physiology (Dorothee Günzel)
      Schedule: Mo 14:30-18:30 (Class starts on: 2025-04-14)
      Location: keine Angabe

      Comments

      Joint class taught by the Institute of Clinical Physiology and the Institute of Physiology at the Charité.

      Theoretical and practical aspects of data acquisition, real-time data processing and automated pattern recognition in biomedicine. Topics from the following areas are covered in depth:

      • Data acquisition and processing of image files in research and clinical settings (e.g. live cell imaging, super-resolution microscopy, medical imaging techniques).
      • Electrophysiological methods (e.g. impedance spectroscopy, microarrays, EEG, ECG)
      • Methods and application of automated pattern recognition (e.g. automated tumour detection, real-time analysis of biological signals in the brain-computer interface or in retina implants, prediction of individual arrhythmia risks)

      The course will be split into two segments: the first seven appointments in the semester will take place at the Institute of Physiology, while the second seven appointments will take place at the Institute of Clinical Physiology.

      For further information: http://klinphys.charite.de/bioinfo/ or mail to Dorothee Günzel

    • 19336902 Practice seminar
      Ü: Advanced Data Visualization for Artificial Intelligence (Georges Hattab)
      Schedule: Mi 14:00-16:00 (Class starts on: 2025-04-16)
      Location: A6/SR 007/008 Seminarraum (Arnimallee 6)
    • 60102502 Practice seminar
      Practice Seminar for Resampling techniques and their application (Frank Konietschke)
      Schedule: Mi 16:00-18:00 (Class starts on: 2025-04-16)
      Location: A6/SR 032 Seminarraum (Arnimallee 6)
    • 60102702 Practice seminar
      Practice seminar for Complex Data Analysis in Physiology (Dorothee Günzel)
      Schedule: s. Vorlesung
      Location: keine Angabe
  • Special Aspects of Data Science Technologies

    0590aB3.3
    • 19327401 Lecture
      Image- and video coding (Heiko Schwarz)
      Schedule: Mo 14:00-16:00 (Class starts on: 2025-04-14)
      Location: T9/053 Seminarraum (Takustr. 9)

      Comments

      This course introduces the most important concepts and algorithms that are used in modern image and video coding approaches. We will particularly focus on techniques that are found in current international video coding standards.

      In a short first part, we introduce the so-called raw data formats, which are used as input and output formats of image and video codecs. This part covers the following topics:

      • Colour spaces and their relation to human visual perception
      • Transfer functions (gamma encoding)
      • Why do we use the YCbCr format?

      The second part of the course deals with still image coding and includes the following topics:

      • The start: How does JPEG work?
      • Why do we use the Discrete Cosine Transform?
      • Efficient coding of transform coefficients
      • Prediction of image blocks
      • Adaptive block partitioning
      • How do we take decisions in an encoder?
      • Optimized quantization

      In the third part, we discuss approaches that make video coding much more efficient than coding all pictures using still image coding techniques:

      • Motion-compensated prediction
      • Coding of motion vectors
      • Algorithms for motion estimation
      • Sub-sample accurate motion vectors and interpolation filters
      • Usage of multiple reference pictures
      • What are B pictures and why do we use them?
      • Deblocking and deringing filters
      • Efficient temporal coding structures

      In the exercises, we will implement our own image codec (in a gradual manner). We may extend it to a simple video codec.

       

      Suggested reading

      • Bull, D. R., “Communicating Pictures: A Course in Image and Video Coding,” Elsevier, 2014.
      • Ohm, J.-R., “Multimedia Signal Coding and Transmission,” Springer, 2015.
      • Wien, M., “High Efficiency Video Coding — Coding Tools and Specifications,” Springer 2014.
      • Sze, V., Budagavi, M., and Sullivan, G. J. (eds.), “High Efficiency Video Coding (HEVC): Algorithm and Architectures,” Springer, 2014.
      • Wiegand, T. and Schwarz, H., "Source Coding: Part I of Fundamentals of Source and Video Coding,” Foundations and Trends in Signal Processing, Now Publishers, vol. 4, no. 1–2, 2011.
      • Schwarz, H. and Wiegand, T., “Video Coding: Part II of Fundamentals of Source and Video Coding,” Foundations and Trends in Signal Processing, Now Publishers, vol. 10, no. 1–3, 2016.

    • 19336901 Lecture
      Advanced Data Visualization for Artificial Intelligence (Georges Hattab)
      Schedule: Mi 10:00-12:00 (Class starts on: 2025-04-16)
      Location: A6/SR 007/008 Seminarraum (Arnimallee 6)

      Comments

      The lecture on Advanced Data Visualization for Artificial Intelligence is a comprehensive exploration of state-of-the-art techniques and tools to create and validate complex visualizations for communicating data insights and stories, with a specific focus on applications in Natural Language Processing (NLP) and Explainable AI. The lecture will introduce participants to the nested model of visualization, which encompasses four layers: characterizing the task and data, abstracting into operations and data types, designing visual encoding and interaction techniques, and creating algorithms to execute techniques efficiently. This model will serve as a framework for designing and validating data visualizations.

      Furthermore, the lecture will delve into the application of data visualization in NLP, emphasizing the visualization of word embeddings and language models to aid in the exploration of semantic relationships between words and the interpretation of language model behavior. In the context of Explainable AI, the focus will be on using visualizations to explain model predictions and feature importance, thereby enhancing the interpretability of AI models. By leveraging the nested model of visualization and focusing on NLP and Explainable AI, the lecture aims to empower participants with the essential skills to design and validate advanced data visualizations tailored to these specific applications, ultimately enabling them to effectively communicate complex data patterns and gain deeper insights from their data.

    • 19327402 Practice seminar
      Practice seminar for image- und video coding (Heiko Schwarz)
      Schedule: Mo 12:00-14:00 (Class starts on: 2025-04-14)
      Location: T9/053 Seminarraum (Takustr. 9)
    • 19336902 Practice seminar
      Ü: Advanced Data Visualization for Artificial Intelligence (Georges Hattab)
      Schedule: Mi 14:00-16:00 (Class starts on: 2025-04-16)
      Location: A6/SR 007/008 Seminarraum (Arnimallee 6)
  • Current Research Topics in Data Science Technologies

    0590aB3.4
    • 19327401 Lecture
      Image- and video coding (Heiko Schwarz)
      Schedule: Mo 14:00-16:00 (Class starts on: 2025-04-14)
      Location: T9/053 Seminarraum (Takustr. 9)

      Comments

      This course introduces the most important concepts and algorithms that are used in modern image and video coding approaches. We will particularly focus on techniques that are found in current international video coding standards.

      In a short first part, we introduce the so-called raw data formats, which are used as input and output formats of image and video codecs. This part covers the following topics:

      • Colour spaces and their relation to human visual perception
      • Transfer functions (gamma encoding)
      • Why do we use the YCbCr format?

      The second part of the course deals with still image coding and includes the following topics:

      • The start: How does JPEG work?
      • Why do we use the Discrete Cosine Transform?
      • Efficient coding of transform coefficients
      • Prediction of image blocks
      • Adaptive block partitioning
      • How do we take decisions in an encoder?
      • Optimized quantization

      In the third part, we discuss approaches that make video coding much more efficient than coding all pictures using still image coding techniques:

      • Motion-compensated prediction
      • Coding of motion vectors
      • Algorithms for motion estimation
      • Sub-sample accurate motion vectors and interpolation filters
      • Usage of multiple reference pictures
      • What are B pictures and why do we use them?
      • Deblocking and deringing filters
      • Efficient temporal coding structures

      In the exercises, we will implement our own image codec (in a gradual manner). We may extend it to a simple video codec.

       

      Suggested reading

      • Bull, D. R., “Communicating Pictures: A Course in Image and Video Coding,” Elsevier, 2014.
      • Ohm, J.-R., “Multimedia Signal Coding and Transmission,” Springer, 2015.
      • Wien, M., “High Efficiency Video Coding — Coding Tools and Specifications,” Springer 2014.
      • Sze, V., Budagavi, M., and Sullivan, G. J. (eds.), “High Efficiency Video Coding (HEVC): Algorithm and Architectures,” Springer, 2014.
      • Wiegand, T. and Schwarz, H., "Source Coding: Part I of Fundamentals of Source and Video Coding,” Foundations and Trends in Signal Processing, Now Publishers, vol. 4, no. 1–2, 2011.
      • Schwarz, H. and Wiegand, T., “Video Coding: Part II of Fundamentals of Source and Video Coding,” Foundations and Trends in Signal Processing, Now Publishers, vol. 10, no. 1–3, 2016.

    • 19327402 Practice seminar
      Practice seminar for image- und video coding (Heiko Schwarz)
      Schedule: Mo 12:00-14:00 (Class starts on: 2025-04-14)
      Location: T9/053 Seminarraum (Takustr. 9)
  • Selected Topics in Data Science Technologies

    0590aB3.5
    • 19326601 Lecture
      Markov Chains (Katinka Wolter)
      Schedule: Di 12:00-14:00, Do 10:00-12:00 (Class starts on: 2025-04-15)
      Location: T9/Gr. Hörsaal (Takustr. 9)

      Comments

      In this course we will study stochastic models commonly used to analyse the performance of dynamic systems. Markov models and queues are used to study the behaviour over time of a wide range of systems, from computer hardware, communication systems, biological systems, epidemics, traffic networks to crypto-currencies. We will take a tour of the basics of Markov modelling, starting from birth-death processes, the Poisson process to general Markov and semi-Markov processes and solution methods for those processes. Then we will look at queueing models and queueing networks with exact and approximate solution algorithms. If time allows we will finally study some of the foundations of discrete event simulation.

      Suggested reading

      William Stewart. Probability, Markov Chains, Queues and Simulation. Princeton University Press 2009.

    • 19326602 Practice seminar
      Practice seminar for Markov Chains (Justus Purat)
      Schedule: Di 14:00-16:00 (Class starts on: 2025-04-15)
      Location: A6/SR 007/008 Seminarraum (Arnimallee 6)
  • Software Project Data Science

    0590aB3.1
    • 19308312 Project Seminar
      Implementation Project: Applications of Algorithms (Mahmoud Elashmawi)
      Schedule: Do 08:30-10:00 (Class starts on: 2025-04-10)
      Location: T9/053 Seminarraum (Takustr. 9)

      Comments

      Contents

      We choose a typical application area of algorithms, usually for geometric problems, and develop software solutions for it, e.g., computer graphics (representation of objects in a computer, projections, hidden edge and surface removal, lighting, raytracing), computer vision (image processing, filtering, projections, camera calibration, stereo-vision) or pattern recognition (classification, searching).

      Prerequsitions

      Basic knowledge in design and anaylsis of algorithms.

      Suggested reading

      je nach Anwendungsgebiet

    • 19314012 Project Seminar
      Software Project: Semantic Technologies (Adrian Paschke)
      Schedule: Mi 14:00-16:00 (Class starts on: 2025-04-16)
      Location: A7/SR 031 (Arnimallee 7)

      Additional information / Pre-requisites

      Corporate Semantic Web

      Further information can be found on the course website

      Comments

      Mixed groups of master and bachelor students will either implement an independent project or are part of a larger project in the area of semantic technologies. They will gain in-depth programming knowledge about applications of semantic technologies and artificial intelligence techniques in the Corporate Semantic Web. They will practice teamwork and best practices in software development of large distributed systems and Semantic Web applications. The software project can be done in collaboration with an external partner from industry or standardization. It is possible to continue the project as bachelor or master thesis.

      Suggested reading

      Corporate Semantic Web

    • 19334212 Project Seminar
      Softwareproject: Machine Learning and Explainability for Improved (Cancer) Treatment (Pauline Hiort)
      Schedule: Di 15:00-17:00, zusätzliche Termine siehe LV-Details (Class starts on: 2025-02-26)
      Location: T9/K40 Multimediaraum (Takustr. 9)

      Comments

      In the software project, we will implement, train, and evaluate various machine learning (ML) methods. The focus of the project is on neural networks (NN) and their explainability. We will compare the methods with different baseline models, such as regression models. The various ML methods will be applied to a specific dataset, e.g., for predicting drug combinations for cancer treatment, and evaluated accordingly. The dataset will be prepared by us and analyzed using the implemented methods. Additionally, we will focus on explainability to ensure that the predictions of the ML models are understandable and interpretable. For this purpose, we will integrate appropriate explainability techniques to better understand and visualize the decision-making processes of the models.

      The programming language is Python, and we plan to use modern Python modules for ML like scikit-learn, and PyTorch. Good Python skills are required. The goal is to create a Python package that provides reusable code for preprocessing, training ML models, and evaluating results with documentation (e.g., using Sphinx) for the specific use case. The software project takes place throughout the semester and can also be conducted in English.

    • Introduction to Profile Areas 0590aA1.1
    • Statistics for Students of Data Science 0590aA1.2
    • Machine Learning for Data Science 0590aA1.3
    • Programming for Data Science 0590aA1.4
    • Data Science in the Social Sciences 0590aB1.1
    • Mobile Mental Health 0590aB1.10
    • Developing Psychological Online Interventions 0590aB1.11
    • Selected Topics in Data Science in the Social Sciences 0590aB1.12
    • Special Aspects of Data Science in the Social Sciences 0590aB1.13
    • Distributed Systems 0590aB1.21
    • Telematics 0590aB1.23
    • Advanced Analysis 0590aB1.24
    • Computer Security 0590aB1.25
    • Pattern Recognition 0590aB1.26
    • Network-Based Information Systems 0590aB1.27
    • Artificial Intelligence 0590aB1.28
    • Special Aspects of Data Administration 0590aB1.29
    • Research Practice 0590aB1.3
    • Big Data Analysis in Bioinformatics 0590aB1.31
    • Neurocognitive Methods and Programming for Data Science 0590aB1.4
    • Cognitive Neuroscience for Data Science A 0590aB1.5
    • Cognitive Neuroscience for Data Science B 0590aB1.6
    • Differential Psychological Approaches in Data Sciences 0590aB1.7
    • Natural Language Processing 0590aB1.8
    • Introduction to Psychoinformatics 0590aB1.9
    • Selected Topics in Data Science in Life Sciences 0590aB2.5