Data Confidentiality and Statistical Disclosure Control

You are here: Programs & Courses » Open Certificate Courses » Data Science Courses

This Page

Data Confidentiality and Statistical Disclosure Control - Course Details

Delve into the course contents and find out about the faculty members.

Single Course Price:

800.00 EUR (tax exempt)



Prof. Jörg Drechsler (Institute for Employment Research)

Video lecture:

Prof. Jörg Drechsler (Institute for Employment Research)


Course Dates

To see all courses in the upcoming term click here.


Book this course here!

In order to book the course with alumni conditions, please get in touch with Manon Pfeifer directly.

Course Description

Short Course Description
This course will provide a gentle introduction to statistical disclosure control with a focus on generating synthetic data for maintaining the confidentiality of the survey respondents. The first part of the course will introduce several traditional approaches for data protection that are widely used at statistical agencies. Some limitations of these approaches will also be discussed. The second part of the course will introduce synthetic data as a possible alternative. This part of the course will discuss different approaches to generating synthetic datasets in detail. Possible modeling strategies and analytical validity evaluations will be assessed and potential measures to quantify the remaining risk of disclosure will be presented. To provide the participants with hands on experience, all steps will be illustrated using simulated and real data examples in R.

The statistical software R will be used for illustrations and for (some of) the homework assignments. Thus, knowledge of R is required to be able to complete the assignments. Some background regarding general linear modelling is expected. Familiarity with the concept of Bayesian statistics is helpful but not required.

Course Objectives
By the end of the course, students will…

  • know which measures are typically taken by statistical agencies to guarantee confidentiality for the survey respondents if data are disseminated to the public.
  • be aware of potential limitations of these measures.
  • have a practical understanding of the concept of synthetic data.
  • be able to judge in which situations the approach could be useful.
  • know how to generate synthetic data from their own data.
  • have a number of tools available to evaluate the analytical validity of the synthetic datasets.
  • know how to assess the disclosure risk of the generated data.

Course Composition
This is a 4 ECTS course, which runs for 8 weeks. The content of the course is broken down into 8 units:

  1. A Brief History of Data Confidentiality & Traditional Approaches for Data Protection
  2. The Computer Science Perspective on Data Privacy & Introduction to Multiply Imputed Synthetic Datasets
  3. Analyzing Synthetic Datasets & Relationship to Multiple Imputation for Nonresponse
  4. Synthesis Models Part I (Univariate and Linear Regression Models)
  5. Synthesis Models Part II (Models for Categorical Variables and Nonparametric Models) & Modeling Strategies
  6. Analytical Validity & Disclosure Risk Part I (Theory)
  7. Disclosure Risk Part II (Examples in R) & Discussion of the Chances and Obstacles of the Synthetic Data Approach
  8. Discussion of the Third Homework Assignment

Learning and Teaching Methods
In this course, you are responsible for watching video recorded lectures and reading the required literature for each unit and then “attending” mandatory weekly one-hour online meetings where students have the chance to discuss the materials from a unit with the instructor. In addition, students are encouraged to post questions about the materials covered in the videos and readings of the week in the forum before the meetings. Just like in an on-site course, homework will be assigned and graded and there will be a final exam at the end of the course.

Grading will be based on:

  • 2 quizzes (worth 15% total)
  • Participation in the weekly online meetings, engagement in discussions during the meetings and/or submission of questions via e-mail (worth 10%)
  • Three homework assignments (worth 45% total)
  • •A final online exam (worth 30%)


ZFU Certification and Online Dispute Resolution

ZFU Certification

The Mannheim Master of Applied Data Science & Measurement program is certified according to the regulations of the ZFU (Staatliche Zentralstelle für Fernunterricht).


Online Dispute Resolution

Online dispute resolution according to Art. 14 Sect. 1 ODR-VO: The European Commission provides a platform for online dispute resolution (ODR). You can find more information under


Located in the heart of the German and European economy, Mannheim Business School (MBS), the umbrella organization for management education at the University of Mannheim, is considered to be one of the leading institutions of its kind in Germany and is continuously ranked as Germany’s #1.