You are here: MBA & Master » Mannheim Master of Applied Data Science & Measurement » The Curriculum
The three study tracks allow you to tailor the curriculum to your personal and professional needs. The wide range of courses in the ﬁve skill areas – Research Design, Data Generating Process, Data Curation Storage, Data Analysis, and Data Output/Access – guarantees that you gain a thorough understanding of the entire subject area throughout the program. On average, each study period (spring, summer, and fall) includes two to three courses that vary between four and 12 weeks in duration. ECTS credits are awarded accordingly.
The course introduces the student to a set of principles of survey and data science that are the basis of standard practices in these fields. The course exposes the student to key terminology and concepts of collecting and analyzing data from surveys and other data sources to gain insights and to test hypotheses about the nature of human and social behavior and interaction. It will also present a framework that will allow the student to evaluate the influence of different error sources on the quality of data.
Sampling is an applied statistics methods course, but differs from most statistics courses because it is concerned almost exclusively with the design of data collection. Little of the analysis of collected data will be discussed in the course. The course will concentrate on problems of applying sampling methods to human populations, since sampling human populations poses a number of particular problems not found in sampling of other types of units. The principles of sample selection, though, can be applied to many other types of populations.
This course examines how to embed experiments in surveys. It covers both the design of survey experiments and the analysis of the results.
This course introduces students to the stages of questionnaire development. The course reviews the scientific literature on questionnaire construction, the experimental literature on question effects, and the psychological literature on information processing. It will also discuss the diverse challenges posed by self versus proxy reporting, and special attention is paid to the relationship between mode of administration and questionnaire design. Students will also get a hand-on experience in developing their own questionnaire.
The course will address methods to combine data on given entities (people, households, firms etc.) that are stored in different data sources. By showing the strengths of these methods and by providing numerous practical examples the course will demonstrate the various benefits of record linkage. The participants will also learn about potential pitfalls record linkage projects may face.
The course introduces the students to the fundamental concepts of web surveys and online panels. The course is organized in three main sections which follow the way a proper web survey is organized: pre-fielding, fielding, and post fielding.
Data is omnipresent in the contemporary world coming in different shapes and sized: from survey data to found data. In order to make use of such data through analysis it is necessary first to import and clean it. This is often one of the most time consuming and difficult parts of data analysis. In this course you will learn both the conceptual steps needed in preparing data for analysis as well as the practical skills to do this. The course will cover all the essential skills needed to prepare data be it survey data, administrative data or found data.
This course introduces students to the basics of Python and SQL for data analysis. Students will explore real publicly-available datasets, using the data analysis tools in Python to create summaries and generate visualizations. Students will learn the basics of database management and organization, as well as learn how to code in SQL and work with PostgreSQL databases. By the end of the class, students should understand how to read in data from CSV files or from the internet and be comfortable using either SQL or Python to aggregate, summarize, describe, and visualize these datasets.
This course will provide a detailed introduction to multiple imputation, a convenient strategy for dealing with (item) nonresponse in surveys. We will motivate the concept and illustrate why multiple imputation should generally be preferred over single imputation methods. The main focus of the course will be on strategies to generate (multiple) imputations and how to deal with common problems when applying the methods for large scale surveys. We will also discuss various options for assessing the quality of the imputations. All concepts will be demonstrated using software illustrations in R.
The main focus of this course lies on the introduction to statistical models and estimators beyond linear regression useful to social and economic scientists. It provides an overview of generalized linear models (GLM) that encompass non-normal response distributions to model functions of the mean. GLMs thus relate the expected mean E(Y) of the dependent variable to the predictor variables via a specific link function. This link function permits the expected mean to be non-linearly related to the predictor variables. Examples for GLMs are the logistic regression, regressions for ordinal data, or regression models for count data. GLMs are generally estimated by use of maximum likelihood estimation. The course thus not only introduces GLMs but starts with an introduction to the principle of maximum likelihood estimation. A good understanding of the classical linear regression model is a prerequisite and required for the course.
This course provides a brief overview of the basics of probability and statistics. Students will review basic probability concepts and probability distributions, the Central Limit Theorem and hypothesis testing, and linear and logistic regression. Throughout this course, students should develop and reinforce proper statistical intuition. This includes knowing how to identify a sample and a population and applying appropriate statistical methods such as hypothesis testing, as well being able to identify different types of data and using the proper methods for each type of data. By the end of the course, students should have a strong foundation in statistics with which they can start their graduate coursework.
This course is a statistical methods class combining hands-on applications and general review of the theory for survey weighting.
Analysis of Complex Sample Data covers the following topics: the development and handling of selection and other compensatory weights for survey data analysis; the effects of stratification and clustering on survey estimation and inference; alternative variance estimation procedures for estimated survey statistics; methods and computer software that take into account the effects of complex sample designs on survey estimation and inference; and methods for handling missing data, including weighting adjustment.
This course investigates the foundations of Natural Language Processing (NLP) as tool for analyzing natural language texts in the social sciences, thus providing an alternative to traditional ways of data generation through surveys. The course introduces general use cases for NLP, provides a guide to standard operations on text as well as their implementation in the Python-based Natural Language Toolkit (NLTK) and introduces the text mining functionalities of the WEKA Machine Learning workbench.
The theory part of the course worth one credit can be supplemented by an optional project part worth another credit point.
This course investigates the practical application of Natural Language Processing (NLP) for analyzing textual data with the goal to answer questions of the social sciences. More specifically, participants of the previous theoretical class will now have the opportunity to implement their own practical research project. Under the guidance of the instructor, the participants will define their own research question; develop a suitable methodology to address this question; conduct and discuss experiments based on the selected methods; and synthesize the results to answer their own-defined research question.
Missing data are a common problem which can lead to biased results if the missingness is not taken into account at the analysis stage. Imputation is often suggested as a strategy to deal with item nonresponse allowing the analyst to use standard complete data methods after the imputation. However, several misconceptions about the aims and goals (isn't imputation making up data?) of imputation make some users skeptical about the approach. In this course we will illustrate why thinking about the missing data is important and clarify which goals a useful imputation method should try to achieve (and which not).
Surveys reflect the opinions or facts researchers are after only partly – the other part will be measurement error, which can seriously bias analyses of interest. To remove such biases it is essential to estimate the extent of measurement error in survey variables, which is precisely the goal of statistical measurement error modeling. In this course, we will discuss how measurement error can be defined, how its presence can be detected using specialized data collection designs and models, and how to perform error-corrected statistical analyses of substantive interest.
The amount of data generated as a by-product in society is growing fast including data from satellites, sensors, transactions, social media and smartphones, just to name a few. Such data are often referred to as "big data", and can be used to create value in different areas such as health and crime prevention, commerce and fraud detection. Big Data are often used for prediction and classification tasks. Both of which can be tackled with machine learning techniques. In this course we explore how Big Data concepts, processes and methods can be used within the context of Survey Research. Throughout this course we will illustrate key concepts using specific survey research examples including tailored survey designs and nonresponse adjustments and evaluation.
Social scientists and survey researchers are confronted with an increasing number of new data sources such as apps and sensors that often result in (para)data structures that are difficult to handle with traditional modeling methods. At the same time, advances in the field of machine learning (ML) have created an array of flexible methods and tools that can be used to tackle a variety of modeling problems. Against this background, this course discusses advanced ML concepts such as cross validation, class imbalance, Boosting and Stacking as well as key approaches for facilitating model tuning and performing feature selection. In this course we also introduce additional machine learning methods including Support Vector Machines, Extra-Trees and LASSO among others. The course aims to illustrate these concepts, methods and approaches from a social science perspective. Furthermore, the course covers techniques for extracting patterns from unstructured data as well as interpreting and presenting results from machine learning algorithms. Code examples will be provided using the statistical programming language R.
We will discuss the theoretical and empirical properties of the two basic variance estimation strategies, namely Taylor series (linear) approximation and replication methods (including BRR, jackknife, and bootstrap) as they apply to several types of complex sample designs. We will study both descriptive statistics, such as means, and analytic statistics, such as linear and logistic regression. We will contrast model-based and design-based inference, the latter used as the standard in this course. Students will learn to use at least one survey software package with real survey data.
The course will acquaint the students with the origins and basic principles of privacy law mainly in Europe. Furthermore, it will contrast the European privacy foundations with the U.S. approach. At the core of this course stands the new European General Data Protection Regulation (GDPR) and its applicability to specific cases and basic principles. Moreover, the course will cover current challenges to the existing privacy paradigms by big data and big data analytics.
This course will provide a gentle introduction to statistical disclosure control with a focus on generating synthetic data for maintaining the confidentiality of the survey respondents. The first part of the course will introduce several traditional approaches for data protection that are widely used at statistical agencies. Some limitations of these approaches will also be discussed. The second part of the course will introduce synthetic data as a possible alternative. This part of the course will discuss different approaches to generating synthetic datasets in detail. Possible modeling strategies and analytical validity evaluations will be assessed and potential measures to quantify the remaining risk of disclosure will be presented. To provide the participants with hands on experience, all steps will be illustrated using simulated and real data examples in R.
Data visualization is one of the most powerful tools to explore, understand and communicate patterns in quantitative information. At the same time, good data visualization is a surprisingly difficult task and demands three quite different skills: substantive knowledge, statistical skill, and artistic sense. The course is intended to introduce participants to key principles of analytic design and useful visualization techniques for the exploration and presentation of univariate and multivariate data. This course is highly applied in nature and emphasizes the practical aspects of data visualization in the social sciences. Students will learn how to evaluate data visualizations based on principles of analytic design, how to construct compelling visualizations using the free statistics software R, and how to explore and present their data with visual methods.
In addition to the courses in the tracks, you can choose from more courses as electives according to your interests and available time, for example:
The course will review a variety of modes and methods of data collection used in surveys. It focuses on the impact modes of data collection have on the quality of survey data, including measurement error properties, levels of nonresponse, and coverage error. Methods of data collection will focus on advances in computer assisted methodology and comparisons among various methods (e.g. telephone versus face to face, paper versus web versus computer-assisted interviews, interviewer administered versus self-administered). The statistical and social science literature on interviewer effects will also be examined, including literature related to the training and evaluation of interviewers. With respect to nonresponse, we will review the literature on the reduction of nonresponse and the impact of nonresponse on estimation.
This course introduces the concepts of usability and usability testing and why they are needed for survey research. The course provides a theoretical model for understanding the respondent-survey interaction and then provides practical methods for incorporating iterative user-centered design and testing into the survey development process. The course provides techniques and examples for designing, planning, conducting and analyzing usability studies on web or mobile surveys.
The short course provides a condensed overview of web technologies and techniques to collect data from the web in an automated way. To this end, students will use the statistical software R. The course introduces fundamental parts of web architecture and data transmission on the web. Furthermore, students will learn how to scrape content from static and dynamic web pages and connect to APIs from popular web services. Finally, practical and ethical issues of web data collection are discussed.
This course focuses on design and implementation considerations for different phases of the survey lifecycle when conducting surveys internationally or outside of one’s home country. Overview and considerations related to ten topics are discussed: Total Survey Error framework, project stakeholders, triple constraints, bids and contracts, sampling and sample management, questionnaire and instrument design, translation and adaptation, pretesting and cognitive interviews, interviewers and data collection, and interviewer monitoring.
There is a growing demand to produce reliable estimates of various socio-economic and health characteristics at both national and sub-national levels. However, data availability at the sub-national (small area) level from a survey is often limited by cost and thus analysts must make the best possible use of all available information. The course will begin with a history of small-area estimation and different uses of small-area statistics in both public and private sectors. This course will provide an introduction to the main concepts and issues in small estimation and describes various approaches for estimating different small area parameters. Topics include standard design-based methods, various traditional indirect methods and the state-of-the-art small-area estimation methods that use both Bayesian and empirical best prediction methods.
This course is a statistical methods class combining hands-on applications and general review of the theory behind different approaches to survey sampling.