Total survey error = representation errors + measurement errors.
There are many kinds of errors that can creep into estimates from surveys, and since the 1940s researchers have worked to systematically organize, understand, and reduce these errors. An important result from all of that effort is the total survey error framework (Groves et al. 2009; Weisberg 2005). The main insight from the total survey error framework is that problems can be grouped into two main buckets: problems related to who you talk to (representation) and problems related to what you learn from those conversations (measurement). For example, you might be interested in estimating attitudes about online privacy among adults living in France. Making these estimates requires two quite different types of inference. First, from the answers that respondents give, you have to infer their attitudes about online privacy. Second, from the inferred attitudes among respondents, you must infer the attitudes in the population as a whole. The first type of inference is the domain of psychology and cognitive science; and the second type of inference is the domain of statistics. A perfect sampling scheme with bad survey questions will produce bad estimates, and a bad sampling scheme with perfect survey questions will also produce bad estimates. Good estimates requires sound approaches to measurement and representation. Given that background, next, I’ll review how survey researchers have thought about representation and measurement in the past. I expect that much of this material will be review to social scienitsts, but it may be new to some data scientists. Then, I’ll show you how those ideas guide digital age survey research.