2.4.1 Counting things

Simple counting can be interesting if you combine a good question with good data.

Although it is couched in sophisticated sounding language, lots of social research is really just counting things. In the age of big data, researchers can count more than ever, but that does not automatically mean that research should be focused on counting more and more stuff. Instead, if we are going to do good research with big data, we need to ask: what things are worth counting? This may seem like an entirely subjective matter, but there are some general patterns.

Often students motivate their counting research by saying: I’m going to count something that no one has ever counted before. For example, a student might say, many people have studied migrants and many people have studied twins, but nobody has studied migrant twins. Motivation by absence does not usually lead to good research. Of course, there might be good reasons to study migrant twins, but the fact that they have not been studied before does not mean that they should be studied now. No one has ever counted the number of threads on the carpet in my office, but that does not automatically imply that this would be a good research project. Motivation by absence is kind of like saying: look, there’s a hole over there, and I’m going to work very hard to fill it up. But, not every hole needs to be filled.

Instead of motivating by absence, I think that counting leads to good research in two situations, when the research is interesting or important (or ideally both). For example, measuring the rate of unemployment is important because it is in indicator of the economy that drives policy decisions. Generally, people have a pretty good sense of what is important. So, in the rest of this section, I’m going to provide three examples where counting is interesting. In each case, the researchers were not counting haphazardly, rather they were counting in very particular settings that revealed important insights into more general ideas about how social systems work. In other words, a lot of what makes these particular counting exercises interesting is not in the data itself, it comes from these more general ideas.

Below I’ll present three examples on: 1) the working behavior of taxi drivers in New York (Section 2.4.1.1), 2) friendship formation by students (Section 2.4.1.2) and 3) social media censorship behavior of the Chinese government (Section 2.4.1.3). What these examples share is that they all show that counting big data can be used to test theoretical predictions. In some cases, big data sources enable you to do this counting relatively directly (as in the case of New York Taxis). In other cases, researchers will need to deal with incompleteness by merging data together and operationalizing theoretical constructs (as in the case of friendship formation); and in some cases researchers will need to collect their own observational data (as in the case of social media censorship). As I hope these examples show, for researchers who are able to ask interesting questions, big data holds great promise.