Research ethics has traditionally also included topics such as scientific fraud and allocation of credit. These are discussed in greater detail in On Being a Scientist by Institute of Medicine and National Academy of Sciences and National Academy of Engineering (2009).
This chapter is heavily influenced by the situation in the United States. For more on the ethical review procedures in other countries, see chapters 6-9 of Desposato (2016b). For an argument that the biomedical ethical principles that have influenced this chapter are excessively American, see Holm (1995). For a further historical review of Institutional Review Boards in the United States, see Stark (2012). The journal PS: Political Science and Politics held a professional symposium on the relationship between political scientists and IRBs; see Martinez-Ebers (2016) for a summary.
The Belmont Report and subsequent regulations in the United States tend to make a distinction between research and practice. I have not made such a distinction in this chapter because I think the ethical principles and frameworks apply to both settings. For more on this distinction and the problems it introduces, see Beauchamp and Saghai (2012), M. N. Meyer (2015), boyd (2016), and Metcalf and Crawford (2016).
For more on research oversight at Facebook, see Jackman and Kanerva (2016). For ideas about research oversight at companies and NGOs, see Calo (2013), Polonetsky, Tene, and Jerome (2015), and Tene and Polonetsky (2016).
In relation to the use of mobile phone data to help address the 2014 Ebola outbreak in West Africa (Wesolowski et al. 2014; McDonald 2016), for more about the privacy risks of mobile phone data, see Mayer, Mutchler, and Mitchell (2016). For examples of earlier crisis-related research using mobile phone data, see Bengtsson et al. (2011) and Lu, Bengtsson, and Holme (2012), and for more on the ethics of crisis related research, see (???).
Many people have written about Emotional Contagion. The journal Research Ethics devoted their entire issue in January 2016 to discussing the experiment; see Hunter and Evans (2016) for an overview. The Proceedings of the National Academics of Science published two pieces about the experiment: Kahn, Vayena, and Mastroianni (2014) and Fiske and Hauser (2014). Other pieces about the experiment include: Puschmann and Bozdag (2014), Meyer (2014), Grimmelmann (2015), M. N. Meyer (2015), (???), Kleinsman and Buckley (2015), Shaw (2015), and (???).
In terms of mass surveillance, broad overviews are provided in Mayer-Schönberger (2009) and Marx (2016). For a concrete example of the changing costs of surveillance, Bankston and Soltani (2013) estimate that tracking a criminal suspect using mobile phones is about 50 times cheaper than using physical surveillance. See also Ajunwa, Crawford, and Schultz (2016) for a discussion of surveillance at work. Bell and Gemmell (2009) provide a more optimistic perspective on self-surveillance.
In addition to being able to track observable behavior that is public or partially public (e.g., Tastes, Ties, and Time), researchers can increasingly infer things that many participants consider to be private. For example, Michal Kosinski and colleagues (2013) showed that they could infer sensitive information about people, such as sexual orientation and use of addictive substances, from seemingly ordinary digital trace data (Facebook Likes). This might sound magical, but the approach Kosinski and colleagues used—which combined digital traces, surveys, and supervised learning—is actually something that I’ve already told you about. Recall that in chapter 3 (Asking questions). I told you how Joshua Blumenstock and colleagues (2015) combined survey data with mobile phone data to estimate poverty in Rwanda. This exact same approach, which can be used to efficiently measure poverty in a developing country, can also be used for potentially privacy-violating inferences.
For more on the possible unintended secondary uses of health data, see O’Doherty et al. (2016). In addition to the potential for unintended secondary uses, the creation of even an incomplete master database could have a chilling effect on social and political life if people became unwilling to read certain materials or discuss certain topics; see Schauer (1978) and Penney (2016).
In situations with overlapping rules, researcher sometimes engage in “regulatory shopping” (Grimmelmann 2015; Nickerson and Hyde 2016). In particular, some researchers who wish to avoid IRB oversight can form partnerships with researchers who are not covered by IRBs (e.g., people at companies or NGOs), and have those colleagues collect and de-identify data. Then, the IRB-covered researcher can analyze this de-identified data without IRB oversight because the research is no longer considered “human subjects research,” at least according to some interpretations of current rules. This kind of IRB evasion is probably not consistent with a principles-based approach to research ethics.
In 2011, an effort began to update the Common Rule, and this process was finally completed in 2017 (???). For more on these efforts to update the Common Rule, see Evans (2013),National Research Council (2014), Hudson and Collins (2015), and Metcalf (2016).
The classic principles-based approach to biomedical ethics is that of Beauchamp and Childress (2012). They propose that four main principles should guide biomedical ethics: Respect for Autonomy, Nonmaleficence, Beneficence, and Justice. The principle of nonmaleficence urges one to abstain from causing harm to other people. This concept is deeply connected to the Hippocratic idea of “Do not harm.” In research ethics, this principle is often combined with the principle of Beneficence, but see chapter 5 of@beauchamp_principles_2012 for more on the distinction between the two. For a criticism that these principles are overly American, see Holm (1995). For more on balancing when the principles conflict, see Gillon (2015).
The four principles in this chapter have also been proposed to guide ethical oversight for research being done at companies and NGOs (Polonetsky, Tene, and Jerome 2015) through bodies called “Consumer Subject Review Boards” (CSRBs) (Calo 2013).
In addition to respecting autonomy, the Belmont Report also acknowledges that not every human is capable of true self-determination. For example, children, people suffering from illness, or people living in situations of severely restricted liberty may not be able to act as fully autonomous individuals, and these people are therefore subject to extra protection.
Applying the principle of Respect for Persons in the digital age can be challenging. For example, in digital-age research, it can be difficult to provide extra protections for people with diminished capability of self-determination because researchers often know very little about their participants. Further, informed consent in digital-age social research is a huge challenge. In some cases, truly informed consent can suffer from the transparency paradox (Nissenbaum 2011), where information and comprehension are in conflict. Roughly, if researchers provide full information about the nature of the data collection, data analysis, and data security practices, it will be difficult for many participants to comprehend. But if researchers provide comprehensible information, it may lack important technical details. In medical research in the analog age—the dominate setting considered by the Belmont Report—one could imagine a doctor talking individually with each participant to help resolve the transparency paradox. In online studies involving thousands or millions of people, such a face-to-face approach is impossible. A second problem with consent in the digital age is that in some studies, such as analyses of massive data repositories, it would be impractical to obtain informed consent from all participants. I discuss these and other questions about informed consent in more detail in section 6.6.1. Despite these difficulties, however, we should remember that informed consent is neither necessary nor sufficient for Respect for Persons.
For more on medical research before informed consent, see Miller (2014). For a book-length treatment of informed consent, see Manson and O’Neill (2007). See also the suggested readings about informed consent below.
Harms to context are the harms that research can cause not to specific people but to social settings. This concept is a bit abstract, but I’ll illustrate with a classic example: the Wichita Jury Study (Vaughan 1967; Katz, Capron, and Glass 1972, chap. 2)—also sometimes called the Chicago Jury Project (Cornwell 2010). In this study, researchers from the University of Chicago, as part of a larger study of social aspects of the legal system, secretly recorded six jury deliberations in Wichita, Kansas. The judges and lawyers in the cases had approved the recordings, and there was strict oversight of the process. However, the jurors were unaware that recordings were occurring. Once the study was discovered, there was public outrage. The Justice Department began an investigation of the study, and the researchers were called to testify in front of Congress. Ultimately, Congress passed a new law that makes it illegal to secretly record jury deliberation.
The concern of critics of the Wichita Jury Study was not the risk of harm to the participants; rather, it was the risk of harms to the context of jury deliberation. That is, people thought that if jury members did not believe that they were having discussions in a safe and protected space, it would be harder for jury deliberations to proceed in the future. In addition to jury deliberation, there are other specific social contexts that society provides with extra protection, such as attorney-client relationships and psychological care (MacCarthy 2015).
The risk of harms to context and the disruption of social systems also arise in some field experiments in political science (Desposato 2016b). For an example of a more context-sensitive cost-benefit calculation for a field experiment in political science, see Zimmerman (2016).
Compensation for participants has been discussed in a number of settings related to digital-age research. Lanier (2014) proposes paying participants for digital traces that they generate. Bederson and Quinn (2011) discuss payments in online labor markets. Finally, Desposato (2016a) proposes paying participants in field experiments. He points out that even if participants cannot be paid directly, a donation could be made to a group working on their behalf. For example, in Encore, the researchers could have made a donation to a group working to support access to the Internet.
Terms-of-service agreements should have less weight than contracts negotiated between equal parties and than laws created by legitimate governments. Situations where researchers have violated terms-of-service agreements in the past have generally involved using automated queries to audit the behavior of companies (much like field experiments to measure discrimination). For additional discussions, see Vaccaro et al. (2015), Bruckman (2016a), and Bruckman (2016b). For an example of empirical research that discusses terms of service, see Soeller et al. (2016). For more on the possible legal problems researchers face if they violate terms of service, see Sandvig and Karahalios (2016).
Obviously, an enormous amount has been written about consequentialism and deontology. For an example of how these ethical frameworks, and others, can be used to reason about digital-age research, see Zevenbergen et al. (2015). For an example of how they can be applied to field experiments in development economics, see Baele (2013).
For more on audit studies of discrimination, see Pager (2007) and Riach and Rich (2004). Not only do these studies not have informed consent, they also involve deception without debriefing.
Both Desposato (2016a) and Humphreys (2015) offer advice about field experiments without consent.
Sommers and Miller (2013) review many arguments in favor of not debriefing participants after deception, and argue that researchers should forgo debriefing
“under a very narrow set of circumstances, namely, in field research in which debriefing poses considerable practical barriers but researchers would have no qualms about debriefing if they could. Researchers should not be permitted to forgo debriefing in order to preserve a naive participant pool, shield themselves from participant anger, or protect participants from harm.”
Others argue that in some situations if debriefing causes more harm than good, it should be avoided (Finn and Jakobsson 2007). Debriefing is a case where some researchers prioritize Respect for Persons over Beneficence, whereas some researchers do the opposite. One possible solution would be to find ways to make debriefing a learning experience for the participants. That is, rather than thinking of debriefing as something that can cause harm, perhaps debriefing can also be something that benefits participants. For an example of this kind of educational debriefing, see Jagatic et al. (2007). Psychologists have developed techniques for debriefing (D. S. Holmes 1976a, 1976b; Mills 1976; Baumrind 1985; Oczak and Niedźwieńska 2007), and some of these may be usefully applied to digital-age research. Humphreys (2015) offers interesting thoughts about deferred consent, which is closely related to the debriefing strategy that I described.
The idea of asking a sample of participants for their consent is related to what Humphreys (2015) calls inferred consent.
A further idea related to informed consent that has been proposed is to build a panel of people who agree to be in online experiments (Crawford 2014). Some have argued that this panel would be a nonrandom sample of people. But chapter 3 (Asking questions) shows that these problems are potentially addressable using post-stratification. Also, consent to be on the panel could cover a variety of experiments. In other words, participants might not need to consent to each experiment individually, a concept called broad consent (Sheehan 2011). For more on the differences between one-time consent and consent for each study, as well as a possible hybrid, see Hutton and Henderson (2015).
Far from unique, the Netflix Prize illustrates an important technical property of datasets that contain detailed information about people, and thus offers important lessons about the possibility of “anonymization” of modern social datasets. Files with many pieces of information about each person are likely to be sparse, in the sense defined formally in Narayanan and Shmatikov (2008). That is, for each record, there are no records that are the same, and in fact there are no records that are very similar: each person is far away from their nearest neighbor in the dataset. One can imagine that the Netflix data might be sparse because with about 20,000 movies on a five-star scale, there are about \(6^{20,000}\) possible values that each person could have (6 because, in addition to 1 to 5 stars, someone might have not rated the movie at all). This number is so large, it is hard to even comprehend.
Sparsity has two main implications. First, it means that attempting to “anonymize” the dataset based on random perturbation will likely fail. That is, even if Netflix were to randomly adjust some of the ratings (which they did), this would not be sufficient because the perturbed record is still the closest possible record to the information that the attacker has. Second, the sparsity means that re-identification is possible even if the attacker has imperfect or impartial knowledge. For example, in the Netflix data, let’s imagine the attacker knows your ratings for two movies and the dates you made those ratings \(\pm\) 3 days; just that information alone is sufficient to uniquely identify 68% of people in the Netflix data. If the attacker knows eight movies that you have rated \(\pm\) 14 days, then even if two of these known ratings are completely wrong, 99% of records can be uniquely identified in the dataset. In other words, sparsity is a fundamental problem for efforts to “anonymize” data, which is unfortunate because most modern social datasets are sparse. For more on “anonymization” of sparse data, see Narayanan and Shmatikov (2008).
Telephone meta-data also might appear to be “anonymous” and not sensitive, but that is not the case. Telephone meta-data are identifiable and sensitive (Mayer, Mutchler, and Mitchell 2016; Landau 2016).
In figure 6.6, I sketched out a trade-off between risk to participants and benefits to society from data release. For a comparison between restricted access approaches (e.g., a walled garden) and restricted data approaches (e.g., some form of “anonymization”) see Reiter and Kinney (2011). For a proposed categorization system of risk levels of data, see Sweeney, Crosas, and Bar-Sinai (2015). For a more a general discussion of data sharing, see Yakowitz (2011).
For more detailed analysis of this trade-off between the risk and utility of data, see Brickell and Shmatikov (2008), Ohm (2010), Reiter (2012), Wu (2013), and Goroff (2015). To see this trade-off applied to real data from massively open online courses (MOOCs), see Daries et al. (2014) and Angiuli, Blitzstein, and Waldo (2015).
Differential privacy also offers an alternative approach that can combine both low risk to participants and high benefit to society; see Dwork and Roth (2014) and Narayanan, Huey, and Felten (2016).
For more on the concept of personally identifying information (PII), which is central to many of the rules about research ethics, see Narayanan and Shmatikov (2010) and Schwartz and Solove (2011). For more on all data being potentially sensitive, see Ohm (2015).
In this section, I’ve portrayed the linkage of different datasets as something that can lead to informational risk. However, it can also create new opportunities for research, as argued in Currie (2013).
For more on the five safes, see Desai, Ritchie, and Welpton (2016). For an example of how outputs can be identifying, see Brownstein, Cassa, and Mandl (2006), which shows how maps of disease prevalence can be identifying. Dwork et al. (2017) also consider attacks against aggregate data, such as statistics about how many individuals have a certain disease.
Questions about data use and data release also raise questions about data ownership. For more, on data ownership, see Evans (2011) and Pentland (2012).
Warren and Brandeis (1890) is a landmark legal article about privacy and is most associated with the idea that privacy is a right to be left alone. Book-length treatments of privacy that I would recommend include Solove (2010) and Nissenbaum (2010).
For a review of empirical research on how people think about privacy, see Acquisti, Brandimarte, and Loewenstein (2015). Phelan, Lampe, and Resnick (2016) propose a dual-system theory—that people sometimes focus on intuitive concerns and sometimes focus on considered concerns—to explain how people can make apparently contradictory statements about privacy. For more on the idea of privacy in online settings such as Twitter, see Neuhaus and Webmoor (2012).
The journal Science published a special section titled “The End of Privacy,” which addresses the issues of privacy and informational risk from a variety of different perspectives; for a summary, see Enserink and Chin (2015). Calo (2011) offers a framework for thinking about the harms that come from privacy violations. An early example of concerns about privacy at the very beginnings of the digital age is Packard (1964).
One challenge when trying to apply the minimal risk standard is that it is not clear whose daily life is to be used for benchmarking (National Research Council 2014). For example, homeless people have higher levels of discomfort in their daily lives. But that does not imply that it is ethically permissible to expose homeless people to higher-risk research. For this reason, there seems to be a growing consensus that minimal risk should be benchmarked against a general-population standard, not a specific-population standard. While I generally agree with the idea of a general-population standard, I think that for large online platforms such as Facebook, a specific-population standard is reasonable. Thus, when considering Emotional Contagion, I think that it is reasonable to benchmark against everyday risk on Facebook. A specific-population standard in this case is much easier to evaluate and is unlikely to conflict with the principle of Justice, which seeks to prevent the burdens of research failing unfairly on disadvantaged groups (e.g., prisoners and orphans).
Other scholars have also called for more papers to include ethical appendices (Schultze and Mason 2012; Kosinski et al. 2015; Partridge and Allman 2016). King and Sands (2015) also offers practical tips. Zook and colleagues (2017) offer “ten simple rules for responsible big data research.”