This section is designed to be used as a reference, rather than to be read as a narrative.
Research ethics has traditionally also included topics such as scientific fraud and allocation of credit. These topics are discussed in greater detail in Engineering (2009).
This chapter is strongly shaped by the situation in the United States. For more on the ethical review procedures in other countries, see Chapters 6, 7, 8, and 9 of Desposato (2016b). For an argument that the biomedical ethical principles that have influenced this chapter are excessively American, see Holm (1995). For more historical review of Institutional Review Boards in the US, see Stark (2012).
The Belmont Report and subsequent regulations in the US have made a distinction between research and practice. This distinction has been criticized subsequently (Beauchamp and Saghai 2012; boyd 2016; Metcalf and Crawford 2016; Meyer 2015). I do not make this distinction in this chapter because I think the ethical principles and frameworks apply to both settings. For more on research oversight at Facebook, see Jackman and Kanerva (2016). For a proposal for research oversight at companies and NGOs, see Polonetsky, Tene, and Jerome (2015) and Tene and Polonetsky (2016).
For more on the case of the Ebola outbreak in 2014, see McDonald (2016), and for more about the privacy risks of mobile phone data, see Mayer, Mutchler, and Mitchell (2016). For an example of crisis-related research using mobile phone data, see Bengtsson et al. (2011) and Lu, Bengtsson, and Holme (2012).
Many people have written about Emotional Contagion. The journal Research Ethics devoted their entire issue in January 2016 discussing the experiment; see Hunter and Evans (2016) for an overview. The Proceedings of the National Academics of Science published two pieces about the experiment: Kahn, Vayena, and Mastroianni (2014) and Fiske and Hauser (2014). Other pieces about the experiment include: Puschmann and Bozdag (2014); Meyer (2014); Grimmelmann (2015); Meyer (2015); Selinger and Hartzog (2015); Kleinsman and Buckley (2015); Shaw (2015); Flick (2015).
For more on Encore, see Jones and Feamster (2015).
In terms of mass surveillance, broad overviews are provided in Mayer-Schönberger (2009) and Marx (2016). For a concrete example of the changing costs of surveillance, Bankston and Soltani (2013) estimates that tracking a criminal suspect using cell phones is about 50 times cheaper than using physical surveillance. Bell and Gemmell (2009) provides a more optimistic perspective on self-surveillance. In addition to being able to track observable behavior that is public or partially public (e.g., Taste, Ties, and Time), researchers can increasingly infer things that many participants consider to be private. For example, Michal Kosinski and colleagues showed that they could infer sensitive information about people, such as sexual orientation and use of addictive substances from seemingly ordinary digital trace data (Facebook Likes) (Kosinski, Stillwell, and Graepel 2013). This might sound magical, but the approach Kosinski and colleagues used—which combines digital traces, surveys, and supervised learning—is actually something that I’ve already told you about. Recall that in Chapter 3 (Asking questions) I told you how Josh Blumenstock and colleagues (2015) combined survey data with mobile phone data to estimate poverty in Rwanda. This exact same approach, which can be used to efficiently measure poverty in a developing country, can also be used for potentially privacy violating inferences.
Inconsistent laws and norms can lead to research that does not respect the wishes of participants, and it can lead to “regulatory shopping” by researchers (Grimmelmann 2015; Nickerson and Hyde 2016). In particular, some researchers who wish to avoid IRB oversight have partners who are not covered by IRBs (e.g., people at companies or NGOs) collect and de-identify data. Then, the researchers can analyze this de-identified data without IRB oversight, at least according to some interpretations of current rules. This kind of IRB evasion appears to be inconsistent with a principles-based approach.
For more on the inconsistent and heterogeneous ideas that people have about health data, see Fiore-Gartland and Neff (2015). For more on the problem that heterogeneity creates for research ethics decisions see Meyer (2013).
One difference between analog age and digital age research is that in digital age research interaction with participants is more distant. These interactions often occur through an intermediary such as a company, and there is typically a large physical—and social—distance between researchers and participants. This distant interaction makes some things that are easy in analog age research difficult in digital age research, such as screening out participants who require extra protection, detecting adverse events, and remediating harm if it occurs. For example, let’s contrast Emotional Contagion with a hypothetical lab experiment on the same topic. In the lab experiment, researchers could screen out anyone who arrives at the lab showing obvious signs of emotional distress. Further, if the lab experiment created an adverse event, researchers would see it, provide services to remediate the harm, and then make adjustments to the experimental protocol to prevent future harms. The distant nature of interaction in the actual Emotional Contagion experiment makes each of these simple and sensible steps extremely difficult. Also, I suspect that the distance between researchers and participants makes researchers less sensitive to the concerns of their participants.
Other sources of inconsistent norms and laws. Some of this inconsistency comes from the fact that this research is happening all over the world. For example, Encore involved people from all over the world, and therefore it might be subject to the data protection and privacy laws of many different countries. What if the norms governing third-party web requests (what Encore was doing) are different in Germany, the United States, Kenya, and China? What if the norms are not even consistent within a single country? A second source of inconsistency comes from collaborations between researchers at universities and companies; for example, Emotional Contagion was a collaboration between a data scientist at Facebook and a professor and graduate student at Cornell. At Facebook running large experiments is routine and, at that time, did not require any third-party ethical review. At Cornell the norms and rules are quite different; virtually all experiments must be reviewed by the Cornell IRB. So, which set of rules should govern Emotional Contagion—Facebook’s or Cornell’s?
For more on efforts to revise the Common Rule, see Evans (2013),Council (2014), Metcalf (2016), and Hudson and Collins (2015).
The classic principles-based approach to biomedical ethics is Beauchamp and Childress (2012). They propose that four main principles should guide biomedical ethics: Respect for Autonomy, Nonmaleficence, Beneficence, and Justice. The principle of nonmaleficence urges one to abstain from causing harm to other people. This concept is deeply connected to Hippocratic idea of “Do no harm.” In research ethics, this principle is often combined with the principle of Beneficence, but see Beauchamp and Childress (2012) (Chapter 5) for more on the distinction between the two. For a criticism that these principles are overly American, see Holm (1995). For more on balancing when the principles conflict, see Gillon (2015).
The four principles in this chapter have also been proposed to guide ethical oversight for research happening at companies and NGOs (Polonetsky, Tene, and Jerome 2015) through bodies called “Consumer Subject Review Boards” (CSRBs) (Calo 2013).
In addition to respecting autonomy, the Belmont Report also acknowledges that not every human is capable of true self-determination. For example, children, people suffering from illness, or people living in situations of severely restricted liberty may not be able to act as fully autonomous individuals, and these people are, therefore, subject to extra protection.
Applying the principle of Respect for Persons in the digital age can be challenging. For example, in digital age research, it can be difficult to provide extra protections for people with diminished capability of self-determination because researchers often know very little about their participants. Further, informed consent in digital age social research is a huge challenge. In some cases, truly informed consent can suffer from the transparency paradox (Nissenbaum 2011), where information and comprehension are in conflict. Roughly, if researchers provide full information about the nature of the data collection, data analysis, and data security practices, it will be difficult for many participants to comprehend. But, if researchers provide comprehensible information, it may lack important technical information. In medical research in the analog age—the dominate setting considered by the Belmont Report—one could imagine a doctor talking individually with each participant to help resolve the transparency paradox. In online studies involving thousands or millions of people, such a face-to-face approach is impossible. A second problem with consent in the digital age is that in some studies, such as analysis of massive data repositories, it would be impractical to obtain informed consent from all participants. I discuss these and other questions about informed consent in more detail in Section 6.6.1. Despite these difficulties, however, we should remember that informed consent is neither necessary nor sufficient for Respect for Persons.
For more on medical research before informed consent, see Miller (2014). For a book-length treatment of informed consent, see Manson and O’Neill (2007). See also the suggested readings about informed consent below.
Harms to context is the harm that research can cause not to specific people but to social settings. This concept is a bit abstract, but I’ll illustrate it with two examples: one analog and one digital.
A classic example of harms to context comes from the Wichita Jury Study [Vaughan (1967); Katz, Capron, and Glass (1972); Ch 2.]—also sometimes called the Chicago Jury Project (Cornwell 2010). In this study researchers from the University of Chicago, as part of a larger study about social aspects of the legal system, secretly recorded six jury deliberations in Wichita, Kansas. The judges and lawyers in the cases had approved the recordings, and there was strict oversight of the process. However, the jurors were unaware that recordings were occurring. Once the study was discovered, there was public outrage. The Justice Department began an investigation of the study, and the researchers were called to testify in front of Congress. Ultimately, Congress passed a new law that makes it illegal to secretly record jury deliberation.
The concern of critics of the Wichita Jury Study was not harm to participants; rather, it was harms to the context of jury deliberation. That is, people believed that if jury members did not believe that they were having discussions in a safe and protected space, it would be harder for jury deliberations to proceed in the future. In addition to jury deliberation, there are other specific social contexts that society provides with extra protection such as attorney-client relationships and psychological care (MacCarthy 2015).
The risk of harms to context and the disruption of social systems also comes up in some field experiments in Political Science (Desposato 2016b). For an example of a more context-sensitive cost-benefit calculation for a field experiment in Political Science, see Zimmerman (2016).
Compensation for participants has been discussed in a number of settings related to digital age research. Lanier (2014) proposed paying participants for digital traces they generate. Bederson and Quinn (2011) discusses payments in online labor markets. Finally, Desposato (2016a) proposes paying participants in field experiments. He points out that even if participants cannot be paid directly, a donation could be made to a group working on their behalf. For example, in Encore the researchers could have made a donation to a group working to support access to the Internet.
Terms-of-service agreements should have less weight than contracts negotiated between equal parties and laws created by legitimate governments. Situations where researchers have violated terms-of-service agreements in the past generally involve using automated queries to audit the behavior of companies (much like field experiments to measure discrimination). For additional discussion see Vaccaro et al. (2015), Bruckman (2016a), Bruckman (2016b). For an example of empirical research that discusses terms of service, see Soeller et al. (2016). For more on the possible legal problems researchers face if they violate terms of service see Sandvig and Karahalios (2016).
Obviously, enormous amounts have been written about consequentialism and deontology. For an example of how these ethical frameworks, and others, can be used to reason about digital age research, see Zevenbergen et al. (2015). For an example of how these ethical frameworks can be applied to field experiments in develop economics, see Baele (2013).
For more on audit studies of discrimination, see Pager (2007) and Riach and Rich (2004). Not only do these studies not have informed consent, they also involve deception without debriefing.
Both Desposato (2016a) and Humphreys (2015) offer advice about field experiments without consent.
Sommers and Miller (2013) reviews many arguments in favor of not debriefing participants after deception, and argues that researchers should forgo “debriefing under a very narrow set of circumstances, namely, in field research in which debriefing poses considerable practical barriers but researchers would have no qualms about debriefing if they could. Researchers should not be permitted to forgo debriefing in order to preserve a naive participant pool, shield themselves from participant anger, or protect participants from harm.” Others argue that if debriefing causes more harm than good it should be avoided. Debriefing is a case where some researchers prioritize Respect for Persons over Beneficence, and some researchers do the opposite. One possible solution would be to find ways to make debriefing a learning experience for the participants. That is, rather than thinking of debriefing as something that can cause harm, perhaps debriefing can also be something that benefits participants. For an example of this kind of education debriefing, see Jagatic et al. (2007) on debriefing students after a social phishing experiment. Psychologists have developed techniques for debriefing (D. S. Holmes 1976a; D. S. Holmes 1976b; Mills 1976; Baumrind 1985; Oczak and Niedźwieńska 2007) and some of these may be usefully applied to digital age research. Humphreys (2015) offers interesting thoughts about deferred consent, which is closely related to the debriefing strategy that I described.
The idea of asking a sample of participants for their consent is related to what Humphreys (2015) calls inferred consent.
A further idea that has been proposed related to informed consent is to build a panel of people who agree to be in online experiments (Crawford 2014). Some have argued that this panel would be a non-random sample of people. But, Chapter 3 (Asking questions) shows that these problems are potentially addressable using post-stratification and sample matching. Also, consent to be on the panel could cover a variety of experiments. In other words, participants might not need to consent to each experiment individually, a concept called broad consent (Sheehan 2011).
Far from unique, the Netflix Prize illustrates an important technical property of datasets that contain detailed information about people, and thus offers important lessons about the possibility of “anonymization” of modern social datasets. Files with many pieces of information about each person are likely to be sparse, in the sense defined formally in Narayanan and Shmatikov (2008). That is, for each record there are no records that are the same, and in fact there are no records that are very similar: each person is far away from their nearest neighbor in the dataset. One can imagine that the Netflix data might be sparse because with about 20,000 movies on a 5 star scale, there are about \(6^{20,000}\) possible values that each person could have (6 because in addition to one to 5 stars, someone might have not rated the movie at all). This number is so large, it is hard to even comprehend.
Sparsity has two main implications. First, it means that attempting to “anonymize” the dataset based on random perturbation will likely fail. That is, even if Netflix were to randomly adjust some of the ratings (which they did), this would not be sufficient because the perturbed record is still the closest possible record to the information that the attacker has. Second, the sparsity means that de-anonymization is possible even if the attacker has imperfect or impartial knowledge. For example, in the Netflix data, let’s imagine the attacker knows your ratings for two movies and the dates you made those ratings +/- 3 days; just that information alone is sufficient to uniquely identify 68% of people in the Netflix data. If the attackers knows 8 movies that you have rated +/- 14 days, then even if two of these known ratings are completely wrong, 99% of records can be uniquely identified in the dataset. In other words, sparsity is a fundamental problem for efforts to “anonymize” data, which is unfortunate because most modern social dataset are sparse.
Telephone metadata also might appear to be “anonymous” and not sensitive, but that is not the case. Telephone metadata is identifiable and sensitive (Mayer, Mutchler, and Mitchell 2016; Landau 2016).
In Figure 6.6, I sketched out a trade-off between risk to participants and benefits to research from data release. For a comparison between restricted access approaches (e.g., a walled garden) and restricted data approaches (e.g., some form of anonymization) see Reiter and Kinney (2011). For a proposed categorization system of risk levels of data, see Sweeney, Crosas, and Bar-Sinai (2015). Finally, for a more a general discussion of data sharing, see Yakowitz (2011).
For more detailed analysis of this trade-off between the risk and utility of data, see Brickell and Shmatikov (2008), Ohm (2010), Wu (2013), Reiter (2012), and Goroff (2015). To see this trade-off applied to real data from massively open online courses (MOOCs), see Daries et al. (2014) and Angiuli, Blitzstein, and Waldo (2015).
Differential privacy also offers an alternative approach that can combine both high benefit to society and low risk to participants, see Dwork and Roth (2014) and Narayanan, Huey, and Felten (2016).
For more on the concept of personally identifying information (PII), which is central to many of the rules about research ethics, see Narayanan and Shmatikov (2010) and Schwartz and Solove (2011). For more on all data being potentially sensitive, see Ohm (2015).
In this section, I’ve portrayed the linkage of different datasets as something that can lead to informational risk. However, it can also create new opportunities for research, as argued in Currie (2013).
For more on the five safes, see Desai, Ritchie, and Welpton (2016). For an example of how outputs can be identifying, see Brownstein, Cassa, and Mandl (2006), which shows how maps of disease prevalence can be identifying. Dwork et al. (2017) also considers attacks against aggregate data, such as statistics about how many individuals have a certain disease.
Warren and Brandeis (1890) is a landmark legal article about privacy, and the article is most associated with the idea that privacy is a right to be left alone. More recently book length treatments of privacy that I would recommend include Solove (2010) and Nissenbaum (2010).
For a review of empirical research on how people think about privacy, see Acquisti, Brandimarte, and Loewenstein (2015). The journal Science published a special issue titled “The End of Privacy”, which addresses the issues of privacy and information risk from a variety of different perspectives; for a summary see Enserink and Chin (2015). Calo (2011) offers a framework for thinking about the harms that come from privacy violations. An early example of concerns about privacy in the very beginnings of the digital age is Packard (1964).
One challenge when trying to apply the minimal risk standard is that it is not clear whose daily life is to be used for benchmarking (Council 2014). For example, homeless people have higher levels of discomfort in their daily lives. But, that does not imply that it is ethically permissible to expose homeless people to higher risk research. For this reason, there seems to be a growing consensus that minimal risk should be benchmarked against a general population standard, not a specific population standard. While I generally agree with the idea of a general population standard, I think that for large online platforms such as Facebook, a specific population standard is reasonable. That is, when considering Emotional Contagion, I think that it is reasonable to benchmark against everyday risk on Facebook. A specific population standard in this case is much easier to evaluate and is unlikely to conflict with the principle of Justice, which seeks to prevent the burdens of research failing unfairly on disadvantaged groups (e.g., prisoners and orphans).
Other scholars have also called for more papers to include ethical appendices (Schultze and Mason 2012; Kosinski et al. 2015). King and Sands (2015) also offers practical tips.