Mass collaboration blends ideas from citizen science, crowdsourcing, and collective intelligence. Citizen science usually means involving “citizens” (i.e., nonscientists) in the scientific process; for more, see Crain, Cooper, and Dickinson (2014) and Bonney et al. (2014). Crowdsourcing usually means taking a problem ordinarily solved within an organization and instead outsourcing it to a crowd; for more, see Howe (2009). Collective intelligence usually means groups of individuals acting collectively in ways that seem intelligent; for more, see Malone and Bernstein (2015). Nielsen (2012) is a book-length introduction to the power of mass collaboration for scientific research.
There are many types of mass collaboration that don’t fit neatly into the three categories that I have proposed, and I think three of these deserve special attention because they might be useful in social research. One example is prediction markets, where participants buy and trade contracts that are redeemable based on outcomes that occur in the world. Predicting markets are often used by firms and governments for forecasting, and they have also been used by social researchers to predict the replicability of published studies in psychology (Dreber et al. 2015). For an overview of prediction markets, see Wolfers and Zitzewitz (2004) and Arrow et al. (2008).
A second example that does not fit well into my categorization scheme is the PolyMath project, where researchers collaborated using blogs and wikis to prove new math theorems. The PolyMath project is in some ways similar to the Netflix Prize, but in this project participants more actively built on the partial solutions of others. For more on the PolyMath project, see Gowers and Nielsen (2009), Cranshaw and Kittur (2011), Nielsen (2012), and Kloumann et al. (2016).
A third example that does not fit well into my categorization scheme is that of time-dependent mobilizations such as the Defense Advanced Research Projects Agency (DARPA) Network Challenge (i.e., the Red Balloon Challenge). For more on these time-sensitive mobilizations see Pickard et al. (2011), Tang et al. (2011), and Rutherford et al. (2013).
The term “human computation” comes out of work done by computer scientists, and understanding the context behind this research will improve your ability to pick out problems that might be suitable for it. For certain tasks, computers are incredibly powerful, with capabilities far exceeding those of even expert humans. For example, in chess, computers can beat even the best grandmasters. But—and this is less well appreciated by social scientists—for other tasks, computers are actually much worse than people. In other words, right now you are better than even the most sophisticated computer at certain tasks involving processing of images, video, audio, and text. Computer scientists working on these hard-for-computers-easy-for-human tasks therefore realized that they could include humans in their computational process. Here’s how Luis von Ahn (2005) described human computation when he first coined the term in his dissertation: “a paradigm for utilizing human processing power to solve problems that computers cannot yet solve.” For a book-length treatment of human computation, in the most general sense of the term, see Law and Ahn (2011).
According to the definition proposed in Ahn (2005) Foldit—which I described in the section on open calls—could be considered a human computation project. However, I choose to categorize Foldit as an open call because it requires specialized skills (although not necessarily formal training) and it takes the best solution contributed, rather than using a split-apply-combine strategy.
The term “split-apply-combine” was used by Wickham (2011) to describe a strategy for statistical computing, but it perfectly captures the process of many human computation projects. The split-apply-combine strategy is similar to the MapReduce framework developed at Google; for more on MapReduce, see Dean and Ghemawat (2004) and Dean and Ghemawat (2008). For more on other distributed computing architectures, see Vo and Silvia (2016). Chapter 3 of Law and Ahn (2011) has a discussion of projects with more complex combine steps than those in this chapter.
In the human computation projects that I have discussed in the chapter, participants were aware of what was happening. Some other projects, however, seek to capture “work” that is already happening (similar to eBird) and without participant awareness. See, for example, the ESP Game (Ahn and Dabbish 2004) and reCAPTCHA (Ahn et al. 2008). However, both of these projects also raise ethical questions because participants did not know how their data were being used (Zittrain 2008; Lung 2012).
Inspired by the ESP Game, many researchers have attempted to develop other “games with a purpose” (Ahn and Dabbish 2008) (i.e., “human-based computation games” (Pe-Than, Goh, and Lee 2015)) that can be used to solve a variety of other problems. What these “games with a purpose” have in common is that they try to make the tasks involved in human computation enjoyable. Thus, while the ESP Game shares the same split-apply-combine structure with Galaxy Zoo, it differs in how participants are motivated—fun versus desire to help science. For more on games with a purpose, see Ahn and Dabbish (2008).
My description of Galaxy Zoo draws on Nielsen (2012), Adams (2012), Clery (2011), and Hand (2010), and my presentation of the research goals of Galaxy Zoo was simplified. For more on the history of galaxy classification in astronomy and how Galaxy Zoo continues this tradition, see Masters (2012) and Marshall, Lintott, and Fletcher (2015). Building on Galaxy Zoo, the researchers completed Galaxy Zoo 2 which collected more than 60 million more complex morphological classifications from volunteers (Masters et al. 2011). Further, they branched out into problems outside of galaxy morphology, including exploring the surface of the Moon, searching for planets, and transcribing old documents. Currently, all their projects are collected at the Zooniverse website (Cox et al. 2015). One of the projects—Snapshot Serengeti—provides evidence that Galaxy Zoo-type image classification projects can also be done for environmental research (Swanson et al. 2016).
For researchers planning to use a microtask labor market (e.g., Amazon Mechanical Turk) for a human computation project, Chandler, Paolacci, and Mueller (2013) and J. Wang, Ipeirotis, and Provost (2015) offer good advice on task design and other related issues. Porter, Verdery, and Gaddis (2016) offer examples and advice focused specifically on uses of microtask labor markets for what they call “data augmentation.” The line between data augmentation and data collection is somewhat blurry. For more on collecting and using labels for supervised learning for text, see Grimmer and Stewart (2013).
Researchers interested in creating what I’ve called computer-assisted human computation systems (e.g., systems that use human labels to train a machine learning model) might be interested in Shamir et al. (2014) (for an example using audio) and Cheng and Bernstein (2015). Also, the machine learning models in these projects can be solicited with open calls, whereby researchers compete to create machine learning models with the greatest predictive performance. For example, the Galaxy Zoo team ran an open call and found a new approach that outperformed the one developed in Banerji et al. (2010); see Dieleman, Willett, and Dambre (2015) for details.
Open calls are not new. In fact, one of the most well-known open calls dates back to 1714 when Britain’s Parliament created The Longitude Prize for anyone that could develop a way to determine the longitude of a ship at sea. The problem stumped many of the greatest scientists of the days, including Isaac Newton, and the winning solution was eventually submitted by John Harrison, a clockmaker from the countryside who approached the problem differently from scientists who were focused on a solution that would somehow involve astronomy; for more information, see Sobel (1996). As this example illustrates, one reason that open calls are thought to work so well is that they provide access to people with different perspectives and skills (Boudreau and Lakhani 2013). See Hong and Page (2004) and Page (2008) for more on the value of diversity in problem solving.
Each of the open call cases in the chapter requires a bit of further explanation for why it belongs in this category. First, one way that I distinguish between human computation and open call projects is whether the output is an average of all the solutions (human computation) or the best solution (open call). The Netflix Prize is somewhat tricky in this regard because the best solution turned out to be a sophisticated average of individual solutions, an approach called an ensemble solution (Bell, Koren, and Volinsky 2010; Feuerverger, He, and Khatri 2012). From the perspective of Netflix, however, all they had to do was pick the best solution. For more on the Netflix Prize, see Bennett and Lanning (2007), Thompson (2008), Bell, Koren, and Volinsky (2010), and Feuerverger, He, and Khatri (2012).
Second, by some definitions of human computation (e.g., Ahn (2005)), Foldit should be considered a human computation project. However, I choose to categorize it as an open call because it requires specialized skills (although not necessarily specialized training) and it takes the best solution, rather than using a split-apply-combine strategy. For more on Foldit see, Cooper et al. (2010), Khatib et al. (2011), and Andersen et al. (2012); my description of Foldit draws on descriptions in Bohannon (2009),Hand (2010), and Nielsen (2012).
Finally, one could argue that Peer-to-Patent is an example of distributed data collection. I choose to include it as an open call because it has a contest-like structure and only the best contributions are used, whereas with distributed data collection, the idea of good and bad contributions is less clear. For more on Peer-to-Patent, see Noveck (2006), Ledford (2007), Noveck (2009), and Bestor and Hamp (2010).
In terms of using open calls in social research, results similar to those of Glaeser et al. (2016), are reported in chapter 10 of Mayer-Schönberger and Cukier (2013) whereby New York City was able to use predictive modeling to produce large gains in the productivity of housing inspectors. In New York City, these predictive models were built by city employees, but in other cases, one could imagine that they could be created or improved with open calls (e.g., Glaeser et al. (2016)). However, one major concern with predictive models being used to allocate resources is that these models have the potential to reinforce existing biases. Many researchers already know “garbage in, garbage out,” and with predictive models it can be “bias in, bias out.” See Barocas and Selbst (2016) and O’Neil (2016) for more on the dangers of predictive models built with biased training data.
One problem that might prevent governments from using open contests is that this requires data release, which could lead to privacy violations. For more about privacy and data release in open calls, see Narayanan, Huey, and Felten (2016) and the discussion in chapter 6.
For more on the differences and similarities between prediction and explanation, see Breiman (2001), Shmueli (2010), Watts (2014), and Kleinberg et al. (2015). For more on the role of prediction in social research, see Athey (2017), Cederman and Weidmann (2017), Hofman, Sharma, and Watts (2017), (???), and Yarkoni and Westfall (2017).
For a review of open call projects in biology, including design advice, see Saez-Rodriguez et al. (2016).
My description of eBird draws on descriptions in Bhattacharjee (2005), Robbins (2013), and Sullivan et al. (2014). For more on how researchers use statistical models to analyze eBird data see Fink et al. (2010) and Hurlbert and Liang (2012). For more on estimating the skill of eBird participants, see Kelling, Johnston, et al. (2015). For more on the history of citizen science in ornithology, see Greenwood (2007).
For more on the Malawi Journals Project, see Watkins and Swidler (2009) and Kaler, Watkins, and Angotti (2015). For more on a related project in South Africa, see Angotti and Sennott (2015). For more examples of research using data from the Malawi Journals Project see Kaler (2004) and Angotti et al. (2014).
My approach to offering design advice was inductive, based on the examples of successful and failed mass collaboration projects that I’ve heard about. There has also been a stream of research attempts to apply more general social psychological theories to designing online communities that is relevant to the design of mass collaboration projects, see, for example, Kraut et al. (2012).
Regarding motivating participants, it is actually quite tricky to figure out exactly why people participate in mass collaboration projects (Cooper et al. 2010; Nov, Arazy, and Anderson 2011; Tuite et al. 2011; Raddick et al. 2013; Preist, Massung, and Coyle 2014). If you plan to motivate participants with payment on a microtask labor market (e.g., Amazon Mechanical Turk), Kittur et al. (2013) offers some advice.
Regarding enabling surprise, for more examples of unexpected discoveries coming out of Zooiverse projects, see Marshall, Lintott, and Fletcher (2015).
Regarding being ethical, some good general introductions to the issues involved are Gilbert (2015), Salehi et al. (2015), Schmidt (2013), Williamson (2016), Resnik, Elliott, and Miller (2015), and Zittrain (2008). For issues specifically related to legal issues with crowd employees, see Felstiner (2011). O’Connor (2013) addresses questions about ethical oversight of research when the roles of researchers and participants blur. For issues related to sharing data while protecting participants in citizen science projects, see Bowser et al. (2014). Both Purdam (2014) and Windt and Humphreys (2016) have some discussion about the ethical issues in distributed data collection. Finally, most projects acknowledge contributions but do not give authorship credit to participants. In Foldit, the players are often listed as an author (Cooper et al. 2010; Khatib et al. 2011). In other open call projects, the winning contributor can often write a paper describing their solutions (e.g., Bell, Koren, and Volinsky (2010) and Dieleman, Willett, and Dambre (2015)).