This section is designed to be used as a reference, rather than to be read as a narrative.
Mass collaboration blends ideas from citizen science, crowdsourcing, and collective intelligence. Citizen science usually means involving “citizens” (i.e., non-scientists) in the scientific process (Crain, Cooper, and Dickinson 2014). Crowdsourcing usually means taking a problem usually solved within an organization and instead outsourcing it to a crowd (Howe 2009). Collective intelligence usually means groups of individuals acting collectively in ways that seem intelligent (Malone and Bernstein 2015). Nielsen (2012) is a wonderful book-length introduction into the power of mass collaboration for scientific research.
There are many types of mass collaboration that don’t fit neatly into the three categories that I proposed, and I think three deserve special attention because they might be useful in social research at some point. One example is prediction markets, where participants buy and trade contracts that are redeemable based on outcomes that occur in the world (Wolfers and Zitzewitz 2004; Arrow et al. 2008). Predicting markets are often used by firms and governments for forecasting, and predicting markets have also been used by social researchers to predict the replicability of published studies in psychology (Dreber et al. 2015).
A second example that does not fit well into my categorization scheme is the PolyMath project, where researchers collaborated using blogs and wikis to prove new math theorems (Gowers and Nielsen 2009; Cranshaw and Kittur 2011; Nielsen 2012; Kloumann et al. 2016). The PolyMath project is in some ways similar to the Netflix Prize, but in the PolyMath project participants more actively built on the partial solutions of others.
A third example that does not fit well into my categorization scheme is time-dependent mobilizations such as the Defense Advanced Research Projects Agency (DARPA) Network Challenge (i.e., the Red Balloon Challenge). For more on these time sensitive mobilizations see Pickard et al. (2011), Tang et al. (2011), and Rutherford et al. (2013).
The term “human computation” comes out of work done by computer scientists, and understanding the context behind this research will improve your ability to pick out problems that might be amenable to it. For certain tasks, computers are incredibly powerful with capabilities far exceeding even expert humans. For example, in chess, computers can beat even the best grand masters. But—and this is less well appreciated by social scientists—for other tasks, computers are actually much worse than people. In other words, right now you are better than even the most sophisticated computer at certain tasks involving processing of images, video, audio, and text. Thus—as was illustrated by a wonderful XKCD cartoon—there are tasks that are easy for computers and hard for people, but there are also tasks that are hard for computers and easy for people (Figure 5.13). Computer scientists working on these hard-for-computers-easy-for-human tasks, therefore, realized that they could include humans in their computational process. Here’s how Luis von Ahn (2005) described human computation when he first coined the term in his dissertation: “a paradigm for utilizing human processing power to solve problems that computers cannot yet solve.”
By this definition FoldIt—which I described in the section on open calls—could be considered a human computation project. However, I choose to categorize FoldIt as an open call because it requires specialized skills and it takes the best solution contributed rather than using a split-apply-combine strategy.
For an excellent book length treatment of human computation, in the most general sense of the term, see Law and Ahn (2011). Chapter 3 of Law and Ahn (2011) has an interesting discussion of more complex combine steps than the ones in this chapter.
The term “split-apply-combine” was used by Wickham (2011) to describe a strategy for statistical computing, but it perfectly captures the process of many human computation projects. The split-apply-combine strategy is similar to the MapReduce framework developed at Google (Dean and Ghemawat 2004; Dean and Ghemawat 2008).
Two clever human computation projects that I did not have space to discuss are the ESP Game (Ahn and Dabbish 2004) and reCAPTCHA (Ahn et al. 2008). Both of these projects found creative ways to motivate participants to provide labels on images. However, both of these projects also raised ethical questions because, unlike Galaxy Zoo, participants in the ESP Game and reCAPTCHA did not know how their data was being used (Lung 2012; Zittrain 2008).
Inspired by the ESP Game, many researchers have attempted to develop others “games with a purpose” (Ahn and Dabbish 2008) (i.e., “human-based computation games” (Pe-Than, Goh, and Lee 2015)) that can be used to solve a variety of other problems. What these “games with a purpose” have in common is that they try to make the tasks involved in human computation enjoyable. Thus, while the ESP Game shares the same split-apply-combine structure with Galaxy Zoo, it differs in how participants are motivated—fun vs. desire to help science.
My description of Galaxy Zoo draws on Nielsen (2012), Adams (2012), Clery (2011), and Hand (2010), and my presentation of the research goals of Galaxy Zoo was simplified. For more on the history of galaxy classification in astronomy and how Galaxy Zoo continues this tradition, see Masters (2012) and Marshall, Lintott, and Fletcher (2015). Building on Galaxy Zoo, the researchers completed Galaxy Zoo 2 which collected more than 60 million more complex morphological classifications from volunteers (Masters et al. 2011). Further, they branched out into problems outside of galaxy morphology including exploring the surface of the moon, searching for planets, and transcribing old documents. Currently, all their projects are collected at www.zooniverse.org (Cox et al. 2015). One of the projects—Snapshot Serengeti—provides evidence that Galaxy Zoo-type image classification projects can also be done for environmental research (Swanson et al. 2016).
For researchers planning to use a micro-task labor market (e.g., Amazon Mechanical Turk) for a human computation project, Chandler, Paolacci, and Mueller (2013) and Wang, Ipeirotis, and Provost (2015) offer good advice on task design and other related issues.
Researchers interested in creating what I’ve called second generation human computation systems (e.g., systems that use human labels to train a machine learning model) might be interested in Shamir et al. (2014) (for an example using audio) and Cheng and Bernstein (2015). Also, these projects can be done with open calls, whereby researchers compete to create machine learning models with the greatest predictive performance. For example, the Galaxy Zoo team ran an open call and found a new approach that outperformed the one developed in Banerji et al. (2010); see Dieleman, Willett, and Dambre (2015) for details.
Open calls are not new. In fact, one of the most well-known open calls dates back to 1714 when Britain’s Parliament created The Longitude Prize for anyone that could develop a way to determine the longitude of a ship at sea. The problem stumped many of the greatest scientists of the days, including Isaac Newton, and the winning solution was eventually submitted by a clockmaker from the countryside who approached the problem differently from scientists who were focused on a solution that would somehow involve astronomy (Sobel 1996). As this example illustrates, one reason that open calls are thought to work so well is that they provide access to people with different perspectives and skills (Boudreau and Lakhani 2013). See Hong and Page (2004) and Page (2008) for more on the value of diversity in problem solving.
Each of the open call cases in the chapter requires a bit of further explanation for why it belongs in this category. First, one way that I distinguish between human computation and open call projects is whether the output is an average of all the solutions (human computation) or the best solution (open call). The Netflix Prize is somewhat tricky in this regard because the best solution turned out to be a sophisticated average of individual solutions, an approached called an ensemble solution (Bell, Koren, and Volinsky 2010; Feuerverger, He, and Khatri 2012). From the perspective of Netflix, however, all they had to do was pick the best solution.
Second, by some definitions of human computation (e.g., Von Ahn (2005)), FoldIt should be considered a human computation project. However, I choose to categorize FoldIt as an open call because it requires specialized skills and it takes the best solution contributed, rather than using a split-apply-combine strategy.
Finally, one could argue that Peer-to-Patent is an example of distributed data collection. I choose to include it as an open call because it has a contest-like structure and only the best contributions are used (whereas with distributed data collection, the idea of good and bad contributions is less clear).
For more on the Netflix Prize, see Bennett and Lanning (2007), Thompson (2008), Bell, Koren, and Volinsky (2010), and Feuerverger, He, and Khatri (2012). For more on FoldIt see, Cooper et al. (2010), Andersen et al. (2012), and Khatib et al. (2011); my description of FoldIt draws on descriptions in Nielsen (2012), Bohannon (2009), and Hand (2010). For more on Peer-to-Patent, see Noveck (2006), Bestor and Hamp (2010), Ledford (2007), and Noveck (2009).
Similar to the results of Glaeser et al. (2016), Mayer-Schönberger and Cukier (2013), Chapter 10 reports large gains in the productivity of housing inspectors in New York City when inspections are guided by predictive models. In New York City, these predictive models were built by city employees, but in other cases, one could imagine that they could be created or improved with open calls (e.g., Glaeser et al. (2016)). However, one major concern with predictive models being used to allocate resources is that the models have the potential to reinforce existing biases. Many researchers already know “garbage in, garbage out”, and with predictive models it can be “bias in, bias out.” See Barocas and Selbst (2016) and O’Neil (2016) for more on the dangers of predictive models built with biased training data.
One problem that might prevent governments from using open contests is that it requires data release, which could lead to privacy violations. For more about privacy and data release in open calls see Narayanan, Huey, and Felten (2016) and the discussion in Chapter 6.
My description of eBird draws on descriptions in Bhattacharjee (2005) and Robbins (2013). For more on how researchers use statistical models to analyze eBird data see Hurlbert and Liang (2012) and Fink et al. (2010). For more on the history of citizen science in ornothology, see Greenwood (2007).
For more on the Malawi Journals Project, see Watkins and Swidler (2009) and Kaler, Watkins, and Angotti (2015). And for more on a related project in South Africa, see Angotti and Sennott (2015). For more examples of research using data from the Malawi Journals Project see Kaler (2004) and Angotti et al. (2014).
My approach to offering design advice was inductive, based on the examples of successful and failed mass collaboration projects that I’ve heard about. There is also a stream of research attempts to apply more general social psychological theories to designing online communities that is relevant to the design of mass collaboration projects, see, for example, Kraut et al. (2012).
Regarding motivating participants, it is actually quite tricky to figure out exactly why people participate in mass collaboration projects (Nov, Arazy, and Anderson 2011; Cooper et al. 2010, Raddick et al. (2013); Tuite et al. 2011; Preist, Massung, and Coyle 2014). If you plan to motivate participants with payment on a micro-task labor market (e.g., Amazon Mechanical Turk) Kittur et al. (2013) offers some advice.
Regarding enabling surprise, for more examples of unexpected discoveries coming out of Zoouniverse projects, see Marshall, Lintott, and Fletcher (2015).
Regarding being ethical, some good general introductions to the issues involved are Gilbert (2015), Salehi et al. (2015), Schmidt (2013), Williamson (2016), Resnik, Elliott, and Miller (2015), and Zittrain (2008). For issues specifically related to legal issues with crowd employees, see Felstiner (2011). O’Connor (2013) addresses questions about ethical oversight of research when the roles of researchers and participants blur. For issues related to sharing data while protecting participats in citizen science projects, see Bowser et al. (2014). Both Purdam (2014) and Windt and Humphreys (2016) have some discussion about the ethical issues in distributed data collection. Finally, most projects acknowledge contributions but do not give authorship credit to participants. In Foldit, the Foldit players are often listed as an author (Cooper et al. 2010; Khatib et al. 2011). In other open call projects, the winning contributor can often write a paper describing their solutions (e.g., Bell, Koren, and Volinsky (2010) and Dieleman, Willett, and Dambre (2015)). In the Galaxy Zoo family of projects, extremely active and important contributors are sometimes invited to be co-authors on papers. For example, Ivan Terentev and Tim Matorny, two Radio Galaxy Zoo participants from Russia, were co-authors on one of the papers that arose from that project (Banfield et al. 2016; Galaxy Zoo 2016).