4.5.1 Use existing environments

You can run experiments inside existing environments, often without any coding or partnership.

Logistically, the easiest way to do a digital experiment is to overlay your experiment on top of an existing environment. Such experiments can be run at a reasonably large scale and don’t require partnership with a company or extensive software development.

For example, Jennifer Doleac and Luke Stein (2013) took advantage of an online marketplace similar to Craigslist in order to run an experiment that measured racial discrimination. They advertised thousands of iPods, and by systematically varying the characteristics of the seller, they were able to study the effect of race on economic transactions. Further, they used the scale of their experiment to estimate when the effect was bigger (heterogeneity of treatment effects) and to offer some ideas about why the effect might occur (mechanisms).

Doleac and Stein’s iPod advertisements varied along three main dimensions. First, the researchers varied the characteristics of the seller, which was signaled by the hand photographed holding the iPod [white, black, white with tattoo] (figure 4.13). Second, they varied the asking price [$90, $110, $130]. Third, they varied the quality of the ad text [high-quality and low-quality (e.g., cApitalization errors and spelin errors)]. Thus, the authors had a 3 × 3 × 2 design which was deployed across more than 300 local markets, ranging from towns (e.g., Kokomo, Indiana and North Platte, Nebraska) to mega-cities (e.g., New York and Los Angeles).

Figure 4.13: Hands used in the experiment of Doleac and Stein (2013). iPods were sold by sellers with different characteristics to measure discrimination in an online marketplace. Reproduced by permission from Doleac and Stein (2013), figure 1.

Figure 4.13: Hands used in the experiment of Doleac and Stein (2013). iPods were sold by sellers with different characteristics to measure discrimination in an online marketplace. Reproduced by permission from Doleac and Stein (2013), figure 1.

Averaged across all conditions, the outcomes were better for the white sellers than the black sellers, with the tattooed sellers having intermediate results. For example, the white sellers received more offers and had higher final sale prices. Beyond these average effects, Doleac and Stein estimated the heterogeneity of effects. For example, one prediction from earlier theory is that discrimination would be less in markets where there is more competition between buyers. Using the number of offers in that market as a measure of the amount of buyer competition, the researchers found that black sellers did indeed receive worse offers in markets with a low degree of competition. Further, by comparing outcomes for the ads with high-quality and low-quality text, Doleac and Stein found that ad quality did not impact the disadvantage faced by black and tattooed sellers. Finally, taking advantage of the fact that advertisements were placed in more than 300 markets, the authors found that black sellers were more disadvantaged in cities with high crime rates and high residential segregation. None of these results give us a precise understanding of exactly why black sellers had worse outcomes, but, when combined with the results of other studies, they can begin to inform theories about the causes of racial discrimination in different types of economic transactions.

Another example that shows the ability of researchers to conduct digital field experiments in existing systems is the research by Arnout van de Rijt and colleagues (2014) on the keys to success. In many aspects of life, seemingly similar people end up with very different outcomes. One possible explanation for this pattern is that small—and essentially random—advantages can lock in and grow over time, a process that researchers call cumulative advantage. In order to determine whether small initial successes lock in or fade away, van de Rijt and colleagues (2014) intervened in four different systems bestowing success on randomly selected participants, and then measured the subsequent impacts of this arbitrary success.

More specifically, van de Rijt and colleagues (1) pledged money to randomly selected projects on Kickstarter, a crowdfunding website; (2) positively rated randomly selected reviews on Epinions, a produce review website; (3) gave awards to randomly chosen contributors to Wikipedia; and (4) signed randomly selected petitions on change.org. They found very similar results across all four systems: in each case, participants who were randomly given some early success went on to have more subsequent success than their otherwise completely indistinguishable peers (figure 4.14). The fact that the same pattern appeared in many systems increases the external validity of these results because it reduces the chance that this pattern is an artifact of any particular system.

Figure 4.14: Long-term effects of randomly bestowed success in four different social systems. Arnout van de Rijt and colleagues (2014) (1) pledged money to randomly selected projects on Kickstarter, a crowdfunding website; (2) positively rated randomly selected reviews on Epinions, a produce review website; (3) gave awards to randomly chosen contributors to Wikipedia; and (4) signed randomly selected petitions on change.org. Adapted from Rijt et al. (2014), figure 2.

Figure 4.14: Long-term effects of randomly bestowed success in four different social systems. Arnout van de Rijt and colleagues (2014) (1) pledged money to randomly selected projects on Kickstarter, a crowdfunding website; (2) positively rated randomly selected reviews on Epinions, a produce review website; (3) gave awards to randomly chosen contributors to Wikipedia; and (4) signed randomly selected petitions on change.org. Adapted from Rijt et al. (2014), figure 2.

Together, these two examples show that researchers can conduct digital field experiments without the need to partner with companies or build complex digital systems. Further, table 4.2 provides even more examples that show the range of what is possible when researchers use the infrastructure of existing systems to deliver treatment and/or measure outcomes. These experiments are relatively cheap for researchers and they offer a high degree of realism. But they offer researchers limited control over the participants, treatments, and outcomes to be measured. Further, for experiments taking place in only one system, researchers need to be concerned that the effects could be driven by system-specific dynamics (e.g., the way that Kickstarter ranks projects or the way that change.org ranks petitions; for more information, see the discussion about algorithmic confounding in chapter 2). Finally, when researchers intervene in working systems, tricky ethical questions emerge about possible harm to participants, non-participants, and systems. We will consider these ethical question in more detail in chapter 6, and there is an excellent discussion of them in the appendix of van de Rijt et al. (2014). The trade-offs that come with working in an existing system are not ideal for every project, and for that reason some researchers build their own experimental system, as I’ll illustrate next.

Table 4.2: Examples of Experiments in Existing Systems
Topic References
Effect of barnstars on contributions to Wikipedia Restivo and Rijt (2012); Restivo and Rijt (2014); Rijt et al. (2014)
Effect of anti-harassment message on racist tweets Munger (2016)
Effect of auction method on sale price Lucking-Reiley (1999)
Effect of reputation on price in online auctions Resnick et al. (2006)
Effect of race of seller on sale of baseball cards on eBay Ayres, Banaji, and Jolls (2015)
Effect of race of seller on sale of iPods Doleac and Stein (2013)
Effect of race of guest on Airbnb rentals Edelman, Luca, and Svirsky (2016)
Effect of donations on the success of projects on Kickstarter Rijt et al. (2014)
Effect of race and ethnicity on housing rentals Hogan and Berry (2011)
Effect of positive rating on future ratings on Epinions Rijt et al. (2014)
Effect of signatures on the success of petitions Vaillant et al. (2015); Rijt et al. (2014); Rijt et al. (2016)