Let’s move beyond simple experiments. Three concepts are useful for rich experiments: validity, heterogeneity of treatment effects, and mechanisms.
Researchers who are new to experiments often focus on a very specific, narrow question: Does this treatment “work”? For example, does a phone call from a volunteer encourage someone to vote? Does changing a website button from blue to green increase the click-through rate? Unfortunately, loose phrasing about what “works” obscures the fact that narrowly focused experiments don’t really tell you whether a treatment “works” in a general sense. Rather, narrowly focused experiments answer a much more specific question: What is the average effect of this specific treatment with this specific implementation for this population of participants at this time? I’ll call experiments that focus on this narrow question simple experiments.
Simple experiments can provide valuable information, but they fail to answer many questions that are both important and interesting, such as whether there are some people for whom the treatment had a larger or smaller effect; whether there is another treatment that would be more effective; and whether this experiment relates to broader social theories.
In order to show the value of moving beyond simple experiments, let’s consider an analog field experiment by P. Wesley Schultz and colleagues on the relationship between social norms and energy consumption (Schultz et al. 2007). Schultz and colleagues hung doorhangers on 300 households in San Marcos, California, and these doorhangers delivered different messages designed to encourage energy conservation. Then, Schultz and colleagues measured the effect of these messages on electricity consumption, both after one week and after three weeks; see figure 4.3 for a more detailed description of the experimental design.
The experiment had two conditions. In the first, households received general energy-saving tips (e.g., use fans instead of air conditioners) and information about their energy usage compared with the average energy usage in their neighborhood. Schultz and colleagues called this the descriptive normative condition because the information about the energy use in the neighborhood provided information about typical behavior (i.e., a descriptive norm). When Schultz and colleagues looked at the resulting energy usage in this group, the treatment appeared to have no effect, in either the short or long term; in other words, the treatment didn’t seem to “work” (figure 4.4).
Fortunately, Schultz and colleagues did not settle for this simplistic analysis. Before the experiment began, they reasoned that heavy users of electricity—people above the mean—might reduce their consumption, and that light users of electricity—people below the mean—might actually increase their consumption. When they looked at the data, that’s exactly what they found (figure 4.4). Thus, what looked like a treatment that was having no effect was actually a treatment that had two offsetting effects. This counterproductive increase among the light users is an example of a boomerang effect, where a treatment can have the opposite effect from what was intended.
Simultaneous to the first condition, Schultz and colleagues also ran a second condition. The households in the second condition received the exact same treatment—general energy-saving tips and information about their household’s energy usage compared with the average for their neighborhood—with one tiny addition: for people with below-average consumption, the researchers added a :) and for people with above-average consumption they added a :(. These emoticons were designed to trigger what the researchers called injunctive norms. Injunctive norms refer to perceptions of what is commonly approved (and disapproved), whereas descriptive norms refer to perceptions of what is commonly done (Reno, Cialdini, and Kallgren 1993).
By adding this one tiny emoticon, the researchers dramatically reduced the boomerang effect (figure 4.4). Thus, by making this one simple change—a change that was motivated by an abstract social psychological theory (Cialdini, Kallgren, and Reno 1991)—the researchers were able to turn a program that didn’t seem to work into one that worked, and, simultaneously, they were able to contribute to the general understanding of how social norms affect human behavior.
At this point, however, you might notice that something is a bit different about this experiment. In particular, the experiment of Schultz and colleagues doesn’t really have a control group in the same way that randomized controlled experiments do. A comparison between this design and that of Restivo and van de Rijt illustrates the differences between two major experimental designs. In between-subjects designs, such as that of Restivo and van de Rijt, there is a treatment group and a control group. In within-subjects designs, on the other hand, the behavior of participants is compared before and after the treatment (Greenwald 1976; Charness, Gneezy, and Kuhn 2012). In a within-subject experiment it is as if each participant acts as her own control group. The strength of between-subjects designs is that they provide protection against confounders (as I described earlier), while the strength of within-subjects experiments is increased precision of estimates. Finally, to foreshadow an idea that will come later when I offer advice about designing digital experiments, a _mixed design_combines the improved precision of within-subjects designs and the protection against confounding of between-subjects designs (figure 4.5).
Overall, the design and results of the study by Schultz and colleagues (2007) show the value of moving beyond simple experiments. Fortunately, you don’t need to be a creative genius to design experiments like this. Social scientists have developed three concepts that will guide you toward richer experiments: (1) validity, (2) heterogeneity of treatment effects, and (3) mechanisms. That is, if you keep these three ideas in mind while you are designing your experiment, you will naturally create a more interesting and useful experiment. In order to illustrate these three concepts in action, I’ll describe a number of follow-up partially digital field experiments that built on the elegant design and exciting results of Schultz and colleagues (2007). As you will see, through more careful design, implementation, analysis, and interpretation, you too can move beyond simple experiments.