Once you have motivated a lot of people to work on a real scientific problem, you will discover that your participants will be heterogeneous in two main ways: they will vary in their skill and they will vary in their level of effort. The first reaction of many social researchers is to exclude low quality participants and then attempt to collect a fixed amount of information from everyone left. This is the wrong way to design a mass collaboration project.
First, there is no reason to exclude low skilled participants. In open calls, low skilled participants cause no problems; their contributions don’t hurt anyone and they don’t require any time to evaluate. In human computation and distributed data collection projects, on the other hand, the best form of quality control comes through redundancy, not a high bar for participation. In fact, rather than excluding low skill participants, a better approach is to help them make better contributions, much as the researchers at eBird have done.
Second, there is no reason to collect a fixed amount of information from each participant. Participation in many mass collaboration projects is incredibly unequal (Sauermann and Franzoni 2015) with a small number of people contributing a lot—sometimes called the fat head—and a lot of people contributing a little—sometimes called the long tail. If you don’t collect information from the fat head and the long tail, you are leaving tons of information uncollected. For example, if Wikipedia accepted 10 and only 10 edits per editor, it would lose about 95% of edits (Salganik and Levy 2015). Thus, with mass collaboration projects, it is best to leverage heterogeneity rather than try to eliminate it.