eBird collects data on birds from birders; volunteers can provide geographic scale that no research team can match.
Birds are everywhere, and ornithologists would like to know where every bird is at every moment. Given such a perfect dataset, ornithologists could address many fundamental questions of their field. Of course, collecting this data is beyond the scope of any particular researcher. At the same time that ornithologists desire richer and more complete data, “birders”—people who go bird watching for fun—are constantly observing birds and documenting what they see. These two communities have a long history of collaborating, but now these collaborations have been transformed by the digital age. eBird is a distributed data collection project that solicits information from birders around the world, and it has already received over 260 million bird sightings from 250,000 participants (Kelling et al. 2015).
Prior to the launch of eBird, much of the data created by birders was unavailable to researchers:
“In thousands of closets around the world today lie countless notebooks, index cards, annotated checklists, and diaries. Those of us involved with birding institutions know well the frustration of hearing over and over again about ‘my late uncle’s bird records’ We know how valuable they could be. Sadly, we also know we can’t use them.” (Fitzpatrick et al. 2002)
Rather than having this valuable data sit unused, eBird enables birders to upload it to a centralized, digital database. Data uploaded to eBird contains six key fields: who, where, when, what species, how many, and effort. For non-birding readers, “effort” refers to the methods used while making observations. Data quality checks begin even before the data is uploaded. Birders trying to submit unusual reports—such as reports of very rare species, very high counts, or out of season reports—are flagged, and the website automatically requests additional information, such as photographs. After collecting this additional information, the flagged reports are sent to one of hundreds of volunteer regional experts for further review. After investigation by the regional expert—including possible additional correspondence with the birder—the flagged reports are either discarded as unreliable or they are entered into eBird database (Kelling et al. 2012). This database of screened observations is then made available to anyone in the world with an Internet connection, and so far, almost 100 peer-reviewed publications have used it (Bonney et al. 2014). eBird clearly shows that volunteer birders are able to collect data that is useful for real ornithology research.
One of the beauties of eBird is that it captures “work” that is already happening—in this case, birding. This feature enables the project to achieve tremendous scale. However, the “work” done by birders does not exactly match the data needed by ornithologists. For example, in eBird, data collection is determined by the location of birders not the location of birds. This means that, for example, most observations tend to occur close to roads (Kelling et al. 2012; Kelling et al. 2015). In addition to this unequal distribution of effort over space, the actual observations made by birders are not always ideal. For example, some birders only upload information about species that they consider interesting rather than uploading information on all species that they observed.
eBird researchers have two main solutions to these data quality issues, issues that arise in many other distributed data collection projects. First, eBird researchers are constantly trying to upgrade the quality of the data submitted by birders. For example, eBird offers education to participants, and it has created visualizations of each participant’s data that, by their design, encourage birders to upload information about all species that they observed, not just a subset (Wood et al. 2011; Wiggins 2011). Second, eBird researchers use statistical models that attempt to correct for the noisy and heterogeneous nature of the raw data. It is not yet clear if these statistical models fully remove biases from the data, but ornithologists are confident enough in the quality of adjusted eBird data that, as had been mentioned earlier, it has been used in almost 100 peer-reviewed scientific publications.
Many non-ornithologists are initially extremely skeptical when they hear about eBird for the first time. In my opinion, part of this skepticism comes from thinking about eBird in the wrong way. Many people first think “Is the eBird data perfect?”, and the answer is absolutely not. However, that’s not the right question. The right question is, “For certain research questions, is the eBird data better than existing ornithology data?” For that question the answer is definitely yes, in part because for many questions of interest there is no realistic alternative to distributed data collection.
The eBird project demonstrates that it is possible to involve volunteers in the collection of important scientific data. However, eBird, and related projects, indicate that challenges related to sampling and data quality are concerns for distributed data collection projects. As we will see in the next section, however, with clever design and technology these concerns can be minimized in some settings.