PhotoCity solves the data quality and sampling problems in distributed data collection.
Websites such as Flickr and Facebook enable people to share pictures with their friends and family, but they also create huge repositories of photos that can be used for other purposes. For example, Agarwal et al. (2011) attempts to use these photos to “Build Rome in a Day” by using 150,000 pictures of Rome to create a 3D reconstruction of the city. For tourist sites like the Coliseum there were enough pictures online to produce 3D models (Figure 5.10), but the quality of these reconstructions were limited by the fact that most photos were taken from the same iconic perspectives, leaving portions of the buildings unphotographed. Further, for most parts of the city, not enough photos were available. Thus, using the found data from photo repositories, it was not possible to recreate all of Rome. But, what if volunteers could be enlisted to collect the necessary photos to truly “Build Rome in a Day”?
In order to enable the targeted collection of large numbers of photos, Kathleen Tuite and colleagues developed PhotoCity, a photo-uploading game. One beautiful aspect of PhotoCity is that it turned the potentially laborious task of data collection—uploading photos—into a game-like activity involving teams, castles, and flags (Figure 5.11). The design of PhotoCity also elegantly solves the sampling and data quality challenges of eBird and other distributed data collection projects.
PhotoCity was first deployed to enable a 3D reconstruction of two universities: Cornell University and University of Washington. Players at each campus could inspect the current state of the reconstruction model of their campus. Then, they could earn points by uploading images that expand the current model. For example, if the current model of Uris Library (at Cornell) was very patchy, a player could earn points by uploading new pictures of it. Critically, the photos that were uploaded must overlap with existing photos so that they can be validated, and the number of points a player received is based on the amount that their photo adds to current model. In the end, the researchers were able to use these uploaded photos to create high resolution 3D models of buildings on both campuses (Figure 5.12).
The design of PhotoCity elegantly solves two problems: data validation and sampling. First, photos were validated by matching them against previous photos which were in turn matched to previous photos all the way back to the seed photos that were uploaded by researchers. In other words, because of this built-in redundancy, it is very difficult for the system to accept bad data. Second, the scoring system naturally trains participants to collect the most valuable—not the most convenient—data. In fact, here are some of the strategies that players described using in order to earn more points, which is equivalent to collecting more valuable data (Tuite et al. 2011):
- “[I tried to] approximate the time of day and the lighting that some pictures were taken; this would help prevent rejection by the game. With that said, cloudy days were the best by far when dealing with corners because less contrast helped the game figure out the geometry from my pictures.”
- “When it was sunny, I utilized my camera’s anti-shake features to allow myself to take photos while walking around a particular zone. This allowed me to take crisp photos while not having to stop my stride. Also bonus: less people stared at me!”
- “Taking many pictures of one building with 5 megapixel camera, then coming home to submit, sometimes up to 5 gigs on a weekend shoot, was primary photo capture strategy. Organizing photos on external hard drive folders by campus region, building, then face of building provided good hierarchy to structure uploads.”
These statements from participants show that when they are provided appropriate feedback, they can become quite expert at collecting data of interest to researchers.
Overall, the PhotoCity project shows that sampling and data quality are not insurmountable problems in distribution data collection. Further, it shows that distributed data collection projects are not limited to tasks that people are already doing anyway, such as watching birds. With the right design, volunteers can be encouraged to do other things too.