PhotoCity solves the data quality and sampling problems in distributed data collection.
Websites such as Flickr and Facebook enable people to share pictures with their friends and family, and they also create huge repositories of photos that can be used for other purposes. For example, Sameer Agarwal and colleagues (2011) attempted to use these photos to “Build Rome in a Day” by repurposing 150,000 pictures of Rome to create a 3D reconstruction of the city. For some heavily photographed buildings—such as the Coliseum (figure 5.10)—the researchers were partially successful, but the reconstructions suffered because most photos were taken from the same iconic perspectives, leaving portions of the buildings unphotographed. Thus, the images from photo repositories were not enough. But what if volunteers could be enlisted to collect the necessary photos to enrich those already available? Thinking back to the art analogy in chapter 1, what if the readymade images could be enriched by custommade images?
In order to enable the targeted collection of large numbers of photos, Kathleen Tuite and colleagues developed PhotoCity, a photo-uploading game. PhotoCity turned the potentially laborious task of data collection—uploading photos—into a game-like activity involving teams, castles, and flags (figure 5.11), and it was first deployed to create a 3D reconstruction of two universities: Cornell University and the University of Washington. Researchers started the process by uploading seed photos from some buildings. Then, players on each campus inspected the current state of the reconstruction and earned points by uploading images that improved the reconstruction. For example, if the current reconstruction of Uris Library (at Cornell) was very patchy, a player could earn points by uploading new pictures of it. Two features of this uploading process are very important. First, the number of points a player received was based on the amount that their photo added to reconstruction. Second, the photos that were uploaded had to overlap with existing reconstruction so that they could be validated. In the end, the researchers were able to create high-resolution 3D models of buildings on both campuses (figure 5.12).
The design of PhotoCity solved two problems that often arise in distributed data collection: data validation and sampling. First, photos were validated by comparing them against previous photos, which were in turn compared with previous photos all the way back to the seed photos that were uploaded by researchers. In other words, because of this built-in redundancy, it was very difficult for someone to upload a photo of the wrong building, either accidentally or intentionally. This design feature meant that the system protected itself against bad data. Second, the scoring system naturally trained participants to collect the most valuable—not the most convenient—data. In fact, here are some of the strategies that players described using in order to earn more points, which is equivalent to collecting more valuable data (Tuite et al. 2011):
- “[I tried to] approximate the time of day and the lighting that some pictures were taken; this would help prevent rejection by the game. With that said, cloudy days were the best by far when dealing with corners because less contrast helped the game figure out the geometry from my pictures.”
- “When it was sunny, I utilized my camera’s anti-shake features to allow myself to take photos while walking around a particular zone. This allowed me to take crisp photos while not having to stop my stride. Also bonus: less people stared at me!”
- “Taking many pictures of one building with 5 megapixel camera, then coming home to submit, sometimes up to 5 gigs on a weekend shoot, was primary photo capture strategy. Organizing photos on external hard drive folders by campus region, building, then face of building provided good hierarchy to structure uploads.”
These statements show that when participants are provided with appropriate feedback, they can become quite expert at collecting data of interest to researchers.
Overall, the PhotoCity project shows that sampling and data quality are not insurmountable problems in distributed data collection. Further, it shows that distributed data collection projects are not limited to tasks that people are already doing anyway, such as watching birds. With the right design, volunteers can be encouraged to do other things too.