Back in July, I noted on this blog how the Guardian newspaper was using crowdsourcing to analyze the mountain of documents that had been released in relation to the UK House of Commons MPs expense scandal. Since the documents were generally image scans of expense claim forms, with handwritten data and receipts, machine interpretation was impossible. If needles were going to be found in this haystack, someone would have to look at each page. Rather than have Guardian employees do this, this crowdsourcing experiment asked Guardian readers to do this scanning. Now that the exercise has run for a while, it’s good to check back in on how it’s work and what lessons there are for future similar exercises.
(Whether this experiment is the unqualified success some observers make it out to be is an open question. It seems to me that the process has stalled half-way through and has not really resulted in not much more than them having to publish an apology for having published an erroneous observation from the crowd? I’ll return to the question of the results of the exercise; for now, here’s the link to the output data and my quick analysis of the user data.)
Neiman Labs recently wrote about four crowdsourcing lessons learned from the experiment. In conversation with the developer, Simon Willison, here are the four big lessons from the experiment:
- Your workers are unpaid, so make it fun. The labor would come from Guardian readers, so the developers made it simple, and made it feel like a game. The four-panel interface — “interesting,” “not interesting,” “interesting but known,” and “investigate this!” made categorization easy. Adding mugshots of each MP to their pages in the database helped – making it personal for the contributors. Also, posting lists of the top-performing volunteers (see quick analysis of the user data – note the long tails) recognized high-producing contributors and introduced some competition between them.
- Public attention is fickle, so launch immediately. Also – public attention is fickle; don’t expect to ride a wave for too long (n.b.: the exercise has bogged down in recent months)
- Speed is mandatory, so use a framework. The project was built on Django, one of the frameworks were investigating for the Digital Fishers project.
- Participation will come in one big burst, so have servers ready. The project rented EC2 server space, the Amazon contract-hosting service.
Here’s Simon Willison, the project lead, talking at a News Innovation Unconference about how he and the rest of the team got the crowdsourcing off the ground in a week. Posted by Julie Starr on October 27, 2009