Data from the Deep, Judgment from the Crowds

The Neptune Canada project at the University of Victoria recently received a $1M funding award from CANARIE Inc., Canada’s Advanced Research and Innovation Network, in response to its “Data from the Deep, Judgment from the Crowd” proposal. is a partner in this project, leading the “Digital Fishers” crowdsourcing component under the direction of UVic’s Centre for Global Studies Senior Associate Dr. Rod Dobell. (a division of Whitehall Policy Inc.) brings leading-edge depth of experience in deploying Web2.0 principles and technologies in corporate, academic, civil society and civic society environments to facilitate social networking, data capture and collaborative knowledge creation.

The Neptune Canada project centres on the construction of the world’s largest cabled seafloor observatory off the west coast of Vancouver Island, British Columbia. The network, which extends across the Juan de Fuca plate , will gather live data from a rich constellation of instruments deployed in a broad spectrum of undersea environments. Data will be transmitted via high-speed fibre optic communications from the seafloor to an innovative data archival system at the University of Victoria. This system will provide free Internet access to an immense wealth of data, both live and archived throughout the life of this planned 25-year project.

The Digital Fishers component of the “Data from the Deep, Judgment from the Crowd” focuses on the application of crowdsourcing / Web2.0 citizen-science techniques (more on this distinction to follow) to the special problem of how to effectively assess the large volume of visual and audio data that will stream in from the seafloor observatory – data which is still largely undecipherable by current machine computation methods – without wasting the highly-skilled human resources represented in the Venus / Neptune science cadre in repetitive tasks that require very little training. We take our inspiration from previous crowdsourcing / citizen science exercises such as NASA Clickworkers.

NASA Clickworkers (server is now offline, but this is courtesy of the Wayback machine) was an experiment to see if public volunteers acting as citizen scientists, each working for a few minutes here and there, could do routine science analysis that would normally be done by a fully-trained scientist or graduate student. Users were asked to mark craters on maps of Mars, classify craters that have already been marked, or search the Mars landscape for “honeycomb” terrain. In its first six months of operation, more than 85,000 users visited the site with many contributing to the effort, making more than 1.9 million entries. An analysis of the quality of markings showed “that the automatically-computed consensus of a large number of clickworkers is virtually indistinguishable from the inputs of a geologist with years of experience in identifying Mars craters.”
The Clickworkers project was a particularly clear example of how a complex professional task that requires a number of highly trained individuals on full-time salaries can be reorganized so as to be performed by tens of thousands of volunteers in increments so small and simple that the tasks could be performed on a much lower budget. The low budget would be devoted to coordinating the volunteer effort. However, the raw human capital needed would be contributed for free. (ref: Kanefsky,B., N. G. Barlow and V. C. Gulick. 2001. “Can distributed volunteers accomplish massive data analysis tasks?Lunar and Planetary Science XXXII.

Much of the data collected through the NEPTUNE seafloor array will be reported as numerical observations that are best analyzed through machine computational methods (e.g., conductivity, temperature, depth, current meters, bottom pressure sensors, chemical and gas sensors for measuring carbon dioxide, oxygen, methane, nitrates, etc. – the list is extensive). However, where inputs are not-so-easily decipherable by traditional computer analytical methods (e.g., camera imagery, full-motion video and audio signals), two alternative approaches are distinguished:

  • Development of software agents that can learn to interpret these data, and
  • Applying human intelligence and reasoning directly through human-based computation, analysis and observation.

In both these instances, the common approach to analyzing these data has been to assign trained personnel to accomplish these tasks – whether to assess the data or problem directly, or – where feasible – to provide rules and vocabulary through iterative interpretations in order to increase the accuracy of software agents. This approach – employing highly-skilled personnel to undertake routine data analysis and software training tasks – can represent an inefficient use of scarce and valuable resources if the tasks are particularly simple and numerous. Our approach is to use crowdsourcing – also, in this context, referred to as Web2.0 citizen science – in order to apply volunteer labour as a first pass effort. A related outcome is extended public engagement in ocean and marine sciences and the work of the Neptune project.

Crowdsourcing is a term coined in 2006 to describe the process of taking a task traditionally performed by an employee and allocating it to an large and dispersed set of volunteers, using the Internet as the medium for communicating the request for volunteers, allocating the task, and collecting the results. Citizen science describes scientific projects or programs in which volunteers with little or no training perform tasks such as observation, measurement or computation. Volunteer crowdsourcing examples usually include tasks that:

  • are comprised of a large number of discrete, simple human-based computations.
  • require very little time on the part of the volunteer to learn how to complete the task, and actually complete one instance of the task.
  • give the volunteer a sense of accomplishment and of having contributed to a large complex project through a very simple, short interaction.

This component envisions the use of crowdsourcing to engage large numbers of volunteers (referred to as “digital fishers” in this project) to provide a feedback loop from the database through the collective mind of the crowd and back to the database to offer enhanced content and value added in support of the scientific community. Can crowdsourcing (an approach which, as applied to large data-sets, rests on the presumption that computers should do what computers are good at – e.g., storing, indexing, comparing – and people should do what people are good at – e.g., querying, deciding, thinking) be effectively applied to those elements in a high-volume data stream that are not optimally amenable to machine computation, or require an iterative process of training autonomous software agents, but are of such scale that it is resource-prohibitive to employ enough qualified people to do so? This question is aimed at assessing the feasibility and value of crowdsourcing in the interpretation of a high-volume data streaming exercise like NEPTUNE, to improve the scientific content drawn from the data and increase the value of the database by engaging many volunteers to scan the raw data flow and interpret and tag segments, and to validate and help refine the annotations of software agents designed to accomplished this task autonomously. This approach would seek to tap the cognitive surplus of large numbers of dispersed volunteers to improve the value of the data to the scientific community and, subsequently, the quality of the evidence provided by the scientific community as a basis for public deliberation. A parallel objective is to use the social networking activities that are central to a successful crowdsourcing strategy to build interest and awareness of the VENUS / NEPTUNE Canada cabled sea-floor observatories. In order to investigate and implement the application of crowdsourcing in this project, will provide leadership on:

  • Social networking strategies and software development in support of the development of software interfaces and middleware for applying crowdsourcing techniques to the VENUS / NEPTUNE Canada datastream for the collating of metadata comprised of interpretation and tagging.
  • Social networking strategies and software specifications for applying crowdsourcing to verify autonomous software annotation of data, and to iteratively train software to improve future accuracy.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s