Twitter icon
Facebook icon
RSS icon
YouTube icon

Rescuing data from the dark

Geoscientists collected data for hundreds of years before the rise of the Internet. A wealth of this so-called “dark data” remains to be digitized and made accessible online.

Credit: 

courtesy of USGS

Along with the proliferation of techniques and technologies to deal with Big Data — the large volumes of data coming in from global sensors and satellites that can require supercomputers to crunch — geoscientists are also addressing the collection and integration of what could be termed small (or mainstream) data.

“Part of the challenge is also in extracting maximum value from the so-called ‘long tail’ of scientific data, or data that is produced by individual researchers and/or small groups,” stated Chaitanya Baru, director of the advanced cyberinfrastructure development group at the San Diego Supercomputer Center, who co-edited one of the first textbooks in the field, “Geoinformatics: Cyberinfrastructure for the Solid Earth Sciences,” published in June 2011.

However, much of these data, especially the older data, have been saved in obsolete forms of media that are not accessible online. Recovering these so-called “dark” data and digitizing them — for example, digging through filing cabinets and old boxes of paper reports and keying in data — can be time-consuming and laborious, and funding is needed to do the work.

To help researchers overcome those hurdles, Integrated Earth Data Applications, a community-based data facility at Columbia University’s Lamont-Doherty Earth Observatory that operates EarthChem and the Marine Geoscience Data System, joined with scientific publisher Elsevier to launch two awards in 2013: a set of four variable awards of up to $7,000 each and the International Data Rescue Award in the Geosciences, a $5,000 prize.

“We’re hoping this will encourage efforts to save data and show its potential for future research,” says Kerstin Lehnert, director of Integrated Earth Data Applications.

The committee is seeking researchers or groups who have digitized content that has “formerly been available in only analog or obsolete electronic formats … or developed data standards, tools and processes that facilitate [uploading] data into sustained, openly accessible community databases.”

Entries are being accepted through Oct. 13 and the prize will be awarded at the annual meeting of the Amer­ican Geophysical Union in San Francisco in December.

Sara E. Pratt
Wednesday, August 14, 2013 - 11:30