Friday, March 06, 2015  

Brown Dog - Bringing Long-Tail Data Into the Light

Brown Dog seeks to develop a service that will make past and present un-curated data accessible and useful to scientists and social scientists while also demonstrating the novel science and scholarship that can be conducted from such data. Brown Dog will not attempt to construct a single piece of software that magically understands all data, but instead will use every possible source of automatable help already in existence in a robust and provenance preserving manner to create a service that can deal with as much of this data as possible. Brown Dog is the proverbial “super mutt” of software, serving as a low-level data infrastructure to interface with digital data content across the web and enabling a new era of science and applications at large. The broader impact of this work is in its potential to serve not just the scientific community but the general public as a “DNS for data”, moving civilization towards an era where a user’s access to data is not limited by a file’s format or un-curated collections. Brown Dog is part of the DataNet Partners program funded by NSF beginning in 2008. DataNet was conceived to address the increasingly digital and data-intensive nature of science and engineering research and education. Digital data are not only the output of research but provide input to new hypotheses, enabling new scientific insights and driving innovation. Therein lies one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams. DataNet addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning. Brown Dog is, more specifically, part of a follow-on effort called DIBBs (Data Infrastructure Building Blocks, focused on building software to support the DataNet efforts. DIBBs projects target software cyberinfrastructure---stuff lots of people can use. All of the DIBBs projects are meant to provide complementary services, each building on the others capabilities. This will be added to the tools section of Research Resources Subject Tracer™ Information Blog.

