Yesterday, the open access publisher Public Library of Science
announced a change to its data sharing requirements. Previously, anyone publishing in one of its journals (including PLoS One
, the largest scientific journal around) implicitly agreed to make the data that they used in the paper available to other researchers, which typically meant that the other researchers had to make a formal request for it. From now on, however, the PLoS journals will require authors to sign a data availability statement that guarantees that all the data used in a paper is publicly accessible to anyone at the moment the paper goes live.
That includes things like images, DNA sequence reads, raw cell counts, and so forth. The publisher suggests three ways that researchers can meet the requirements. If the underlying data (like cell counts) is numerical, it can simply be published in a table in the paper itself. If it's a bit larger, researchers can compress it and make the archive a supplement to the paper, which PLoS will host on its servers. If it's larger still, researchers should look to a third-party service; hosting it on an institutional server would also be an option.
PLoS accepts that this won't work in some cases, as confidentiality is required for patient data, and some researchers rely on third parties for data. These exceptions, however, should be just that: exceptional. The vast majority of data should be subject to the new rules.
And the new rules are significant. Formal requests for data can sometimes get lost in spam filters or be put aside for weeks, even if the person who has the data is happy to share it (which is not always the case). By shifting the default state to one where anyone with a Web browser can grab whatever data they'd like, a lot of the friction that slows down the spread of scientific data should be eliminated.