Show me the data!

This post was originally posted over at the LSE Impact blog where I was kindly invited to write on this theme by the Managing Editor. It’s a widely read platform and I hope it inspires some academics to upload more of their work for everyone to read and use

Recently I tried to explain on twitter in a few tweets how everyone can take easy steps towards open scholarship with their own work. It’s really not that hard and potentially very beneficial for your own career progress – open practices enable people to read & re-use your work, rather than let it gather dust unread and undiscovered in a limited access venue as is traditional. For clarity I’ve rewritten the ethos of those tweets below:

Step 1: before submitting to a journal or peer-review service upload your manuscript to a public preprint server

Step 2: after your research is accepted for publication, deposit all the outputs – full-text, data & code in subject or institutional repositories

The above is the concise form of it, but as with everything in life there is devil in the detail, and much to explain, so I will elaborate upon these steps in this post.

Step 1: Preprints

Uploading a preprint before submission is technically very easy to do – it takes just a few clicks, but the barrier that prevents many from doing this in practice is cultural and psychological. In disciplines like physics it’s completely normal to upload preprints to and their submission to a journal in some cases has more to do with satisfying the requirements of the Research Excellence Framework exercise than any real desire to see it in a journal. Many preprints on arXiv get cited and are valued scientific contributions, even without them ever being published in a journal. That said, even within this community author perceptions differ as to the exact practice of when to upload a preprint in the publication cycle.

Within biology it’s relatively unheard of to upload a preprint before submission but that’s likely to change this year because of an excellent well-put article advocating their use in biology and the very many different outlets available for them. My own experience of this has been illuminating – I recently co-authored a paper openly on github and the preprint was made available with a citable DOI via figshare. We’ve received a nice comment, more than 250 views and a citation from another preprint. All before our paper has been ‘published’ in the traditional sense. I hope this illustrates well how open practices really do accelerate progress.

This is not a one-off occurrence either. As with open access papers, freely accessible preprints have a clear citation advantage over traditional subscription access papers:


Outside of the natural sciences the situation is also similar; Martin Fenner notes that in the social sciences (SSRN) and economics (RePEc) preprints are also common either in this guise, or as ‘working papers’ – the name may be different but the pre-submission accessibility is the same. Yet I suspect, like in biology, this practice isn’t yet mainstream in the Arts & Humanities – perhaps just a matter of time before this cultural shift occurs (more on this later on in the post…)?

There is one important caveat to mention with respect to posting preprints – a small minority of conservative, traditional journals will not accept articles that have been posted online prior to submission. You might well want to check Sherpa/RoMEo before you upload your preprint to ensure that your preferred destination journal accepts preprint submissions. There is an increasing grass-roots led trend apparent to convince these journals that preprint submissions should be allowed, of which some have already succeeded.

If even much-loathed publishers like Elsevier allow preprints, unconditionally, I think it goes to show how rather uncontroversial preprints are. Prior to submission it’s your work and you can put it anywhere you wish.


Step 2: Postprints


Unlike with preprints, the postprint situation is a little trickier. Publishers like to think that they have the exclusive right to publish your peer-reviewed work. The exact terms of these agreements will vary from journal to journal depending on the exact terms of the copyright or licencing agreement you might have signed. Some publishers try to enforce ’embargoes’ upon postprints, to maintain the artificial scarcity of your work and their monopoly of control over access to it. But rest assured, at some point, often just 12 months after publication, you’ll be ‘allowed’ to upload copies of your work to the public internet (again SHERPA/RoMEO gives excellent information with respect to this).

So, assuming you already have some form of research output(s) to show for your work, you’ll want these to be discoverable, readable and re-usable by others – after all, what’s the point of doing research if no-one knows about it! If you’ve invested a significant amount of time writing a publication, gathering data, or developing software – you want people to be able to read and use this output. All outputs are important, not just publications. If you’ve published a paper in a traditional subscription access journal, then most of the world can’t read it. But, you can make a postprint of that work available, subject to the legal nonsense referred to above.

If it’s allowed, why don’t more people do it?

Similar to the cultural issues discussed with preprints, for some reason, researchers on the whole don’t tend to use institutional repositories (IR) to make their work more widely available. My IR at the University of Bath lists metadata for over 3300 published papers, yet relatively few of those metadata records have a fulltext copy of the item deposited with them for various reasons. Just ~6.9% of records have fulltext deposits, as published back in June 2011.

I think it’s because institutional repositories have an image problem: some are functional but extremely drab. I also hear of researchers full of disdain who say of their IR’s (I paraphrase):

“Oh, that thing? Isn’t that just for theses & dissertations – you wouldn’t put proper research there”

All this is set to change though as researchers are increasingly being mandated to deposit their fulltext outputs in IR’s. One particular noteworthy driver of change in this realm could be the newly-launched Zenodo service. Unlike or ResearchGate which are for-profit operations, and are really just websites in many respects; Zenodo is a proper repository – it supports harvesting of content via the OAI-PMH protocol and all metadata about the content is CC0, and it’s a not-for-profit operation. Crucially, it provides a repository for academics less well-served by the existing repository systems – not all research institutions have a repository, and independent or retired scholars also need a discoverable place to put their postprints. I think the attractive, modern-look, and altmetrics to demonstrate impact will also add that missing ‘sex appeal’ to provide the extra incentive to upload.


Providing Access to Your Published Research Data Benefits You

A new preprint on PeerJ shows that papers with associated open research data have a citation advantage. Furthermore other research has shown that willingness to share research data is related to the strength of the evidence and the quality of the results. Traditional repository software was designed around handling metadata records and publications. They don’t tend be great at storing or visualizing research data. But a new development in this arena is the use of CKAN software for research data management. Originally CKAN was developed by the Open Knowledge Foundation to help make open government data more discoverable and usable; the UK, US, and governments around the world now use this technology to make data available. Now research institutions like the University of Lincoln are also using this too for research data management, and like Zenodo the interface is clean, modern and provides excellent discoverability.


Repositories are superior for enabling discovery of your work

Even though I use & ResearchGate myself. They’re not perfect solutions. If someone is looking for your papers, or a particular paper that you wrote these websites do well in making your output discoverable for these types of searches from a simple Google search. But interestingly, for more complex queries, these simple websites don’t provide good discoverability.

An example: I have a fulltext copy of my Nature letter on, it can’t be found from Google Scholar – but the copy in my institutional repository at Bath can. This is the immense value of interoperable and open metadata. Academics would do well to think closely about how this affects the discoverability of their work online.

The technology for searching across repositories for freely accessible postprints isn’t as good as I’d want it to be. But repository search engines like BASE, CORE and Repository Search are improving day by day. Hopefully, one day we’ll have a working system where you can paste-in a DOI and it’ll take you to a freely available postprint copy of the work; Jez Cope has an excellent demo of this here.

Open scholarship is now open to all

So, if there aren’t any suitable fee-free journals in your subject area (1), you find you don’t have funds to publish a gold open access article (2), and you aren’t eligible for am OA fee waiver (3), fear not. With a combination of preprint & postprint postings, you too can make your research freely available online, even if it has the misfortune to be published in a traditional subscription access journal. Upload your work today!

This is a re-post, originally first blogged by myself on the Open Knowledge Foundation main blog here

Recently Science Europe published a clear and concise position statement titled:
Principles on the Transition to Open Access to Research Publications

This is an extremely timely & important document that clarifies what governments and research funders should expect during the transition to open access. Unlike the recent US OSTP public access policy which allows publishers to apply up to a 12 month access embargo (to the disgust of some scientists like Michael Eisen) on publicly-funded research, this new Science Europe statement makes clear that only up to a 6 month embargo at maximum should be accepted for publicly funded STEM research. The recent RCUK (UK research councils) open access policy also requires 6 months embargo at most, with some caveats.

But among the many excellent principles is a particularly bold and welcome proclamation:

the hybrid model, as currently defined and implemented by publishers, is not a working and viable pathway to Open Access. Any model for transition to Open Access supported by Science Europe Member Organisations must prevent ‘double dipping’ and increase cost transparency

Hybrid options are typically far more expensive than ‘pure’ open access journal costs, and they don’t typically aid transparency or the wider transition to open access.

The Open Knowledge Foundation heartily endorses these principles as together with the above they respect, and reinforce the need for free access AND full re-use rights to scientific research.

About Science Europe:

Science Europe is an association of European Research Funding Organisations and Research Performing Organisations, based in Brussels. At present Science Europe comprises 51 Research Funding and Research Performing Organisations from 26 countries, representing around €30 billion per annum.

This article is cross-posted from the main Open Knowledge Foundation blog, where I occasionally post. I realise I haven’t had time to post here on my own blog for over a month now(!), so I may well copy across a few more posts I’ve written for OKF.

Here at the Open Knowledge Foundation, we know Open Science is tough, but ultimately rewarding. It requires courage & leadership to take the open path in science.

Nearly a week ago on the open-science mailing list we started putting together a list of established scientists who have in some way or another made significant contributions to open science or lent their esteemed reputation to calls for increased openness in science. Our open list now has over 130 notable scientists, among whom 88 are Nobel prize winners.

In an interesting parallel development, the White House has just put out a call to help identify “Open Science” Champions of Change — outstanding individuals, organizations, or research projects promoting and using open scientific data for the benefit of society.


Anyone can nominate an Open Science candidate for consideration by May 14, 2013.

What more proof do we need that open science is both good, and valued in society? This marks a tremendous validation of the open science movement. The US government is not seeking to reward any scientist; only open scientists actively working to change the world for the better will win this recognition.

We’re still a long way from Open Science being the norm in science. But perhaps now, we’re a crucial step closer to important widespread recognition that Open Science is good, and could be the norm in the future. We eagerly await the unveiling of the winning Open Science champions at the White House on the 20th June later this year.