Show me the data!

Libre redistribution – a key facet of Open Access

May 28th, 2012 | Posted by rmounce in Content Mining | Open Access

I have previously commented elsewhere on other blogs, that uniquely, with BOAI-compliant Open Access literature, one is able to re-distribute research however one wishes (provided proper attribution is given). I believe this to be hugely beneficial and perhaps a rather under-appreciated facet of the plurality of benefits offered by Open Access publishing.

Below is an expanded version of the comment I made on Cameron Neylon’s excellent blog Science in the Open on this very theme (and please do read Cameron’s post too for greater context):

Decentralized journal/article distribution is already happening.

I have 20,000+ PLoS articles on my computer right now. You can get them too – via BioTorrents. When compressed (as initially provided there) it’s less than 16GB’s of files – a trivial amount for anyone with a broadband connection. I can now (and do!) take PLoS on a USB stick with me wherever I go, allowing me to do research on trains, planes, and remote locations completely hassle free without even an internet connection. It was easy to download (pretty much 1-click) too via my high-speed institutional connection – and didn’t overload PLoS’s servers because I didn’t *get* the articles from their servers. With peer-2-peer file sharing the load is balanced between seeders (and in turn, I’m now seeding this torrent too, to help share the load). If all institutions/libraries agreed to help seed the world’s research literature, without copyright restriction on electronic redistribution (which we could do tomorrow if it weren’t for the legal copyright barriers imposed by most traditional subscription-access publishers) doing literature research would be pretty much frictionless! We could even get papers & data on campus much quicker over campus LAN rather than the internet.

Institutions already agree to help distribute code e.g. R and it’s multitude of packages – this is hugely beneficial, and helps share the costs associated with bandwidth — why not for research publications? The PLoS corpus is a great way to try out content mining ideas – it shows you how easy academic life *could* be if everything was Open Access. I’ve run some simple scripts on it myself. I’m not sure the simple things I did such as string matching could be classified as ‘text mining’ – but one thing I do know is – it was 100,000x times easier/quicker doing this locally, machine-reading files, rather than doing it paper by paper negotiating paywalls (where do I click, how many hoops do I have to jump through before I’m let in, what information are the ‘helpful’ tracking cookies keeping about me…) and getting cutoff by publishers. It’s worth pointing out as well, that once you have all the literature you need on your computer – you don’t even need the internet to do your research! For research in lesser economically developed countries, with weaker telecomms infrastructure – I’d imagine this would be a real boon for research.

It’s a window on the world that *could* be possible if we just changed our attitude WRT to copyright and research publishing. That PLoS, BMC and other Open Access publishers use the Creative Commons Attribution Licence makes this all possible.

I predict that the rights to electronically redistribute, and machine-read research will be vital for 21st century research – yet currently we academics often wittingly or otherwise relinquish these rights to publishers. This has got to stop. The world is networked, thus scholarly literature should move with the times and be openly networked too.

In short, I think research would be a whole lot easier to do, and ultimately (all things considered) be more cost-effective, if all future publicly-funded research could be made BOAI-compliant Open Access. This is just my opinion – you are welcome to disagree in the comments section below, I sincerely hope I don’t sound like an Open Access ‘zealot‘ for this is certainly not my intention.