Show me the data!

Libre redistribution – a key facet of Open Access

May 28th, 2012 | Posted by rmounce in Content Mining | Open Access - (Comments Off on Libre redistribution – a key facet of Open Access)

I have previously commented elsewhere on other blogs, that uniquely, with BOAI-compliant Open Access literature, one is able to re-distribute research however one wishes (provided proper attribution is given). I believe this to be hugely beneficial and perhaps a rather under-appreciated facet of the plurality of benefits offered by Open Access publishing.

Below is an expanded version of the comment I made on Cameron Neylon’s excellent blog Science in the Open on this very theme (and please do read Cameron’s post too for greater context):

Decentralized journal/article distribution is already happening.

I have 20,000+ PLoS articles on my computer right now. You can get them too – via BioTorrents. When compressed (as initially provided there) it’s less than 16GB’s of files – a trivial amount for anyone with a broadband connection. I can now (and do!) take PLoS on a USB stick with me wherever I go, allowing me to do research on trains, planes, and remote locations completely hassle free without even an internet connection. It was easy to download (pretty much 1-click) too via my high-speed institutional connection – and didn’t overload PLoS’s servers because I didn’t *get* the articles from their servers. With peer-2-peer file sharing the load is balanced between seeders (and in turn, I’m now seeding this torrent too, to help share the load). If all institutions/libraries agreed to help seed the world’s research literature, without copyright restriction on electronic redistribution (which we could do tomorrow if it weren’t for the legal copyright barriers imposed by most traditional subscription-access publishers) doing literature research would be pretty much frictionless! We could even get papers & data on campus much quicker over campus LAN rather than the internet.

Institutions already agree to help distribute code e.g. R and it’s multitude of packages – this is hugely beneficial, and helps share the costs associated with bandwidth — why not for research publications? The PLoS corpus is a great way to try out content mining ideas – it shows you how easy academic life *could* be if everything was Open Access. I’ve run some simple scripts on it myself. I’m not sure the simple things I did such as string matching could be classified as ‘text mining’ – but one thing I do know is – it was 100,000x times easier/quicker doing this locally, machine-reading files, rather than doing it paper by paper negotiating paywalls (where do I click, how many hoops do I have to jump through before I’m let in, what information are the ‘helpful’ tracking cookies keeping about me…) and getting cutoff by publishers. It’s worth pointing out as well, that once you have all the literature you need on your computer – you don’t even need the internet to do your research! For research in lesser economically developed countries, with weaker telecomms infrastructure – I’d imagine this would be a real boon for research.

It’s a window on the world that *could* be possible if we just changed our attitude WRT to copyright and research publishing. That PLoS, BMC and other Open Access publishers use the Creative Commons Attribution Licence makes this all possible.

I predict that the rights to electronically redistribute, and machine-read research will be vital for 21st century research – yet currently we academics often wittingly or otherwise relinquish these rights to publishers. This has got to stop. The world is networked, thus scholarly literature should move with the times and be openly networked too.

In short, I think research would be a whole lot easier to do, and ultimately (all things considered) be more cost-effective, if all future publicly-funded research could be made BOAI-compliant Open Access. This is just my opinion – you are welcome to disagree in the comments section below, I sincerely hope I don’t sound like an Open Access ‘zealot‘ for this is certainly not my intention.

The Panton Fellowship – April (month 1)

May 4th, 2012 | Posted by rmounce in Panton Fellowship updates - (Comments Off on The Panton Fellowship – April (month 1))

If you haven’t heard yet – I was successful in my Panton Fellowship proposal
Logo for the Panton Fellowships

I wasn’t the only successful applicant either – huge congratulations to Sophie Kershaw and her excellent proposal to train doctoral students how to do Open Science at Oxford University. We’ll be working together on shared goals throughout the year I suspect

As part of the Fellowship process I’ll be making monthly short reports on progress and more lengthy quarterly reports.

So without further ado, here’s what I’ve been getting up to in April:

  • For the main component of my proposal – extracting phylogenetic data from PDF’s – I’ve spent the month getting up to speed with things with the expert guidance of PMR. I even spent a whole day (16th Apr) in Cambridge with PMR working on this. Things are coming on in leaps and bounds.
  • Visited Digital Science HQ in King’s Cross to have a chat with them about all the exciting web technology they’re working on.
  • Successfully arranged for the Open Knowledge Foundation to have a stall, and possibly a talk at the upcoming Progressive Palaeontology academic conference in Cambridge later this month.
  • Raised transparency and Open Data issues at the Systematics Association council meeting. As a result of this, we will soon upload our official constitution to our website to make it crystal clear what our guiding principles are. Additionally, all council members unanimously agreed in principle that we should try and make the data underlying our future Systematics Association special volume publications Open Data online somewhere, somehow – but we need to get feedback and agreement from our publisher, Cambridge University Press before we proceed further with this.
  • Together with Sophie Kershaw we agreed a strategy for our OKFest plans and with the excellent help of Laura Newman submitted a talk session proposal for the OKfestival, Helsinki later this year.
  • Attended the OKFN London Open Science hackday, further details on that are in my previous blogpost.


and of course this is all concurrent with my ‘regular’ PhD work which included, two manuscripts currently being prepared, 3 conference abstract submissions (and associated work to actually have something to write about!), undergrad demonstrating work and all the other day to day stuff.

I even had time for a small holiday over the long Bank Holiday weekend, to St Austell to see The Lost Gardens of Heligan & The Eden Project amongst other things.

It’s been a busy month!


PS I’ve been enjoying the new HTML classes on Codecademy. Below I’m going to see if some of these new HTML tricks work in WordPress:

This box should have rounded corners
This box should have a black shadow

I can guess the number you are thinking of

Follow the Rules and then hover the card below

  1. Think of a number below 10
  2. Double the number you have
  3. Add 6
  4. Divide it by 2
  5. Subtract the original number from your answer

Reflections on the OKFN Open Science hackday

April 1st, 2012 | Posted by rmounce in Open Data - (Comments Off on Reflections on the OKFN Open Science hackday)

Yesterday, I dragged myself out of bed (it was a Saturday!) to go to my first ever ‘hackathon‘. Thankfully it was a lot less geeky than it sounds – just a cosy little get together of people interested in Open Science, to work on things in a shared public space.

Nick Stenning, Stefan Wehrmeyer, Jenny Molloy, Caspar Addyman and I all beavering away on our laptops at the Barbican Centre, later joined by surprise guest Todd Vision (Dryad & UNC) in the afternoon. We also had online participation from afar communicating with us via Etherpad & IRC, including Rufus Pollock giving me a few pointers on PDF image extraction tools and James Casbon working on notebook.js.

You can see a record of all the things we worked on here on the official Etherpad for the event.

I have to say, I didn’t make all that much progress on my tasks for the day for a variety of n00by errors. The tools I wanted to use were rather large to download, particularly the Eclipse IDE which took a fair while to get over the public WiFi we were using. I was also using a small netbook. This is handy for my regular train journeys between Bath & London but not so useful when you need simultaneous windows open e.g. IRC + PDF manual + terminal + browser. The 24″ desktop screens I usually do work on have probably led me astray into such less efficient multi-window habits! Although by using a translucent dropdown terminal (Tilda) I saved on some window switching, but not enough to make things easy…

So for next time I’ve learn’t:

1.) Bring a comfortably sized laptop. Unless you really know what you’re doing on the command-line, you’re gonna need screen real estate

2.) Download all the large files you’ll need before you go

3.) Consider bringing your own food, drink & snacks! I think I must have spent over £10 just on lunch there, and the canteen only had over-priced tuna sandwiches :/

All in all though, the session was great. There’s no substitute to meeting people IRL. There was time for excellent therapeutic #PhDchat with Jenny, tactical discussions on how to encourage more palaeontologists into publicly archiving research publication data with Todd, and meeting other people in the Open Science community I’d never met before. As we discussed at the hackday – it’s not something we would do every weekend, but as a special event every now and again – it’s well worth going to!

Perhaps I might see YOU at the next one? All are welcome


Copyright in an Open Access World – a parody

March 27th, 2012 | Posted by rmounce in Open Access - (Comments Off on Copyright in an Open Access World – a parody)

This is a parody of a recent blog post over in Elsevier-land by David Tempest. If you haven’t read it yet, you really should – it’s an interesting insight into the mind of the DEPUTY DIRECTOR OF UNIVERSAL ACCESS (their caps-usage, not mine) at Elsevier.

Here’s my remix tribute post, words in blue are my insertions, and strikethough words are words I’ve chosen to delete because they don’t represent my opinion.

Copyright in an Open Access World

Copyright plays a significant vital role in the current world of publishing scientific, medical and technical content. It provides commercial publishers authors with a set ofrights to enable them to utilize these their works to generate subscription access profits and to be recognized as the copyright holder creator of the work. Commercial publishers are empowered to act on behalf of their shareholders the author to use copyright transfer or exclusive license to copy, publish, and adapt works, whilst protecting their profit margins integrity. In this way, publishers are empowered to do various things on behalf of the author, for example to ensure that the article is paywalled widely disseminated, that all requests for the rights to re-use content are denied and provision of permissions are answered efficiently, and to ensure that the original is correctly attributed. Each month, Elsevier receives more than 10,000 rights and permissions requests for content – both books and journals – and we have developed sophisticated systems to deny facilitate these requests and make the process as awkward, daunting and untimely simple and timely as possible. We take this role very seriously.

The importance of protecting profit generating content

But what about copyright in an open access world? Does it make a difference that articles are being made available to all and should we be concerned? The answer is…well, yes and no.

To all intents and purposes, the fact that journal articles are being made available to all through open access, is a big threat to our current business model or to subscribers under the subscription model, should not really affect things. Issues can arise, however, as there is a common misperception [citation needed] that open access means anyone can do anything with an article – in fact, the rights in the content must still be understood and upheld.

In addition, from an editorial perspective, copyright does not prevent elements such as plagiarism, multiple submission and fraud in journal articles. and whilst is It does not actually help detect these elements, so it cannot acts as a protective measure to uphold the quality of journals.

Within open access publishing there seems to be no a dilemma over copyright: author’s should definitely and the three choices facing an author: retain copyright share it or transfer it. Elsevier believes that it remains a fundamental role of a commercial publisher to pretend to act on the author’s behalf, and by continuing to transfer copyright, we can ensnare ensure and uphold the copyrights of the authors and handle all subsequent toll access profits generated permission requests. If copyright is retained by the publisher, then this process remains with the publisher and, if it is shared, there is a greater risk that profit loss fraudulent use may occur, which is why we continue to advocate the transfer of copyright for our journals.

Clearing up the dangerous ‘confusion that threatens our excessive profit margins

Some believe that in an open access world these factors become blurred and journal articles are easier to copy and incorporate into other works – because it’s true! This is a good thing. Science is based on building on, reusing and openly criticising the published body of scientific knowledge – we need to be able to do this as frictionlessly as possible. For example, open access journals offer additional usage rights which help enable re-use may introduce some confusion in relation to copyright. These open access ‘factors may help the speed and progress of science threaten the rights of the author and make it difficult for publishers to make excessive profits from academic works enforce copyright policy. However, if it is clear where copyright lies through consistent application, the usage rights of the article in question become independent of the publishing model and work for both subscription and open access content.

Of course, one of the main issues with copyright in general is that it is often widely misunderstood and interpreted in a different way by each individual. A study published by JISC in 2005 investigated the level of understanding of researchers towards copyright. It found that from a pool of 355 respondents, 30% of researchers did not know who initially owned the copyright of their own research articles and a further 26% of the respondents indicated that they had a low interest in the copyright issues of their own research articles! Clearly, this continues to be one of the important roles a commercial publisher must embrace: ensuring that it is clear and easy to understand what cannot be done with toll access content.




But seriously. I hope this goes to show it’s very easy to write and publish a very one-sided opinion and present this opinion as authority on a website. I dread to think anyone reads those Elsevier editorials uncritically.