Show me the data!
Header

I have previously commented elsewhere on other blogs, that uniquely, with BOAI-compliant Open Access literature, one is able to re-distribute research however one wishes (provided proper attribution is given). I believe this to be hugely beneficial and perhaps a rather under-appreciated facet of the plurality of benefits offered by Open Access publishing.

Below is an expanded version of the comment I made on Cameron Neylon’s excellent blog Science in the Open on this very theme (and please do read Cameron’s post too for greater context):

Decentralized journal/article distribution is already happening.

I have 20,000+ PLoS articles on my computer right now. You can get them too – via BioTorrents. When compressed (as initially provided there) it’s less than 16GB’s of files – a trivial amount for anyone with a broadband connection. I can now (and do!) take PLoS on a USB stick with me wherever I go, allowing me to do research on trains, planes, and remote locations completely hassle free without even an internet connection. It was easy to download (pretty much 1-click) too via my high-speed institutional connection – and didn’t overload PLoS’s servers because I didn’t *get* the articles from their servers. With peer-2-peer file sharing the load is balanced between seeders (and in turn, I’m now seeding this torrent too, to help share the load). If all institutions/libraries agreed to help seed the world’s research literature, without copyright restriction on electronic redistribution (which we could do tomorrow if it weren’t for the legal copyright barriers imposed by most traditional subscription-access publishers) doing literature research would be pretty much frictionless! We could even get papers & data on campus much quicker over campus LAN rather than the internet.

Institutions already agree to help distribute code e.g. R and it’s multitude of packages – this is hugely beneficial, and helps share the costs associated with bandwidth — why not for research publications? The PLoS corpus is a great way to try out content mining ideas – it shows you how easy academic life *could* be if everything was Open Access. I’ve run some simple scripts on it myself. I’m not sure the simple things I did such as string matching could be classified as ‘text mining’ – but one thing I do know is – it was 100,000x times easier/quicker doing this locally, machine-reading files, rather than doing it paper by paper negotiating paywalls (where do I click, how many hoops do I have to jump through before I’m let in, what information are the ‘helpful’ tracking cookies keeping about me…) and getting cutoff by publishers. It’s worth pointing out as well, that once you have all the literature you need on your computer – you don’t even need the internet to do your research! For research in lesser economically developed countries, with weaker telecomms infrastructure – I’d imagine this would be a real boon for research.

It’s a window on the world that *could* be possible if we just changed our attitude WRT to copyright and research publishing. That PLoS, BMC and other Open Access publishers use the Creative Commons Attribution Licence makes this all possible.

I predict that the rights to electronically redistribute, and machine-read research will be vital for 21st century research – yet currently we academics often wittingly or otherwise relinquish these rights to publishers. This has got to stop. The world is networked, thus scholarly literature should move with the times and be openly networked too.

In short, I think research would be a whole lot easier to do, and ultimately (all things considered) be more cost-effective, if all future publicly-funded research could be made BOAI-compliant Open Access. This is just my opinion – you are welcome to disagree in the comments section below, I sincerely hope I don’t sound like an Open Access ‘zealot‘ for this is certainly not my intention.

If you haven’t heard yet – I was successful in my Panton Fellowship proposal
Logo for the Panton Fellowships

I wasn’t the only successful applicant either – huge congratulations to Sophie Kershaw and her excellent proposal to train doctoral students how to do Open Science at Oxford University. We’ll be working together on shared goals throughout the year I suspect

As part of the Fellowship process I’ll be making monthly short reports on progress and more lengthy quarterly reports.

So without further ado, here’s what I’ve been getting up to in April:

  • For the main component of my proposal – extracting phylogenetic data from PDF’s – I’ve spent the month getting up to speed with things with the expert guidance of PMR. I even spent a whole day (16th Apr) in Cambridge with PMR working on this. Things are coming on in leaps and bounds.
  • Visited Digital Science HQ in King’s Cross to have a chat with them about all the exciting web technology they’re working on.
  • Successfully arranged for the Open Knowledge Foundation to have a stall, and possibly a talk at the upcoming Progressive Palaeontology academic conference in Cambridge later this month.
  • Raised transparency and Open Data issues at the Systematics Association council meeting. As a result of this, we will soon upload our official constitution to our website to make it crystal clear what our guiding principles are. Additionally, all council members unanimously agreed in principle that we should try and make the data underlying our future Systematics Association special volume publications Open Data online somewhere, somehow – but we need to get feedback and agreement from our publisher, Cambridge University Press before we proceed further with this.
  • Together with Sophie Kershaw we agreed a strategy for our OKFest plans and with the excellent help of Laura Newman submitted a talk session proposal for the OKfestival, Helsinki later this year.
  • Attended the OKFN London Open Science hackday, further details on that are in my previous blogpost.

 

and of course this is all concurrent with my ‘regular’ PhD work which included, two manuscripts currently being prepared, 3 conference abstract submissions (and associated work to actually have something to write about!), undergrad demonstrating work and all the other day to day stuff.

I even had time for a small holiday over the long Bank Holiday weekend, to St Austell to see The Lost Gardens of Heligan & The Eden Project amongst other things.

It’s been a busy month!

 

PS I’ve been enjoying the new HTML classes on Codecademy. Below I’m going to see if some of these new HTML tricks work in WordPress:

This box should have rounded corners
This box should have a black shadow

I can guess the number you are thinking of

Follow the Rules and then hover the card below

  1. Think of a number below 10
  2. Double the number you have
  3. Add 6
  4. Divide it by 2
  5. Subtract the original number from your answer

Yesterday, I dragged myself out of bed (it was a Saturday!) to go to my first ever ‘hackathon‘. Thankfully it was a lot less geeky than it sounds – just a cosy little get together of people interested in Open Science, to work on things in a shared public space.

Nick Stenning, Stefan Wehrmeyer, Jenny Molloy, Caspar Addyman and I all beavering away on our laptops at the Barbican Centre, later joined by surprise guest Todd Vision (Dryad & UNC) in the afternoon. We also had online participation from afar communicating with us via Etherpad & IRC, including Rufus Pollock giving me a few pointers on PDF image extraction tools and James Casbon working on notebook.js.

You can see a record of all the things we worked on here on the official Etherpad for the event.

I have to say, I didn’t make all that much progress on my tasks for the day for a variety of n00by errors. The tools I wanted to use were rather large to download, particularly the Eclipse IDE which took a fair while to get over the public WiFi we were using. I was also using a small netbook. This is handy for my regular train journeys between Bath & London but not so useful when you need simultaneous windows open e.g. IRC + PDF manual + terminal + browser. The 24″ desktop screens I usually do work on have probably led me astray into such less efficient multi-window habits! Although by using a translucent dropdown terminal (Tilda) I saved on some window switching, but not enough to make things easy…

So for next time I’ve learn’t:

1.) Bring a comfortably sized laptop. Unless you really know what you’re doing on the command-line, you’re gonna need screen real estate

2.) Download all the large files you’ll need before you go

3.) Consider bringing your own food, drink & snacks! I think I must have spent over £10 just on lunch there, and the canteen only had over-priced tuna sandwiches :/

All in all though, the session was great. There’s no substitute to meeting people IRL. There was time for excellent therapeutic #PhDchat with Jenny, tactical discussions on how to encourage more palaeontologists into publicly archiving research publication data with Todd, and meeting other people in the Open Science community I’d never met before. As we discussed at the hackday – it’s not something we would do every weekend, but as a special event every now and again – it’s well worth going to!

Perhaps I might see YOU at the next one? All are welcome

 

This is a parody of a recent blog post over in Elsevier-land by David Tempest. If you haven’t read it yet, you really should – it’s an interesting insight into the mind of the DEPUTY DIRECTOR OF UNIVERSAL ACCESS (their caps-usage, not mine) at Elsevier.

Here’s my remix tribute post, words in blue are my insertions, and strikethough words are words I’ve chosen to delete because they don’t represent my opinion.

Copyright in an Open Access World

Copyright plays a significant vital role in the current world of publishing scientific, medical and technical content. It provides commercial publishers authors with a set ofrights to enable them to utilize these their works to generate subscription access profits and to be recognized as the copyright holder creator of the work. Commercial publishers are empowered to act on behalf of their shareholders the author to use copyright transfer or exclusive license to copy, publish, and adapt works, whilst protecting their profit margins integrity. In this way, publishers are empowered to do various things on behalf of the author, for example to ensure that the article is paywalled widely disseminated, that all requests for the rights to re-use content are denied and provision of permissions are answered efficiently, and to ensure that the original is correctly attributed. Each month, Elsevier receives more than 10,000 rights and permissions requests for content – both books and journals – and we have developed sophisticated systems to deny facilitate these requests and make the process as awkward, daunting and untimely simple and timely as possible. We take this role very seriously.

The importance of protecting profit generating content

But what about copyright in an open access world? Does it make a difference that articles are being made available to all and should we be concerned? The answer is…well, yes and no.

To all intents and purposes, the fact that journal articles are being made available to all through open access, is a big threat to our current business model or to subscribers under the subscription model, should not really affect things. Issues can arise, however, as there is a common misperception [citation needed] that open access means anyone can do anything with an article – in fact, the rights in the content must still be understood and upheld.

In addition, from an editorial perspective, copyright does not prevent elements such as plagiarism, multiple submission and fraud in journal articles. and whilst is It does not actually help detect these elements, so it cannot acts as a protective measure to uphold the quality of journals.

Within open access publishing there seems to be no a dilemma over copyright: author’s should definitely and the three choices facing an author: retain copyright share it or transfer it. Elsevier believes that it remains a fundamental role of a commercial publisher to pretend to act on the author’s behalf, and by continuing to transfer copyright, we can ensnare ensure and uphold the copyrights of the authors and handle all subsequent toll access profits generated permission requests. If copyright is retained by the publisher, then this process remains with the publisher and, if it is shared, there is a greater risk that profit loss fraudulent use may occur, which is why we continue to advocate the transfer of copyright for our journals.

Clearing up the dangerous ‘confusion that threatens our excessive profit margins

Some believe that in an open access world these factors become blurred and journal articles are easier to copy and incorporate into other works – because it’s true! This is a good thing. Science is based on building on, reusing and openly criticising the published body of scientific knowledge – we need to be able to do this as frictionlessly as possible. For example, open access journals offer additional usage rights which help enable re-use may introduce some confusion in relation to copyright. These open access ‘factors may help the speed and progress of science threaten the rights of the author and make it difficult for publishers to make excessive profits from academic works enforce copyright policy. However, if it is clear where copyright lies through consistent application, the usage rights of the article in question become independent of the publishing model and work for both subscription and open access content.

Of course, one of the main issues with copyright in general is that it is often widely misunderstood and interpreted in a different way by each individual. A study published by JISC in 2005 investigated the level of understanding of researchers towards copyright. It found that from a pool of 355 respondents, 30% of researchers did not know who initially owned the copyright of their own research articles and a further 26% of the respondents indicated that they had a low interest in the copyright issues of their own research articles! Clearly, this continues to be one of the important roles a commercial publisher must embrace: ensuring that it is clear and easy to understand what cannot be done with toll access content.

 

———————————————

end.

But seriously. I hope this goes to show it’s very easy to write and publish a very one-sided opinion and present this opinion as authority on a website. I dread to think anyone reads those Elsevier editorials uncritically.

Research Councils UK (RCUK) – a partnership of seven core UK research funding bodies (AHRC, BBSRC, EPSRC, ESRC, MRC, NERC, and STFC), has recently released a very welcome draft policy document detailing their proposed Open Access mandate, for all research which they help fund.

The new proposed policies include (quoting from the draft):

  • Peer reviewed research papers which result from research that is wholly or partially funded by the Research Councils must be published in journals which are compliant with Research Council policy on Open Access.
  • Research papers which result from research that is wholly or partially funded by the Research Councils should ideally be made Open Access on publication, and must be made Open Access after no longer than the Research Councils’maximum acceptable embargo period. [6 months for all except AHRC & ESRC for which 12 months is the maximum delay permitted].
  • researchers are strongly encouraged to publish their work in compliance with the policy as soon as possible. [added emphasis, mine]

As a researcher funded by BBSRC myself – I’m thrilled to read this document.

It shows a clear understanding of the issues, including explicit statements on the need of different types of access – both manual AND automated:

The existing policy will be clarified by specifically stating that Open Access includes unrestricted use of manual and automated text and data mining tools. Also, that it allows unrestricted re-use of content with proper attribution – as defined by the Creative Commons CC-BY licence

 

But as a strong supporter of the Panton Principles for Open Data in Science, and Science Code Manifesto, I’m a little disappointed that the policy improvements with respect to data and code access are comparatively minor. Such underlying research materials need only be ‘accessible’ with few further stipulations as to how. AFAIK this allows researchers to make their data available via pigeon-transport (only) on Betamax tapes, 10 years after the data was generated *if there is no ‘best practice’ standard in one’s field.

The BBSRC’s data sharing policy for example seems to favour cost-effectiveness over transparency: “It should also be cost effective and the data shared should be of the highest quality.” and maddeningly seems to give researchers ownership over data, even though the data was obtained using BBSRC-funding: “Ownership of the data generated from the research that BBSRC funds resides with the investigators and their institutions.” This seems rather devoid of logic to me – if taxpayers paid for this data to be created, surely they should have some ownership of it? Finally ”Where best practice does not exist, release of data within three years of its generation is suggested.” 3 years huh? And that’s only a suggestion! Does anyone actually check that data is made available after those 3 years? I suspect not.

Admittedly, it would be hard to create a good one-size fits all policy, and policing it would cost more money, but I do feel that data & code sharing policies could be tightened-up in places, to enable more frictionless sharing, re-using and building-on previous research outputs.

So all in all this is a great step in the right direction towards Open Scholarship, particularly for BBB-compliant Open Access.

Related reactions and comments which are highly worth reading include posts by Casey Bergman, Peter Suber, and Richard Van Noorden.

Creative Commons Licence This blog post is licensed under a Creative Commons Attribution 3.0 Unported License, so feel free to redistribute, remix and re-use! All that I ask for is attribution :)

[Rather than summarise what's already been said about Elsevier and their for-excessive-profit practices in recent weeks, I'll just lazily assume you've read it all... right then. Here's what I have to add.]

This post is a real-world anecdote of the problems that Elsevier’s journal bundling & excessive profiteering*** causes. Just one of many reasons which persuaded me to sign my name along with 5,000+ other academics over at The Cost of Knowledge, to register my disapproval of what Elsevier (and other publishers) are doing with scholarly works.

Recently, I discovered to my dismay that my institutional library (University of Bath) had cut it’s subscription (and therefore easy access) to an important journal in my field. Literally, one week I had free access to the journal content, and then the next week I found I didn’t!

The journal is Biology Letters, a general biology journal by Royal Society Publishing [RSP from now on]. **

 

Intellectual property owned by Royal Society Publishing (taken from Wikipedia)

Interestingly RSP take a relatively enlightened stance on Open Access, and have made some interesting statements in the past, such as this gem [from a statement published way back in 2008]:

“…some companies do appear to be making excessive profits from the publication of researchers’ papers”

I think RSP, is a non-profit organisation (source) and hence it doesn’t surprise me that they have such prescient criticism of Elsevier & co to offer. They aren’t in the business of excessive profiteering like some.

So… RSP’s Biology Letters has been cut from our subscriptions budget. Why? – was the very first question I emailed the subject librarian at my institution. To their credit, I got some wonderfully informative replies from our librarian staff – I have no doubt they’ve done their best, given the limited powers they have. Like all institutions, we don’t have an unlimited budget. Something had to be cut, and unfortunately it was our subscription to Biology Letters. Which by the way, would only have cost us £852 for an institutional online-only subscription.

Why was this journal, of which I read/used at least 15 separate articles of in 2011 alone, cut from our subscriptions instead of a journal like… Elsevier’s ‘International Journal of Coal Geology‘?*

I think this is a fair question to ask. Biology Letters has a higher impact factor, not that the journal Impact Factor is a particularly brilliant metric of quality and would cost a lot less (£1107 [Biol. Lett. print version+online] vs 2540 Euros; the current institutional subscription price for the print version ‘International Journal of Coal Geology’). Most damningly of all, I suspect no-one at my institution ever reads this Elsevier journal, feel free to correct me on this – I’m sure I could find plenty of other Elsevier journals that satisfy this last property.

But the answer to this question is of course not relevant to any of 3 rational above points (unfortunately) – Biology Letters can be cut because it’s vulnerable, as it’s not part of a MegaBundle sold by a large for-profit publisher. The International Journal of Coal Geology cannot be cut because access to it comes as part of a ‘Big Deal’ bundle, in which there are some *vital* journals to which we *must* have access to (and the corporation selling access, knows and exploits this). So despite the fact that no one needs it here, that it’s ~2x more expensive, and it has a lower Impact Factor – I have access to this, and many countless other bundled journals I DON’T need, and I DON’T have access to vital articles from another journal I *do* need for my research.

Welcome to the crazy world of academic publishing! Much of it simply doesn’t make sense in the Digital Age. Of current explanations, I’d say Mike Taylor’s parable explains this most clearly.

I can’t claim to have explained all of the problems and intricacies here – but rest assured it clearly doesn’t make sense to me. Journal mega-bundling is plainly inefficient, and we can’t let this practice continue.

Stop feeding the beast! The Cost of Knowledge

Footnotes:
* Through-out this post I use the example of the International Journal of Coal Geology, not out of disrespect for the editorial board, or the scholarly quality of the work presented there-in – I’m sure it’s great if you’re into Coal Geology. I only use it because a) it’s an Elsevier journal to which Elsevier very arguably adds very little value to, and b) I sincerely believe virtually no researchers at my institution make use of this journal.

** Just for the record, I don’t blame RSP or my librarians for this subscription cut happening. It’s out of their control. RSP do a great job IMO, as do my librarians.

*** I just read that one UK institution pays over £1,000,000 (yes, more than a million) every year for Elsevier’s ‘Big Deal’ bundle (source). I think this is a disgraceful ransom.

I’m really pleased this new Open Access paper has just been published.

CC BY 3.0 Zookeys Special Issue 150ResearchBlogging.org

Hagedorn, G. et al. Creative commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information 150, 127-149 (2011).

Some background…

After parading my Open Data t-shirt (pictured below) around the Society of Vertebrate Paleontology meeting this month, I was invited to give an impromptu pitch in front of the great and good of the Mammal AToL project & MorphoBank people. Having pointed out to MorphoBank a while ago that they should really make explicit the terms and conditions [license] under which they make their (?) data available, I naturally advocated CC-BY 3.0 and CC0 licences. I talked about this very subject and pleaded with them NOT to use the NC clause refering to Rod Page & Peter Murray-Rust ‘s [1,2] thoughts on the matter.

Data providers vs Data re-users – need they really be in opposition?

The trouble is, a lot of (data providing) institutions seem hell-bent on ‘protecting commercial interests’, at the expense of research opportunities. So as I understand it, at the moment databases such as these face an awkward problem of either satisfying the restriction requests of data providers OR satisfying permissiveness of re-use by data re-users [such as myself!], and the needs of both camps are seldom entirely met.

Conclusion

I see this paper as an important step in persuading such restriction-minded institutions of the absolute importance of #OpenData / #PantonPrinciples and how NC clauses can genuinely obstruct and impair real academic research.
I just hope people read it and take note!

[Most of this is just a re-post of my spur of the moment G+ post Research Blogging to give this paper the publicity it deserves. Much of the content is widely applicable IMO to most of scholarly communications, not just biodiversity informatics, and indeed the whole ZooKeys special issue (Open Access) is well worth a browse.]

References

[1] http://iphylo.blogspot.com/2010/12/plant-list-nice-data-shame-it-not-open.html
[2] http://blogs.ch.cam.ac.uk/pmr/2010/12/17/why-i-and-you-should-avoid-nc-licences/
[3] Hagedorn, G., Mietchen, D., Morris, R., Agosti, D., Penev, L., Berendsohn, W., & Hobern, D. (2011). Creative Commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information ZooKeys, 150 DOI: 10.3897/zookeys.150.2189

Yesterday’s post about haywire RSS feeds, reminded me that I should perhaps share a trick or two I know about RSS feeds.

This post assumes you know what an RSS feed is, and why they’re awesome. I still encounter researchers everyday who have no idea what an RSS feed is. I have no idea how they cope with the sheer volume of literature being produced these days without RSS feeds!

1.) RSS feed filtering

why?
Some journals e.g. PLoS ONE put out a *lot* of new research articles each and every week. So much so that it’s tiresome and time-wasteful to even read just the titles, let alone the abstracts of each and every new article published in this journal.

This should not be taken as a criticism of such high-volume journals. I’m very supportive of Open Access publishing, and the higher the volume of articles in Open Access rather than closed access, the better (for science) as far as I’m concerned. All one needs to do is apply some conservative filtering criteria to such feeds so that one receives only items of interest.

how?
My interest is in phylogenetics. Therefore I filter the PLoS ONE new article (all subjects) alert feed by subject specific keywords using Yahoo Pipes (see below).
adge

If the wildcard filters for ‘phylo*’ and ‘clad*’ work, then the other filters are probably redundant, but just in case y’know.
The resultant output of this feed (here), significantly tames the PLoS ONE deluge to a relevant and manageable trickle.
There are many other ways of filtering RSS feeds, but the graphical nature of Yahoo Pipes IMO makes it very recommendable.

It’s worth noting as well that PLoS provide their own filtered feeds here broken down by subject, but this isn’t helpful for me, as my research interest often pop-ups in many different subject classifications.

2.) RSS feed creation

why?
Perhaps a journal / database / website of interest to you doesn’t provide an RSS feed. So you can’t otherwise easily track updates to it. With research, I think it’s very important to keep up to date with the latest developments. Journals, databases and websites *should* of course always provide RSS feeds for you but a minority in my experience don’t.

The solution for these cases is: DIY!

how?
Again there are a huge swathe of options to help you ‘roll your own’ RSS feed. Some are reasonably complex and highly configurable e.g. http://feed43.com/ Whilst others are really simple, but not so adaptable e.g. http://page2rss.com/

The latter, simple option works very well for me, so I can keep up to date with latest additions to the MorphoBank database.

I’d be interested to know if anyone had any further recommendations for RSS feed creation tools, other RSS-related tips & tricks and/or interesting research related use-cases.

PS Should anyone wish to subscribe to my output, the RSS feed for this blog is here (and in the top right hand corner, I should probably make it a bit more obvious though!)

Further Reading:

http://iphylo.blogspot.com/2009/07/how-to-publish-journal-rss-feed.html