Show me the data!

I just submitted some comments to SPARC / PLOS / OASPA’s request for public comment on their new HowOpenIsIt? material here. If you haven’t done so yourself, the deadline is TODAY 5pm (EST).

Below are the comments I submitted. A mixture of praise for remembering to include machine-readability. Concern over some possible interpretations, and practical points on providing Hyperlinks or URLs for all the CC licenses mentioned:


* I heartily support & commend that Machine Readability takes pride of place within this guide to Open Access. This freedom was there from the start in the Budapest declaration: “…crawl them for indexing, pass them as data to software, or use them for any other lawful purpose…” but in recent years this freedom has been often neglected by some, and worse actively-restricted by some subscription-based publishers in their contractual agreements. Yet it represents one of the most important freedoms that needs to be enabled by Open Access. It has been estimated that over 50 million academic articles have been published and the volume of publications is increasing rapidly year on year. The only rational way we’ll be able to make full use of all this research both NOW and in the future, is if we are allowed to use machines to help us make sense of this vast and growing literature.

* I am slightly worried that the statement on machine readability for Open Access, could yet still provide a barrier for use by publishers to protect their content from mining: “…through a community standard API or protocol” perhaps leaves too much to interpretation. The API provided could be a poor one, inflexible and not sufficiently cutting-edge for the research required. I think there is no need for a clause on how machines might be let access to Open Access research if it is published CC-BY as mentioned under Reuse Rights. Only that the medium in which the work is published (PDF, HTML, XML or other) is sufficiently machine-interpretable and not DRM-protected.

* I support that the guide itself is licensed under CC-BY-NC-ND to prevent derivative or modified works, to prevent interoperability problems. This is in line with both W3C ( and IETF practices.

* May I suggest the paper version of this guide (if there is to be one) be printed with full URLs to the CC-BY-NC-ND, CC-BY, & CC BY-NC licenses mentioned in the guide. Likewise the electronic/digital version should have clickable hyperlinks to further explain these contractions.

* I think the guide should make it clearer that the label ‘Open Access’ should only be applied to content that has all of the full top-line suite of rights. Anything less than this in any of the categories is nearly but not quite Open Access. There are other terms available for such less Open content, like ‘free access’, ‘public access’, ‘less-restricted access’ that can all be applied in some form or combination to apply to the set of rights in between ‘Open Access’ and ‘Closed Access’. This guide should reaffirm that only the full suite of Open rights makes a work Open Access.

* However, I do wonder if the question of who holds copyright (author or publisher) is somewhat irrelevant to Open Access? I certainly support that authors retain copyright to their own content, but in instances where the publisher has taken the copyright and the work is in all other respects fulfilling the other qualities of Open Access – is this not Open Access? Surely then the Copyright column is just a special case subset of the Reuse Rights column? The issue of who holds copyright is something important but separate to Open Access in my opinion.

* Ditto for ‘Author Posting’ this duplicates what is given in the Reuse Rights column, just a special case for the author. This section is usefully distinct in grey not-quite-Open Access cases, but for Open Access it is just a rewritten duplication that *anyone* has the right to reuse/repost.

At some point I also intend to make comment on BMC’s Open Data & Open Bibliography RFP but the deadline for that is much later and I have LOTS of work to do in the mean time, so that’ll have to wait for a bit…

Wow! Where to begin… In this post I shall attempt to summarise some of OKFestival 2012.

Some Background:

I had been to the Open Knowledge Conference last year (in Berlin), where I gave an invited talk on Open Palaeontology and met lots of brilliant people in the Open Science community like Bjoern Brembs, Cameron Neylon & Peter Murray-Rust. But this year the event was even bigger, and even better – teaming up with the annual Open Government Data Camp for a mega-event.

The Event Itself:

It was a little awkward that it was held so far away from most of the conference accommodation – everyone had a 20-30 minute commute before getting to the venue, and some of the talk rooms were fairly far apart. But once the conference goers got used to that it was plain sailing from there, and the Aalto University buildings themselves were wonderfully modern and well equipped for it (inc. great WiFi). I got to Helsinki on the Tuesday, and caught the tail end of the Data Journalism session that day including an excellent, inspirational talk on amongst other things. It detailed the amazing knowledge and insight gained from tracking the movement of ships with open data. I couldn’t help thinking that academics could learn a lot from these open data visualization experts (myself included!).

An interesting example of Shippr data – ships turn off their beacons once they pass the point for fear of pirates…

Wednesday – my chance to make a difference

I really liked the way that the conference had an introductory session to the days parallel events in the morning from 10am – 11am. If one was unsure of which stream to go to – these Morning Plenaries gave each topic stream a chance to pitch their events in a short slot to the awaiting audience. I thought this was very helpful given there were 13 separate topic streams at the conference!

I was involved in two sessions this day. Firstly the Open Access discussion panel, the video for which is here with Tim Hubbard (Sanger Institute), Carlos Russel (World Bank), Peter Murray-Rust (University of Cambridge / Open Knowledge Foundation) and Tom Olijhoek & Mark MacGillivray (Open Access Index):

It’s a long video, we covered many topics, with excellent contributions from the audience including Puneet Kishnor from Creative Commons and Matt Todd from the Open Source Drug Discovery team amongst others.

Then after this there was the research data session with contributions from Mark Wainwright on CKAN, Mark Hahnel on Figshare and Joss Winn of the Orbital project.

Finally we finished with the Panton Fellowships Session with talks from myself and Sophie Kershaw on what we’d been doing in our fellowship work:

The day was rounded off with a hugely inspirational talk from Matt Todd summarising his Open Source Drug Discovery work in the main lecture theatre, with a lovely if expensive meal afterwards in Lasipalatsi Ravintola.


I spent some quality time with Peter working on a BBSRC grant proposal.
I also thoroughly enjoyed Hans Rosling’s fantastic key note presentation which I urge you all to watch – it was brilliant, and thrilling to be there live in the audience for.


If there’s one thing that impresses me most of all about OKFestival, it’s this: it’s not just about talking – they do things here too. Lots of ‘hacking’ sessions on Friday to create new tools and collate awesome new data. Most conferences are extremely boring in that it’s just talk after talk after talk. Things get done here, new collaborations are started, fresh links across disciplinary boundaries are made connecting journalism with academia, economic development with open architectural design, and other incredible trans-disciplinary mashups. It’s a joy to behold.

I’m really glad I came to OKFestival, as ever I got a lot out of it.

Next year it’ll be in Switzerland (?), I hope I didn’t just make that up… I seem to remember that it was announced to be there but I couldn’t find any confirmation from Google. Rest assured I’ll try and be there though!

I said I would make an update on Tuesday (today), so if I get this posted before midnight I will (just) have met that  goal…

In this (minor) update I have:

added: Ubiquity Press (great low cost option!), SPIE (scored for 1-column per page), SAGE Open, Frontiers, WileyOpenAccess, OxfordOpen (OUP hybrid option), GigaScience, Open Biology (Royal Society)

added the label for: Pensoft (sincerest apologies, it is tied with Copernicus and was on the 0.1 plot, just unlabelled!)

changed the categorization of: Scientific Reports (NPG) [I have put it in a no-mans-land between CC BY and CC BY NC since they give authors a choice of licenses. I think this is a bad idea as it allows authors to make the mistake of choosing a less open licence (are there really any common circumstances in which they might want a less open, free to read licence?)]


As noted elsewhere there are actually a lot of completely fee-free Gold Open Access journals out there (I shall try and make a listing of them in a future post), they’re just not perhaps all that well-known. GigaScience and Open Biology (Royal Society) are temporarily completely fee-free options that certainly look like good recommendations!


I shall endeavour to add-in more of a variety of the various differently priced BMC journals in the next update of the plot. Basically I believe most of them lie in the range between BMC Research Notes, and BMC Biology.

My site stats show that in just a few days v0.1 of the plot had nearly 1000 pageviews, which is HUGE for my otherwise low-key blog!

And it has had real impact already. Thanks to Mike Taylor, Acta Pal. Polonica is thinking of adopting the CC BY licence. Brilliant news! It is fee-free but not explicitly licensed to allow re-use at the moment. Hopefully this will change soon.


Anyway, I have to get off the train now, so that’ll be the end of this post.




Since Sunday afternoon I’ve been at an International Council for Science (ICSU) / Royal Society invited workshop on ‘Revaluing Science in the Digital Age’.

We’ve had a fascinating set of talks from academics, publishers (PLoS, Nature, BMC), librarians, policymakers, data managers, scientific societies…

Attendees included:
Jose Cotta, European Commision

Mark Thorley (RCUK)
Chris Banks  (University Librarian and Director, Aberdeen)
Mark Hahnel (Figshare)
Max Wilkinson (UCL, Head of Research Data Service)
Dave Roberts (ViBRANT)
Rob Frost (GSK)
Catriona MacCallum (PLoS)
Mark Forster (Syngenta)
Iain Hrynaszkiewicz (BMC)
Ruth Wilson (Nature Publishing Group)
Kaitlin Thaney (Digital Science)
Stuart Taylor (Royal Society)
Robert Simpson (Zooniverse)
Paul Groth (OpenPHACTS)
and more…


I gave a talk on content mining and the importance of full BOAI-compliant Open Access with respect to this, on behalf of the Open Knowledge Foundation:

There was lots of discussion on reproducibility, provenance of data, peer review, incentives, research misconduct and ethics.

I’ve met many new people and have learnt many new things. For example, on the subject of reproducibility I talked about Roger Peng and the journal Biostatistics in discussion, and then was soon informed that there was an analogous journal in Chemistry called Organic Syntheses whereby:

In order for a procedure to be accepted for publication, each reaction must be successfully repeated in the laboratory of a member of the Editorial Board at least twice, with similar yields (generally ±5%) and selectivity similar to that reported by the submitters.

Fantastic! We were also informed that this rigorous protocol ensures that research published in this journal is very highly regarded. I’ve suggested similar such reproducibility checks for phylogenetics research before (at the Systematics Association Biennial meeting Belfast, 2011) but this was viewed as too futuristic / infeasible…

Right now we’re working on a draft statement of outcome from this workshop that ICSU can pass to its members to possibly officially agree to endorse.

So I better finish here, and get back to the discussion.
I’m rather hoping they will endorse the Panton Principles rather than reinvent the wheel (policy-wise).

Exciting times!


PS I have made a Storify of the tweets from the workshop here .