Show me the data!
Header

Recently I had the opportunity to collaborate on an extremely timely paper on data sharing and data re-use in phylogenetics, as part of the continuing MIAPA (Minimal Information for a Phylogenetic Analysis) working group project:

 

Additionally, in order to also practice what we preach about data archiving, we opted (it wasn’t mandated by the journal or editors) to put the underlying data for this publication in Dryad so it was immediately freely available for re-use/scrutiny/whatever upon publication of the paper, under a CC0 waiver

Dryad (and similar free services like FigShare, MorphoBank & LabArchives) allow research data to be made available either pre-publication, on publication, or even post-publication with optional embargoes (access denied for up to 1-year after the paper is published). I’m strongly against the use of data embargoes but Dryad allow it because embargoed data is better than no data at all! I’ve seen some recent papers that have made use of this option and apparently the journals, editors & reviewers are ‘fine’ with this practice of proactively denying access to data. I guess it’s a generational thing? That sort of practise used-to understandably be okay pre-Internet when digital data was costly to distribute. But now we can freely & easily distribute supporting data, there are a multitude of reasons why we really should unless there are justifiable reasons not to e.g. privacy with sensitive medical/patient data.

 

I haven’t had all that much experience of the publication process so far – I’m amazed how kludgy it can be at times – far from smooth or efficient IMO. I was in charge of the Dryad data deposition for this paper among other things and because the journal isn’t integrated with Dryad’s deposition process it took me quite a few emails to work out what & when to do things but it wasn’t a major difficulty – the benefits of doing this will almost certainly outweigh the small effort cost of doing it. Those journals with a Dryad-integrated workflow will no doubt have a smoother process.

Another thing I learn’t from this manuscript was that publishers commonly outsource their typesetting to developing countries (for the cheaper labor available there). So in this instance BMC sent our MS to the Philippines to be re-typeset for publication and when the proofs came back we encountered some really comical errors e.g. Phylomatic had been re-typeset as ‘phlegmatic’. This sparked a very serendipitous conversation on Twitter, which eventually led to Bryan Vickery (Chief Operating Officer at BMC) inviting me to visit the London office of BMC to have a chat about ‘all-things-publishing’ (and btw, serious *props* to PLOS and BMC for having such nice, helpful tweeps on Twitter):

http://storify.com/rmounce/bmc

Bryan and I arranged a time and a date (after-SVP) and so I ended-up visiting BMC for more than 2 hours on Wednesday 24th October. I got to meet not only Bryan but also Deborah Kahn, Shane Canning and others including some of the editors for BMC Research Notes (thanks again for helping publish our paper!) & BMC Evolutionary Biology. Iain Hrynaszkiewicz was there too (Hi Iain!), given our enthusiasm for Open Data (do read his *excellent* paper ‘Open By Default’ in the same article collection as ours) I’m sure we’ll meet again at more workshops and events in future.

I couldn’t possibly go through everything that was explained to me there but it certainly was illuminating. I suspect many junior academics like myself have little or no clue at all as to the behind-the-scenes processes that go on with manuscripts to get them into a state ready for publication. Perhaps a publisher visit (or even short placement?) scheme like this should be run as part of a postgraduate skills training session? Moreover perhaps it could help alleviate the ‘too many PhDs, too few academic jobs‘ problem by highlighting skilled sciencey jobs like STM publishing as viable and noble alternatives to the extremely overpopulated rat-race for tenure-track academic jobs. STM publishing isn’t even an ‘exit’ from academia. People like Jenny Rohm (chair of Science is Vital) have demonstrated that one can go into STM publishing and still go back into academia after this.

The cost of peer-review & publishing

 

This part of the post has sat on the backburner for a long time because it’s a complex one.

From what I was told (and I could well believe) organizing peer-review can be an immensely variable process. Sometimes it can very simple. Automated processes such as peer2ref can be used to select appropriate reviewers for a manuscript, if these reviewers accept and get on with it nicely and in a timely fashion the process can be of very little administrative burden. However there are also times when maybe 10 or 12 reviewers need to be contacted before 2 may agree and then there can be complications after this leading to a very time consuming, costly and burdensome process. So organizing peer-review costs money, but it’s difficult, or perhaps commercially-sensitive (?) to put an average price on that process -> I’m still in the dark on how much this process should cost. If anyone knows of a reputable source for data on this please do let me know.

 

What of DOI’s?  Why do some high-volume journals like Zootaxa & Taxon operate without DOI’s? Is there really much money to be saved by dispensing with them? Well, Bryan kindly pointed me to this link here for all the salient info.

It’s just $1 per DOI. That’s nothing tbh. What’s more, it’s even cheaper to retrospectively add DOI’s to older already published content: ‘backfile’ DOI’s are just $0.15. That means Zootaxa could retrospectively add DOI’s to all ~5866 of their backfile articles (2004-2009) for just $880 !  There’s plenty of other things that would need fixing before that happened though, Zootaxa doesn’t even have proper article landing pages as was pointed out to me by Rod Page. No doubt there would also be some labour cost associated with getting someone to add DOI’s to all those thousands of articles. Still, it looks cheap to me. I still feel justified in my annoyed rant I sent to TAXACOM a while ago about this pressing issue with respect to DOI’s and responsibility of publishers.

This also has ramifications for some of the changes I’ve been pushing for now I’m on the Systematics Association council. Our main publication is a book and each of the chapters *could* but currently don’t have DOI’s issued for them, I suggested we issue DOI’s at the last council meeting, but alas it’s not up to me, we need co-operation from our publisher to make this happen (Hi, Cambridge University Press!). Book chapter DOI’s cost just $0.25 per DOI, so I think this small cost would certainly be worth it, if it raises the discoverability and citeability of our publications.

Article submission

A final point of interest from my BMC visit: Bryan told me that BMC used to offer a means by which authors could submit their works directly via an XML authoring tool. It wasn’t popular, but I wonder whether this was perhaps because it was a little before its time? The whole process of Biologists submitting Word files, having figures and text inadvertently mangled and wrongly re-typeset at the publisher seems extremely inefficient to me. Physicists & Computational Scientists seem to get along fine with LaTeX submission processes which alleviate some but not all of the typesetting shenanigans. Perhaps it is the authors, and the authoring tools that need to change to enable more re-usable research in the future, to fully enable the potential of the semantic Web. It looks like Pensoft might be trying to go again in this direction with its Pensoft Writing Tool.

image by Gregor Hagedorn. CC BY-SA

 
On that note, it might be good to end with a small advert  for the Pro-iBiosphere biodiversity informatics & taxonomy workshop in February, 2013 Leiden (NL).

I very much look forward to meeting taxonomists IRL!

 

 

 

 

My submitted HowOpenIsIt? comments

October 8th, 2012 | Posted by rmounce in Panton Fellowship updates - (Comments Off on My submitted HowOpenIsIt? comments)

I just submitted some comments to SPARC / PLOS / OASPA’s request for public comment on their new HowOpenIsIt? material here. If you haven’t done so yourself, the deadline is TODAY 5pm (EST).

Below are the comments I submitted. A mixture of praise for remembering to include machine-readability. Concern over some possible interpretations, and practical points on providing Hyperlinks or URLs for all the CC licenses mentioned:

Comments:

* I heartily support & commend that Machine Readability takes pride of place within this guide to Open Access. This freedom was there from the start in the Budapest declaration: “…crawl them for indexing, pass them as data to software, or use them for any other lawful purpose…” but in recent years this freedom has been often neglected by some, and worse actively-restricted by some subscription-based publishers in their contractual agreements. Yet it represents one of the most important freedoms that needs to be enabled by Open Access. It has been estimated that over 50 million academic articles have been published and the volume of publications is increasing rapidly year on year. The only rational way we’ll be able to make full use of all this research both NOW and in the future, is if we are allowed to use machines to help us make sense of this vast and growing literature.

* I am slightly worried that the statement on machine readability for Open Access, could yet still provide a barrier for use by publishers to protect their content from mining: “…through a community standard API or protocol” perhaps leaves too much to interpretation. The API provided could be a poor one, inflexible and not sufficiently cutting-edge for the research required. I think there is no need for a clause on how machines might be let access to Open Access research if it is published CC-BY as mentioned under Reuse Rights. Only that the medium in which the work is published (PDF, HTML, XML or other) is sufficiently machine-interpretable and not DRM-protected.

* I support that the guide itself is licensed under CC-BY-NC-ND to prevent derivative or modified works, to prevent interoperability problems. This is in line with both W3C (http://www.w3.org/Consortium/Legal/IPR-FAQ-20000620) and IETF practices.

* May I suggest the paper version of this guide (if there is to be one) be printed with full URLs to the CC-BY-NC-ND, CC-BY, & CC BY-NC licenses mentioned in the guide. Likewise the electronic/digital version should have clickable hyperlinks to further explain these contractions.

* I think the guide should make it clearer that the label ‘Open Access’ should only be applied to content that has all of the full top-line suite of rights. Anything less than this in any of the categories is nearly but not quite Open Access. There are other terms available for such less Open content, like ‘free access’, ‘public access’, ‘less-restricted access’ that can all be applied in some form or combination to apply to the set of rights in between ‘Open Access’ and ‘Closed Access’. This guide should reaffirm that only the full suite of Open rights makes a work Open Access.

* However, I do wonder if the question of who holds copyright (author or publisher) is somewhat irrelevant to Open Access? I certainly support that authors retain copyright to their own content, but in instances where the publisher has taken the copyright and the work is in all other respects fulfilling the other qualities of Open Access – is this not Open Access? Surely then the Copyright column is just a special case subset of the Reuse Rights column? The issue of who holds copyright is something important but separate to Open Access in my opinion.

* Ditto for ‘Author Posting’ this duplicates what is given in the Reuse Rights column, just a special case for the author. This section is usefully distinct in grey not-quite-Open Access cases, but for Open Access it is just a rewritten duplication that *anyone* has the right to reuse/repost.

At some point I also intend to make comment on BMC’s Open Data & Open Bibliography RFP but the deadline for that is much later and I have LOTS of work to do in the mean time, so that’ll have to wait for a bit…

Opportunity Knocks

October 3rd, 2012 | Posted by rmounce in Open Access | Palaeontology - (2 Comments)

A few months ago I gave a short talk about the Open Knowledge Foundation and its activities as relevant to academics at a small (but good!) palaeontology conference in Cambridge (which I blogged about previously).

I didn’t need to give this talk. Neither the OKF nor my academic progression required me to give this talk. I just felt it might be helpful to let my friends and peers know who the OKF are, what they’re trying to achieve, and what my Panton Fellowship is about.

That optional talk has now paid HUGE dividends: enabling me to talk live on BBC Radio 3 last night about Open Access and the beneficial impact this will have on research with our Minister for Science & Universities, David Willetts MP & Dame Janet Finch (writer of ‘the Finch report’). I got some good time at the end after the show to speak with David about encouraging efficiently run ultra low-cost journals like the Journal of Machine Learning Research. I hope this will have had some influence, if not, I certainly tried!

So how did this come about?

Nick Crumpton, PhD student at the University of Cambridge, and one of the student organisers for Progressive Palaeontology 2012 (ProgPal) is also a BBC Online British Science Association Media Fellow and thus has good contacts at the BBC. They were apparently looking for a young scientist to come on the show and give an informed opinion from ‘the coalface’ of research so Nick kindly remembered my impassioned talk from ProgPal on OKF & openness in academia and recommended me.

I got in touch with the programme producer, and was invited to join the live radio debate later that night.

Image © British Broadcasting Company. Click through to listen to the radio programme. The Open Access discussion segment occurs from about 6min40s in

…and that’s how it happened.

With Open Access Week coming up very soon, 22-28 October, I guess the point of this post is:

No matter how small your contribution towards the advocacy of Open Access might seem; every little helps. Keep at it. Keep speaking out about OA until all publicly funded research everywhere (glares at the US) is Open Access.

Postscript: That same day Sir Mark Walport was also interviewed on BBC Radio, partly about Open Access – I highly recommend & agree with his opinions; the link is here. Listen from 11.38 to 15.10 for the OA bits h/t Steve Hitchcock @stevehit

The Open Knowledge Festival 2012, Helsinki: some notes

September 26th, 2012 | Posted by rmounce in Conferences | Panton Fellowship updates - (Comments Off on The Open Knowledge Festival 2012, Helsinki: some notes)

Wow! Where to begin… In this post I shall attempt to summarise some of OKFestival 2012.

Some Background:

I had been to the Open Knowledge Conference last year (in Berlin), where I gave an invited talk on Open Palaeontology and met lots of brilliant people in the Open Science community like Bjoern Brembs, Cameron Neylon & Peter Murray-Rust. But this year the event was even bigger, and even better – teaming up with the annual Open Government Data Camp for a mega-event.

The Event Itself:

It was a little awkward that it was held so far away from most of the conference accommodation – everyone had a 20-30 minute commute before getting to the venue, and some of the talk rooms were fairly far apart. But once the conference goers got used to that it was plain sailing from there, and the Aalto University buildings themselves were wonderfully modern and well equipped for it (inc. great WiFi). I got to Helsinki on the Tuesday, and caught the tail end of the Data Journalism session that day including an excellent, inspirational talk on shippr.org amongst other things. It detailed the amazing knowledge and insight gained from tracking the movement of ships with open data. I couldn’t help thinking that academics could learn a lot from these open data visualization experts (myself included!).

An interesting example of Shippr data – ships turn off their beacons once they pass the point for fear of pirates…

Wednesday – my chance to make a difference

I really liked the way that the conference had an introductory session to the days parallel events in the morning from 10am – 11am. If one was unsure of which stream to go to – these Morning Plenaries gave each topic stream a chance to pitch their events in a short slot to the awaiting audience. I thought this was very helpful given there were 13 separate topic streams at the conference!

I was involved in two sessions this day. Firstly the Open Access discussion panel, the video for which is here with Tim Hubbard (Sanger Institute), Carlos Russel (World Bank), Peter Murray-Rust (University of Cambridge / Open Knowledge Foundation) and Tom Olijhoek & Mark MacGillivray (Open Access Index):

It’s a long video, we covered many topics, with excellent contributions from the audience including Puneet Kishnor from Creative Commons and Matt Todd from the Open Source Drug Discovery team amongst others.

Then after this there was the research data session with contributions from Mark Wainwright on CKAN, Mark Hahnel on Figshare and Joss Winn of the Orbital project.

Finally we finished with the Panton Fellowships Session with talks from myself and Sophie Kershaw on what we’d been doing in our fellowship work:

The day was rounded off with a hugely inspirational talk from Matt Todd summarising his Open Source Drug Discovery work in the main lecture theatre, with a lovely if expensive meal afterwards in Lasipalatsi Ravintola.

Thursday

I spent some quality time with Peter working on a BBSRC grant proposal.
I also thoroughly enjoyed Hans Rosling’s fantastic key note presentation which I urge you all to watch – it was brilliant, and thrilling to be there live in the audience for.

Friday

If there’s one thing that impresses me most of all about OKFestival, it’s this: it’s not just about talking – they do things here too. Lots of ‘hacking’ sessions on Friday to create new tools and collate awesome new data. Most conferences are extremely boring in that it’s just talk after talk after talk. Things get done here, new collaborations are started, fresh links across disciplinary boundaries are made connecting journalism with academia, economic development with open architectural design, and other incredible trans-disciplinary mashups. It’s a joy to behold.

I’m really glad I came to OKFestival, as ever I got a lot out of it.

Next year it’ll be in Switzerland (?), I hope I didn’t just make that up… I seem to remember that it was announced to be there but I couldn’t find any confirmation from Google. Rest assured I’ll try and be there though!