Show me the data!

yet another #solo12 recap (part2)

November 19th, 2012 | Posted by rmounce in Conferences | Open Data

A couple of days ago I posted specifically about the data re-use session.
I’m going to use this post to muse about the conference more generally.

About SpotOn London 2012

It used to be called Science Online London – an informative, sensible and appropriate name. This year I hear (rumours) that it had to change name to SpotOn because Science AAAS or some other litigious entity was claiming brand identity infringement. I have sympathies with the organisers for this enforced change but ‘SpotOn London’ which apparently stands for “Science Policy, Outreach and Tools Online” (you wouldn’t know unless told!) is not my cup of tea, tbh. I suggest next year we continue to use the #solo hashtag and continue to call it Science Online London (informally), even if for legal reasons this can’t be the official conference title.


As the focus of the conference was tripartite: “Policy“, “Outreach“, and “Tools“; the mix of speakers, panellists & attendees was refreshingly diverse – unlike most academic conferences I go to. High-level & high-profile academics like Prof. Stephen Curry (Imperial College), Prof. Athene Donald (U.of Cambridge),Dr Ethan Perlstein and Dr Jenny Rohn (Science is Vital) mixed freely with PhD students like myself, Jon Tennant, Jojo Scoble, Nick Crumpton, Tom Phillips and others. There were policy people like Mark Henderson and Nic Bilham and even politicians themselves: we should all be grateful for Julian Huppert MP for Cambridge, one of unfortunately few UK politicians to take a genuine interest in science. There were publishers reps including Matt Hodgkinson & Martin Fenner (PLOS), Brian Hole (Ubiquity Press), Graeme Moffat & Kamila Markram (FrontiersIn), Ian Mulvany (eLife), Michael Habib (Elsevier), and ‘independents’ like Anna Sharman (Anna Sharman Editorial Services) & Kaveh Bazargan (of River Valley). Librarians Peter Morgan (U.of Cambridge) & Frank Norman, research funders Geraldine Clement-Stoneham (MRC), journalists Ed Yong and really interesting people who defy easy classification(!) like Brian Kelly (UKOLN), Tony Hirst, and the Digital Science team (some of): Euan Adie, Mark Hahnel and Kaitlin Thaney.

Now apologies to those I didn’t name check in the above list – there were many other brilliant and interesting people there (Ed Baker, Vince Smith, David Shotton, Josh Greenberg, I could go on… There’s a fuller list of attendees by Twitter handle here). I merely selected a few from broad categories to show the impressive diversity of representation there. This is one of the very best things about the conference – it attracts virtually all of the stakeholders of science. It’s not just about researchers, publishers, research funders and librarians – it rightly recognises that science isn’t only for ivory tower academics; it’s for everyone.

[Incidentally, for those interested I’d say gender diversity was quite balanced. Alas racial diversity was rather too imbalanced – perhaps sadly reflective of academia as whole?]

As befits a conference formerly known as ‘Science Online’ *all* of the talks were recorded & tweeted, so there are videos on youtube of every single one, and Storifys (of the tweets) available to view.

Selected Highlights (aside from the #solo12reuse session):

The journal is dead, long live the journal

In the early stages of this session, I was worried it wasn’t really going anywhere interesting with the discussion…

and then Dr Kaveh Bazargan took the microphone at about ~28:22 (skip to that section, it’s brilliant)

on the publishing process, author manuscript submissions & typesetting:

It’s madness really. I’m here to say I shouldn’t be in business.

as any manuscript-submitting biologist knows… publishers ask for all sorts of ridiculously pedantic formatting from us, particularly for reference lists. As Kaveh reminds us this is all pointless and stupid because when the publishers get this, they send it off to typesetters to be typeset anyhow – this process is hugely inefficient: “madness”. Not only this, but if a submitted manuscript gets rejected from one journal, the poor authors have to waste often significant amounts of time and energy to re-format their manuscript to suit the stylistic vagaries of another journal. Microsoft Word is not a good authoring tool – it’s largely unstructured. The publishing process requires a high-degree of technical structure, usually provided by XML or TeX.

If you dig into the issue a little bit. You’ll see that programs like Mendeley (and any other reference manager I would think) are fully capable of providing reference lists as structured XML. And yet journal policies enforce that we submit plain text (in say a Word doc), only for the typesetters to get paid by the publishing companies to then re-implement those plain text references back into fully-structured XML. Madness! 

Typesetters are mostly located in areas where the labour is cheap. India, Phillipines etc… it’s an intensely manual process and perhaps in future may be less of a necessity(?). Furthermore, as I discovered with a recent BMC manuscript I was an author on, typesetters can sometime introduce new errors into the publication process which slow down the process even further! I commend the brutal honesty of Dr Bazargan in bravely speaking-out about this, this is an issue completely separate to OA/TA journals; both can be guilty of this madness.

It also affects re-use potential as he also remarked in the #solo12reuse session. Not every publisher publicly exposes the XML version of the papers they publish, these are of extreme importance to re-use potential (e.g. mining) – the Geological Society of London and their publications are one of these which is a great shame. I asked Neal Marriott of GSL about this back in June via email and he replied: “We do not currently have a feature to allow download of the NLM XML source.”   I also tried to take this up politely with Nic Bilham at the bar after the first night of the conference, but for someone with “external relations” in his job title I found him rather frosty towards me. Happy to say I had no such problems with Grace Baynes (NPG) it was charming to meet her in person after our exchanges regarding NPG’s new OA pricing strategy.

So how do we get GSL and other publishers to expose their XML which they surely have? They already provide HTML & PDF versions, what’s difficult about exposing the underlying XML version too?



I was at another session at the time this was going on but I think this is an important session I should highlight. Broadening the assessment & evaluation of research beyond incredibly narrow metrics (like the journal Impact Factor; die die!) is something that’s clearly very important. Everyone agrees it’s “early days” and that not all impact is measurable, but that shouldn’t dissuade us from actively researching and cautiously embracing this new (positive) trend.

Fraud in Science

Virginia Barbour, Ed Yong and others were on the panel for this one and again, whilst I wasn’t in the room at the time for this session – looking back at the video, I rather wish I was at the session – it was really interesting:

  • Virginia Barbour 18:45 “I think there’s much more evidence of sloppiness than outright fraud… at some PLOS journals we ask authors for the original figures to check for figure manipulation before acceptance… when we ask authors to supply these a large number of authors can’t do it” (is it really that they can’t find the files, or just that they don’t *want* to supply the originals?) “It is completely unacceptable, but not uncommon” Amen Virginia – I agree very much!
  • Virginia Barbour 21:36 “…the larger issues that plague science; sloppiness, unwillingness to share data, conflict of interest, and publication bias. There are *solutions* to these and the great thing is that the internet makes it much easier to spot and actually makes it easier to address than previously…” +1
  • much of the later talk about clinical trials refers to work done in this paper: Prayle, A. P., Hurley, M. N., and Smyth, A. R. 2012. Compliance with mandatory reporting of clinical trial results on cross sectional study. BMJ 344. in which the authors report a rather disappointing 22% compliance rate with US Food and Drug Administration Amendments Act legislation that requires the results of all clinical trials to be reported within reasonable time.
  • Finally, I was seriously impressed and pleased that Ed Yong extolled the virtues of Open Data from 45:07 to enable greater transparency and lower the barriers to critical re-analyses. This is something I most definitely would have raised had I been there, alas I think I was at the “publishing research data: what’s in it for me?” session

So yeah, the conference was great. Not all the sessions were brilliant. The ‘big data’ session was a little disappointing (no offence to any of the panel, just small attendence, little engagement) – perhaps because the topic is already well-covered for conferences with alternative events like the recent O’Reilly Strata Big Data meetup dominating?

I’ll be there at next year’s Science Online London event for sure – whatever it’s called!

11 Responses