Show me the data!

yet another #solo12 recap (part2)

November 19th, 2012 | Posted by rmounce in Conferences | Open Data

A couple of days ago I posted specifically about the data re-use session.
I’m going to use this post to muse about the conference more generally.

About SpotOn London 2012

It used to be called Science Online London – an informative, sensible and appropriate name. This year I hear (rumours) that it had to change name to SpotOn because Science AAAS or some other litigious entity was claiming brand identity infringement. I have sympathies with the organisers for this enforced change but ‘SpotOn London’ which apparently stands for “Science Policy, Outreach and Tools Online” (you wouldn’t know unless told!) is not my cup of tea, tbh. I suggest next year we continue to use the #solo hashtag and continue to call it Science Online London (informally), even if for legal reasons this can’t be the official conference title.


As the focus of the conference was tripartite: “Policy“, “Outreach“, and “Tools“; the mix of speakers, panellists & attendees was refreshingly diverse – unlike most academic conferences I go to. High-level & high-profile academics like Prof. Stephen Curry (Imperial College), Prof. Athene Donald (U.of Cambridge),Dr Ethan Perlstein and Dr Jenny Rohn (Science is Vital) mixed freely with PhD students like myself, Jon Tennant, Jojo Scoble, Nick Crumpton, Tom Phillips and others. There were policy people like Mark Henderson and Nic Bilham and even politicians themselves: we should all be grateful for Julian Huppert MP for Cambridge, one of unfortunately few UK politicians to take a genuine interest in science. There were publishers reps including Matt Hodgkinson & Martin Fenner (PLOS), Brian Hole (Ubiquity Press), Graeme Moffat & Kamila Markram (FrontiersIn), Ian Mulvany (eLife), Michael Habib (Elsevier), and ‘independents’ like Anna Sharman (Anna Sharman Editorial Services) & Kaveh Bazargan (of River Valley). Librarians Peter Morgan (U.of Cambridge) & Frank Norman, research funders Geraldine Clement-Stoneham (MRC), journalists Ed Yong and really interesting people who defy easy classification(!) like Brian Kelly (UKOLN), Tony Hirst, and the Digital Science team (some of): Euan Adie, Mark Hahnel and Kaitlin Thaney.

Now apologies to those I didn’t name check in the above list – there were many other brilliant and interesting people there (Ed Baker, Vince Smith, David Shotton, Josh Greenberg, I could go on… There’s a fuller list of attendees by Twitter handle here). I merely selected a few from broad categories to show the impressive diversity of representation there. This is one of the very best things about the conference – it attracts virtually all of the stakeholders of science. It’s not just about researchers, publishers, research funders and librarians – it rightly recognises that science isn’t only for ivory tower academics; it’s for everyone.

[Incidentally, for those interested I’d say gender diversity was quite balanced. Alas racial diversity was rather too imbalanced – perhaps sadly reflective of academia as whole?]

As befits a conference formerly known as ‘Science Online’ *all* of the talks were recorded & tweeted, so there are videos on youtube of every single one, and Storifys (of the tweets) available to view.

Selected Highlights (aside from the #solo12reuse session):

The journal is dead, long live the journal

In the early stages of this session, I was worried it wasn’t really going anywhere interesting with the discussion…

and then Dr Kaveh Bazargan took the microphone at about ~28:22 (skip to that section, it’s brilliant)

on the publishing process, author manuscript submissions & typesetting:

It’s madness really. I’m here to say I shouldn’t be in business.

as any manuscript-submitting biologist knows… publishers ask for all sorts of ridiculously pedantic formatting from us, particularly for reference lists. As Kaveh reminds us this is all pointless and stupid because when the publishers get this, they send it off to typesetters to be typeset anyhow – this process is hugely inefficient: “madness”. Not only this, but if a submitted manuscript gets rejected from one journal, the poor authors have to waste often significant amounts of time and energy to re-format their manuscript to suit the stylistic vagaries of another journal. Microsoft Word is not a good authoring tool – it’s largely unstructured. The publishing process requires a high-degree of technical structure, usually provided by XML or TeX.

If you dig into the issue a little bit. You’ll see that programs like Mendeley (and any other reference manager I would think) are fully capable of providing reference lists as structured XML. And yet journal policies enforce that we submit plain text (in say a Word doc), only for the typesetters to get paid by the publishing companies to then re-implement those plain text references back into fully-structured XML. Madness! 

Typesetters are mostly located in areas where the labour is cheap. India, Phillipines etc… it’s an intensely manual process and perhaps in future may be less of a necessity(?). Furthermore, as I discovered with a recent BMC manuscript I was an author on, typesetters can sometime introduce new errors into the publication process which slow down the process even further! I commend the brutal honesty of Dr Bazargan in bravely speaking-out about this, this is an issue completely separate to OA/TA journals; both can be guilty of this madness.

It also affects re-use potential as he also remarked in the #solo12reuse session. Not every publisher publicly exposes the XML version of the papers they publish, these are of extreme importance to re-use potential (e.g. mining) – the Geological Society of London and their publications are one of these which is a great shame. I asked Neal Marriott of GSL about this back in June via email and he replied: “We do not currently have a feature to allow download of the NLM XML source.”   I also tried to take this up politely with Nic Bilham at the bar after the first night of the conference, but for someone with “external relations” in his job title I found him rather frosty towards me. Happy to say I had no such problems with Grace Baynes (NPG) it was charming to meet her in person after our exchanges regarding NPG’s new OA pricing strategy.

So how do we get GSL and other publishers to expose their XML which they surely have? They already provide HTML & PDF versions, what’s difficult about exposing the underlying XML version too?



I was at another session at the time this was going on but I think this is an important session I should highlight. Broadening the assessment & evaluation of research beyond incredibly narrow metrics (like the journal Impact Factor; die die!) is something that’s clearly very important. Everyone agrees it’s “early days” and that not all impact is measurable, but that shouldn’t dissuade us from actively researching and cautiously embracing this new (positive) trend.

Fraud in Science

Virginia Barbour, Ed Yong and others were on the panel for this one and again, whilst I wasn’t in the room at the time for this session – looking back at the video, I rather wish I was at the session – it was really interesting:

  • Virginia Barbour 18:45 “I think there’s much more evidence of sloppiness than outright fraud… at some PLOS journals we ask authors for the original figures to check for figure manipulation before acceptance… when we ask authors to supply these a large number of authors can’t do it” (is it really that they can’t find the files, or just that they don’t *want* to supply the originals?) “It is completely unacceptable, but not uncommon” Amen Virginia – I agree very much!
  • Virginia Barbour 21:36 “…the larger issues that plague science; sloppiness, unwillingness to share data, conflict of interest, and publication bias. There are *solutions* to these and the great thing is that the internet makes it much easier to spot and actually makes it easier to address than previously…” +1
  • much of the later talk about clinical trials refers to work done in this paper: Prayle, A. P., Hurley, M. N., and Smyth, A. R. 2012. Compliance with mandatory reporting of clinical trial results on cross sectional study. BMJ 344. in which the authors report a rather disappointing 22% compliance rate with US Food and Drug Administration Amendments Act legislation that requires the results of all clinical trials to be reported within reasonable time.
  • Finally, I was seriously impressed and pleased that Ed Yong extolled the virtues of Open Data from 45:07 to enable greater transparency and lower the barriers to critical re-analyses. This is something I most definitely would have raised had I been there, alas I think I was at the “publishing research data: what’s in it for me?” session

So yeah, the conference was great. Not all the sessions were brilliant. The ‘big data’ session was a little disappointing (no offence to any of the panel, just small attendence, little engagement) – perhaps because the topic is already well-covered for conferences with alternative events like the recent O’Reilly Strata Big Data meetup dominating?

I’ll be there at next year’s Science Online London event for sure – whatever it’s called!

  • Ross, thanks for the write-up. In retrospect I think it would have been great to have a session on authoring tools. While I agree that the current process is madness, I would also not like it if authors were required to submit structured manuscripts in well-formed XML. Authors should focus on the science, and not also become experts in type-setting, graphic design, etc. The solution is probably somewhere in the middle where authors submit manuscripts with some structure, e.g. vector graphics (preferably SVG) instead of bitmap images, and reference lists in BibTeX or RIS instead of free text following one of 1000+ citation styles.

    • Thanks for the comment! Perhaps next year there will be an authoring tools session then…?

      I see where you’re coming from in that it’s important to get good science published regardless of whether the scientist knows how to operate techy software tools. No-one wants this to be a barrier.

      Yet, I wonder if perhaps this couldn’t be a ‘call to arms’ to clever software people, those at Digital Science and such to create new, easy-to-use tools to help scientists write more structured manuscripts from the get-go, this could save tens of millions of pounds per year surely? Strong incentive for someone to do it I would think!

      I don’t care what tool I use to author, be it github, etherpad, Google Doc, Word doc, TeX… none of these are especially hard to use IMO -> we have the technology to do this, it’s about making it easier, education and encouragement.

      On a related note, on my hunt for Gold APC’s from different publishers I discovered that some clever OA publishers charge smaller APC’s if the manuscript is provided by the submitting authors in a more structured form e.g. TeX. Copernicus Publications ( are a good example of this.

      Perhaps this kind of scheme could be rolled out across more of the good OA publishers, to decrease the APC’s that some critics think are a barrier to the full adoption of Open Access.

      Regardless of the OA argument, there’s also the speeding-up the publications process & less introduced errors arguments, so I’d like to see more TA AND OA publishers at least allowing TeX (and other alternative format) submissions if not perhaps also encouraging them!

      +1 for reference lists as BibTeX / RIS or DOI’s

      submitting free text is plain silly!

      • Ross, yes a ‘call to arms’ would be good. I would like to combine this with another idea, which is that we need more open source software if we want to do proper open science. Which is a blog post I hope to write in the next two weeks.

        Reducing APCs for properly formatted manuscripts is a good idea, and I have also seen this for authors who submit their references in an Endnote file. Copernicus is probably not typical, because their whole workflow is based on TeX – which is probably unusual for a publisher.

        • Thanks for mentioning me Ross. Just hope my clients are not reading this post too closely. ;-)

          Having made a name for myself at SOLO by way of shouting my mouth off, I would be happy to partake in a session on authoring tools next year.

          I do agree with Martin that a solution will only work if it is more attractive to authors than what they are already used to, and in general that is going to be MS Word.

          We are one of the few using primarily TeX (like Copernicus), but we are in the tiny minority. Most typesetters will convert TeX files into Word, then go from there. Yes, amazing, but true…

          • Great write-up, Ross, especially on the authoring tools/typesetting issue.

            Like Martin, I think it is unrealistic to ask authors to submit manuscripts that are well formatted using any particular tool. When you request this, some authors do it wonderfully, some adequately and some amazingly badly. As a journal copyeditor I have seen some horrendous misuses of Word (some examples are in one of my blog posts:; things like using spaces to format tables or adding a hard return at the end of every line are not that uncommon. If you can think of a bizarre kind of formatting, someone has almost certainly submitted a paper using it.

            What we need is tools that allow authors to format properly (in XML, say) if they have the skills and are willing to take the time (or pay someone who has), but that also allow others (copyeditors and typesetters) to easily convert bad formatting into good. The reason why copyeditors almost universally use Word is because of its powerful macros that can be used by non-programmers, and I have yet to find another piece of software that can do scripted searches and replacements as well as it does without the user having to learn a programming language. I would be very happy to collaborate with a programmer, as I have some basic programming knowledge and know what the problem is, to develop an open-source alternative for copyeditors to use for manuscript formatting. This is probably possible in LaTeX but I have not ventured far into that yet (and my fellow copyeditors mostly find it completely baffling). Adding a user interface to LaTeX might be enough. Or making the macros in LibreOffice easier to understand and documenting them better. Or adding scripting to Google Docs documents (it is currently only available for spreadsheets).

            Reducing APCs for authors who submit well formatted manuscripts would make sense, because the publisher would save the cost of paying copyeditors and typesetters to put bad formatting right. But it can be difficult to tell what’s good or bad until after it has been checked carefully, so I’d be wary of giving a discount until after the copyediting has been done.

            I look forward to more discussion on these issues. I would also be interested in being involved in a discussion on authoring tools.

          • Thanks for your comment.

            I’m really surprised that most copyeditors use Word. As you say, a little knowledge of programming and one could do some wondrous things… perhaps copyeditors might benefit from learning emacs or vi ?

            I was recently made aware of this call for ‘peer-editing’ (Johnston, 2009) with relevance to the current discussion, perhaps academics need to push for more control over the copyediting / editorial roles at journals?

            I think the needs of science have outgrown Word tbh, yet in many quarters this hasn’t been recognised and worse still people are (as per usual) actively resistant to change. Will we still be using MS Word in 2020 for science? I hope not…!

  • I was working as a copyeditor for Springer way back in the day, and was one of the people who worked on drafting the Springer style guide, including the references section.

    I was involved in getting the Mendeley CSL editor started, and I’ve seen how typesetters ignore submitted formats, and how academics slave over them (some institutions mark down essays for incorrect formatting).

    Such loss, such waste, the hours, the hours. How much potential thought, gone.

    eLife accepts any reasonably formatted reference list. Typesetters use a lot of auto processing, the marginal cost of accepting any format is minimal.

    We also publish our article XML, I mean, why wouldn’t you? (

    • I also meant to mention that I would be well up for helping with a session on authoring tools.

    • Great to hear that about eLife. I hope thye’ve made that clear in the ‘author requirements’ section.

      I know some supervisors make their students spend hours and hours combing through reference lists making them *as perfect as possible* before submission – this wasted effort needs to be eliminated ASAP!

  • Sweet of you to refer to me as a “High-level & high-profile academic” – but has that assertion been peer reviewed? ;-)

    Sorry not to have met you in person at the conference.

    • No problem. I’ve introduced myself before at the “Open access: is it open season on traditional scientific publishing?” debate at Imperial.

      You may well also see me hanging around Imperial at times while I’m writing-up my PhD thesis. Dr Steve Cook’s invited me to give a talk to the Life Sci undergrads at some point too on blogging & twitter (I think). We’ll no doubt meet again :)