Show me the data!

My full comments on the PLOS ONE manuscript submission modelling paper:


On 27 January 2015 at 23:05, Chris Woolston <REDACTED> wrote:

Dr. Mounce,

Hello again. I contacted you awhile ago for my Nature column on the intersection of science and social media.

Yep I remember.

I’m wondering if I could once more ask for your help. (This is what you get for being a prolific and articulate tweeter.)

Sure why not? Thanks for the compliment :)

The next edition will look at the PLoS report on the optimum strategy for submitting papers.

Salinas S, Munch SB (2015) Where Should I Send It? Optimizing the Submission Decision Process. PLoS ONE 10(1): e0115451. doi: 10.1371/journal.pone.0115451
A worthy choice. It relates to my most recent research too… I have a preprint in which I comprehensively demonstrate that information published in PLOS ONE is substantially more discoverable than publishing in other paywalled journals – if other researchers can’t discover your paper when searching for relevant terms, they probably won’t cite it…
Mounce R. (2015) Dark Research: information content in many modern research papers is not easily discoverable online. PeerJ PrePrints 3:e773v1
This may help to explain a (but not the only) causative mechanism behind the frequently observed open access citation advantage.
PLOS ONE is both an open access journal AND a technically excellent content platform, thus it is near perfectly full-text indexed in Google Scholar. Other journals operating a paywall, or with a more simplistic content platform & content provision (e.g. PDF only) are not well indexed in Google Scholar & thus may suffer in terms of citation.

I saw your tweet regarding “scoops.” If you have a moment, I would appreciate a brief elaboration. Isn’t there some extra value in a scoop?

“we assume that a publication that has been scooped has negligible value”… Replications are good. Not worthless! —
Some academics have an odd psychological complex around this thing called ‘scooping’. The authors of this paper are clearly strong believers in scooping. I don’t believe in scooping myself – it’s a perverse misunderstanding of good scientific practice. I believe what happens is that someone publishes something interesting; useful data testing a novel hypothesis — then somewhere else another academic goes “oh no, I’ve been scooped!” without realising that even if they’re testing exactly the same hypothesis, their data & method is probably different in some or many respects — independently generated and thus extremely useful to science as a replication even if the conclusions from the data are essentially the same.
Many papers are often published, deliberately, testing the same hypothesis on different species, across species, in different countries or habitats, under different conditions – these are not generally labelled ‘already scooped papers’ although under this scheme of thought, perhaps they should be? Particularly in lab or field ecology I find it extremely unlikely that two independent groups could possibly go out and collect data on *exactly* the same hypothesis, species, population, area… They’d bump into one another, surely?
It’s only really with entirely computational theoretical ecology that it might be possible for two independent groups to be working on exactly the same hypothesis, with roughly the same method at the same time. But even here, subtle differences in parameter choice will produce two different experiments & different, independent implementations are useful to validate each other. In short, scooping is a figment of the imagination in my opinion. There should be no shame in being ‘second’ to replicate or experimentally test a hypothesis. All interesting hypotheses should be tested multiple times by independent labs, so REPLICATION IS A GOOD THING.
I suggest the negative psychology around ‘scooping’ in academia has probably arisen in part from the perverse & destructive academic culture of chasing publication in high impact factor journals. Such journals typically will only accept a paper if it is the first to test a particular hypothesis, regardless of the robustness of approach used – hence the nickname ‘glamour publications’ / glam pubs. Worrying about getting scooped is not healthy for science. We should embrace, publish, and value independent replications.
With relevance to the PLOS ONE paper – it’s a fatal flaw in their model that they assumed that ‘scooped’ (replication) papers had negligible value. This is a false assumption. I would like to see an update of calculations where ‘scooped’ (replication) papers are given various parameterizations between 10% & 80% of the value of a completely novel ‘not-scooped’ paper. In such a model I’d expect submitting to journals with efficient, quick submission-to-publication times will be optimal, journals such as PeerJ, F1000Research & PLOS ONE would come top probably.   Many academics who initially think they’ve been mildly or partially scooped, rework their paper, do perhaps an additional experiment and then still proceed to publish it. This reality is not reflected in the assumption of “negligible value”.

And don’t scientists generally look for an outlet that will publish their work sooner than later?

Some do. I do. But others chase high impact factor publication & glamour publication – this is silly, and in many cases results in a high-risk suboptimal strategy. I know people who essentially had to quit academia because they chose this high-risk approach (and failed / didn’t get lucky) rather than just publishing their work in decent outlets that do the job appropriately.

I suppose that’s a big part of the decision process: Impact vs. expediency. Did any of the other points in the paper strike your attention?

It’s great news for PLOS ONE. Many ecologists have a strange & irrational distaste for PLOS ONE, particularly in the UK – often it’s partly a reticence around open access but also many seem to wilfully misunderstand PLOS ONE’s review process: reviewing for scientific-soundness and not perceived potential ‘impact’. This paper provides solid evidence that if you want your work to be cited, PLOS ONE is great place to send your work.
Citations aren’t the be all & and all though. It’s dangerous to encourage publication strategies based purely on maximising number of citations. Such thinking encourages sensationalism & ‘link-bait’ article titles, at a cost to robust science. To be highly-cited is NOT the purpose of publishing research. Brilliant research that saves lives, reduces global warming, or some other real-world concrete impact, can have a fairly low absolute number of citations. Likewise research in a popular field or topic can be highly-cited simply because many people are also publishing in that area. Citations don’t necessarily equate to good scholarship or ‘worthyness’.

I would welcome a brief response over email, or perhaps we could schedule a chat on the phone tomorrow. I’m in the US, and I’m generally not available before 3 p.m. your time. Thank you.

I’ll skip the phone chat if that’s okay. I’ve failed to be brief but I’ve bolded bits I think are key.
All the best,


Last week, on Monday 19th January, I co-organised the first ever Open Research London event at Imperial College London, with the help of local organisers; Jon Tennant & Torsten Reimer.


We invited two speakers for our first meeting:

They both gave excellent talks which were recorded on Imperial’s ‘Panopto’ recording system. We hope to make these available for viewing/download as soon as possible. The recordings are now publicly available! CB’s talk is available to stream here & download here, JMcA’s talk is available to stream here & download here.


We had lots of free swag to give away to attendees, including PLOS t-shirts, notebooks, USB sticks and ‘How Open Is It?‘ guides, as well as SPARC and OA Button stickers & badges – they seemed to go down well. I kept some swag back for the next event too, so if you didn’t get what you wanted this time, there will be more next time!

The speakers were kind enough to publicly post their slide-decks before their talks so you can alternatively catch-up with their content on Slideshare.

Chris Banks’ slides are embedded below:

Joe McArthur’s slides are below here:

I’ll refrain from naming names for the sake of privacy but what I most enjoyed about the event was the diversity of attendees. We had people who were ‘curious’ about Open Access and wanted to know more. We had a new PhD student, we had midway PhD students, librarians, open access publishers, and more… I believe one attendee might even have travelled back to Brighton after the event! In terms of affiliations, we had attendees from Jisc, The Natural History Museum London, Imperial College (two different campuses represented!), UCL, The National Institute for Medical Research (MRC), and AllTrials.

I was also mightily impressed that nearly all the attendees, including both speakers happily joined us in the student union (Eastside) afterwards for discussions & networking over drinks – a real sense of community here I think.

Can we do better next time? Sure we can, we must! Attendance was lower than I had hoped for but several people kindly messaged me afterwards to let me know they wanted to be there but couldn’t. I’ve no doubt that with warmer weather we’ll be able to double our attendance.


The next ORL meetup will be in mid or late March at UCL, further details TBC. 

Keep up-to-date with ORL via Twitter @OpenResLDN or our OKFN community group page:


I’m actively in the process of trying to grow the organising/steering committee for ORL. At the moment it’s just myself, Liz I-S and Jon Tennant. If you’re passionate about open research, open access, open data, reproducible research, citizen science, diversity in research, open peer-review etc… then get in contact with me:

I would love to have an OC that more broadly represents the variety of the open research community in London :)


Until next time…



[Update: I’ve submitted this idea as a FORCE11 £1K Challenge research proposal 2015-01-13. I may be unemployed from April 2015 onwards (unsolicited job offers welcome!), so I certainly might find myself with plenty of time on my hands to properly get this done…!]

Inspired by something I heard Stephen Curry say recently, and with a little bit of help from Jo McIntyre I’ve started a project to compare EuropePMC author manuscripts with their publisher-made (mangled?) ‘version of record’ twins.

How different are author manuscripts from the publisher version of record? Or put it another way, what value do publishers add to each manuscript? With the aggregation & linkage provided by EuropePMC – an excellent service – we can rigorously test this.


In this blog post I’ll go through one paper I chose at random from EuropePMC:

Sinha, N., Manohar, S., and Husain, M. 2013. Impulsivity and apathy in parkinson’s disease. J Neuropsychol 7:255-283.  doi: 10.1111/jnp.12013 (publisher version) PMCID: PMC3836240 (EuropePMC version)


A quick & dirty analysis with a simple tool that’s easy to use & available to everyone:

pdftotext -layout     (you’re welcome to suggest a better method by the way, I like hacking PDFs)

(P) = Publisher-version , (A) = Author-version

Manual Post-processing – remove the header and footer crud from each e.g. “262
Nihal Sinha et al.” (P) and “J Neuropsychol. Author manuscript; available in PMC 2013 November 21.” (A)

Automatic Post-processing – I’m not interested in numbers or punctuation or words of 3-letters or less so I applied this bash-one-liner:

strings $inputfile | tr ‘[A-Z]’ ‘[a-z]’ | sed ‘s/[[:punct:]]/ /g’ | sed ‘s/[[:digit:]]/ /g’ |  sed s/’ ‘/\\n/g | awk ‘length > 3’ | sort | uniq -c | sort -nr > $outputfile

Then I just manually diff’d the resulting word lists – there’s so little difference it’s easy for this particular pair.



The correspondence line changed slightly from this in the author version:

Correspondence should be addressed to Nuffield Department of Clinical Neurosciences and Department Experimental Psychology, Oxford University, Oxford OX3 9DU, UK ( . (A)

To this in the publisher version (I’ve added bold-face to highlight the changes):

Correspondence should be addressed to Masud Husain, Nuffield Department of Clinical Neurosciences and Department Experimental Psychology, Oxford University, Oxford OX3 9DU, UK (e-mail: (P)


Reference styling has been changed. Why I don’t know, seems a completely pointless change. Either style seems perfectly functional to me tbh:

Drijgers RL, Dujardin K, Reijnders JSAM, Defebvre L, Leentjens AFG. Validation of diagnostic criteria for apathy in Parkinson’s disease. Parkinsonism & Related Disorders. 2010; 16:656–660. doi:10.1016/j.parkreldis.2010.08.015. [PubMed: 20864380] (A)

to this in the publisher version:

Drijgers, R. L., Dujardin, K., Reijnders, J. S. A. M., Defebvre, L., & Leentjens, A. F. G. (2010). Validation of diagnostic criteria for apathy in Parkinson’s disease. Parkinsonism & Related Disorders, 16, 656–660. doi:10.1016/j.parkreldis.2010.08.015 (P)

In the publisher-version only (P) “Continued” has been added below some tables to acknowledge that they overflow on the next page. Arguably the publisher has made the tables worse as they’ve put them sideways (landscape) so they now overflow onto other pages. In the author-version (A) they are portrait-orientated and so hence each fit on one page entirely.


Finally, and most intriguingly, some of the figure-text comes out only in the publisher-version (P). In the author-version (A) the figure text is entirely image pixels, not copyable text. Yet the publisher version has introduced some clearly imperfect figure text. Look closely and you’ll see in some places e.g. “Dyskinetic state” of figure 2 c) in (P), the ‘ti’ has been ligatured and is copied out as a theta symbol:

DyskineƟc state




I don’t know about you, but for this particular article, it doesn’t seem like the publisher has really done all that much aside from add their own header & footer material, some copyright stamps & their journal logo – oh, and ‘organizing peer-review’. How much do we pay academic publishers for these services? Billions? Is it worth it?

I plan to sample at least 100 ‘twinned’ manuscript-copies and see what the average difference is between author-manuscripts and publisher-versions. If the above is typical of most then this will be really bad news for the legacy academic journal publishers… Watch this space!


Thoughts or comments as to how to improve the method, or relevant papers to read on this subject are welcome. Collaboration welcome too – this is an activity that scales well between collaborators.

So, apparently Elsevier are launching a new open access mega-journal some time this year, joining the bandwagon of similar efforts from almost every other major publisher. A lovely acknowledgement of the roaring success of PLOS ONE, who did it first a long time ago.

They’re only ~8 years behind, but they’re learning. I for one am pleased they are asking the research community what they want from this new journal. One of their “key points” in the press release is: “the journal will be developed in close collaboration with the research community and will evolve in response to feedback”

Well, I’m a member of the research community. I’m a BBSRC-funded postdoc at the University of Bath. I publish research myself AND I re-use published research, so I have a dual perspective that Elsevier should find useful. Here’s my feedback on their new open access journal proposal:


  • Does the research community really need or want a new journal?

We have at least 27,000 other peer-reviewed journals (source: Ulrich’s). I can’t see anything in Elsevier’s proposal that’s really new, or better than anything that already exists – you’ll be hard pressed to beat PeerJ. More journals add to the fragmentation of the research literature – it’s already hard to search across all these journals effectively. Why not just accept more volume in existing journals? It’d be great if you flipped The Lancet, Cell, and Trends in Ecology and Evolution to full (100%) open access journals, and rejected less submitted papers that present sound science. I genuinely do not know of any researcher that asked specifically for an additional new Elsevier journal.


The definition of open access always has been, and always will be this:

By “open access” to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited. (BOAI)

If you’re going to allow the CC-BY-NC-ND licence then by definition you can’t call it an open access journal. Either don’t allow that restrictive non-open licence, or call this new journal a ‘free-to-read’ journal or a ‘public access’ journal. These are the established terms for cost-free but not open journal content that the research community uses. Speak our language for a change instead of deliberately opaque legalese.


  • Take feedback on the design of your new journal from the WORLD not just the research community

Approximately 80% of the world’s academic research is taxpayer or charitably funded. The world is therefore your customer, not just researchers. Ask the world what they want from your new journal.


Take inspiration from the Panton Principles: “Science is based on building on, reusing and openly criticising the published body of scientific knowledge” – help researchers do the best science possible by not allowing them any excuses to not share non-sensitive data with their colleagues. The ’email the author’ system has been widely proven not to work, in my own experience too.


  • Make peer reviews open for all to see, post-publication alongside the paper

At the time of review, you can do single or double blind, but after the manuscript is accepted and published, please publish the reviews alongside the accepted paper. The research community can then see for themselves how good peer review is at your new journal. Allow people to sign their reviews if they wish to (and personally I think this is best in most circumstances).


  • Encourage data citation

Do I really need to explain this one? Old school academic editors have apparently been striking these out at some journals. Please make all editors aware that this is both a good thing and is encouraged.


  • Encourage authors to provide their ORCIDs upon submission, (and ORCIDs for reviewers and editors too please)

This will help people disambiguate who’s who’s which is important when there are at least 7 million active researchers.


  • Charge a reasonable APC ($1350 or less), and be generous with fee waivers and discounts for those that cannot afford them

Anything more than $1350 per article for a new journal in 2015 is daylight robbery. For the first year of publication you should waive charges for everyone, as everyone else does.


  • Provide open, full text XML

Great for text-mining. We don’t need your API. Just give us the content.


There you go Elsevier – that’s my feedback. If you can do ALL of the above or better, I might even publish with you myself. I have stated what I think you should do; it’s up to you now to implement it. I anticipate the launch of your glorious new journal. When your new journal comes out I shall revisit this post & score your new journal against it.


I encourage all other researchers & the scholarly poor who feel similarly, to also make their feelings known to Elsevier, and to add points I have perhaps overlooked. I’d say good luck Elsevier, but you don’t need luck with your fat profit margins – it’s simple to openly publish a good peer-reviewed research journal – just get on and do it already.




Ross Mounce, PhD