Show me the data!
Header

Author Archives: rmounce

[Update 2015-03-13: I have blogged further about this here and provided a recap here. This post has been viewed over 10,000 times. Clearly some people want to sweep this under the carpet and pretend this is just ‘a storm in a teacup’ but it did happen and people do care about this. Thanks to everyone who spread the word.]

Today, Elsevier (RELX Group) illegally sold me a Creative Commons Attribution-NonCommercial-NoDerivatives licensed article:

Colson, P. et al. HIV infection en route to endogenization: two cases. Clin Microbiol Infect 20, 1280-1288 (2014).

I’m really not happy about it. I don’t think the research funders will be happy about it either. Especially not the authors (who are the copyright holders here).

Below is a screenshot of how the content was illegally on offer for sale, for $31.50 + tax.

2015-03-06-175622_1286x907_scrot

To investigate if it really was on sale. I decided to make a test purchase. Just to be absolutely sure. Why not? The abstract looked interesting. The abstract was all I was allowed to read. I wanted to know more.

Below is the email receipt I received confirming my purchase of the content. I have crudely redacted my postal address but it’s otherwise unaltered:

receipt

So what’s the problem here?

The article was originally published online by Wiley. As clearly indicated in the document, the copyright holders are the authors. The work was licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license (CC BY-NC-ND 4.0).

The terms of this widely used license clearly state: “You may not use the material for commercial purposes.

Wiley respect this license. They make this content freely available on their website here. The authors, or their research funder or institution probably paid Wiley money to make sure that the article could be made freely available to the world.

But tonight, Elsevier were selling it to me and all the world via their ScienceDirect platform.
This is clearly an illegal copyright infringement.

I have tweeted Elsevier employees @wisealic & @TomReller to see how I can get a refund for my purchase at the very least. This article should never have been on sale.

I have also contacted the corresponding author (Didier) to see what his thoughts are.
I do hope the authors will take legal action against Elsevier for their criminal misdeeds here.

My full comments on the PLOS ONE manuscript submission modelling paper:

 

On 27 January 2015 at 23:05, Chris Woolston <REDACTED> wrote:

Dr. Mounce,

Hello again. I contacted you awhile ago for my Nature column on the intersection of science and social media.

Yep I remember.

I’m wondering if I could once more ask for your help. (This is what you get for being a prolific and articulate tweeter.)

Sure why not? Thanks for the compliment :)

The next edition will look at the PLoS report on the optimum strategy for submitting papers.

Salinas S, Munch SB (2015) Where Should I Send It? Optimizing the Submission Decision Process. PLoS ONE 10(1): e0115451. doi: 10.1371/journal.pone.0115451
A worthy choice. It relates to my most recent research too… I have a preprint in which I comprehensively demonstrate that information published in PLOS ONE is substantially more discoverable than publishing in other paywalled journals – if other researchers can’t discover your paper when searching for relevant terms, they probably won’t cite it…
Mounce R. (2015) Dark Research: information content in many modern research papers is not easily discoverable online. PeerJ PrePrints 3:e773v1 http://dx.doi.org/10.7287/peerj.preprints.773v1
This may help to explain a (but not the only) causative mechanism behind the frequently observed open access citation advantage.
PLOS ONE is both an open access journal AND a technically excellent content platform, thus it is near perfectly full-text indexed in Google Scholar. Other journals operating a paywall, or with a more simplistic content platform & content provision (e.g. PDF only) are not well indexed in Google Scholar & thus may suffer in terms of citation.

I saw your tweet regarding “scoops.” If you have a moment, I would appreciate a brief elaboration. Isn’t there some extra value in a scoop?

“we assume that a publication that has been scooped has negligible value” journals.plos.org/plosone/articl… Replications are good. Not worthless! — https://twitter.com/rmounce/status/559744669740171264
Some academics have an odd psychological complex around this thing called ‘scooping’. The authors of this paper are clearly strong believers in scooping. I don’t believe in scooping myself – it’s a perverse misunderstanding of good scientific practice. I believe what happens is that someone publishes something interesting; useful data testing a novel hypothesis — then somewhere else another academic goes “oh no, I’ve been scooped!” without realising that even if they’re testing exactly the same hypothesis, their data & method is probably different in some or many respects — independently generated and thus extremely useful to science as a replication even if the conclusions from the data are essentially the same.
Many papers are often published, deliberately, testing the same hypothesis on different species, across species, in different countries or habitats, under different conditions – these are not generally labelled ‘already scooped papers’ although under this scheme of thought, perhaps they should be? Particularly in lab or field ecology I find it extremely unlikely that two independent groups could possibly go out and collect data on *exactly* the same hypothesis, species, population, area… They’d bump into one another, surely?
It’s only really with entirely computational theoretical ecology that it might be possible for two independent groups to be working on exactly the same hypothesis, with roughly the same method at the same time. But even here, subtle differences in parameter choice will produce two different experiments & different, independent implementations are useful to validate each other. In short, scooping is a figment of the imagination in my opinion. There should be no shame in being ‘second’ to replicate or experimentally test a hypothesis. All interesting hypotheses should be tested multiple times by independent labs, so REPLICATION IS A GOOD THING.
I suggest the negative psychology around ‘scooping’ in academia has probably arisen in part from the perverse & destructive academic culture of chasing publication in high impact factor journals. Such journals typically will only accept a paper if it is the first to test a particular hypothesis, regardless of the robustness of approach used – hence the nickname ‘glamour publications’ / glam pubs. Worrying about getting scooped is not healthy for science. We should embrace, publish, and value independent replications.
With relevance to the PLOS ONE paper – it’s a fatal flaw in their model that they assumed that ‘scooped’ (replication) papers had negligible value. This is a false assumption. I would like to see an update of calculations where ‘scooped’ (replication) papers are given various parameterizations between 10% & 80% of the value of a completely novel ‘not-scooped’ paper. In such a model I’d expect submitting to journals with efficient, quick submission-to-publication times will be optimal, journals such as PeerJ, F1000Research & PLOS ONE would come top probably.   Many academics who initially think they’ve been mildly or partially scooped, rework their paper, do perhaps an additional experiment and then still proceed to publish it. This reality is not reflected in the assumption of “negligible value”.

And don’t scientists generally look for an outlet that will publish their work sooner than later?

Some do. I do. But others chase high impact factor publication & glamour publication – this is silly, and in many cases results in a high-risk suboptimal strategy. I know people who essentially had to quit academia because they chose this high-risk approach (and failed / didn’t get lucky) rather than just publishing their work in decent outlets that do the job appropriately.

I suppose that’s a big part of the decision process: Impact vs. expediency. Did any of the other points in the paper strike your attention?

It’s great news for PLOS ONE. Many ecologists have a strange & irrational distaste for PLOS ONE, particularly in the UK – often it’s partly a reticence around open access but also many seem to wilfully misunderstand PLOS ONE’s review process: reviewing for scientific-soundness and not perceived potential ‘impact’. This paper provides solid evidence that if you want your work to be cited, PLOS ONE is great place to send your work.
Citations aren’t the be all & and all though. It’s dangerous to encourage publication strategies based purely on maximising number of citations. Such thinking encourages sensationalism & ‘link-bait’ article titles, at a cost to robust science. To be highly-cited is NOT the purpose of publishing research. Brilliant research that saves lives, reduces global warming, or some other real-world concrete impact, can have a fairly low absolute number of citations. Likewise research in a popular field or topic can be highly-cited simply because many people are also publishing in that area. Citations don’t necessarily equate to good scholarship or ‘worthyness’.

I would welcome a brief response over email, or perhaps we could schedule a chat on the phone tomorrow. I’m in the US, and I’m generally not available before 3 p.m. your time. Thank you.

I’ll skip the phone chat if that’s okay. I’ve failed to be brief but I’ve bolded bits I think are key.
All the best,
Ross

 

Open Research London launch event

January 27th, 2015 | Posted by rmounce in Open Access | Open Science - (0 Comments)

Last week, on Monday 19th January, I co-organised the first ever Open Research London event at Imperial College London, with the help of local organisers; Jon Tennant & Torsten Reimer.

OpenResearch

We invited two speakers for our first meeting:

They both gave excellent talks which were recorded on Imperial’s ‘Panopto’ recording system. We hope to make these available for viewing/download as soon as possible. The recordings are now publicly available! CB’s talk is available to stream here & download here, JMcA’s talk is available to stream here & download here.

 

We had lots of free swag to give away to attendees, including PLOS t-shirts, notebooks, USB sticks and ‘How Open Is It?‘ guides, as well as SPARC and OA Button stickers & badges – they seemed to go down well. I kept some swag back for the next event too, so if you didn’t get what you wanted this time, there will be more next time!

The speakers were kind enough to publicly post their slide-decks before their talks so you can alternatively catch-up with their content on Slideshare.

Chris Banks’ slides are embedded below:

Joe McArthur’s slides are below here:

I’ll refrain from naming names for the sake of privacy but what I most enjoyed about the event was the diversity of attendees. We had people who were ‘curious’ about Open Access and wanted to know more. We had a new PhD student, we had midway PhD students, librarians, open access publishers, and more… I believe one attendee might even have travelled back to Brighton after the event! In terms of affiliations, we had attendees from Jisc, The Natural History Museum London, Imperial College (two different campuses represented!), UCL, The National Institute for Medical Research (MRC), and AllTrials.

I was also mightily impressed that nearly all the attendees, including both speakers happily joined us in the student union (Eastside) afterwards for discussions & networking over drinks – a real sense of community here I think.

Can we do better next time? Sure we can, we must! Attendance was lower than I had hoped for but several people kindly messaged me afterwards to let me know they wanted to be there but couldn’t. I’ve no doubt that with warmer weather we’ll be able to double our attendance.

 

The next ORL meetup will be in mid or late March at UCL, further details TBC. 

Keep up-to-date with ORL via Twitter @OpenResLDN or our OKFN community group page: http://science.okfn.org/london-open-research/

 

I’m actively in the process of trying to grow the organising/steering committee for ORL. At the moment it’s just myself, Liz I-S and Jon Tennant. If you’re passionate about open research, open access, open data, reproducible research, citizen science, diversity in research, open peer-review etc… then get in contact with me: ross.mounce@gmail.com

I would love to have an OC that more broadly represents the variety of the open research community in London :)

 

Until next time…

 

Ross

[Update: I’ve submitted this idea as a FORCE11 £1K Challenge research proposal 2015-01-13. I may be unemployed from April 2015 onwards (unsolicited job offers welcome!), so I certainly might find myself with plenty of time on my hands to properly get this done…!]

Inspired by something I heard Stephen Curry say recently, and with a little bit of help from Jo McIntyre I’ve started a project to compare EuropePMC author manuscripts with their publisher-made (mangled?) ‘version of record’ twins.

How different are author manuscripts from the publisher version of record? Or put it another way, what value do publishers add to each manuscript? With the aggregation & linkage provided by EuropePMC – an excellent service – we can rigorously test this.

 

In this blog post I’ll go through one paper I chose at random from EuropePMC:

Sinha, N., Manohar, S., and Husain, M. 2013. Impulsivity and apathy in parkinson’s disease. J Neuropsychol 7:255-283.  doi: 10.1111/jnp.12013 (publisher version) PMCID: PMC3836240 (EuropePMC version)

Method

A quick & dirty analysis with a simple tool that’s easy to use & available to everyone:

pdftotext -layout     (you’re welcome to suggest a better method by the way, I like hacking PDFs)

(P) = Publisher-version , (A) = Author-version

Manual Post-processing – remove the header and footer crud from each e.g. “262
Nihal Sinha et al.” (P) and “J Neuropsychol. Author manuscript; available in PMC 2013 November 21.” (A)

Automatic Post-processing – I’m not interested in numbers or punctuation or words of 3-letters or less so I applied this bash-one-liner:

strings $inputfile | tr ‘[A-Z]’ ‘[a-z]’ | sed ‘s/[[:punct:]]/ /g’ | sed ‘s/[[:digit:]]/ /g’ |  sed s/’ ‘/\\n/g | awk ‘length > 3’ | sort | uniq -c | sort -nr > $outputfile

Then I just manually diff’d the resulting word lists – there’s so little difference it’s easy for this particular pair.

 

Results

The correspondence line changed slightly from this in the author version:

Correspondence should be addressed to Nuffield Department of Clinical Neurosciences and Department Experimental Psychology, Oxford University, Oxford OX3 9DU, UK (masud.husain@ndcn.ox.ac.uk). . (A)

To this in the publisher version (I’ve added bold-face to highlight the changes):

Correspondence should be addressed to Masud Husain, Nuffield Department of Clinical Neurosciences and Department Experimental Psychology, Oxford University, Oxford OX3 9DU, UK (e-mail: masud.husain@ndcn.ox.ac.uk). (P)

 

Reference styling has been changed. Why I don’t know, seems a completely pointless change. Either style seems perfectly functional to me tbh:

Drijgers RL, Dujardin K, Reijnders JSAM, Defebvre L, Leentjens AFG. Validation of diagnostic criteria for apathy in Parkinson’s disease. Parkinsonism & Related Disorders. 2010; 16:656–660. doi:10.1016/j.parkreldis.2010.08.015. [PubMed: 20864380] (A)

to this in the publisher version:

Drijgers, R. L., Dujardin, K., Reijnders, J. S. A. M., Defebvre, L., & Leentjens, A. F. G. (2010). Validation of diagnostic criteria for apathy in Parkinson’s disease. Parkinsonism & Related Disorders, 16, 656–660. doi:10.1016/j.parkreldis.2010.08.015 (P)

In the publisher-version only (P) “Continued” has been added below some tables to acknowledge that they overflow on the next page. Arguably the publisher has made the tables worse as they’ve put them sideways (landscape) so they now overflow onto other pages. In the author-version (A) they are portrait-orientated and so hence each fit on one page entirely.

 

Finally, and most intriguingly, some of the figure-text comes out only in the publisher-version (P). In the author-version (A) the figure text is entirely image pixels, not copyable text. Yet the publisher version has introduced some clearly imperfect figure text. Look closely and you’ll see in some places e.g. “Dyskinetic state” of figure 2 c) in (P), the ‘ti’ has been ligatured and is copied out as a theta symbol:

DyskineƟc state

 

Discussion

 

I don’t know about you, but for this particular article, it doesn’t seem like the publisher has really done all that much aside from add their own header & footer material, some copyright stamps & their journal logo – oh, and ‘organizing peer-review’. How much do we pay academic publishers for these services? Billions? Is it worth it?

I plan to sample at least 100 ‘twinned’ manuscript-copies and see what the average difference is between author-manuscripts and publisher-versions. If the above is typical of most then this will be really bad news for the legacy academic journal publishers… Watch this space!

 

Thoughts or comments as to how to improve the method, or relevant papers to read on this subject are welcome. Collaboration welcome too – this is an activity that scales well between collaborators.