Show me the data!

TL;DR summary: ESA data papers should be free to read but Wiley (ESA’s new publishing ‘partner’) just charged me $45.60 yesterday to access one of them. They have done this kind of ‘accidental’ profit-generation before, as have other big publishers.

John Wiley & Sons (whom I will refer to as ‘Wiley’ from now on) is not a very competent company when it comes to providing free or open access to research. Don’t take my word for that. Ask the Wellcome Trust: over 50% of articles that they had paid to be open access with Wiley were not compliant with their open access policy. I have also had my problems with Wiley: I caught them selling access to thousands of articles that should have been free to access this time last year. They also paywalled an article I wrote which should have been free to access.

Despite all this, and the detailed letter I sent to the Ecological Society of America (ESA) back in 2013 during their open access consultation process, the ESA decided to switch to publishing with Wiley: a profit-driven company who’s goals conflict with the goals of the society. I was very disappointed with this decision.


Now that the switch is complete there are some problems readily apparent. Wiley are selling some bits of ESA journal content for $45.60 (inc. tax) a time that ESA did not previously charge readers to access. I discovered this yesterday on Twitter thanks to Jaime Ashander & Stephanie Peacock. So I made a test purchase to see if Wiley really were charging for access to this free content (they were!). Below are tweets documenting this:


Amusingly, the first time I tried to buy access to the article, my bank blocked the transaction thinking it was a suspicious payment to a scammy company! Only after I confirmed with my bank was I actually allowed to purchase access to the data paper – it really IS hard to access research that is paywalled, even when you have the money to pay for it!

ESA have acknowledged the problem on Twitter and will see if I can get a refund on Monday:


There is more than meets the eye to this case.

Data papers are still a fairly new concept to most. Thus I honestly didn’t know what I’d be getting from behind the paywall when I paid for access – I did expect more than just the abstract. It would not surprise me if others could also make this mistaken assumption (we are wearily used to abstracts hiding much longer papers behind paywalls).

Charging the authors of ESA data papers $250 with the excuse that this is for “long-term hosting and maintenance” is absurd and unjustifiable. At the very most it should be $120 which is what Dryad charges, with a reminder that Figshare and Zenodo continue to sustainably archive data for free. Charging each and every reader outside the paywall in addition to this $45 to read the abstract of an ESA data paper in PDF format is just ridiculous.

The cost of single-article purchases has now more than DOUBLED since ESA moved to Wiley. Below is a screencap I took from the old ESA publishing platform. ESA articles were paywalled for just $20 and that allowed 30-day access. Now with Wiley, the exact same content is available to me for $45.60 (inc. UK tax) and I only have a 24-hour permitted-access period. This price-hike and narrow access window are utterly absurd and unjustified. Is it any wonder everyone uses SciHub these days?

Does this help raise the awareness of ecological science?

The old paywall was half the price and gave 30-days access, not just 24 hours!






I’m also frightened that ESA had no idea this was going-on. This is exactly what happens when you give all your content to an unscrupulous oligopoly publisher like Wiley to sell on your behalf. It seems to me that many academic societies are receiving big fat cheques every year from their commercial publishing ‘partners’ and are completely ignoring where from and how this money was generated. It’s well known that the academic publishing oligopoly is siphoning huge margins of money away from research. Why are academic societies so willingly complicit in this racket? It seems to me as if it is a sadly common approach to deal with this impropriety by turning a blind eye: “Take the money, don’t ask questions!” As long as society members benefit (at the expense of the rest of the world), anything goes.

Some final questions…

  1. Does ESA know how much Wiley is charging libraries around the world for subscriptions to ESA’s journals?
  2. Does ESA actually know anything of the real cost of production and publishing services that Wiley provides – not the price Wiley says it costs (inc. unhealthy profit margin) but the actual cost?
  3. How many readers like me (‘the scholarly poor’) outside the paywall has Wiley charged for access to ESA data papers that should have been free to access?
  4. Given Wiley’s lack of transparency, can we trust them when they report back how many others have also bought access to these ESA data papers that should have been free?

Update 2016/04/09: Thankfully, I did eventually get a refund for this article purchase on 2016/04/08, although I still appear to have lost out due to currency conversion issues with my bank:




This has done the rounds on Twitter a lot recently, and justifiably-so but just in case you haven’t seen it yet…
I thought I’d quickly blog about this excellent graph published on a FrontiersIn blog late last year (source/credit: )
Source, Credit, Kudos, and Copyright: Pascal Rocha da Silva, originally posted here.

Source, Credit, Kudos, and Copyright: Pascal Rocha da Silva. Originally posted here.

With data from 570 different journals, it appears to demonstrate that rejection rate (the percentage of papers submitted, but NOT accepted for publication at a journal) has no apparent correlation with journal impact factor.


Why is this significant?


Well, a lot of people seem to think that ‘selectivity’ is good for research. That somehow by rejecting lots of perfectly valid papers submitted to a journal, it somehow ensures increased ‘quality’ (citations?) of the papers that are eventually accepted for publication at a journal. The fact is, high rejection rates in practice indicate that a lot of good research papers are being rejected just to satisfy an unjustified fetish for arbitrary and crude pre-publication filtering. This is important evidence for advocates of the ‘publish first, filter post-publication’ philosophy; as put into practice by journals such as F1000Research and Research Ideas and Outcomes.


Release early, release often?


Rejecting perfectly good/sound research causes delays in the dissemination of knowledge – rejected manuscripts have to be reformatted, resubmitted and re-reviewed elsewhere at great cost. The overwhelming majority of initially rejected manuscripts get published somewhere else, eventually. So why bother rejecting them in the first place, if all it does is waste time and effort?

Please show your friends the graph if they haven’t already seen it. I think data like this could change a lot of people’s minds…

Further Reading:

Similar findings have been reported before with smaller samples:
Schultz, D. M. 2010. Rejection rates for journals publishing in the atmospheric sciences. Bull. Amer. Meteor. Soc. 91:231-243 DOI: 10.1175/2009bams2908.1

I’ve written 29 blog posts this year! Still time for one more…

This work relates to my new postdoc at the University of Cambridge in Sam Brockington’s group.

I’ve been closely examining IUCN RedList data for plant taxa and found some rather odd things.

Out of the 100 or so plant species that the IUCN RedList asserts as ‘extinct’, at least 16 of them are growing alive and well somewhere in the world at the moment.

For some species even Wikipedia notes the conflict between reality and the ‘official’ IUCN assessment e.g. for Rauvolfia nukuhivensis.

Here are the 16 plant species that I think are incorrectly assessed as ‘extinct’ right now by the IUCN RedList:

Astragalus nitidiflorus, Cnidoscolus fragrans, Cynometra beddomei, Dipterocarpus cinereus, Dracaena umbraculifera, Madhuca insignis, Melicope cruciata, Ochrosia brownii, Ochrosia fatuhivensis, Ochrosia tahitensis, Pausinystalia brachythyrsum, Pouteria stenophylla, Rauvolfia nukuhivensis, Wendlandia angustifolia, Wikstroemia skottsbergiana, Wikstroemia villosa

Additionally to the 16 above, with less certainty, I also think the Hawaiian taxa Delissea kauaiensis and Delissea niihauensis might have some individuals still alive according to this Department of Land and Natural Resources ‘Fact Sheet’ from 2013.


Why not harness the wisdom of the crowds and/or semi-automated text mining?


It’s remarkable that the IUCN RedList still lists some of these as ‘extinct’ when there are easily findable peer-reviewed articles reporting the rediscovery and hence extant status of these taxa. To their credit, many are listed as “needs updating” but still, if there are important updates to statuses why not just go in and make the change(s) to correct the record?   The IUCN RedList page listing Wendlandia angustifolia as ‘extinct’ is possibly the worst example – it was reported as rediscovered back in the year 2000, more than a decade ago! The IUCN has had 15 years to update their incorrect assertion of ‘extinct’ for this taxon!

I can’t possibly go through the literature and check all other IUCN-listed plant taxa myself but this does seem like a great opportunity for ContentMine tools to help the IUCN RedList stay on top of the latest updates about IUCN RedListed taxa. See ‘Daily updates on IUCN Red List species‘ for more on that idea.


Below I list sources of information relating to the 16 species that I think are definitely NOT extinct, despite being listed as such on the IUCN RedList.

Wahyu, Y., Wihermanto, N., Risna, R. A., and Ashton, P. S. 2013. Rediscovery of the supposedly extinct Dipterocarpus cinereus. Oryx 47:324.

Martínez-Sánchez, J. J., Segura, F., Aguado, M., Franco, J. A., and Vicente, M. J. 2011. Life history and demographic features of Astragalus nitidiflorus, a critically endangered species. Flora – Morphology, Distribution, Functional Ecology of Plants 206:423-432.

Lorence, D. and Butaud, J.-F. 2011. A reassessment of Marquesan Ochrosia and Rauvolfia (Apocynaceae) with two new combinations. PhytoKeys 4:95+

Viswanathan MB, Harrison Premkumar E, Ramesh N. 2000. Rediscovery of Wendlandia angustifolia Wight ex Hook.f. (Rubiaceae), from Tamil Nadu, a species presumed extinct. J. Bombay Nat. Hist. Soc. 97. (2): 311-313

Oppenheimer, H. 2011. New Hawaiian plant records for 2009 Records of the Hawaii Biological Survey for 2009–2010. Bishop Museum Occasional Papers 110: 5–10 [notes the rediscovery of Wikstroemia villosa]

Shenoy et al. 2014. Extended distribution of Madhuca insignis (Radlk.) H. J. Lam. (Sapotaceae) – A Critically Endangered species in Shimoga District of Karnataka. ZOO’s PRINT  Volume XXIX, Number 6

Sudhi, K. S. 2012. Rediscovered tree still ‘extinct’ on IUCN Red List. The Hindu. [Cynometra beddomeii]

Missouri Botanical Garden 2012. Umbrella Draceana. [Dracaena umbraculifera might be extinct in the wild, but it is still successfully grown in many botanical gardens!]






OpenCon 2015 Brussels was an amazing event. I’ll save a summary of it for the weekend but in the mean time, I urgently need to discuss something that came up at the conference.

At OpenCon, it emerged that Elsevier have apparently been blocking Chris Hartgerink’s attempts to access relevant psychological research papers for content mining.

No one can doubt that Chris’s research intent is legitimate – he’s not fooling around here. He’s a smart guy; statistically, programmatically and scientifically – without doubt he has the technical skills to execute his proposed research. Only recently he was an author on an excellent paper highlighted in Nature News: ‘Smart software spots statistical errors in psychology papers‘.

Why then are Elsevier interfering with his research?

I know nothing more about his case other than what is in his blog posts, however I have also had publishers block my own attempts to do content mining this year, so I think this is the right time for me to go public about this, in support of Chris.

My own use of content mining

I am trying to map where in the giant morass of research literature Natural History Museum (London) specimens are mentioned. No-one has an accurate index of this information. With the use of simple regular expressions it’s easy to filter hundreds of thousands of full text articles to find, classify and lookup potential mentions of specimens.

In the course of this work, I was frequently obstructed by BioOne. My IP address kept getting blocked, stopping me from downloading any further papers from this publisher. I should note here that my institution (NHMUK) pays BioOne to provide access to all their papers – my access is both legitimate and paid-for.

Strong claims, require strong evidence. Thankfully I was doing my work with the full support and knowledge of the NHM Library & Archives team, so they forwarded one or two of the threatening messages they were getting from the publishers I was mining. I have no idea how many messages were sent in total. Here’s one such message from BioOne (below)

Blocked by BioOne

Blocked by BioOne

So according to BioOne, I swiftly found out that downloading more that 100 full text articles in a single session is automatically deemed “excessive” and “a violation of permissible activity“.

Isn’t that absolutely crazy? In the age of ‘big data’ where anyone can download over a million full text articles from the PubMed Central OA subset at a few clicks, an artificially imposed-restriction of just 100 is simply mad and is anti-science. As a member of a subscription-paying institution I have a paid right to be able to access and analyze this content surely? We are paying for access but not actually getting full access.

If I tell other journals like eLife, PLOS ONE, or PeerJ that I have downloaded every single one of their articles for analysis – I get a high-five: these journals understand the importance of analysis-at-scale. Furthermore, the subscription access business model needn’t be a barrier: the Royal Society journals are very friendly with content mining – I have never had a problem downloading entire decades worth of journal content from the Royal Society journals.

I have two objectives for this blog post.

1.) A plea to traditional publishers: PLEASE STOP BLOCKING LEGITIMATE RESEARCH

Please get out of the way and let us do our research. If our institutions have paid for access, you should provide it to us. You are clearly impeding the progress of science. Far more content mining research has been done on open access content and there’s a reason for that – it’s a heck of a lot less hassle and (legal) danger. These artificial obstructions on access to research are absurd and unhelpful.

2.) A plea to researchers and librarians: SHARE YOUR STORIES

I’m absolutely sure it’s not just Chris & I that have experienced problems with traditional publishers artificially obstructing our research. Heather Piwowar is one great example I know. She bravely, extensively and publicly documented her torturous experiences with negotiating access & text mining to Elsevier-controlled content. But we need more people to speak-up. I fear that librarians in particular may be inadvertently sweeping these issues under the carpet – they are most likely to get the most interesting emails from publishers with respect to these matters.

This is a serious matter. Given the experience of Aaron Swartz; being faced with up to 50 years of imprisonment for downloading ‘too many’ JSTOR papers – it would not surprise me if few researchers come forward publicly.