Show me the data!

Open in order to unleash the power of text mining

October 23rd, 2017 | Posted by rmounce in Generation Open | Open Access - (Comments Off on Open in order to unleash the power of text mining)

In 2017, we have a vast toolbox of informative methods to help us analyse large volumes of text. Sentiment analysis, topic modelling, and named entity recognition are to name but a few of these exciting approaches. Computational power and storage capacity are not the limiting factors on what we could do with the 100 million or so journal articles that comprise the ever-growing research literature so far. But the continued observance of 17th century limitations on how we can use research are simply jarring. Thanks to computers and the internet, we have the ability to do wonderful things, but the licensing and access-restrictions placed on most of the research literature explicitly and artificially prevent most of us from trying. As a result, few researchers bother thinking about using text mining techniques – it is often simpler and easier to just farm-out repetitive large-scale literature analysis tasks to an array of student minions and volunteers to do by-hand – even though computers could and perhaps should be doing these analyses for us.

Inadequate computational access to research has already caused us great harm. Just ask the Ministry of Health in Liberia: they were not pleased to discover, after a lethal Ebola virus outbreak, that vital knowledge locked-away in “forgotten papers” published in the 1980’s, clearly warned that the Ebola virus might be present in Liberia. This information wasn’t in the title, keywords, metadata, or abstract; it was completely hidden behind a paywall. Full text mining approaches would have easily found this buried knowledge and would have provided vital early warning that Ebola could come to Liberia, which might have prevented some deaths during the West African Ebola virus epidemic (2013–2016)

Some subscription-based publishers have been known to use ‘defence’ mechanisms such as ‘trap URLs’ that hinder text miners – making it even harder to do basic research. Whilst other subscription publishers like Royal Society Publishing are helpfully supportive to text miners, as are open access publishers. Hindawi for instance, allows anyone to download every single article they’ve ever published with a single mouse-click. Thanks to open licensing, aggregators like Europe PubMedCentral can bring together the outputs of many different OA publishers, making millions of articles available with a minimum of fuss. It is “no bullshit” access. You want it? You can have it all. No need to beg permission, to spend months negotiating and signing additional contracts, nor to use complicated publisher-controlled access APIs, and their associated restrictions. Furthermore, OA publishers typically provide highly structured full-text XML files which make it even easier for text miners. But only a small fraction of the research literature is openly-licensed open access. It’s for these reasons and more that many of the best text-mining researchers operate-on and enrich our understanding of open access papers-only e.g. Florez-Vargas et al 2016.

So if I had but one wish this Christmas, it would be for the artificial, legally-imposed restrictions on the bulk download and analysis of research texts, to be unambiguously removed for everyone, worldwide – so that no researcher need fear imprisonment or other punitive action, simply for doing justified and ethical academic research. Unchain the literature, and we might be able to properly unleash and apply the collected knowledge of humanity.  


This is my short contribution for Open Access Week 2017, and the #OpenInOrderTo website created by SPARC, to move beyond talking about openness in itself and focus on what openness enables.


New Career, Same Me

April 17th, 2017 | Posted by rmounce in Open Access - (3 Comments)

This is a quick post to announce what I’ll be doing next after my postdoc at the Department of Plant Sciences, University of Cambridge. From June 2017 onwards, I’m delighted to say I’ll be the new Open Access Grants Manager for Arcadia Fund.

About Arcadia Fund

If you haven’t heard of it before here’s what you need to know: Arcadia is a charitable fund, set up by Peter Baldwin and Lisbet Rausing in 2002. So far, it has awarded more than $440 million to cultural, environmental and open access projects. Within the open access funding programme Arcadia have awarded grants to organisations including Creative Commons, Wikimedia Foundation, Authors Alliance, Public.Resource.Org, Internet Archive, Digital Public Library of America and more…

New Career, Same Me

When the job ad came-up I could scarcely believe how good the organisational fit was for me: Arcadia funds brilliant projects in this space. I am genuinely looking forward to developing and advising on Arcadia’s open access policy, to continue engaging with the wider open access community, to manage Arcadia’s existing grants portfolio, and to identify new opportunities for high impact initiatives where funding from Arcadia will make a difference.

I feel extremely grateful to have been chosen for this position against many other talented and experienced applicants (and friends!) and although it’ll take me many months to ‘learn the ropes’ I see this as my new career now, no going back. I’m now part of the 88% majority of UK postdocs who never secure a tenured position in academia; but don’t feel sorry for me – I’m delighted with this new direction. New career, same me.

A lot of passionate, intelligent young people with an academic background have jobs where they can really make a difference (i.e. not in academia). In this regard, I’m inspired by the likes of TJ Bliss at Hewlett Foundation, Carly Strasser at Moore Foundation, Nick Shockey at SPARC, Heather Piwowar and Jason Priem at Impactstory, Joe McArthur at The Right to Research Coalition, and Jonathon Gray at Open Knowledge. Now I’ve turned 30, I’m married, and I have a beautiful baby daughter. Some things have changed, but my passion for open knowledge hasn’t. Doing ‘open’ on the side of research wasn’t enough. Soon it’ll be my full time endeavour!

An Open Letter to Oxford University Press on Publishing

April 2nd, 2017 | Posted by rmounce in Publishing - (Comments Off on An Open Letter to Oxford University Press on Publishing)


From: Ross Mounce <>

Subject: An Open Letter to Oxford University Press on Publishing


Date: Sun, 2 Apr 2017 16:55:50 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Content-Type: text/plain; charset=utf-8; format=flowed

Dear Oxford University Press,

The problems at Oxford University Press (OUP) journals have been
going-on far too long.

It is affecting our ability to do research.

OUP have known about these problems since mid-January yet so many
critical problems still remain unresolved. Where is all the
supplementary materials for MBE and GBE? Where are the “missing” 16
years worth of full text MBE articles?

Providing access to research is the most basic job of a publisher,
yet OUP are failing to do even this simple task at the moment.

The situation is simply outrageous.

On behalf of over 50 signatories representing postdocs, principal
investigators, professors, and others from institutions around the
world, I attach a formal open letter of complaint about the abysmal
service being provided by OUP, in the hope that OUP might finally
provide adequate responses to the various problems detailed therein.


Dr Ross Mounce

Postdoctoral Research Associate & Software Sustainability Fellow
Department of Plant Sciences
University of Cambridge

Seeking Justice for Readers

February 27th, 2017 | Posted by rmounce in Paywall Watch | Wrongly selling OA articles - (Comments Off on Seeking Justice for Readers)

I am highly curious as to why Elsevier do not seem to be responding to emails at the moment:

Four days ago, continuing an existing thread on the public GOAL mailing list, I wrote to Dr Alicia Wise (Director of Access and Policy at Elsevier), about how Elsevier’s paywall systems are wrongly defrauding readers across the world by charging them to access content that has been paid-for to be open access.

Below is the full content of my message to the list, nicely formatted for clarity. I re-publish it here because I am still seeking answers to my questions, and I still want justice for the many unknown readers who may have been defrauded by paying to access content that should instead have been “open access” to all.

I remind readers that the scale of this is not trivial at all. Elsevier themselves admit in 2014 they attempted to refund or credit “about $70,000” to readers who had been wrongly charged for open access content.

Dear Alicia,
Approximately five days ago you wrote on behalf of Elsevier to this mailing list in response to my finding of a single paid-for hybrid “open access” article paywalled and actively being sold at the Elsevier journal Mitochondrion.

In the response Elsevier sought to re-assure the world (open access is for the benefit of everyone) that:

“We’ve gone through the system, this is the only article affected.”

I then found another paid-for hybrid “open access” article paywalled and actively being sold at the Elsevier journal The Lancet.

We appear to have had no official response from Elsevier since.

Today, an independent analysis by Christoph Broschinski of more paid-for hybridOA articles at Elsevier journals may have found up to five additional paywalled, for sale articles. The Cambridge one is a mistake, the payment was for page charges & colour figures not open access, I have done due diligence on this and checked myself since I am at Cambridge.

I’m struggling to reconcile what I have found and what Christoph might have found with Elsevier’s statement:

“We’ve gone through the system, this is the only article affected.”

The most likely conclusion I can draw from Elsevier’s statement and these reports that appear to conflict with that statement is that Elsevier’s system is not adequately tracking paid-for hybridOA articles. 

Assuming this to be true:

1.) Will Elsevier openly publish on a single web page, on a continuous, ongoing basis, the exact DOIs of all articles that Elsevier has been paid to make “hybridOA” , including the DOIs of articles that Elsevier were paid to make open access, that now reside at journals published by other publishers (if the journal was subsequently transferred to another publisher) ?

This will enable any interested party to:

a) Check that each and every one is actually freely accessible from the publisher site landing page

b) This ‘master list’ of Elsevier hybridOA can be cross-checked against institutionally-held lists of paid invoices. Any articles listed by an institution as paid-for OA, but not on Elsevier’s hybridOA ‘master list’ can be further investigated, to perhaps further reveal more articles that should be “open access” that Elsevier’s faulty “system” has overlooked.

2.) Will Elsevier refund 100% of the paid APC to each institution, funder, or individual that has a wrongly paywalled paid-for “open access” article behind a paywall?

3.) Will Elsevier hire and fully pay for an independent 3rd party forensic accounting firm to go through their pay-per-view and re-use licensing data/systems and records, including the period from January 1st 2005 until today (23rd February 2017), to produce a thorough openly available report on the extent of PPV payments AND re-use licensing payments for articles that should not have been sold to access, or to re-use?

I hope Elsevier will do this to ensure that every individual who has paid to access or re-use ‘mistakenly’ paywalled “open access” material is refunded in full, with interest, including the local taxes applied not just the article fee, and not with “credit” that can only be used to purchase other Elsevier goods or services.

4.) What meaningful assurances can Elsevier give that it will not make these mistakes again, given that it appears to be making these mistakes over and over again?

Full open access publishers have an error rate of precisely 0 out of over >500,000 articles published so far e.g. PLOS, PeerJ and eLife.

Errors such as these are simply intolerable and have the potential to cause great harm.
For instance, imagine if an article providing the first report of Ebola in Peru was paid-for to be hybridOA but was instead mistakenly paywalled…
Since Peru is now “too rich” [1] to qualify for HINARI, but still too poor to pay to subscribe to most subscription journals, many Peruvians would not have access to this vital information unless it was open access.

c.f. the first report of Ebola in Liberia which was also infamously paywalled at an Elsevier journal [2]