Show me the data!
Header

Yesterday I published a blog post calling for ongoing monitoring of ‘hybrid’ open access articles and academic publisher services in general.

Today I want to share with you some highlights from my brief checks on 2 years worth of Wellcome Trust ‘open access’ article processing charge (APC) supported published research outputs.

Source data

Robert Kiley of the Wellcome Trust has made public official data on the APC spend on ‘open access’ articles paid for by the Wellcome Trust over at his figshare profile. This was a brilliant thing to do. Many people have made thought-provoking and brilliant analysis of this data. The data has been copied many times and I see it now in many different github repositories.

Yesterday, just to test the idea, with no real knowledge that I’d actually find anything of interest, I decided to check the DOIs of 2 years worth (2012 to 2014) of Wellcome Trust funded ‘open access’ articles. Here’s the three major things that I have discovered from this mini exercise so far:

 

1.) Paywalled Articles That Should Be Open Access Wellcome Trust Funded articles that are not openly accessible (updated for accuracy 2017-02-27)

According to Robert Kiley’s figshare data for 2013-2014 Wellcome Trust paid £1,194 to Emerald to make an article entitled Running a hospital patient safety campaign: a qualitative study open access at the publisher website. I followed the DOI link given and found that today this article is paywalled, and is being advertised for sale at £20 for 30 days of access by Emerald Group Publishing (screenshot below):

 

Sadly, I am no stranger to this kind of event. I have personally seen Elsevier, Wiley, Springer and Oxford University Press sell articles that had been paid-for specifically by funders to be open access to everyone in the world, not to be sold at the point of availability. It seems inevitable now that hybrid open access would lead to this. Paywall publishers simply can’t keep the paywalls off, even if they are paid to do so.

UPDATE 2017-02-27: Three weeks after publication of this blog post Wellcome Trust and Emerald kindly confirmed to me that no APC was actually paid for the above article (contrary to what was mistakenly stated in the figshare data). The article authors backed-out of choosing gold open access. Unfortunately, the authors did not self-archive a freely available version of this paper either, so it remains not freely accessible to those outside paywalls and thus was definitely NOT published in a manner compliant with Wellcome Trust rules and regulations that were in place at the time.

 

2.) Misuse of funds set aside to cover Open Access charges

I thought this was another simple case of hybrid open access being ‘mistakenly’ paywalled by the publisher but the truth is even stranger.

I found this article entitled, Mechanisms underlying cortical activity during value-guided choice at the journal Nature Neuroscience, that according to Robert Kiley’s data, the sum of £1,272.86 had been paid by Wellcome Trust to make this article open access. The plot thickens however. Nature Neuroscience doesn’t really do hybrid open access, and if it did, it would charge a lot more than that. What I have discovered here is an instance where the authors (or their institution) have mistakenly (fraudulently?) depending on how forgivingly you view this: used the Wellcome Trust Open Access fund to pay Nature Neuroscience £1,272.86 for colour figures. The article is NOT open access at the publisher website. The 2017 Wellcome Trust guidelines are absolutely clear that you cannot use Wellcome Trust Open Access money for “page charges” or “colour figure” charges. I do not know if Wellcome’s rules were so clear back in 2012 when the payment was made but this is extremely disappointing to observe. Charity funding should be better spent than on spurious publisher-invented ransoms like “colour figure charges”.

3.) Elsevier ‘open access’ articles are not accessible to all machine methods

To help check article DOIs in a simple and automated manner, I used the R package httr. The code I used is available as a github gist. The code works well for articles hosted at all publishers except one: Elsevier. Any attempt to follow DOI links with R::httr just hangs and I have to use a timeout to ensure that my script skips over such problems to proceed onto the next article.

Can Elsevier really call what they are offering ‘open access’ if it is not openly accessible by automated methods such as R::httr scripts? I don’t have time to expound upon this at length here, but I will certainly return to this particular point at a later date.

Conclusions

So there you have it. Super simple automated checks of just a few thousand Wellcome Trust funded ‘open access’ articles by their DOIs has revealed three rather interesting things, and supports my overall thesis that we need to continuously monitor academic publishers: not just “one-time” compliance checks.

I really do think this is the start of something very interesting. I have plans. WATCH THIS SPACE!

In a recent series of posts I’ve become fascinated with how unnecessarily fragile the scholarly communications system seems to be in 2017:

Oxford University Press have failed to preserve access to the scholarly record (23-01-2017)

Documenting the many failures across OUP journals (24-01-2017)

Comparing OUP to other publishers (25-01-2107)

As a reminder, academics literally invented the internet, I think we can and should be doing better.

We have the technology and resources available to make a robust and efficient scholarly communications system, yet from the more than $10 billion per year we spend on it every year we appear to be getting incredibly poor service from our various providers. If we take Digital Object Identifiers (DOIs) as an example – they are great in theory. If I send a colleague a DOI such as “10.1093/sysbio/syw105” in theory, in five years time, no matter who owns this journal, or what platform technology they decide to use, my colleague should be able to follow this DOI-based link to an article landing page: http://doi.org/10.1093/sysbio/syw105 (as of 06-02-2017 this one happens to be broken though!).

The DOI registration agency (CrossRef) that most journals use is highly competent. I have no doubts about their technical abilities, or the support and documentation they provide to publishers. Most modern, born-digital publishers create DOIs for their journal articles with accurate metadata, and little difficulty or breakage. Yet when it comes to some publishers like Oxford University Press, I am amazed to find that more than 2.3% of their DOIs result in a 404 “Not Found” error. Collectively, research institutions, libraries, and personal subscribers across the world pay publishers like OUP a huge amount every year to provide publishing services. Why can’t they actually do the job we pay them to do?

Personally, I have now lost all faith in the ability of many legacy academic publishers to actually publish content on the internet in a robust manner. For example, when we pay them >$3000 to make an article hybrid open access, what happens? They end up putting it behind a paywall and selling it to readers, even despite the hybrid open access payment. Wiley, Elsevier, Springer and now OUP have been caught doing this to some extent. Nor are the failures confined to just subscription access journals. Remember when all of Scientific Reports and other NPG journals were down for a few days with absolutely no prior warning (mid June 2016)? I totally understand the need for publishers to upgrade and or maintain their platforms but the apparent lack of testing or forethought when they decide to fiddle with their platforms is professionally incompetent at times, and this recent problem with OUP definitely falls into this category. When GitLab (not an academic publishing services company, unfortunately) made a serious screw-up recently that caused a major outage to their services, they livestreamed on Youtube their attempts to fix the problem and had it fixed in about 12 hours. They also published full, transparent, extensive and frank reports on what happened, why it happened and how and when they fixed it. In stark contrast to this OUP have given their customers opaque reassurances in an official statement AND still haven’t fixed many of the problems even weeks after! Legacy academic publishers and other modern internet services are sadly miles apart in the levels of service and transparency they provide.

So what can we do to remedy this abominable situation?

I propose that funders, institutions, and authors need to start doing more than just “one-time” compliance checks on the way that research outputs are published. We need continuous, daily, weekly, monthly or quarterly checks on research outputs just to make sure they are still actually there and that vital links to them like DOIs actually work! Additionally, those that pay for these publisher services actually need to start checking that publishers are actually providing good service. Withhold money, or bring consequences to those publishers who provide poor service.

To this end I have created toy code for use in R to help empower authors to check that the DOIs of their own authored research outputs actually work. I have my script setup as a cronjob scheduled to check my DOIs every day. Today’s fully-automated report is below, the 401 error tells me that my letter is behind a paywall at Nature (true), unfortunately, but it’s not a 404 so it’s otherwise okay:


"HTTP.Status","DOI"
"Success: (200) OK","http://doi.org/10.1111/evo.12884"
"Success: (200) OK","http://doi.org/10.7287/peerj.preprints.773v1"
"Success: (200) OK","http://doi.org/10.3897/rio.1.e7547"
"Success: (200) OK","http://doi.org/10.5334/ban"
"Success: (200) OK","http://doi.org/10.1045/november14-murray-rust"
"Success: (200) OK","http://doi.org/10.3897/bdj.2.e1125"
"Success: (200) OK","http://doi.org/10.4033/iee.2013.6b.14.f"
"Success: (200) OK","http://doi.org/10.1002/bult.2013.1720390406"
"Success: (200) OK","http://doi.org/10.5061/dryad.h6pf365t"
"Success: (200) OK","http://doi.org/10.1186/1756-0500-5-574"
"Client error: (401) Unauthorized","http://doi.org/10.1038/nature10266"
"Success: (200) OK","http://doi.org/10.1038/npre.2011.6048"

This idea gets more interesting however if scaled-up to the institutional or funder-level. What would happen if Cambridge University checked the DOIs of all “their” co-authored research outputs every week? What would happen if the Wellcome Trust checked the DOIs of all their funded research outputs every month? At this scale it could be done both time and cost effectively, and it is much more likely to uncover the abundant problems that lie quietly unobserved and under-reported.

Tomorrow I will blog about what I discovered when I checked the DOIs from just 2 years worth (2012 to 2014) of Wellcome Trust funded research. I promise it’ll be interesting and it’ll demonstrate more the utility of this exercise…

Comparing OUP to other publishers

January 25th, 2017 | Posted by rmounce in Paywall Watch - (5 Comments)

Any good scientist knows that one must have an adequate experimental control when trying to determine the significance of effects.

Therefore, in order to test the significance of the 106 broken DOIs I reported at OUP yesterday, I created a comparable stratified ‘control’ sample of 21 journals NOT published by OUP that are indexed in pubmed. These 21 are published by a variety of different publishers including PLOS, eLife, NPG, Taylor and Francis, Springer, and PeerJ.

I used the exact same method (screen-scraping pubmed to get the 100 most recent items published at each journal) and checked all of the resulting 1605 DOI URLs that I obtained from these 21 journals (not every item listed in pubmed has a DOI). For the control group of non-OUP journals, I found just 7 broken DOIs. So just 0.4% (to 1 d.p.) of recently minted DOIs at other journals are broken. This is in stark contrast to the >6% failure rate at OUP.  I think it’s fair to say OUP has a significant problem!

 

The 21 journals included in the OUP set are: 

J Anal Toxicol; FEMS-Microbiology-Ecology; Journal of Heredity; Medical Mycology; Bioinformatics; FEMS-Microbiology-Letters; Journal of Medical Entomology; Mutagenesis; Brain; FEMS-Yeast-Research; Journal of Pediatric Psychology; Briefings in Bioinformatics; Briefings in Functional Genomics; Journal of the Pediatric Infectious Diseases Society (JPIDS); Pathogens and Disease; Glycobiology; Clinical Infectious Diseases; Systematic Biology; Evolution Medicine & Public Health; Journal of Biochemistry; Molecular Biology and Evolution

The 21 journals in the non-OUP set are:
Academic Radiology; Appetite; Neurological Research; Acta Neuropathologica; Autoimmunity; Nutrition and Cancer; Addictive Behaviours; British Journal of Nutrition; PeerJ; AIDS Care; Diabetologia; PLOS ONE; Alcohol; eLife; PLOS Pathogens; Annals of Anatomy; Heliyon; Psychological Medicine; Antiviral Research; Journal of Medical Systems; Scientific Reports

 

Full logs evidencing the data used in these analyses and the DOIs of each and every article checked are available on github: https://github.com/rossmounce/Checking-OUP-DOIs

Continuous Monitoring

This analysis was hastily done. By using the pubmed or EuropePMC API I could actually script-up some weekly monitoring of ALL journals indexed in pubmed to produce weekly reports like this, ranking each and every publisher in terms of DOI performance. I could do this. But I’m hoping Crossref will publish these simple statistics instead. The scholarly community needs to know this kind of information. I’m hoping it will shame some publishers into improving their practices!

Updated 2017-02-01: Mathematical equation rendering failures spotted at the journal ‘Molecular Biology & Evolution’ (MBE). Added to the lengthy list.

In this post I shall try and summarise the different types of error that are occurring across Oxford University Press (OUP) journals at the moment.

It appears OUP have changed their underlying platform software this year, and that they haven’t done enough testing before putting it into production. The variety of different errors encountered is truly astonishing.

1.) Missing Articles

As documented yesterday with an example, OUP have failed to do the most basic task of a publisher: preserve access to paid-for subscription content. 24 hours later after I reported it missing, the Bayes Factor article is now available, but the DOI URL (http://dx.doi.org/10.1093/sysbio/syw101) still doesn’t resolve to it. Speaking of which…

2.) Paywalling Open Access Articles (update 1) 

Oddly OUP have managed to paywall an article at the normally fully open access journal ‘Nucleic Acids Research’: the article ‘A novel method for crosstalk analysis of biological networks: improving accuracy of pathway annotation‘ appears to be inaccessible at OUP’s site. Additionally, through Rightslink, they are selling the re-use rights to this article. To determine if this was real or not I made a test purchase, specifying that I wanted to re-use this article in a non-commercial setting, in a presentation. I was charged and paid 42.14 GBP for the right to re-use 1 page of this article in an educational, non-commercial presentation. You can see a screenshot of my receipt for this rights purchase here.

3.) Broken DOI’s that don’t resolve to article landing pages

DOI’s are an integral part of modern 21st century publishing infrastructure. They are supposed to be reliable, persistent links to content. I tested 1735 recently minted DOI’s across 21 different journals published my OUP that are indexed in PubMed. The log files to provide full evidence of my testing are available on github. When a DOI fails to resolve to an article landing page it gives a 404 error. I found that 106 (over 6%) of the recently minted DOI’s I examined gave 404 errors. Remarkably 82 of these failures all come from article DOIs at one journal: the Journal of Medical Entomology.

4.) Editors appearing (erroneously) listed as additional authors of papers

I haven’t observed this myself, but apparently keen eyes at Systematic Biology have spotted this occurring to some article pages.

5.) Journal Articles Appearing as Published by a Totally Different Journal

Yesterday I found that 15 Systematic Biology articles appear to be published in the “Logic Journal of the IGPL” (as of today, I think some have been fixed and inevitably they will all get fixed eventually, so I have a screenshot below to prove it)

 

6.) Unexpected lack of indexing in PubMed

I happen to really like the journal Gigascience – they unfortunately decided to move from BioMedCentral to publishing with OUP starting this year, and they seem to have been hardest hit by the problems at OUP. For unknown reasons it is readily apparent that PubMed hasn’t indexed any Gigascience articles since November 2016! See for yourself: https://www.ncbi.nlm.nih.gov/pubmed/?term=%22Gigascience%22%5Bjournal%5D

This is a really serious problem. If I was an author of a recent Gigascience article I would be furious about this. Recent articles there are completely invisible to literature searches performed at PubMed. This has affected 49 articles in the December Issue (Vol 5 Issue 1), as well as 15 advance access articles that haven’t been assigned an issue yet. Hundreds, perhaps thousands of authors are affected by this. If I were OUP I would make this bug the highest priority to fix. 

7.) Mathematical equations failing to render in the HTML (on any browser) [update 2]

As spotted by Brian O’Meara and independently confirmed by Joseph Brown. See below for an example:

 

8.) Article landing pages with no article title or authorship details visible

This bug is affecting articles at Evolution, Medicine & Public Health, Gigascience, Nucleic Acids Research and probably more. I’m certain it is not a ‘deliberate’ style choice.

9.) Some DOIs redirecting to placeholder PDFs (instead of actual content)

10.) Article Views data appears to have been reset to zero

 

I note that OUP have put out a statement to “apologize sincerely” for these issues. But I am not convinced a mere apology is enough compensation when many of the errors remain unfixed.

I call upon libraries, authors of recent articles in OUP journals, and academic societies that publish with OUP to seriously consider taking further action about this matter. Many of these problems have been present at OUP journals since at least January 13th 2017. OUP have been incredibly slow to identify and fix these problems and many of them should not have been problems in the first place – completely avoidable with adequate testing.

Tomorrow I will assess the situation again and update with any new reports of errors or action taken.