Show me the data!
Header

In a recent series of posts I’ve become fascinated with how unnecessarily fragile the scholarly communications system seems to be in 2017:

Oxford University Press have failed to preserve access to the scholarly record (23-01-2017)

Documenting the many failures across OUP journals (24-01-2017)

Comparing OUP to other publishers (25-01-2107)

As a reminder, academics literally invented the internet, I think we can and should be doing better.

We have the technology and resources available to make a robust and efficient scholarly communications system, yet from the more than $10 billion per year we spend on it every year we appear to be getting incredibly poor service from our various providers. If we take Digital Object Identifiers (DOIs) as an example – they are great in theory. If I send a colleague a DOI such as “10.1093/sysbio/syw105” in theory, in five years time, no matter who owns this journal, or what platform technology they decide to use, my colleague should be able to follow this DOI-based link to an article landing page: http://doi.org/10.1093/sysbio/syw105 (as of 06-02-2017 this one happens to be broken though!).

The DOI registration agency (CrossRef) that most journals use is highly competent. I have no doubts about their technical abilities, or the support and documentation they provide to publishers. Most modern, born-digital publishers create DOIs for their journal articles with accurate metadata, and little difficulty or breakage. Yet when it comes to some publishers like Oxford University Press, I am amazed to find that more than 2.3% of their DOIs result in a 404 “Not Found” error. Collectively, research institutions, libraries, and personal subscribers across the world pay publishers like OUP a huge amount every year to provide publishing services. Why can’t they actually do the job we pay them to do?

Personally, I have now lost all faith in the ability of many legacy academic publishers to actually publish content on the internet in a robust manner. For example, when we pay them >$3000 to make an article hybrid open access, what happens? They end up putting it behind a paywall and selling it to readers, even despite the hybrid open access payment. Wiley, Elsevier, Springer and now OUP have been caught doing this to some extent. Nor are the failures confined to just subscription access journals. Remember when all of Scientific Reports and other NPG journals were down for a few days with absolutely no prior warning (mid June 2016)? I totally understand the need for publishers to upgrade and or maintain their platforms but the apparent lack of testing or forethought when they decide to fiddle with their platforms is professionally incompetent at times, and this recent problem with OUP definitely falls into this category. When GitLab (not an academic publishing services company, unfortunately) made a serious screw-up recently that caused a major outage to their services, they livestreamed on Youtube their attempts to fix the problem and had it fixed in about 12 hours. They also published full, transparent, extensive and frank reports on what happened, why it happened and how and when they fixed it. In stark contrast to this OUP have given their customers opaque reassurances in an official statement AND still haven’t fixed many of the problems even weeks after! Legacy academic publishers and other modern internet services are sadly miles apart in the levels of service and transparency they provide.

So what can we do to remedy this abominable situation?

I propose that funders, institutions, and authors need to start doing more than just “one-time” compliance checks on the way that research outputs are published. We need continuous, daily, weekly, monthly or quarterly checks on research outputs just to make sure they are still actually there and that vital links to them like DOIs actually work! Additionally, those that pay for these publisher services actually need to start checking that publishers are actually providing good service. Withhold money, or bring consequences to those publishers who provide poor service.

To this end I have created toy code for use in R to help empower authors to check that the DOIs of their own authored research outputs actually work. I have my script setup as a cronjob scheduled to check my DOIs every day. Today’s fully-automated report is below, the 401 error tells me that my letter is behind a paywall at Nature (true), unfortunately, but it’s not a 404 so it’s otherwise okay:


"HTTP.Status","DOI"
"Success: (200) OK","http://doi.org/10.1111/evo.12884"
"Success: (200) OK","http://doi.org/10.7287/peerj.preprints.773v1"
"Success: (200) OK","http://doi.org/10.3897/rio.1.e7547"
"Success: (200) OK","http://doi.org/10.5334/ban"
"Success: (200) OK","http://doi.org/10.1045/november14-murray-rust"
"Success: (200) OK","http://doi.org/10.3897/bdj.2.e1125"
"Success: (200) OK","http://doi.org/10.4033/iee.2013.6b.14.f"
"Success: (200) OK","http://doi.org/10.1002/bult.2013.1720390406"
"Success: (200) OK","http://doi.org/10.5061/dryad.h6pf365t"
"Success: (200) OK","http://doi.org/10.1186/1756-0500-5-574"
"Client error: (401) Unauthorized","http://doi.org/10.1038/nature10266"
"Success: (200) OK","http://doi.org/10.1038/npre.2011.6048"

This idea gets more interesting however if scaled-up to the institutional or funder-level. What would happen if Cambridge University checked the DOIs of all “their” co-authored research outputs every week? What would happen if the Wellcome Trust checked the DOIs of all their funded research outputs every month? At this scale it could be done both time and cost effectively, and it is much more likely to uncover the abundant problems that lie quietly unobserved and under-reported.

Tomorrow I will blog about what I discovered when I checked the DOIs from just 2 years worth (2012 to 2014) of Wellcome Trust funded research. I promise it’ll be interesting and it’ll demonstrate more the utility of this exercise…