Show me the data!

Comparing OUP to other publishers

January 25th, 2017 | Posted by rmounce in Paywall Watch

Any good scientist knows that one must have an adequate experimental control when trying to determine the significance of effects.

Therefore, in order to test the significance of the 106 broken DOIs I reported at OUP yesterday, I created a comparable stratified ‘control’ sample of 21 journals NOT published by OUP that are indexed in pubmed. These 21 are published by a variety of different publishers including PLOS, eLife, NPG, Taylor and Francis, Springer, and PeerJ.

I used the exact same method (screen-scraping pubmed to get the 100 most recent items published at each journal) and checked all of the resulting 1605 DOI URLs that I obtained from these 21 journals (not every item listed in pubmed has a DOI). For the control group of non-OUP journals, I found just 7 broken DOIs. So just 0.4% (to 1 d.p.) of recently minted DOIs at other journals are broken. This is in stark contrast to the >6% failure rate at OUP.  I think it’s fair to say OUP has a significant problem!


The 21 journals included in the OUP set are: 

J Anal Toxicol; FEMS-Microbiology-Ecology; Journal of Heredity; Medical Mycology; Bioinformatics; FEMS-Microbiology-Letters; Journal of Medical Entomology; Mutagenesis; Brain; FEMS-Yeast-Research; Journal of Pediatric Psychology; Briefings in Bioinformatics; Briefings in Functional Genomics; Journal of the Pediatric Infectious Diseases Society (JPIDS); Pathogens and Disease; Glycobiology; Clinical Infectious Diseases; Systematic Biology; Evolution Medicine & Public Health; Journal of Biochemistry; Molecular Biology and Evolution

The 21 journals in the non-OUP set are:
Academic Radiology; Appetite; Neurological Research; Acta Neuropathologica; Autoimmunity; Nutrition and Cancer; Addictive Behaviours; British Journal of Nutrition; PeerJ; AIDS Care; Diabetologia; PLOS ONE; Alcohol; eLife; PLOS Pathogens; Annals of Anatomy; Heliyon; Psychological Medicine; Antiviral Research; Journal of Medical Systems; Scientific Reports


Full logs evidencing the data used in these analyses and the DOIs of each and every article checked are available on github:

Continuous Monitoring

This analysis was hastily done. By using the pubmed or EuropePMC API I could actually script-up some weekly monitoring of ALL journals indexed in pubmed to produce weekly reports like this, ranking each and every publisher in terms of DOI performance. I could do this. But I’m hoping Crossref will publish these simple statistics instead. The scholarly community needs to know this kind of information. I’m hoping it will shame some publishers into improving their practices!

  • Geoffrey Bilder

    Hi Ross. Nice work. Our support team will contact OUP about this, but I note that OUP has been in the midst of changing platform vendors and it seems that, in the process, they have been discovering a number of DOIs that they’d thought had been registered on the old platform but that had never actually been deposited with Crossref. It might be that this will only really be addressed as their migration is completed. This may take some time.

    I looked for the code that you used to generate your data, but couldn’t find it. So perhaps you already know this but:

    1) You needn’t screen-scrape pubmed to get the latest DOIs. You can use the Crossref REST API at You’ll get better coverage as well since you will get more than biosciences (particularly important in this case).
    2) The REST API also supports a “sample” filter which lets you easily select a random sample.

    You mention that it would be good if Crossref were to generate such reports for our members. Well, we do. But they were designed many years ago and are *cough* difficult to interpret. See, for example- our “conflict reports” which list DOIs that have ambiguous and/or “conflicting” metadata. Note that you will need to use a “recent” browser to see these reports. I do hope you have something more recent than Internet Explorer 5+ or Netscape Navigator 7.0 ;-)

    The following is the conflict report for Pensoft who publish a journal you have had some involvement with, RIO:

    And this is how to interpret it:

    So we have to take some responsibility here as we don’t really provide our members with an easy-to-understand way of monitoring their own performance and comparing it to that of other Crossref members. I think you’ve heard me mention that we are in the early stages of revamping our reporting. This project is just kicking off- though you can already see some experimental data that the R&D group has been exposing in our API for years. Again- using Pensoft as an example:

    The above data focuses on measuring the metadata coverage of our members (broken down by ‘current’ and ‘backfile’ content). We will also be including data on broken links, adherence to, Crossref best practice guidelines, and adherence to general web best practices (e.g. HTTPs support, use of meta elements in landing pages, correct use of cookies, etc.) Again- the data that is here now is experimental and should be treated with great skepticism. We still need to develop a more robust statistical methodology for calculating these numbers. We hope to have the revamped system available later this year.

    Again, nice work. And you might want to get a DOI for the data you used for your analysis. And for your code. :-)


    • Thanks for the brilliant help Geoffrey. Pensoft are an excellent example to choose because they have so few conflict reports. Just 4 across ALL their journals (with none at RIO)!

      You’re right it would be an improvement upon my method to use the CrossRef API. Screen-scraping by hand is tiresome.

      My DOI checking code was so simple I didn’t think it was worth sharing but here it is anyway:

      ls > journal-folder-names.txt
      while read line ; do cd $line ; wget –user-agent=”Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0″ -w1 -i links.txt -o log.log ; cd /home/ross/Checking-OUP-DOIs ; done <journal-folder-names.txt
      grep -r "ERROR 404: Not Found" *

      I will do some more analyses of these conflict reports once I understand them. Unfortunately, upon first glance as you warn, they don't seem readily understandable but I will have a go…

      • Hmmm… Attempting to insert a code snippet into the comment box was a bad idea. It appears to have garbled my comment. I shall put the code snippet on github instead & get a DOI for it all via Zenodo.

  • Geoffrey Bilder

    Thought you might be interested in the following OUP announcement related to broken DOIs:;

  • Pingback: Monitoring publishing services - Ross Mounce()