Show me the data!

Author Archives: rmounce

So, apparently Elsevier are launching a new open access mega-journal some time this year, joining the bandwagon of similar efforts from almost every other major publisher. A lovely acknowledgement of the roaring success of PLOS ONE, who did it first a long time ago.

They’re only ~8 years behind, but they’re learning. I for one am pleased they are asking the research community what they want from this new journal. One of their “key points” in the press release is: “the journal will be developed in close collaboration with the research community and will evolve in response to feedback”

Well, I’m a member of the research community. I’m a BBSRC-funded postdoc at the University of Bath. I publish research myself AND I re-use published research, so I have a dual perspective that Elsevier should find useful. Here’s my feedback on their new open access journal proposal:


  • Does the research community really need or want a new journal?

We have at least 27,000 other peer-reviewed journals (source: Ulrich’s). I can’t see anything in Elsevier’s proposal that’s really new, or better than anything that already exists – you’ll be hard pressed to beat PeerJ. More journals add to the fragmentation of the research literature – it’s already hard to search across all these journals effectively. Why not just accept more volume in existing journals? It’d be great if you flipped The Lancet, Cell, and Trends in Ecology and Evolution to full (100%) open access journals, and rejected less submitted papers that present sound science. I genuinely do not know of any researcher that asked specifically for an additional new Elsevier journal.


The definition of open access always has been, and always will be this:

By “open access” to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited. (BOAI)

If you’re going to allow the CC-BY-NC-ND licence then by definition you can’t call it an open access journal. Either don’t allow that restrictive non-open licence, or call this new journal a ‘free-to-read’ journal or a ‘public access’ journal. These are the established terms for cost-free but not open journal content that the research community uses. Speak our language for a change instead of deliberately opaque legalese.


  • Take feedback on the design of your new journal from the WORLD not just the research community

Approximately 80% of the world’s academic research is taxpayer or charitably funded. The world is therefore your customer, not just researchers. Ask the world what they want from your new journal.


Take inspiration from the Panton Principles: “Science is based on building on, reusing and openly criticising the published body of scientific knowledge” – help researchers do the best science possible by not allowing them any excuses to not share non-sensitive data with their colleagues. The ’email the author’ system has been widely proven not to work, in my own experience too.


  • Make peer reviews open for all to see, post-publication alongside the paper

At the time of review, you can do single or double blind, but after the manuscript is accepted and published, please publish the reviews alongside the accepted paper. The research community can then see for themselves how good peer review is at your new journal. Allow people to sign their reviews if they wish to (and personally I think this is best in most circumstances).


  • Encourage data citation

Do I really need to explain this one? Old school academic editors have apparently been striking these out at some journals. Please make all editors aware that this is both a good thing and is encouraged.


  • Encourage authors to provide their ORCIDs upon submission, (and ORCIDs for reviewers and editors too please)

This will help people disambiguate who’s who’s which is important when there are at least 7 million active researchers.


  • Charge a reasonable APC ($1350 or less), and be generous with fee waivers and discounts for those that cannot afford them

Anything more than $1350 per article for a new journal in 2015 is daylight robbery. For the first year of publication you should waive charges for everyone, as everyone else does.


  • Provide open, full text XML

Great for text-mining. We don’t need your API. Just give us the content.


There you go Elsevier – that’s my feedback. If you can do ALL of the above or better, I might even publish with you myself. I have stated what I think you should do; it’s up to you now to implement it. I anticipate the launch of your glorious new journal. When your new journal comes out I shall revisit this post & score your new journal against it.


I encourage all other researchers & the scholarly poor who feel similarly, to also make their feelings known to Elsevier, and to add points I have perhaps overlooked. I’d say good luck Elsevier, but you don’t need luck with your fat profit margins – it’s simple to openly publish a good peer-reviewed research journal – just get on and do it already.




Ross Mounce, PhD





Dark Research: behind the preprint

January 5th, 2015 | Posted by rmounce in Phylogenetics | PLoS | Publications - (0 Comments)

This post is about my new preprint I’ve uploaded to PeerJ PrePrints:

Mounce, R. (2015) Dark Research: information content in some paywalled research papers is not easily discoverable online. PeerJ PrePrints

Needless to say, it’s not peer-reviewed yet but you can change that by commenting on it at the excellent PeerJ PrePrints website. All feedback is welcome.

The central hypothesis of this work is that content in some academic journals is less discoverable than in other journals. What do I mean discoverable? It’s simple really. Imagine a paper is a bag-of-words:


If academic search providers like Google Scholar, Web of Knowledge, and Scopus can correctly tell me that this paper contains the word ‘rat’ then this is good and what science needs. If they can’t find it, it’s bad for the funders, authors and potential readers of that paper – the rat research remains hidden as ‘dark research’, published but not easily found. More formally, in terms of information retrieval you can measure search performance across many documents by assessing recall.

Recall is defined as:

the fraction of the documents that are relevant to the query that are successfully retrieved

As a toy example: if there are 100 papers containing the word ‘rat’ in Zootaxa, and Google Scholar returns 50 search results containing the word ‘rat’ in Zootaxa, then we ascertain that Google Scholar has 50% recall for the word ‘rat’ in Zootaxa.

In my preprint I test recall for terms related to my subject-area against >20,000 full text papers from PLOS ONE & Zootaxa. The results are really intriguing:

  • Web of Knowledge is shown to be consistently poor at recall across both journals (not surprising, it only indexes titles, abstracts and keywords – woeful at detecting words that are typically present only in the methods section).
  • Google Scholar appears to have near-perfect recall of PLOS ONE content (open access), but less than 50% recall on average of Zootaxa content.
  • Scopus shows an inverse trend: reasonably consistent and good recall of Zootaxa content, averaging ~70% recall for all tests but poorer at recalling PLOS ONE content (45% recall on average).


Why is Google Scholar so poor at recalling terms used in Zootaxa papers? Is it because Zootaxa is predominantly a subscription access journal?


Why is Scopus so poor at recalling terms used in PLOS ONE papers? PLOS ONE is an open access journal, published predominantly under CC-BY – there should be no difficulty indexing it. Google Scholar demonstrates that it can be done well.


Why is Scopus so much better than Google Scholar at indexing terms in Zootaxa? What does Scopus do, or have that Google Scholar doesn’t?


I don’t pretend to have answers to these questions – academic search providers tend to be incredibly opaque about how they operate. But it’s fascinating, and slightly worrying for our ability to do research effectively if we can’t know where knowledge has been published.

More general thoughts

Why is academic content search so bad in 2015? It’s really not that hard for born-digital papers! Is this another sign that academic publishing is broken? Discoverability is broken & inconsistent. Access is broken & inconsistent. Peer review is broken & inconsistent. Hiring & tenure is broken & inconsistent…


The good news is; there’s a clear model for success here if we can identify its exact determinants: PLOS ONE & Google Scholar provide excellent discoverability (>95%). Whatever they’re doing, I suggest publishers copy it to ensure high discoverability of research.



My thoughts on Generation Open

December 9th, 2014 | Posted by rmounce in Generation Open | Open Access - (0 Comments)

I’ve just given an email interview for Abby Clobridge, for a forthcoming short column in Online Searcher.

I give many of these interviews and often very little material from it gets used, so I asked Abby if it was okay if I reposted what I wrote. Her response: “go for it” – thanks Abby! So here’s my thoughts on Generation Open, for a readership of librarians and information professionals:

1) Why are Open issues particularly important for early career researchers? 

Science is digital and online. Virtually no-one hand-writes a manuscript with pen & paper. Our digital research objects e.g. papers, data, software, if open as per can be freely copied and shared to all, for the benefit of everyone. Yet legacy business models from the past are putting awkward constraints, restrictions and obstructions on the publishing and re-use of our research objects. This is deeply wrong. For reasons of efficiency, economic benefit & morality our research should be open, particularly if it’s publicly or charitably funded. Non-open research creates horrid inefficiencies and inequalities that effect us all. Early career researchers are the future of research; we are the ones who can put things right and do research as it should be done – maximising the utility of the internet for low-cost, open dissemination, evaluation and discussion of research. If the early career community don’t act now to help change things, change simply won’t happen.

2) What kind of changes would you like to see within universities/colleges in regard to Open Access, Open Education, or Open Data? 

All lecture material material should be openly-licensed and available online. It’s mad to think that lecturers all over the world are creating new slides every year with essentially the same content. Deeply inefficient. Share teaching materials. Re-use & adapt good content you find. Save time & enrich the quality of your teaching.
Teaching in many ways stems from research. There would be a lot more open content available for worry-free re-use & adaption if research papers, particularly research figures were openly-available. I honestly don’t think research academics are all that aware of the licencing costs involved for re-using non-open research to which a traditional publisher has taken the copyright of. Peter Murray-Rust has a great example  of a Nature paper, that if you want to print 10 copies of it for teaching purposes, it costs $1610 USD, not including the paper & ink, just the licence to reproduce!


It’s ridiculously obstructive and a waste of good research. No one will use that paper for teaching because of the prohibitive licencing costs. By contrast, open access papers published under the Creative Commons Attribution Licence (CC BY) can be emailed, put on Moodle, printed for no additional cost, nor does one need to ask permission before re-use. Open removes barriers and makes life easier for everyone.


With respect to data & software, institutions need to train-up their staff & students more in terms of research data management, reproducible research, git & version control. It’s mildly embarrassing that external (but brilliant) organisations like Software Carpentry & Data Carpentry are taking up the slack and giving everyone the training that they need. All Software Carpentry sessions in the UK have been packed as far as know because that kind & quality of training simply isn’t being adequately provided at many institutions.


3) What can librarians do to support ECRs in regards to being open? 


Go out into departments and speak to people. Give energetic presentations in collaboration with an enthusiastic researcher in that department (sometimes a librarian alone just won’t get listened to). Academics sorely need to know:
  • * the cost of academic journal subscriptions
  • * that using journal impact factors to assess an individual’s research is statistically illiterate practice
  • * the cost of re-using non-open research papers for teaching purposes (licencing)
  • * What Creative Commons licences are, and why CC BY or CC0 are best for open access
  • * new research tools that support open research: Zenodo, Dryad, Github, Sparrho, WriteLatex etc…


4) What action(s) have you personally taken to support or promote openness?


How long a list do you want?


5) Anything else I’m not asking that you think is important… 


What do I think of NPG’s recent #SciShare announcement. Will it help people gain access to research?


No. I think it’s just another form of #BeggarAccess. The actual terms & conditions of the scheme are extremely limiting and do not resemble the initial hype around the scheme when it was first announced. The Open Access Button and #icanhazpdf remain as the most optimal solutions for access to proper copies of NPG articles.


What do I think of the attitude and prevalence of academic copyright infringement amongst early career researchers?


Everyone is knowingly or unknowingly committing copyright infringement at the moment. If we didn’t, research would be incredibly painfully slow and inefficient. Ignoring silly laws is what my generation do. For context; the Napster generation was 1999-2001 – that was a long, long time ago. We know how to share files online. We know how to use torrents. I really don’t know why libraries don’t cut more subscription journals – the academic community is very good at routing around damage caused by paywalls. Have faith in our ability to find access, even if the institutional library can’t provide it. Cut subscriptions, let them go, we don’t need or want the restrictions they offer.

Nature’s Beggar Access

December 2nd, 2014 | Posted by rmounce in Open Access - (8 Comments)

Nature has announced a press release about a new scheme they’ve come up with to legalise begging to view research.

Picture from / All Rights Reserved, copyright not mine.

Pic lovemeow All Rights Reserved, copyright not mine.


The situation before this scheme was that the scholarly poor would beg for access via private social media (email) and public social media (e.g. twitter #icanhazpdf). Kind, privileged subscribers with access to Nature magazine would then privately pass along a printable PDF copy via untrackable/untraceable ‘dark social‘ means.


After this announcement, the situation won’t change much. The printable-PDF that most people use and want is still under a 6 month embargo. It can’t be posted to an institutional repository.

The scholarly poor, without a Nature subscription, will still need to beg subscribers for access to specific articles they want. Only now this begging is more clearly legalised. Nature will graciously, formally allow privileged subscribers to share an extremely rights-restricted locked-down view of Nature articles with their scholarly poor friends. These view-only articles CANNOT be printed, presumably because that would enable untrackable ‘offline’ sharing of research.

Which makes me think? What are the real reasons behind this new policy?

Macmillan Publishers Ltd who publish Nature, also run Digital Science who are an investor of AND an investor in ReadCube.

It’s clear that this new policy is major PR for ReadCube – the links will presumably direct to Nature articles view-only within ReadCube. The more subtle boost is also for & the altmetrics of all shared Nature articles.

If this PR stunt converts some dark social sharing of PDFs into public, trackable, traceable sharing of research via non-dark social means (e.g. Twitter, Facebook, Google+ …) this will increase the altmetrics of Nature relative to other journals and that may in-turn be something that benefits


I’m sorry to be so cynical about this PR stunt, but it really doesn’t appear to change much. It will convert a small amount of semi-legal ‘dark social’ sharing, into formally legal public social sharing of research.

It has legalised begging.

It also panders to those that think true open access publishing is “a solution for a problem that does not exist”. A shrewd measure retarding the progress towards the inevitability of open access.


Congratulations Nature, hmmm…?


For less cynical posts see Wilbanks review: Nature’s Shareware Moment, and Michael Eisen’s ‘Is Nature’s “free to view” a magnanimous gesture or a cynical ploy?‘.


Update: It’s come to my attention that the ‘annotation’ function that the press release mentions is also likely to be a ReadCube-only feature. This is classic lock-in strategy. Please DO NOT annotate any Nature papers you read using Beggar Access. Macmillan / Digital Science / ReadCube are clearly looking to monetize annotations on their proprietary platform.

Also, it looks like blind & visually-impaired people don’t benefit from this. I don’t think standard screen-reading software works with ReadCube. Thanks to a suggestion from @derivadow I tried the ChromeVox screen-reader plugin and that seemed to work, it could read-out all the words. I do not know if it works with popular screen-reader software like Orca or JAWS.