Show me the data!
Header
This has done the rounds on Twitter a lot recently, and justifiably-so but just in case you haven’t seen it yet…
I thought I’d quickly blog about this excellent graph published on a FrontiersIn blog late last year (source/credit: http://blog.frontiersin.org/2015/12/21/4782/ )
Source, Credit, Kudos, and Copyright: Pascal Rocha da Silva, originally posted here.

Source, Credit, Kudos, and Copyright: Pascal Rocha da Silva. Originally posted here.

With data from 570 different journals, it appears to demonstrate that rejection rate (the percentage of papers submitted, but NOT accepted for publication at a journal) has no apparent correlation with journal impact factor.

 

Why is this significant?

 

Well, a lot of people seem to think that ‘selectivity’ is good for research. That somehow by rejecting lots of perfectly valid papers submitted to a journal, it somehow ensures increased ‘quality’ (citations?) of the papers that are eventually accepted for publication at a journal. The fact is, high rejection rates in practice indicate that a lot of good research papers are being rejected just to satisfy an unjustified fetish for arbitrary and crude pre-publication filtering. This is important evidence for advocates of the ‘publish first, filter post-publication’ philosophy; as put into practice by journals such as F1000Research and Research Ideas and Outcomes.

 

Release early, release often?

 

Rejecting perfectly good/sound research causes delays in the dissemination of knowledge – rejected manuscripts have to be reformatted, resubmitted and re-reviewed elsewhere at great cost. The overwhelming majority of initially rejected manuscripts get published somewhere else, eventually. So why bother rejecting them in the first place, if all it does is waste time and effort?

Please show your friends the graph if they haven’t already seen it. I think data like this could change a lot of people’s minds…

Further Reading:

Similar findings have been reported before with smaller samples:
Schultz, D. M. 2010. Rejection rates for journals publishing in the atmospheric sciences. Bull. Amer. Meteor. Soc. 91:231-243 DOI: 10.1175/2009bams2908.1

I’ve written 29 blog posts this year! Still time for one more…

This work relates to my new postdoc at the University of Cambridge in Sam Brockington’s group.

I’ve been closely examining IUCN RedList data for plant taxa and found some rather odd things.

Out of the 100 or so plant species that the IUCN RedList asserts as ‘extinct’, at least 16 of them are growing alive and well somewhere in the world at the moment.

For some species even Wikipedia notes the conflict between reality and the ‘official’ IUCN assessment e.g. for Rauvolfia nukuhivensis.

Here are the 16 plant species that I think are incorrectly assessed as ‘extinct’ right now by the IUCN RedList:

Astragalus nitidiflorus, Cnidoscolus fragrans, Cynometra beddomei, Dipterocarpus cinereus, Dracaena umbraculifera, Madhuca insignis, Melicope cruciata, Ochrosia brownii, Ochrosia fatuhivensis, Ochrosia tahitensis, Pausinystalia brachythyrsum, Pouteria stenophylla, Rauvolfia nukuhivensis, Wendlandia angustifolia, Wikstroemia skottsbergiana, Wikstroemia villosa

Additionally to the 16 above, with less certainty, I also think the Hawaiian taxa Delissea kauaiensis and Delissea niihauensis might have some individuals still alive according to this Department of Land and Natural Resources ‘Fact Sheet’ from 2013.

 

Why not harness the wisdom of the crowds and/or semi-automated text mining?

 

It’s remarkable that the IUCN RedList still lists some of these as ‘extinct’ when there are easily findable peer-reviewed articles reporting the rediscovery and hence extant status of these taxa. To their credit, many are listed as “needs updating” but still, if there are important updates to statuses why not just go in and make the change(s) to correct the record?   The IUCN RedList page listing Wendlandia angustifolia as ‘extinct’ is possibly the worst example – it was reported as rediscovered back in the year 2000, more than a decade ago! The IUCN has had 15 years to update their incorrect assertion of ‘extinct’ for this taxon!

I can’t possibly go through the literature and check all other IUCN-listed plant taxa myself but this does seem like a great opportunity for ContentMine tools to help the IUCN RedList stay on top of the latest updates about IUCN RedListed taxa. See ‘Daily updates on IUCN Red List species‘ for more on that idea.

 

Below I list sources of information relating to the 16 species that I think are definitely NOT extinct, despite being listed as such on the IUCN RedList.

Wahyu, Y., Wihermanto, N., Risna, R. A., and Ashton, P. S. 2013. Rediscovery of the supposedly extinct Dipterocarpus cinereus. Oryx 47:324.

Martínez-Sánchez, J. J., Segura, F., Aguado, M., Franco, J. A., and Vicente, M. J. 2011. Life history and demographic features of Astragalus nitidiflorus, a critically endangered species. Flora – Morphology, Distribution, Functional Ecology of Plants 206:423-432.

Lorence, D. and Butaud, J.-F. 2011. A reassessment of Marquesan Ochrosia and Rauvolfia (Apocynaceae) with two new combinations. PhytoKeys 4:95+

Viswanathan MB, Harrison Premkumar E, Ramesh N. 2000. Rediscovery of Wendlandia angustifolia Wight ex Hook.f. (Rubiaceae), from Tamil Nadu, a species presumed extinct. J. Bombay Nat. Hist. Soc. 97. (2): 311-313

Oppenheimer, H. 2011. New Hawaiian plant records for 2009 Records of the Hawaii Biological Survey for 2009–2010. Bishop Museum Occasional Papers 110: 5–10 [notes the rediscovery of Wikstroemia villosa]

Shenoy et al. 2014. Extended distribution of Madhuca insignis (Radlk.) H. J. Lam. (Sapotaceae) – A Critically Endangered species in Shimoga District of Karnataka. ZOO’s PRINT  Volume XXIX, Number 6

Sudhi, K. S. 2012. Rediscovered tree still ‘extinct’ on IUCN Red List. The Hindu. [Cynometra beddomeii]

Missouri Botanical Garden 2012. Umbrella Draceana. [Dracaena umbraculifera might be extinct in the wild, but it is still successfully grown in many botanical gardens!]

 

 

 

 

 

OpenCon 2015 Brussels was an amazing event. I’ll save a summary of it for the weekend but in the mean time, I urgently need to discuss something that came up at the conference.

At OpenCon, it emerged that Elsevier have apparently been blocking Chris Hartgerink’s attempts to access relevant psychological research papers for content mining.

No one can doubt that Chris’s research intent is legitimate – he’s not fooling around here. He’s a smart guy; statistically, programmatically and scientifically – without doubt he has the technical skills to execute his proposed research. Only recently he was an author on an excellent paper highlighted in Nature News: ‘Smart software spots statistical errors in psychology papers‘.

Why then are Elsevier interfering with his research?

I know nothing more about his case other than what is in his blog posts, however I have also had publishers block my own attempts to do content mining this year, so I think this is the right time for me to go public about this, in support of Chris.

My own use of content mining

I am trying to map where in the giant morass of research literature Natural History Museum (London) specimens are mentioned. No-one has an accurate index of this information. With the use of simple regular expressions it’s easy to filter hundreds of thousands of full text articles to find, classify and lookup potential mentions of specimens.

In the course of this work, I was frequently obstructed by BioOne. My IP address kept getting blocked, stopping me from downloading any further papers from this publisher. I should note here that my institution (NHMUK) pays BioOne to provide access to all their papers – my access is both legitimate and paid-for.

Strong claims, require strong evidence. Thankfully I was doing my work with the full support and knowledge of the NHM Library & Archives team, so they forwarded one or two of the threatening messages they were getting from the publishers I was mining. I have no idea how many messages were sent in total. Here’s one such message from BioOne (below)

Blocked by BioOne

Blocked by BioOne

So according to BioOne, I swiftly found out that downloading more that 100 full text articles in a single session is automatically deemed “excessive” and “a violation of permissible activity“.

Isn’t that absolutely crazy? In the age of ‘big data’ where anyone can download over a million full text articles from the PubMed Central OA subset at a few clicks, an artificially imposed-restriction of just 100 is simply mad and is anti-science. As a member of a subscription-paying institution I have a paid right to be able to access and analyze this content surely? We are paying for access but not actually getting full access.

If I tell other journals like eLife, PLOS ONE, or PeerJ that I have downloaded every single one of their articles for analysis – I get a high-five: these journals understand the importance of analysis-at-scale. Furthermore, the subscription access business model needn’t be a barrier: the Royal Society journals are very friendly with content mining – I have never had a problem downloading entire decades worth of journal content from the Royal Society journals.

I have two objectives for this blog post.

1.) A plea to traditional publishers: PLEASE STOP BLOCKING LEGITIMATE RESEARCH

Please get out of the way and let us do our research. If our institutions have paid for access, you should provide it to us. You are clearly impeding the progress of science. Far more content mining research has been done on open access content and there’s a reason for that – it’s a heck of a lot less hassle and (legal) danger. These artificial obstructions on access to research are absurd and unhelpful.

2.) A plea to researchers and librarians: SHARE YOUR STORIES

I’m absolutely sure it’s not just Chris & I that have experienced problems with traditional publishers artificially obstructing our research. Heather Piwowar is one great example I know. She bravely, extensively and publicly documented her torturous experiences with negotiating access & text mining to Elsevier-controlled content. But we need more people to speak-up. I fear that librarians in particular may be inadvertently sweeping these issues under the carpet – they are most likely to get the most interesting emails from publishers with respect to these matters.

This is a serious matter. Given the experience of Aaron Swartz; being faced with up to 50 years of imprisonment for downloading ‘too many’ JSTOR papers – it would not surprise me if few researchers come forward publicly.

Anecdata On Sharing Science

October 1st, 2015 | Posted by rmounce in ARCS2015 - (0 Comments)

[This is my competition entry for the ARCS2015 essay competition hosted at The Winnower. I’m using their excellent WordPress plugin to automagically transfer this post from my blog to their site at the click of a button.]

There’s a 1,000-word limit for this competition, so forgive my brevity. I could easily write ten thousand! These are merely a couple of vignettes.

To really understand why open is better, you should try traditional science. Otherwise, you won’t see all the most awful practices as these are usually hidden from view.

My first peer-reviewed paper was published in a popular glamour magazine called Nature. Most academics read it for the News and Jobs sections, but it also publishes some research articles too. Editorially, it selects research articles for publication on the basis of their news-worthiness which has unfortunate side-effects: significantly more of these stories eventually get retracted or corrected, relative to other journals which focus more on the correctness of the science.

My one-page, one-figure article simply pointed out that an article the magazine had previously published on its front cover was wrong. I wasn’t the only one to notice this either. Amazingly, it took the journal 160 days from submission to publication to publish my small contribution. This was my first author-experience of the vast inefficiency, bureaucracy, and secrecy practised by traditional ‘closed’ science journals.

It was thus made obvious to me from a very early stage of my PhD that there had to be other better, faster, cheaper, more-enriched ways of communicating science available. Nowadays I wouldn’t recommend anyone to use the traditional (read: slow, obstructive, secretive) means of post-publication commentary. If you want to communicate what is poor about a paper published at a traditional journal, writing to the journal is the very least effective means of doing so. Use PubPeer, PubMed Commons, blogs, Twitter, or even The Winnower for post-publication peer-review.  Making incisive, well-communicated points about research you have read, and sharing these thoughts, openly for others to read and comment on, is a valuable skill. Although hard to evidence, I believe I have gained respect and wider exposure for doing this myself, as have others e.g. Rosie Redfield whom I would not have heard of were it not for her excellent critique of the #arseniclife paper (for those who don’t know the about it: the original paper was published in another glamour mag, and was also subsequently formally-rebutted with neutrally-titled ‘Technical Comments‘ 177 days after online publication, despite Rosie’s much more timely blogged-rebuttal which went online 2 days after the initial publication). I’m not alone in thinking these are glaring examples of how traditional science communication is broken.

Even simply sharing your research talk slides online can be hugely beneficial for your career

Another thing I learned by experience, early-on in my PhD was that there’s a problematic absence of data supporting many research articles. To put it more bluntly; most articles have pretty figures and lovely prose but many simply don’t make the underlying data available. I discussed this at length, with evidence in a conference presentation at the Young Systematists Forum, 2010. I pro-actively put my talk slides online to share my ideas on this with the world and with the help of Twitter, this one small act of sharing a conference presentation directly-led to a multiplicity of benefits:

  • I was invited on to the council of the Systematics Association, so I can try to influence the future direction of the society towards data sharing, open access, and better publishing (it’s work in progress, large committees have a tendency to change slooooooowly)
  • I was invited to join an international collaboration to document the lack of data archiving for phylogenetic studies, which was published in 2012, in an open access journal, and has been cited 18 times so far
  • It also led to my first invited speaking slot at the Open Knowledge Foundation conference in 2011 (OKCon), which in turn helped me become aware of and successfully apply for one of the first Panton Fellowships for Open Data in Science (£8,000)

So one small act of sharing directly-led to Fellowship money, many speaking invites, additional publications/collaborations I wouldn’t have otherwise been involved with, and genuine influence within an academic society. Sharing my presentations, my ideas, my data, and of course my publications has clearly benefited my career, and if anything I’m only likely to go more open with my research in future, rather than less!

As my title alludes to, I’m well aware my stories are just anecdata. This isn’t an objective assessment of the benefits of open science, but the logical basis of the benefits are clear nonetheless: if you don’t share your work, less will know of it. Share freely and openly and you may find yourself with many more beneficial opportunities as a result. Go forth and upload your work today!

Using the NHM Data Portal API

September 30th, 2015 | Posted by rmounce in Content Mining | NHM - (0 Comments)

Anyone care to remember how awful and unusable the web interface for accessing the NHM’s specimen records used to be? Behold the horror below as it was in 2013, or visit the Web Archive to see just how bad it was. It’s not even the ‘look’ of it that was the major problem – it was more that it simply wouldn’t return results for many searches. No one I know actually used that web interface because of these issues. And obviously there was no API.

2013. It was worse than it looks.

2013. It was worse than it looks.

The internal database that the NHM uses is based upon KE Emu and everyone who’s had the misfortune of having to use it knows that it’s literally dinosaur software – it wouldn’t look out of place in the year 1999 and again, the actual (poor) performance of it is the far bigger problem. I guess by 2025 the museum might replace it, if there’s sufficient funding and the political issues keeping it in place are successfully navigated. To hear just how much everyone at the museum knows what I’m talking about; listen to the knowing laughter in the audience when I describe the NHM’s KE Emu database as “terrible” in my SciFri talk video below (from about 3.49 onwards):

Given the above, perhaps now you can better understand my astonishment and sincere praise I feel is due for the team behind the still relatively new online NHM Data Portal at: http://data.nhm.ac.uk/

The new Data Portal is flipping brilliant. Ben Scott is the genius behind it all – the lead architect of the project. Give that man a pay raise, ASAP!

He’s successfully implemented the open source CKAN software, which itself incidentally is maintained by the Open Knowledge Foundation (now known simply as Open Knowledge). This is the same software solution that both the US and UK governments use to publish their open government data. It’s a good, proven, popular design choice, it scales, and I’m pleased to say it works really well for both casual users and more advanced users. This is where the title of post comes in…

The NHM Specimen Records now have an API and this is bloody brilliant

In my text mining project to find NHM specimens in the literature, and link them up to the NHM’s official specimen records, it’s vitally important to have a reliable, programmatic web service I can use to lookup tens of thousands of catalogue numbers against. If I had to copy and paste-in each one e.g. “BMNH(E)1239048manually, using a GUI web browser my work simply wouldn’t be possible. I wouldn’t have even started my project.

Put simply, the new Data Portal is a massive enabler for academic research.

To give something back for all the usage tips that Ben has been kindly giving me (thanks!), I’d thought I’d use this post to describe how I’ve been using the NHM Data Portal API to do my research:

At first, I was simply querying the database from a local dump. One of the many great features of the new Specimen Records database at the Data Portal, is that the portal enables you to download the entire database as a single plain text table: over 3GB’s in size. Just click the “Download” button, you can’t miss it! But after a while, I realised this approach was impractical – my local copy after just a few weeks was significantly out of date. New specimen records are made public on the Data Portal every week, I think!

So, I had to bite the bullet and learn how to use the web API. Yes: it’s a museum with an API! How cool is that? There really aren’t many of those around at the moment. This is cutting-edge technology for museums. The Berkeley Ecoinformatics Engine is one other I know of. Among other things it allows API access to geolocated specimen records from the Berkeley Natural History Museums. Let me know in the comments if you know of more.

The basic API query for the NHM Data Portal Specimen Records database is this:

That doesn’t look pretty, so let me break it down into meaningful chunks.

The first part of the URL is the base URL and is the typical CKAN DataStore Data API endpoint for data search. The second part specifies which exact database on the Data Portal you’d like to search. Each database has it’s own 32-digit GUID to uniquely identify it. There are currently 25 different databases/datasets available at the NHM Data Portal including data from the PREDICTS project, assessing ecological diversity in changing terrestrial systems. The third and final part is the specific query you want to run against the specified database, in this case: “Archaeopteryx”. This is a simple search that queries across all fields of the database, which may be too generic for many purposes.

This query will return 2 specimen records in JSON format. The output doesn’t look pretty to human eyes, but to a computer this is cleanly-structured data and it can easily be further analysed, manipulated or converted.

More complex / realistic search queries using the API

The simple search queries across all fields. A more targeted query on a particular field of the database is sometimes more desirable. You can do this with the API too:

In the above example I have filtered my API query to search the “catalogNumber” field of the database for the exact string “PV P 51007

This isn’t very forgiving though. If you search for just “51007” with this type of filter you get 0 records returned:

So, the kind of search I’m actually going to use to lookup my putative catalogue numbers (as found in the published literature) via the API, will have to make use of the more complex SQL-query style:

This query returns 19 records that contain at least partially, the string ‘51007’ in the catalogNumber field. Incidentally, you’ll see if you run this search that 3 completely different entomological specimen records share the exact same catalogue number: “BMNH(E)251007”:

Thamastes dipterus Hagen, 1858 (Trichoptera, Limnephilidae)

Contarinia kanervoi Barnes, 1958 (Diptera, Cecidomyiidae)

Sympycnus peniculitarsus Hollis, D., 1964 (Diptera, Dolichopodidae)

NHM Catalogue numbers are unfortunately far from uniquely identifying but that’s something I’ll leave for the next post in this series!

Isn’t the NHM Data Portal amazing? I certainly think it is. Especially given what it was like before!

Yesterday, I tried to read a piece of research, relevant to my interests that was published in 1949. Sadly as is usual, I hit a paywall asking me for £30 + tax to read it (I didn’t pay).

Needless to say, the author is almost certainly deceased so I can’t simply email him for a copy.

The paper copy is useless to me, even though my institution probably has one somewhere. I need electronic access. It would probably take me an hour to walk to the library, do the required catalogue searches, find the shelf, find the issue, find the page, re-type the paragraphs I need back into a computer, walk back to my desk etc… That whole paper-based workflow is a non-starter.

I noted the article is available electronically online to some lucky, privileged subscribers – but who? Why is the list of institutions that are privileged enough to have access to paywalled articles not public information? It would be extremely helpful to know what institutions have access to which journals & which journal year ranges.

So I thought I’d do an informal twitter poll of people on twitter about this issue:

I received an overwhelming number of responses. Probably over a hundred in total. Huge thanks to all those who took part.

Given such a brilliant community response it would be remiss of me not to share what I’ve learnt with everyone, not just those who helped contribute each little piece of information. So 24 hours later, here’s what I now know about who can access this 1949 paper (data supporting these statements is permanently archived at Zenodo):

Mounce, Ross. (2015). Data on which institutions have access to a 1949 paper, paywalled at Taylor & Francis. Zenodo.

I’m not pretending the following analysis of the data is rigorous science. It’s not. It’s anecdata about access to a single paper at a single journal (a classic n=1 experiment). Of course it also relies on each contributor correctly reporting the truth, and that some potential responses may have self-censored. The sampling is highly non-random and reflects my social sphere of influence on Twitter; predominately US and UK-centric, although I do have single data points from Brazil & Australia (thanks Gabi & Cameron!). Nevertheless, despite all these provisos it’s highly interesting anecdata:

The United Kingdom of Great Britain and Northern Ireland:
Of responses representing 41 different UK institutions including my own, only 3 have access to this paper, namely: University of Cambridge, University of Oxford, and University of Glasgow.
Had I got more responses from a wider variety of UK HEIs like the University of Lincoln and University Of Worcester where I also have friends, I suspect the overall percentage of UK institutions that have access would be even smaller! I’m particularly amused that it appears that no London-based institution has electronic access to this paper!

North America:
Of responses representing 29 different institutions in Canada and the United States, only 7 have access to the paper, namely: Virginia Tech, University of Illinois, University of Florida, North Carolina State University, Case Western Reserve University, Arizona State University, and McGill University. It’s intriguing that North American institutions appear to have slightly better access to this journal as originally the journal was published in London, England!

The ‘rest of the world’ (not meant in a patronising way):
Of responses representing 23 different institutions not based in the UK, Canada, or the United States, only 2 definitely have access to this paper: Wageningen University and Stockholm University. I note that the person who contributed data on Stockholm University access does not have an official recognised affiliation with Stockholm university and that they used alternative methods *cough* to discover this (just for clarity and to further demonstrate the sampling issues at play here!).

Despite asking far and wide. I only found 11 different institutions that actually have electronic access to this paper, and none from London where the paper was actually published.

I’m fascinated by this data, despite its limitations. I’d like to collect more and collect it more efficiently. Perhaps the librarian community could help by publishing exactly what each institution has access to? Although one conversation thread seemed to indicate that libraries may not even know exactly what they have subscribed to at any one point in time (Seriously? WTF!).

Why is this stuff important to know?

I often hear an old canard from certain people that we don’t need open access because “most researchers have access to all the journals and articles they need”. Sometimes some crap, misleading survey data is trotted-out to support this opinion. Actual data on which actual institutions have actual access to subscription-only research is pivotal to countering this canard. For example, it is extremely useful to point out that institutions like Brock University and University of Montreal do NOT have access to the bundle of Wiley journals.  Particularly at a time when maddeningly many societies have decided to start publishing …with Wiley e.g. the Ecological Society of America! It’s not very joined-up thinking and it’s going to create a lot of pain for a lot of people. Both Montreal & Brock & many other institutions with ecologists do not have access to the big Wiley bundle of journals. I’m sure there are useful examples in other subject areas too of mismatch between subscriptions held & needed access. The solution to this of course is NOT to re-subscribe, but to fix the problem at its source; to fully-recognise that access is a global issue and many people need access to a very wide variety of different journals, that a proper transition to an open access availability model is needed.

If I wait 26 years, it will be available for free in the Biodiversity Heritage Library. I hope I live that long!

What to do next?

If your institution isn’t listed in my dataset so far, please do still try and access this article and let me know if you can or cannot instantly access it via your institutional affiliations from Taylor & Francis.

Given we have researchers coming from all corners of the globe for OpenCon later this year, I will soon explore whether together, as the OpenCon community, we can do something like this on a grander scale to more rigorously document the patchy nature of subscription-provided access.

The final word

I’ll leave the final word, to the obvious ‘elephant-in-the-room’ that I haven’t discussed much so far, they are the 99.99% relative to us privileged institutionally-affiliated lucky-ones. I am very obviously aware of and do care about, independent researchers & readers of the ‘general public’; neither of which can afford subscription-access to most paywalled journals:

Today (2015-09-01), marks the public announcement of Research Ideas & Outcomes (RIO for short), a new open access journal for all disciplines that seeks to open-up the entire research cycle with some truly novel features

I know what you might be thinking: Another open access journal? Really? 

Myself, nor Daniel Mietchen simply wouldn’t be involved with this project if it was just another boring open access journal. This journal packs a mighty combination of novel features into one platform:

  • 1.) RIO will publish research proposals, as well as regular research outputs such as articles, data papers and software – this has never been done by a journal before to my knowledge
  • 2.) RIO will label research outputs with ‘Impact Categories’ based upon UN Millennium Development Goals (MDGs) and EU Societal Challenges, to highlight the real-world relevance of research and to better link-up research across disciplines (see below for some example MDGs).

millenium-development-goals

  • 3.) RIO supports a variety of different types of peer-review, including ‘pre-submission, author-facilitated, external peer-review‘ (new), as well as post-publication journal-organized open peer-review (similar to that pioneered by F1000Research), and ‘spontaneous’ (not journal-organized) post-publication open peer-review which is actively encouraged. All peer-review will be open/public, in keeping with the overall guiding philosophy of the journal to increase transparency and reduce waste in the research cycle. Reviewer comments are highly valuable; it is a waste not to make them public. When supplied, all reviewer comments will be made openly available.
  • 4.) RIO offers flexibility in publishing services and pricing in a bold attempt to ‘decouple’ the traditional scholarly journal into its component services. Authors & funders thus may choose to pay for the publishing services they actually want, not an inflexible bundle of different services, as there is at most journals.
Source: Priem, J. and Hemminger, B. M. 2012. Decoupling the scholarly journal. Frontiers in Computational Neuroscience. Licensed under CC BY-NC

Source: Priem, J. and Hemminger, B. M. 2012. Decoupling the scholarly journal. Frontiers in Computational Neuroscience. Image licensed under CC BY-NC.

 

  • 5.) On the technical side of things, RIO uses an integrated end-to-end XML-backed publication system for Authoring, Reviewing, Publishing, Hosting, and Archiving called ARPHA. As a publishing geek this excites me greatly as it eliminates the need for typesetting, ensuring a smooth and low-cost publishing process. Reviewers can make comments inline or more generally over the entire manuscript, on the very same document and platform that the authors wrote in, much like Google Docs. This has been successfully tried and tested for years at the Biodiversity Data Journal and is a system now ready for wider-use.

 

For the above reasons and more, I’m hugely excited about this journal and am delighted to be one of their founding editors alongside Dr Daniel Mietchen. See our growing list of Advisory and Editorial Board members for insight into who else is backing this new journal – we’ve got some great people on board already! If you’re interested in supporting this initiative please do enquire about volunteering as an editor for the journal, we need more editors to support the broad scale and ambition of journal. You can apply via the main website here.

What is Journal Visibility?

August 28th, 2015 | Posted by rmounce in Publications - (2 Comments)

I’ve just read a paper published in Systematic Biology called ‘A Falsification of the Citation Impediment in the Taxonomic Literature‘.

Having read the full paper many times, including the 64-page PDF supplementary materials file. I’m amazed the paper was published in its current form.

Early on, in the abstract no less, the authors introduce a parameter called ‘journal visibility’. Apparently they ‘correct’ the number of citations for it.

We compared the citation numbers of 306 taxonomic and 2291 non-taxonomic research articles (2009–2012) on mosses, orchids, ciliates, ants, and snakes, using Web of Science (WoS) and correcting for journal visibility. For three of the five taxa, significant differences were absent in citation numbers between taxonomic and non-taxonomic papers.

I count over twenty further instances of the term ‘visibility’ or ‘visable’ in this paper. It is clearly an important part of the work and calculations. But what is it and how did they correct for it? All parameters in reputable scientific papers should be clearly defined, as well as any numerical ‘correction’ operations performed. Yet in this paper I honestly can’t find any given explicit definition of ‘journal visibility’. As Brian O’Meara points out, they define highly visible journals as “those included in WoS and with a good standing”. Good standing is not further defined or scored. No definition is given for what a lowly visible or middlingly visible journal is. All journals indexed in Web of Science are assigned an Impact Factor. Thus ‘included in WoS’ and ‘has Impact Factor’ are two ways of saying the same thing.

For the sake of clarity I will now quote and number all other passages in the paper, aside from the abstract, that mention ‘visibility’ or ‘visible’ (I have highlighted each instance in red):

1 & 2 & 3

In more detail, we address five questions: Does publishing taxonomy harm a journal’s citation performance? Is it within the possibilities of journal editors to influence taxonomy’s visibility? If more high-visibility journals opened their doors to taxonomic publications, would taxonomy’s productivity be sufficient for an increase in the number of taxonomic papers in these journals? Can taxonomy be published by taxonomists only or by a larger community? And finally, would the community use the chance to publish more taxonomic papers in highly visible journals?

4

Just 14 of the 47 journals published both taxonomic and non-taxonomic papers on the focal taxa on a yearly basis in the years 2009–2012 (Table 1). The analyzed taxonomic publications in these 14 journals might have experienced lower visibility than publications in the other 33 journals. This is due to the fact that the average IF 2012 of the 14 journals with both taxonomic and non-taxonomic publications was significantly lower ( 1.16±0.51 standard deviation [SD]) than the average IF of the other 33 journals ( 2.66±1.60 ; Student’s t -test, P<0.001 ).

5

Because of the correction for journal visibility, we consider the results for the 14 journals to be more representative of the citation performance of taxonomic versus non-taxonomic per se than the results for all journals.

6 & 7 & 8

[Section Heading] EDITORS CAN INCREASE THE VISIBILITY OF TAXONOMIC PUBLICATIONS

For strengthening the impact and prospects of taxonomy, equal opportunity is needed for taxonomists and non-taxonomists. In practice, this means that taxonomists should be able to publish in highly visible journals (those included in WoS and with a good standing). Editors of highly visible periodicals that include taxonomy will contribute actively to reducing the taxonomic impediment and, considering our analyses, might on top of this do the best for their journals.

9 & 10

The IF 2012 of these 19 journals that (in principle) publish taxonomy ( 2.61±1.64 ) does, on average, not differ significantly from that of the 14 journals that do not publish taxonomy at all ( 2.73±1.61 ; Student’s t -test, P=0.84 ) meaning that equal visibility for taxonomists and non-taxonomists might, in fact, not be out of reach. In essence, for many editors of highly visible periodicals, it might not so much be a question of changing the scope of their journals but of increasing the frequency of taxonomic publications and thus simply of communicating the willingness to publish taxonomy to the community

11 & 12 & 13

[Section Heading] TAXONOMY’S PRODUCTIVITY WOULD BE SUFFICIENT TO INCREASE THE NUMBER OF PAPERS IN HIGHLY VISIBLE JOURNALS

It is not enough, however, for editors of highly visible journals to actively invite taxonomic contributions. A crucial question about whether increasing taxonomy’s visibility will work is the capacity of taxonomy to follow the invitation. One way to approach this issue is looking at the growth rate of taxonomy

14

To our knowledge, a comprehensive taxonomic literature database is available just for animals, Zoological Record (ZR). For 2012, the latest year considered here, ZR lists 2.1 times more publications on animal taxonomy than WoS (Fig. 2b, c). This indicates that already in the short term, there is sufficient taxonomic publication output for editors of highly visible journals to indeed increase their share in taxonomy.

15

On the whole, the capacity for increased publication of taxonomy in highly visible journals seems to be there. Accepting that the potential exists, there is still a question of whether taxonomy’s flexibility will be sufficient for a change in publication culture to be realized.

16 & 17 & 18 & 19

[Section Heading] THE COMMUNITY WOULD LIKELY USE THE CHANCE TO INCREASE TAXONOMY’S VISIBILITY

… This suggests that taxonomists indeed would use also other chances of publishing in highly visible journals, should the opportunity arise. The resulting shift from aiming at low visibility to targeting highly visible journals will be very important for taxonomists in working toward both an improved image (Carbayo and Marques 2011) and an improved measure of their scientific impact (Agnarsson and Kuntner 2007).

20 & 21 & 22

Editors of highly visible journals in biology could help (i) increase the visibility of taxonomic publications by encouraging taxonomists to publish in their journals (thereby generally not harming but possibly boosting their journals) and (ii) increase total taxonomic output by making it attractive for scientists working in species delimitation (with their primary focus different from taxonomy) to publish the taxonomic consequences of their research.

The task of taxonomic authors, in turn, will be to follow the invitation and to submit indeed their best papers to the best-visible journals available for submission—just as authors of non-taxonomic papers do.

My inferences on visibility

For independent, unbiased confirmation, I looked-up the definition of ‘visibility’ online and found:

Noun

visibility ‎(countable and uncountable, plural visibilities)

  1. (uncountable) The condition of being visible.
  2. (countable) The degree to which things may be seen.

source: https://en.wiktionary.org/wiki/visibility

By the above definition, which is not unreasonable, I would have thought that open access journals would have the highest ‘journal visibility’ as everyone with an internet connection is able to see articles in them without having to login or pay money to view.

Popular subscription access journals like Nature arguably have middling visibility as many scientists have access to them (although not that many actually read all the articles in them, I certainly don’t). Finally, many subscription access journals are known to be less widely subscribed to by both individuals and institutions e.g. Zootaxa (I would love to have data to demonstrate this more objectively, it is certainly true for UK Higher Education Institutions that significantly more subscribe to Nature than to Zootaxa).

I get the feeling that the authors of this paper did not score ‘visibility’ in this manner.

Many of the mentions of ‘visibility’ appear near discussion of Impact Factor (IF). Perhaps the authors mean to suggest that visibility and Impact Factor are one and the same thing or are highly-correlated? No evidence or citation is given to support this idea. I find this conflation of ‘visibility’ and Impact Factor to be simply wrong and dangerously misleading. Why?

Take the visibility of Elsevier journals for instance. They range in Impact Factor from 0 (many journals e.g. Arab Journal of Gastroenterology), to 2 (e.g. Academic Pediatrics), up to 45 (The Lancet). Yet I’d argue the visibility of most Elsevier subscription journals is the same because institutional libraries tend to (be practically forced to) buy Elsevier journals as a bundle – the euphemistically-titled ‘Freedom Collection‘. With the privilege of an institutional affiliation you typically either have access to all the Elsevier journals, including the cruddy ones, or you have access to none of them (in one ARL survey from 2012, 92% of surveyed libraries subscribed to the Elsevier bundle). Unfortunately very few academic libraries opt to subscribe to just a few select Elsevier subscription-only journals, rather than the bundle, MIT is one of the rare exceptions. Thus whether an individual subscription access Elsevier journal has an Impact Factor of 0, 2, 5, or 10 the global visibility of articles in Elsevier journals is relatively similar between different Elsevier journals, except only for the very most popular journals like The Lancet which might have an appreciable number of individual subscribers and institutions that subscribe to the journal without subscribing to the rest of Elsevier’s bundle of journals.

Journals aren’t a good unit of measure anyway – citations, views, downloads and ‘quality’ (broadly-defined) can vary greatly even within the same journal. Articles are a more appropriate unit of measure and we have abundant article-level metrics (ALMs) these days. Let’s not lose sight of that fact.

Surely this article needs correction at the very least? This is more than just a minor linguistic quibble. If the authors mean to say Impact Factor every time they say ‘visible’ or ‘visibility’ why don’t they just do this? Perhaps it is because Impact Factor is so widely and rightly derided, not to mention statistically illiterate (the distribution of journal article citations are well known to be skewed, you shouldn’t take the mean but the median to measure central tendency. The Impact Factor uses the mean in its calculation – oops!) they knew that it wouldn’t be meaningful and so masked it by using ‘visibility’ a weasel-word instead?

This article seems to be asking: Is it within the possibilities of journal editors to influence taxonomy’s visibility Impact Factor.