Show me the data!
Header

My final repost today (edited) from the Open Knowledge Foundation blog. It’s a little old, originally posted on the 16th of April, 2013 but I think it definitely deserves to be here on my blog as a record of my activities…

So… it’s over.

For the past twelve months I was immensely proud to be one of the first Open Knowledge Foundation Panton Fellows, but that has now come to an end (naturally). In this post I will try and recap my activities and achievements during the fellowship.

okfhelsinki

The broad goals of the fellowship were to:

  • Promote the concept of open data in all areas of science
  • Explore practical solutions for making data open
  • Facilitate discussions surrounding the role and value of openness
  • Catalyse the open community, and reach out beyond its traditional core

and I’m pleased to say that I think I achieved all four of these goals with varying levels of success.

 

Achievements:

Outreach & Promotion – I went to a lot of conferences, workshops and meetings during my time as a Panton Fellow to help get the message out there. These included:

Conferences

At all of these I made clear my views on open data and open access, and ways in which we could improve scientific communication using these guiding principles. Indeed I was more than just a participant at all of these conferences – I was on stage at some point for all, whether it was arguing for richer PDF metadata, discussing data re-use on a panel or discussing AMI2 and how to liberate open phylogenetic data from PDFs.

One thing I’ve learnt during my fellowship is that just academic-to-academic communication isn’t enough. In order to change the system effectively, we’ve got to convince other stakeholders too, such as librarians, research funders and policy makers. Hence I’ve been very busy lately attending more broader policy-centred events like the Westminster Higher Education Forum on Open Access & the Open Access Royal Society workshop & the Institute of Historical Research Open Access colloquium.

Again, here in the policy-space my influence has been international not just domestic. For example, my trips to Brussels, both for the Narratives as a Communication Tool for Scientists workshop (which may help shape the direction of future FP8 funding), and the ongoing Licences For Europe: Text and Data Mining stakeholder dialogue have had real impact. My presentation about content mining for the latter has garnered nearly 1000 views on slideshare and the debate as a whole has been featured in widely-read news outlets such as Nature News. Indeed I’ve seemingly become a spokesperson for certain issues in open science now. Just this year alone I’ve been asked for comments on ‘open’ matters in three different Nature features; on licencing, text mining, and open access from an early career researcher point-of-view – I don’t see many other UK PhD students being so widely quoted!

Another notable event I was particularly proud of speaking at and contributing to was the Revaluing Science in the Digital Age invite-only workshop organised jointly by the International Council for Science & Royal Society at Chicheley Hall, September 2012. The splendour was not just in the location, but also the attendees too – an exciting, influential bunch of people who can actually make things happen. The only downside of such high-level international policy is the glacial pace of action – I’m told, arising from this meeting and subsequent contributions, a final policy paper for approval by the General Assembly of ICSU will likely only be circulated in 2014 at the earliest!

 

helsinkiTALK

The most exciting outreach I did for the fellowship were the ‘general public’ opportunities that I seized to get the message out to people beyond the ‘ivory towers’ of academia. One such event was the Open Knowledge Festival in Helsinki, September 2012 (pictured above). Another was my participation in a radio show broadcast on Voice of Russia UK radio with Timothy Gowers, Bjorn Brembs, and Rita Gardner explaining the benefits and motivation behind the recent policy shift to open access in the UK. This radio show gave me the confidence & experience I needed for the even bigger opportunity that was to come next – at very short notice I was invited to speak on a live radio debate show on open access for BBC Radio 3 with other panellists including Dame Janet Finch & David Willetts MP! An interesting sidenote is that this opportunity may not have arisen if I hadn’t given my talk about the Open Knowledge Foundation at a relatively small conference; Progressive Palaeontology in Cambridge earlier that year – it pays to network when given the opportunity!

 

Outputs

The fellowship may be over, but the work has only just begun!

I have gained significant momentum and contacts in many areas thanks to this Panton Fellowship. Workshop and speaking invites continue to roll in, e.g. next week I shall be in Berlin at the Making Data Count workshop, then later on in the month I’ll be speaking at the London Information & Knowledge Exchange monthly meet and the ‘Open Data – Better Society’ meeting (Edinburgh).

Even completely independent of my activism, the new generation of researchers in my field are discovering for themselves the need for Open Data in science. The seeds for change have definitely been sown. Attitudes, policies, positions and ‘defaults’ in academia are changing. For my part I will continue to try and do my bit to help this in the right direction; towards intelligent openness in all its forms.

What Next?

I’m going to continue working closely with the Open Knowledge Foundation as and when I can. Indeed for 6 months starting this January I agreed to be the OKF Community Coordinator, Open Science before my postdoc starts. Then when I’ve submitted my thesis (hopefully that’ll go okay), I’ll continue on in full-time academic research with funding from a BBSRC grant I co-wrote partially out in Helsinki(!) at the Open Knowledge Festival with Peter Murray-Rust & Matthew Wills, that has subsequently been approved for funding. This grant proposal which I’ll blog further about at a later date, comes as a very direct result of the content mining work I’ve been doing with Peter Murray-Rust for this fellowship using AMI2 tools to liberate open data. Needless to say I’m very excited about this future work… but first things first I must complete and submit my doctoral thesis!

I’ve been quoted in a Nature News story about Open Access journal licencing.

I’m a staunch defender of the use of the Creative Commons Attribution licence, as it’s a good licence for academic research.

Here’s just some of what I sent Richard Van Noorden (Nature News) by email. I don’t blame him for only using select quotes. But I do feel much of this provides additional useful context, so I am republishing it here for everyone to read:
—————————————

I believe RCUK want their research publications to be made available under the CC BY licence because it allows *anyone* to re-use them. That specifically includes commercial organisations. This is a good thing. Academic researchers aren’t good at commercializing their research. I for one would be delighted if someone could make money out of my research publications. I already get paid by RCUK to do research. I don’t need more money from licensing royalties on something I could have written 50 years ago (remember copyright law in many jurisdictions has extended protection to the life of the author plus 70 years!). I do research to find new knowledge and help the scientific community and society as a whole. I know many other researchers also have this philosophy about their work. It is a privilege to be given public funds with which to perform exciting research. Furthermore as RCUK fully fund my research, why should *I* have control over access to the outputs of that research? As far as I’m concerned if they funded the work, they have the right to dictate how it is published to ensure maximum benefit as they see fit. Researchers who carry out RCUK funded research have the right to be formally acknowledged as people who made these discoveries, and this is ensured and protected by the BY module. By mandating the CC BY license for gold OA articles, RCUK are ensuring maximum benefit from the money they may pay for the publication of it (but note that not all gold OA journals charge an APC. There are many excellent high-quality fee-free gold OA journals and I would encourage authors to publish in these good outlets).

Obviously, please link to my chart if you wish, the newest version is here:

http://rossmounce.co.uk/2012/09/04/the-gold-oa-plot-v0-2/

You can even republish it if you wish, without even asking my permission. All content on my blog unless otherwise indicated is made available for re-use under the Creative Commons Attribution Licence :)

( I cannot for obvious reasons guarantee that this plot is still correct. Prices change all the time. I have data to show that on average across 97 BMC journals the mean price increase in APCs from 2012 to 2013 prices was just over %5 http://dx.doi.org/10.6084/m9.figshare.105920 )

Furthermore, journals can change the licence under which they publish. I alerted Mike Taylor that Acta Palaeontologia Polonica was not using a Creative Commons licence to publish. He in turn contacted an editor about this, and now the journal happily publishes all new articles under CC BY. Simple as that. Changing licenses is a simple process that costs journals nothing – it is easy to do.

I suspect many free access journals and authors who publish in them would see no problem in granting full open access with CC BY. I suspect they don’t currently do this only because they are not aware of the problems this causes to those that wish to re-use content. Copyright law in many countries and jurisdictions unfortunately requires permission to be sought to re-use works (e.g. textmining, format shifting, printing-off copies for educational use in the classroom) even if they are freely (gratis) available to read on the internet.

This ‘free’ only provides ocular access as Jan Velterop terms it. Open Access as defined by the original Budapest Open Access Initiative statement http://www.opensocietyfoundations.org/openaccess/read

permits any users to “…read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself.”

This statement was recently reaffirmed http://www.opensocietyfoundations.org/openaccess/boai-10-recommendations …with pretty much exactly the same definition as originally given.

Thus only articles made available under licences that are compliant with that definition are truly Open Access. One such licence that is compliant with the definition of Open Access is the Creative Commons Attribution Licence (CC BY) but it may not be the only compliant licence.

The Creative Commons Attribution Non-commercial licence (CC BY-NC) is not compliant with the definition of Open Access because it prevents commercial uses of such licenced material – BOAI clearly states *any* users. Note that even non-profit companies and charities can be prevented from re-using content by this – if there is commerce involved (e.g. donations, advertising) re-use is blocked in this setting. Many people who use this licence think they are just blocking use by for-profit companies but it is much wider than this.

I have a project running at the moment to get the licencing details for the 985 journals featured in Jevin West et al’s recent cost-effectiveness of open access plot http://www.eigenfactor.org/openaccess/index.php These are a selection of just those high-quality (Thompson Reuters JCR ranked) free access journals. The vast majority of these use CC BY. Remember that whilst DOAJ lists 8000+ journals there is little quality control, it is acknowledged that there are some predatory OA journals listed there, and that I certainly don’t have time to investigate 8000+ journals! Thus this set of quality-assured JCR-ranked journals seemed like a fair sample to me.

of those 985 free journals (over 500 of them so far), mostly use CC BY (survey not completed yet, work still in progress) examples:

American Physical Society journals

AOSIS OpenJournals

BioMed Central journals

European Geosciences Union journals

Frontiers journals

Genetics Society of America

Hindawi journals

MDPI journals

Pensoft journals

PLOS journals

SAGE journals

Springer journals

Versita journals

Wiley journals

+ many society & very small publisher journals

Thus whether by number of journal titles, or article volume – CC BY is the most used license. (Given the article volume of BMC + PLOS + Hindawi + MDPI + Frontiers + Pensoft is significant it’s also likely to absolutely dwarf that of the number of articles put out by the non CC BY journals. That’s a safe estimation)

Across these journals the use of CC BY-NC exclusively is rare. Only 19 journals (not including the optional ones where it is offered as a choice of licence) amongst the 620 scored so far. These 19 are mostly Brazilian, which is notable and odd (even though I’ve been doing this alphabetically it’s still significant):

COLLEGE & RESEARCH LIBRARIES (ALA)

BRAZILIAN JOURNAL OF MEDICAL AND BIOLOGICAL RESEARCH

Revista Brasileira de Psiquiatria

Sao Paulo Medical Journal

South African Journal of Surgery

Brazilian Journal of Biology

BMJ Open

Acta Botanica Brasilica

Horticultura Brasileira

BRAZILIAN JOURNAL OF CHEMICAL ENGINEERING

Revista Brasileira de Ci ncia do Solo

Revista Brasileira de Fruticultura

Brazilian Journal of Infectious Diseases

Journal of the Brazilian Society of Mechanical Sciences and Engineering

International Brazilian Journal of Urology

BRAZILIAN JOURNAL OF MICROBIOLOGY

Jornal de Pediatria

Revista Brasileira de Pol’tica Internacional

Revista Brasileira de Farmacognosia/ Brazilian Journal of Pharmacognosy

CC BY-NC-ND users:

DRUGS IN R&D

Journal of Toxicologic Pathology

NATL INST SCIENCE COMMUNICATION journals (Indian, 10 of them)

CC BY-NC-SA users:

CBE Life Sciences Education

Journal of Engineering Technology

Journal of Microbiology & Biology Education

mBio

Medknow journals (14 journals)

Sadly, there are also a significant number of journals that do not indicate any kind of Creative Commons license. One such alarming one is the CDC journal ‘Emerging Infectious Diseases’. It is lamentable that content in this important free-to-read medical research journal requires permission to be sought to re-use and/or textmine. In these ambiguous re-use cases one must assume the default state of “All Rights Reserved” even though the PDF is free (gratis) to view, for anything else permission must be sought.

source data: https://docs.google.com/spreadsheet/ccc?key=0AtbO6mZEvieCdExBQm9UclBSaWxlMWVNelVDMHFnSkE#gid=0

There are many examples of such unintended problems caused by the NC license module detailed in this excellent publication recently translated from the original German by members of the Open Knowledge Foundation:

Consequences, Risks, and side-effects of the license module Non-Commercial – NC

Such NC content cannot be used in Wikipedia or newspapers

Educators that charge their pupils fees cannot use NC content without permission

CC BY-NC is incompatible with CC BY-SA content. No mashups, remixes, or combinations of these (and btw Wikipedia publishes its content under CC BY-SA so incompatibility is a BIG PROBLEM). CC BY content is compatible with CC BY-SA.

Many blogs are ad-supported, these generate income and thus no matter how little are classed as commerce and thus NC content cannot be reused without permission here either.

“It is also commercial use if an image is printed in a book that is published by a publishing house, entirely independent of whether the author receives a remuneration or possibly even has to pay a printing fee to make the publication possible. The publishing house acts with a commercial interest in either case.”

“…NC restrictions are most minutely heeded where their consequences are least intended.”

“Am I ready to act against the commercial use of my content? If not, you should consider not to use the NC module in the first place”

See also this Zookeys paper for problems with NC: http://www.pensoft.net/journals/zookeys/article/2189/abstract

 

The Ecological Society of America (ESA) would like your input on how to expand access to their publications and what they should do if *gasp* the USA also mandates some form of public or open access …like the rest of the world seems to be doing at the moment.

The official call is here in this new free to access ESA publication (at the end):

Collins, S., Goldberg, D., Schimel, J., and McCarter, K. 2013. ESA and scientific Publishing—Past, present, and pathways to the futureBulletin of the Ecological Society of America 94:4-11.

You should probably read it all, so you understand their position and their misgivings before you email them with your ideas at: pubsfeedback@esa.org

 

Well done ESA. It’s nice to know they are aware of the inevitable changes that are happening in the world of academic publishing. They haven’t exactly been to receptive to the idea of Open Access before but now they seem to be acknowledging that it might be thrust upon them whether they like it or not and so need to prepare for it. I only wish all learned societies were doing this (I know we at the Systematics Association have plans, and that the Geological Society of London have a working group on this).

 

Here’s the email that I sent to them on Wednesday 23rd January (UK time). Proof, just in case they pretend they didn’t receive it *wink*

(N.B. I’ve recycled much of this from my House of Lords inquiry submission. Why not? Takes a lot of effort to write a detailed letter of support for Open Access! I’ll be damned sure to get some usage & re-usage out of it!)

 

 

Dear Ecological Society of America,

I read your special report ESA and Scientific Publishing—Past, Present, and Pathways to the Future with great interest. I wholeheartedly agree that the “world of scientific publishing is undergoing dramatic changes” at the moment – the internet clearly allows for extremely low-cost, efficient and open dissemination of research.

Currently there are huge inequalities in access to scholarly outputs (not just papers, but data & software too). My research library at the University of Bath can only afford to subscribe to so many subscription access journals – very far from all of them. But for myself and my colleagues to do high-quality, high-impact, definitive research we frequently need access to materials we don’t have either free/Open Access, or quick paid-subscription access to. In these cases myself and colleagues often spend hugely-wasteful lengths of time trying to get copies of these must read materials that are buried behind paywalls we can’t unlock.

The alternative options for access to paywall-restricted papers are poor and inefficient; inter-library loans can take days or weeks. Relatively few researchers currently post full-text self-archived copies of their own work in ‘green’ online repositories (although perhaps more might do so in the future). Electronic inter-library loans from the British Library can only be printed-once – if an error occurs during printing – tough luck, you’ll only ever have half a print version.

Sympathetic colleagues at different institutions with different journal access rights pass each other PDFs all the time – technically this is copyright infringement. Yet these small acts of academic copyright infringement are rampant online if you know where to look and are often the only way to sensibly and efficiently get research done. Buying additional legal access is simply not affordable nor desirable at the outrageous prices often offered – and sometimes only upon inspection of the fulltext does one find that the paper isn’t actually of use and can be discarded.

Many different peer-reviewed papers have shown that Open Access research has a higher citation rate than its paywall-protected ‘Closed Access’ counter-parts [e.g. 1-8]. Making ESA published research 100% Open Access would reasonably therefore confer some of this effect and increase the already impressive global impact of this research.

As you know the UK is far from alone in strongly pursuing Open Access means of research dissemination. The NIH Public Access mandate requires that all NIH-funded research publications are accessible to the public (world-wide) via the PubMed Central repository no later than 12 months after publication. In Australia, both NHMRC & ARC have Open Access policies in place. In fact if one looks closely enough one will see a litany of national research funders that already have open access mandates in place ArgentinaDenmarkAustriaBelgium, as well as innumerable policies at the university/institution level e.g. the Howard Hughes Medical Institute , Wellcome Trust, and even my own institution – the University of Bath (important to mention, because not all UK university research is funded by RCUK).

In particular I think we should note the way in which the SciELO Network has provided sustainable free access to over a thousand South American, Latin American, and (more recently) African research journals via the internet. It is ethically awkward that ‘they’ provide access to so much of ‘their’ research to us for free whilst we often charge them for access to ‘our’ research (many institutions do NOT receive charitably given access via HINARI ). This is an asymmetrical access imbalance that sorely needs to be corrected.

 

Learned Societies and Open Access

Learned societies heavily-reliant on subscription journal income and concerned with how public/open access policies may affect them should closely examine the workings of other societies that have successfully operated open access journals for many years. West and colleagues [9] provide robust data showing hundreds of society-operated gold Open Access journals with good citation impact at either no-cost to authors, or for a usually reasonable APC (article processing charge).

Good examples include the Journal of Economic Perspectives (of the American Economic Association) – not only do they charge nothing to authors (APC=$0) and provide free access to readers, but also Thomson Reuters Journal Citation Reports (JCR) ranks this as the 5th best journal in Economics out of 321 listed. It is influential and extremely well cited.

The journal Acta Veterinaria Scandinavica is a remarkable success story of society journals (it’s the official journal of the Veterinary Associations of the Nordic Countries). From 2000 to 2005 it was subscription-access only and was dwindling in impact and citations. In 2006 they changed to Open Access publishing with BioMed Central and now enjoy significantly increased impact and citations for the research published there.A plot of the Impact Factor of the journal Acta Veterinaria Scandinavica over time, showing a marked increase after switching to Open Access publishing. Source. Author: BioMed Central. Image licensed under the Creative Commons Attribution 3.0 Unported license

The European Geological Society (EGU) publishes 14 different gold Open Access journals with the help of Copernicus Publishing. One of these in particular – Atmospheric Chemistry and Physics has been hugely successful and through high citation rate is now ranked the 2nd best journal of 71 in the category “Meterology & Atmospheric Sciences” in Thomson Reuters JCR. It happily publishes articles using the Creative Commons Attribution Licence (CC BY) and charges a fair, variable APC that is cheaper for those who submit manuscripts in LaTeX form – reflecting the ease of which it is to convert such manuscripts into publishable forms. Microsoft Word submissions require more processing and thus they charge more (reflecting real cost). It is commendable that they expose, and make avoidable some of the labor costs of typesetting this way.

Furthermore, I’d bet there are many different societies operating subscription access journals that already allow self-archiving of published works so that they’d be compliant with the ‘Green’ OA route which the RCUK policy also allows. This would seem to me to be a fairly pain-free way of complying with the policy should ESA wish to do so via this route.

Overall, I think it would be fairer for all societies to publish associated journals in an Open Access manner – whilst clearly delivering on their core mission(s) of educating the world (not just a few subscribers) about their subject. Relying on denying access to research via paywalls to provide surplus income with which to spend on outreach and other activities that further the society mission, seem to me like a very convoluted justification and an inefficient way of achieving outreach goals. Put simply, Open Access very clearly fulfils many of the core purposes of learned societies and provides an open platform with which to build outreach around.

 

Finally, I would like to respond to some specific points that you mentioned in ESA and Scientific Publishing—Past, Present, and Pathways to the Future. 

  • “Will publishers need to invest heavily in their online platforms to meet gold requirements?”

Categorically, no. The current system of maintaining a sophisticated paywall, with login access only for paid-subscribers must be far more expensive to maintain and police than a simple, un-paywalled system whereby anyone can download articles. You already publish Ecosphere in a free to access manner which clearly shows you have the technology already in place to do this, so why suggest it would take heavy investment? Furthermore for societies that lack establish open access publishing systems there are plenty of cost-free (software-wise) robust solutions like Open Journal Systems that is already used successfully by over 11,000 journals (both Open Access & subscription journals!).

  • “Most publishers, including ESA, currently operate under a Creative Commons CC-BY-NC license for open-access publications…”

This is simply not true. Relatively few publishers and journals use this licence e.g. Jornal de Pediatria (a Brazilian journal). In fact, the majority of Open Access journals listed in Thompson Reuters JCR use the Creative Commons Attribution Licence (CC BY). Don’t believe me? Look at the data yourselves here. It’s the licence that BMC, Springer Open, PLoS, Hindawi, MDPI, Versita, Frontiers, Copernicus, Ubiquity Press, Pensoft, American Physical Society, and some Nature Publishing Group, Wiley & Sage open access journals use. So by counting publisher, journal or article-volume it’s definitely the most common Creative Commons licence used to publish scientific research. It’s common for very good reasons, not least that the non-commercial NC-clause can obstruct textmining analyses, and prevents the content from being re-used in Wikipedia.

  • You use an argument that ‘the “shelf life” of ecology research tends to be much longer than for medically oriented sciences

Whilst I don’t wish to disagree with you on this, I think you need not compare yourself to such a niche area of STM publishing. Take for example Palaeontology. I collected data recently to show that the mean age of a cited paper in a typical palaeontology article was >18 years! Yet in palaeontology there are plenty of successful high-impact open access journals and many which allow the green route of open access after a relatively short embargo period. If short (6 or 12 month) embargo periods don’t affect the income of subscription access palaeontology journals, why would it cause harm to ESA journals to allow this? I feel you fear something that won’t actually happen.

  • I strongly doubt that if you allowed a ‘green’ friendly route to Open Access, with a 6 month embargo as allowed by the RCUK policy, that you’d lose much subscription revenue.

Statistics from the Romeo/SHERPA database that tracks green OA policies shows that 60% of journals allow immediate self-archiving of the full-text of research papers, with a further 27% permitting the submitted version (pre-print) to be archived immediately. Only 13% of journals do not allow immediate archiving. There remains little convincing evidence that short embargo periods seriously harm library subscription revenue.

So if I were ESA, I’d probably look into the green OA route as a relatively pain-free / hassle-free way of expanding public access to research.

 

Regards,

 

Ross Mounce,

PhD Student at the University of Bath  & Open Knowledge Foundation Panton Fellow

http://about.me/rossmounce

 

References

 

1. Lawrence, S. 2001. Free online availability substantially increases a paper’s impact. Nature 411:521 http://dx.doi.org/10.1038/35079151

2. Xia, J. and Nakanishi, K. 2012. Self-selection and the citation advantage of open access articles. Online Information Review 36:40-51.http://www.emeraldinsight.com/journals.htm?articleid=17004555&show=html  [the OA citation advantage is more pronounced for 'smaller' journals]

3. Xia, J., Myers, R. L., and Wilhoite, S. K. 2011. Multiple open access availability and citation impact. Journal of Information Science 37:19-28.http://dx.doi.org/10.1177/0165551510389358 [More copies available in different places, more citations...]

4. Riera, M. and Aibar, E. 2012. Does open access publishing increase the impact of scientific articles? an empirical study in the field of intensive care medicine. Medicina intensiva / Sociedad Espanola de Medicina Intensiva y Unidades Coronarias.http://dx.doi.org/10.1016/j.medin.2012.04.002

5. Norris, M., Oppenheim, C., and Rowland, F. 2008. The citation advantage of open-access articles. J. Am. Soc. Inf. Sci. 59:1963-1972.http://dx.doi.org/10.1002/asi.20898

6. Eysenbach, G. 2006. Citation advantage of open access articles. PLoS Biol 4:e157+. http://dx.doi.org/10.1371/journal.pbio.0040157

7. Hajjem, C., Harnad, S., and Gingras, Y. 2006. Ten-Year Cross-Disciplinary comparison of the growth of open access and how it increases research citation impact. http://arxiv.org/abs/cs.DL/0606079

8. Gargouri, Y., Hajjem, C., Larivière, V., Gingras, Y., Carr, L., Brody, T., and Harnad, S. 2010. Self-Selected or mandated, open access increases citation impact for higher quality research. PLoS ONE 5:e13636+. http://dx.doi.org/10.1371/journal.pone.0013636

9. West, J., Bergstrom, T. and Bergstrom, C. T. 2013. Cost-effectiveness of open access publications

 

Here’s my submission for the House of Lords inquiry. I rather ran out of steam writing it so you’ll see it tails off towards the end. There’s probably loads of things I should mention too. But alas, I have lots of other work to be getting on with right now. Ironically, I highlight the excellent journal Impact Factor‘s of some OA journals. Please forgive me for those sins! So here it is:

 

17/01/2012 Author: Ross Mounce, final year PhD Student at University of Bath & Open Knowledge Foundation Panton Fellow email: rcpm20@bath.ac.uk

 

This submission is an individual contribution but I think it may be indicative of the opinion of many in the scientific research community. Of particular relevance to this inquiry I should state my research funding is from BBSRC, I am engaged in content mining research (which is commonly hampered by copyright/legal issues with respect to non-Open Access research), and I am a council member of The Systematics Association (a UK-based learned society that publishes academic works with CUP).

 

Background

 

  1. On the whole I was extremely pleased when the Finch Report came out and even more so when RCUK announced it was going to implement most if not all of the recommendations. I, and most of my colleagues strongly believe that taxpayer-funded research such as that given out by RCUK should be made openly available to everyone in the world to read and to use for whatever purpose (Open Access).
  2. Currently there are huge inequalities in access to scholarly outputs (not just papers, but data & software too). My research library at the University of Bath can only afford to subscribe to so many subscription access journals – very far from all of them. But for myself and my colleagues to do high-quality, high-impact, definitive research we frequently need access to materials we don’t have either free/Open Access, or quick paid-subscription access to. In these cases myself and colleagues often spend hugely-wasteful lengths of time trying to get copies of these must read materials that are buried behind paywalls we can’t unlock.
  3. The alternative options for access to paywall-restricted papers are poor and inefficient; inter-library loans can take days or weeks. Relatively few researchers currently post full-text self-archived copies of their own work in ‘green’ online repositories (although perhaps more might do so in the future). Electronic inter-library loans from the British Library can only be printed-once – if an error occurs during printing – tough luck, you’ll only ever have half a print version.
  4. Sympathetic colleagues at different institutions with different journal access rights pass each other PDFs all the time – technically this is copyright infringement – we have a system that appears to criminalise attempts to do comprehensive and diligent research. Yet these small acts of academic copyright infringement are rampant online if you know where to look and are often the only way to sensibly and efficiently get research done. Buying additional legal access is simply not affordable nor desirable at the outrageous prices often offered – and sometimes only upon inspection of the fulltext does one find that the paper isn’t actually of use and can be discarded.
  5. Many different peer-reviewed papers have shown that Open Access research has a higher citation rate than its paywall-protected ‘Closed Access’ counter-parts [e.g. 1-8]. Making RCUK research 100% Open Access should reasonably therefore confer some of this effect on our research and increase our already impressive global impact, particularly if we are one of the first big research nations to embrace this, rather than the last.
  6. But the UK is far from alone in strongly pursuing Open Access means of research dissemination. The NIH Public Access mandate requires that all NIH-funded research publications are accessible to the public (world-wide) via the PubMed Central repository no later than 12 months after publication. In Australia, both NHMRC & ARC have Open Access policies in place. In fact if one looks closely enough one will see a litany of national research funders that already have open access mandates in place Argentina, Denmark, Austria, Belgium, as well as innumerable policies at the university/institution level e.g. the Howard Hughes Medical Institute , Wellcome Trust, and even my own institution – the University of Bath (important to mention, because not all UK university research is funded by RCUK).
  7. In particular I think we should note the way in which the SciELO Network has provided sustainable free access to over a thousand South American, Latin American, and (more recently) African research journals via the internet. It is ethically awkward that ‘they’ provide access to so much of ‘their’ research to us for free whilst we often charge them for access to ‘our’ research (many institutions do NOT receive charitably given access via HINARI ). This is an asymmetrical access imbalance that sorely needs to be corrected.

 

On Learned Societies

 

  1. Learned societies heavily-reliant on subscription journal income and concerned with how the RCUK policy may affect them should closely examine the workings of other societies that have successfully operated open access journals for many years. West and colleagues [9] provide robust data showing hundreds of society-operated gold Open Access journals with good citation impact at either no-cost to authors, or for a usually reasonable APC.
  2. Good examples include the Journal of Economic Perspectives (of the American Economic Association) – not only do they charge nothing to authors (APC=0) and provide free access to readers, but also Thomson Reuters Journal Citation Reports (JCR) ranks this as the 5th best journal in Economics out of 321 listed. It is influential and extremely well cited.
  3. The journal Acta Veterinaria Scandinavica is a remarkable success story of society journals (it’s the official journal of the Veterinary Associations of the Nordic Countries). From 2000 to 2005 it was subscription-access only and was dwindling in impact and citations. In 2006 they changed to Open Access publishing with BioMed Central and now enjoy significantly increased impact and citations for the research published there.A plot of the Impact Factor of the journal Acta Veterinaria Scandinavica over time, showing a marked increase after switching to Open Access publishing. Source. Author: BioMed Central. Image licensed under the Creative Commons Attribution 3.0 Unported license 
  4. The European Geological Society (EGU) publishes 14 different gold Open Access journals with the help of Copernicus Publishing. One of these in particular – Atmospheric Chemistry and Physics has been hugely successful and through high citation rate is now ranked the 2nd best journal of 71 in the category “Meterology & Atmospheric Sciences” in Thomson Reuters JCR. It happily publishes articles using the Creative Commons Attribution Licence (CC BY) and charges a fair, variable APC that is cheaper for those who submit manuscripts in LaTeX form – reflecting the ease of which it is to convert such manuscripts into publishable forms. Microsoft Word submissions require more processing and thus they charge more. It is commendable that they expose, and make avoidable some of the effort costs of typesetting this way.
  5. Furthermore, I’d bet there are many different societies operating subscription access journals that already allow self-archiving of published works so that they’d be compliant with the Green OA route which the RCUK policy also allows (with additional leniency on the humanities, allowing a 12 month embargo). This would seem to me to be a fairly pain-free way of complying with the policy should they wish to (N.B. Learned societies are not obligated to comply with this policy, although you would think if it was a British society it might be in their best interests. It is the researchers that must comply).
  6. I am concerned for some UK learned societies that from their annual financial reports seem to indicate they are rather reliant on subscription-journal income to support their societies financially. I am not privy to the exact details of whether society subscription-journal income is ‘ringfenced’ away from supporting the other activities & perks of a societies’ membership. I hope it is. Otherwise I worry that perhaps some learned societies maybe using the surplus from the subscription-access journal income (paid for by libraries/institutions/universities world-wide) and spending this surplus on personal society member-only perks e.g. a free hardcopy paper newsletter only delivered to personal members. I have examined annual report accounts of some learned society accounts myself and find that where the money/surplus goes to be rather opaque in some cases.
  7. It appears that many societies have been operating a consistent and healthy surplus from their subscription-access journals and using this surplus to expand their outreach activities and member perks – free pens, paper, mugs, USB sticks and heavily discounted student memberships. I myself have greedily taken many of these membership benefits, and know that I have received goods and services that far exceed the cost of the small, hugely subsidized membership fee I paid. All this would be okay if it was only members paying for other (younger) members – self-sustainability. But I am increasingly concerned about the asymmetry of fees and benefits provided by some learned societies. Surely a significant portion of journal subscription income is from institutional subscriber agreements? Institutions are very rarely members of learned societies, and institutionally the only benefit they get from these fees paid is institutional access to subscription-only society journals. Yet the surplus from subscription income at societies doesn’t seem to be given back except to members through perks and the organisation of outreach events and such.
  8. Therefore I think it would be fairer for a society to publish any associated journals in an Open Access manner and concentrate on being financially self-sustaining – whilst clearly delivering on their core mission(s) of educating the world about their subject. Relying on denying access to research via paywalls to provide surplus income with which to spend on outreach to further their mission, seems like a very convoluted argument and an inefficient way of achieving their aims. Put simply, Open Access very clearly fulfils many of the core purposes of learned societies and provides an open platform with which to build outreach around.

 

Arrangements for APC funds

 

  1. As I’m sure many will cite, most gold open access journals listed in the Directory of Open Access Journals (DOAJ) are fee free. They do not charge an APC. Of those that do, the average APC is just $906 (Solomon & Bjork, 2012). There is no strong relationship between the APC cost of gold open access journals and their article level impact [9]. Intuitively this makes sense – if I submitted my work to Nature, or I submitted my work to the Panamanian Journal of Ichthyology (a fictional journal) the work, if published, would essentially be the same – journal ‘brand’ is just a label, it doesn’t change anything – especially not the quality of peer review. In terms of citations, solid evidence supports this intuition – since 1990 the relationship between Impact Factor (citations to a journal) and article-level citations has significantly weakened [10]. To put it another way – good research gets read and cited no matter where it’s published.
  2. I’m aware there are concerns in the Humanities and Social Sciences about Open Access and APCs. I don’t know why there aren’t more Open Access journals in these disciplines. There’s nothing technologically preventing a surfeit of new Open Access journals from forming. Good, well tested solutions like Open Journal Systems are free to implement (no software cost) and are used by over 11,000 journals world-wide. The implementation only needs bandwidth-cost support and the same human time/effort required to run a subscription access journal, which I’m sure institutions should be made willing to help with. Stuart Shieber gives an excellent description of how costs are managed at the Journal of Machine Learning Research. Here academics volunteer time, with the help of a little institutional support to produce a high-quality, high-impact peer-reviewed research journal that costs just $6.50 per paper to run.
  3. I would urge the House of Lords to look into how universities and libraries could be encouraged to help British academics create new, efficient, low-cost, peer-reviewed research journals. Martin Eve for one appears to have no trouble doing this. It need not even necessarily require additional cash-injection, just IT-support and the use of institutional bandwidth & servers to host Open Access journals. Willingness to try, rather than just moan about change is also required.
  4. Above all, academics in all areas need to consider and be made aware of the huge variety of open access publishing options available to them. The big commercial publisher brands may be the most well-known in some areas, and they spend significant marketing budgets on ensuring this. Unfortunately these commercial publishers also offer some of the most eye-wateringly expensive gold Open Access options. We need to incentivize and ensure a ‘value-for-money publishing’ mentality, and to discourage academics away from these expensive ‘hybrid’ OA options. It would be good to set a hard limit on the amount of cash that RCUK would be willing to pay for an APC for any one publication. Otherwise it might encourage some publishers to further indulge in price-gouging.
  5. I am glad that RCUK is supporting gold open access and green open access routes. I fail to see how green alone would work out in the end – it does not provide peer review. ‘Overlay’ peer-review services external to journal publishers operating on pre-print servers are a nice idea, but I’m not sure this model of publishing will gain traction or acceptance in academia, not for a while at least. Therefore to continue to build-on and support low-cost journals I think it is good that RCUK is encouraging the gold open access route.

 

Embargo periods

 

  1. I don’t have much to say about embargo periods. Only that I’ve seen some interesting arguments used against short embargo periods in the humanities e.g. history. One such argument used was that the ‘citation half-life’ was very long in History and therefore a short embargo period would harm this discipline more than in the sciences. Yet I know that in Palaeontology, the citation half-life of papers as you might imagine is also very long – yet there are few such concerns about embargo periods or the effect of Open Access in this discipline. I recently gathered data and found that the mean-age of cited papers in palaeontology is roughly >18 years. Therefore I don’t ‘buy’ this long-tail usage argument as it equally applies in other disciplines that appear to have no problem with open access, green or gold.

References

 

1. Lawrence, S. 2001. Free online availability substantially increases a paper’s impact. Nature 411:521 http://dx.doi.org/10.1038/35079151

2. Xia, J. and Nakanishi, K. 2012. Self-selection and the citation advantage of open access articles. Online Information Review 36:40-51.http://www.emeraldinsight.com/journals.htm?articleid=17004555&show=html  [the OA citation advantage is more pronounced for 'smaller' journals]

3. Xia, J., Myers, R. L., and Wilhoite, S. K. 2011. Multiple open access availability and citation impact. Journal of Information Science 37:19-28.http://dx.doi.org/10.1177/0165551510389358 [More copies available in different places, more citations...]

4. Riera, M. and Aibar, E. 2012. Does open access publishing increase the impact of scientific articles? an empirical study in the field of intensive care medicine. Medicina intensiva / Sociedad Espanola de Medicina Intensiva y Unidades Coronarias.http://dx.doi.org/10.1016/j.medin.2012.04.002

5. Norris, M., Oppenheim, C., and Rowland, F. 2008. The citation advantage of open-access articles. J. Am. Soc. Inf. Sci. 59:1963-1972.http://dx.doi.org/10.1002/asi.20898

6. Eysenbach, G. 2006. Citation advantage of open access articles. PLoS Biol 4:e157+. http://dx.doi.org/10.1371/journal.pbio.0040157

7. Hajjem, C., Harnad, S., and Gingras, Y. 2006. Ten-Year Cross-Disciplinary comparison of the growth of open access and how it increases research citation impact. http://arxiv.org/abs/cs.DL/0606079

8. Gargouri, Y., Hajjem, C., Larivière, V., Gingras, Y., Carr, L., Brody, T., and Harnad, S. 2010. Self-Selected or mandated, open access increases citation impact for higher quality research. PLoS ONE 5:e13636+. http://dx.doi.org/10.1371/journal.pone.0013636

9. West, J., Bergstrom, T. and Bergstrom, C. T. 2013. Cost-effectiveness of open access publications

10. Lozano, G. A. , Lariviere, V. and Gingras Y. 2012. The weakening relationship between the Impact Factor and papers’ citations in the digital age http://arxiv.org/abs/1205.4328v1

 

So a week ago, I investigated publisher-produced Version of Record PDFs with pdfinfo and the results were very disappointing. Lots of missing metadata was found and one could not reliably identify most of these PDFs from metadata alone, let alone extract particular fields of interest.

But Rod Page kindly alerted to me the fact that I might be using the wrong tool for this investigation. So at his suggestion I’ve tried again to extract metadata from the exact same set of PDFs as last time…

Only this time I’ll be using exiftool version 9.10.

This time I’ve put the full raw metadata output from exiftool on figshare for each and every PDF file, just to really prove the point, reproducible research and all. I’d love to post the corresponding PDFs too but sadly many of them are not Open Access and this thus prevents me from uploading them to a public space.   **Insert timely comment here about how closed access publications stifle effective research practices…**

Exiftool is really simple to use. You just need type:
exiftool NameOfPDF.pdf
to get a human-readable exhaustive output of all possible metadata.

and
exiftool -b -XMP NameOfPDF.pdf
to get XML-structured metadata. I could only extract this from 56 of the 69 PDF files. The data output from this for those 56 PDFs is available as a separate fileset on figshare here.

Finally, if you want to test a whole bunch of PDF files in your working directory I’ve made a simple shell script that loops through all PDFs in your working directory, available here (oops, it’s not data, perhaps I should have put that on github instead?). [I'm sure many readers will be able to create a simple bash loop themselves but just for those that don't...]

 

I’m assuming that the reason exiftool -b -XMP failed on 13 of those PDFs is because they have no embedded XMP metadata – an empty (zero-byte sized) file is created for these. This is an assumption though… I notice that those 13 exactly correspond with all the 13 that were produced with iText. I checked the website and I’m pretty sure iText 2.x and up can embed XMP metadata, it’s just whether the publishers have bothered to use & apply this functionality.

So if I’m right, neither Taylor & Francis, BRILL, nor Acta Palaeontology Polonica embed XMP metadata (at all!) in their PDFs. The alternative explanation is that the XMP metadata is in there but exiftool for whatever reason can’t read/parse it from iText produced PDFs. I find this an unlikely alternative explanation though tbh.

Elsevier have superior XMP metadata to everyone else by the looks of it, but Elsevier aside the metadata is still very poor, so my conclusions from last week’s post still stand I think.

Most of the others do contain metadata (of some sort) but by and large it’s rather poor. I need to get some other work done on Monday so I’m afraid this is where I’m going to leave this for now. But I hope I’ve made the point.

Further angles to explore

Interestingly Brian Kelly, has taken this a slightly different direction and looked at the metadata of PDFs in institutional repositories. I hadn’t realised this but apparently some institutional repositories (IRs) universally add cover pages to most deposits. If this is done without care for the embedded metadata, the original metadata can be wiped and/or replaced with newer (less informative) metadata.  Not to mention that cover pages are completely unnecessary -> all the information on a cover page is exactly the kind of stuff that should be put in embedded metadata! No need to waste time and space by putting that info as the first page. JSTOR does this too (cover pages) and it annoys the hell out of me.

After some excellent chat on Twitter about this IR angle I’ve discovered that UKOLN based here on campus at Bath have also done some interesting research in this area, in particular the FixRep project which is described in more detail here. CrossRef labs pdfmark tool also looks like something of interest towards fixing poor quality metadata PDFs. I’ve got this installed/compiled from the source on github but haven’t tried it out yet. It would be interesting to see the difference it makes – a before and after comparison of metadata to see what we’re missing… But why should we fix a problem that shouldn’t exist in the first place? Publishers are the point of origin for this. It’s their job to be the first to publish the Version of Record. They should provide the highest level of metadata possible IMO.

 

Why would publishers add metadata?

Because their customers – libraries, governments, research funders (in the case of Open Access PDFs ) should demand it. A pipe dream perhaps but that’s my $.02.  I would ask for a refund if I downloaded MP3′s from iTunes/Amazon MP3 with insufficient embedded metadata. Why not the same principle for electronically published PDFs?

 

PS Apologies for some of the very cryptic filenames in the metadata uploads on figshare. You’ll have to cross-match with this list here or the spreadsheet I uploaded last week to work out which metadata file corresponds to which PDF/Bibliographic Data record/Publisher.

Publisher Identifier Journal Contains embedded XMP metadata? Filename
American Association for the Advancement of Science Ezard2011 Science yes? ezard_11_interplay_759293.pdf
American Association for the Advancement of Science Nagalingum2011 Science yes? nagalingum_11_recent_719133.pdf
American Association for the Advancement of Science Rowe2011 Science yes? Science-2011-Rowe-955-7.pdf
Blackwell Publishing Ltd Burks2011 Cladistics yes? burks_11_combined_694888.pdf
Blackwell Publishing Ltd Janies2011 Cladistics yes? janies_11_supramap_779773.pdf
Blackwell Publishing Ltd Simmons2011 Cladistics yes? simmons_11_deterministic_779537.pdf
BRILL Barbosa2011 Insect Systematics & Evolution no barbosa_11_phylogeny_779910.pdf
BRILL Dellape2011 Insect Systematics & Evolution no dellape_11_phylogenetic_779909.pdf
Cambridge Journals Online Knoll2010 Geological Magazine yes? knoll_10_primitive_475553.pdf
Cambridge Journals Online Saucede2007 Geological Magazine yes? thomas_saucegraved_07_phylogeny_506869.pdf
CSIRO Chamorro2011 Invertebrate Systematics yes? chamorro_11_phylogeny_780467.pdf
CSIRO Daugeron2011 Invertebrate Systematics yes? daugeron_11_phylogenetic_780466.pdf
CSIRO Johnson2011 Invertebrate Systematics yes? johnson_11_collaborative_750540.pdf
Elsevier Lane2011 Molecular Phylogenetics and Evolution yes E3-1-s2.0-S1055790311001448-main.pdf
Elsevier Cunha2011 Molecular Phylogenetics and Evolution yes E2-1-s2.0-S1055790311001680-main.pdf
Elsevier Spribille2011 Molecular Phylogenetics and Evolution yes E1-1-s2.0-S1055790311001606-main.pdf
Frontiers In Horn2011 Frontiers in Neuroscience yes? fnins-05-00088.pdf
Frontiers In Ogura2011 Frontiers in Neuroscience yes? fnins-05-00091.pdf
Frontiers In Tsagareli2011 Frontiers in Neuroscience yes? fnins-05-00092.pdf
Hindawi Diniz2012 Psyche: A Journal of Entomology yes? 79139500.pdf
Hindawi Restrepo2012 Psyche: A Journal of Entomology yes? 516419.pdf
Hindawi Savopoulou2012 Psyche: A Journal of Entomology yes? 167420.pdf
Institute of Paleobiology, Polish Academy of Sciences Amson2011 Acta Palaeontologica Polonica no amson_11_affinities_666987.pdf
Institute of Paleobiology, Polish Academy of Sciences Edgecombe2011 Acta Palaeontologica Polonica no edgecombe_11_new_666988.pdf
Institute of Paleobiology, Polish Academy of Sciences Williamson2011 Acta Palaeontologica Polonica no app2E20092E0147.pdf
Magnolia Press Agiuar2011 Zootaxa yes? zt02846p098.pdf
Magnolia Press Ebach2011 Zootaxa yes? ebach_11_taxonomy_599972.pdf
Magnolia Press Nelson2011 Zootaxa yes? nelson_11_resemblance_688762.pdf
National Academy of Sciences Casanovas2011 Proceedings of the National Academy of Sciences yes? casanovas-vilar_11_updated_644658.pdf
National Academy of Sciences Goswami2011 Proceedings of the National Academy of Sciences yes? goswami_11_radiation_814757.pdf
National Academy of Sciences Thorne2011 Proceedings of the National Academy of Sciences yes? thorne_11_resetting_654055.pdf
Nature Publishing Group Meng2011 Nature yes? meng_11_transitional_644647.pdf
Nature Publishing Group Rougier2011 Nature yes? rougier_11_highly_720202.pdf
Nature Publishing Group Venditti2011 Nature yes? venditti_11_multiple_779840.pdf
NRC Research Press CruzadoCaballero2010 Canadian Journal of Earth Sciences yes? 650000.pdf
NRC Research Press Druckenmiller2010 Canadian Journal of Earth Sciences yes? 80000000c5.pdf
NRC Research Press Mazierski2010 Canadian Journal of Earth Sciences yes? mazierski_10_description_577223.pdf
NRC Research Press Modesto2009 Canadian Journal of Earth Sciences yes? modesto_09_new_577201.pdf
NRC Research Press Parsons2009 Canadian Journal of Earth Sciences yes? parsons_09_new_575744.pdf
NRC Research Press Wu2007 Canadian Journal of Earth Sciences yes? wu_07_new_622125.pdf
Pensoft Publishers Hagedorn2011 ZooKeys yes? hagedorn_11_creative_779747.pdf
Pensoft Publishers Penev2011 ZooKeys yes? penev_11_interlinking_694886.pdf
Pensoft Publishers Thessen2011 ZooKeys yes? thessen_11_data_779746.pdf
Public Library of Science Hess2011 PLoS ONE yes? hess_11_addressing_694222.pdf
Public Library of Science McDonald2011 PLoS ONE yes? mcdonald_11_subadult_694229.pdf
Public Library of Science Wicherts2011 PLoS ONE yes? wicherts_11_willingness_779788.pdf
SAGE Publications deKloet2011 Journal of Veterinary Diagnostic Investigation yes? Invest-2011-deKloet-421-9.pdf
SAGE Publications Richter2011 Journal of Veterinary Diagnostic Investigation yes? Invest-2011-Richter-430-5.pdf
SAGE Publications Wassmuth2011 Journal of Veterinary Diagnostic Investigation yes? Invest-2011-Wassmuth-436-53.pdf
Senckenberg Natural History Collections Dresden Fresneda2011 Arthropod Systematics & Phylogeny yes? fresneda_11_phylogenetic_785869.pdf
Senckenberg Natural History Collections Dresden Mally2011 Arthropod Systematics & Phylogeny yes? ASP_69_1_Mally_55-71.pdf
Senckenberg Natural History Collections Dresden Shimizu2011 Arthropod Systematics & Phylogeny yes? ASP_69_2_Shimizu_75-81.pdf
Springer-Verlag Beermann2011 Zoomorphology yes? 10.1007_s00435-011-0129-9.pdf
Springer-Verlag Cuezzo2011 Zoomorphology yes? cuezzo_11_ultrastructure_694669.pdf
Springer-Verlag Vinn2011 Zoomorphology yes? 10.1007_s00435-011-0133-0.pdf
Taylor & Francis Bianucci2011 Journal of Vertebrate Paleontology no bianucci_11_aegyptocetus_778747.pdf
Taylor & Francis Makovicky2011 Journal of Vertebrate Paleontology no makovicky_11_new_694826.pdf
Taylor & Francis Pietri2011 Journal of Vertebrate Paleontology no pietri_11_revision_689491.pdf
Taylor & Francis Rook2011 Journal of Vertebrate Paleontology no rook_11_phylogeny_694916.pdf
Taylor & Francis Tsuihiji2011 Journal of Vertebrate Paleontology no tsuihiji_11_cranial_660620.pdf
Taylor & Francis Yates2011 Journal of Vertebrate Paleontology no yates_11_new_694821.pdf
Taylor & Francis Gerth2011 Systematics and Biodiversity no gerth_11_wolbachia_779749.pdf
Taylor & Francis Krebes2011 Systematics and Biodiversity no krebes_11_phylogeography_779700.pdf
Sociedade Brasileira de Ictiologia Britski2011 Neotropical Ichthyology yes? a02v9n2.pdf
Sociedade Brasileira de Ictiologia Sarmento2011 Neotropical Ichthyology yes? a03v9n2.pdf
Sociedade Brasileira de Ictiologia Calegari2011 Neotropical Ichthyology yes? a04v9n2.pdf
Royal Society Billet2011 Proceedings of the Royal Society B: Biological Sciences yes? billet_11_oldest_687630.pdf
Royal Society Polly2011 Proceedings of the Royal Society B: Biological Sciences yes? polly_11_history_625430.pdf
Royal Society Sansom2011 Proceedings of the Royal Society B: Biological Sciences yes? sansom_11_decay_625429.pdf

I’m proud to announce I have a new article over at Palaeontology [Online]

The Palaeontology [Online] logo – by the P [O] team, licensed under a Creative Commons Attribution License

Posts at ‘P [O]‘ are primarily aimed at public-engagement and since the site was launched back in July 2011, with sponsorship and support from the Palaeontology Association, one post per month has been featured on site. This month [December], I’ve written a rather different type of post for them. Not so much about fossils, creatures, classification and rocks – but instead on how palaeontology and science-as-a-whole is made available with respect to Open Access, Open Data, Open Source (code), and Open Educational Resources (OERs). Incidentally, I think it’s also the first P [O] post with embedded video content too – really making using of the digital medium!

I’ve tied these strands together with an explicit acknowledgement that Creative Commons has legally enabled all this Open content and that it’s a fantastic achievement. Consider it my early birthday present to celebrate that it’s now been nearly 10 years since Creative Commons first launched (#cc10 on Twitter btw for related news & events).

I’m hoping it will raise awareness that citizens & scientists alike can directly read the primary scientific literature themselves (via Open Access journals and articles) and they should be encouraged to – given as taxpayers they’ve paid for most of it to be created! Also more than just mere engagement, I’ve highlighted that uniquely with an Open philosophy there’s nothing stopping ‘amateur’ or citizen science contributions in palaeontology – it’s sad that more of the literature, data, code and educational resources in this area aren’t openly available for re-use – arguably the world would know a lot more about palaeontology if they were.

With specific reference to http://opendefinition.org/ I try and make it clear what open actually means in this context. There’s been a lot of openwashing this year. Open is clearly a desirable state, and a label which will help sell and ‘add value’ to products, therefore both innocent and malicious temptation abounds to mistakenly label or brand things as ‘open’ when they are de facto not open. Education and awareness-raising clearly has a significant role to play here in preventing this problem.

 

During the production of the article some interesting points were raised, which in the end didn’t make it to the ‘final’ version of the post, so I’ll blog them here instead.

 

On Open Access:

For the sake of simplicity I neglected to point out that in actuality the definition of OA is slightly narrower and more specific than just open as per http://opendefinition.org/ . OA is defined by the BOAI-definition which does not require nor allow(?) the ShareAlike (SA) clause. It does however require the Attribution clause (BY):

the relevant excerpt…

… The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited. …

see Mike Taylor’s excellent posts over at SVPOW for more.

 

On Open Data:

I wonder if perhaps there is still a perception out there that there are still technical barriers to sharing data openly?

Particularly with regard to very very large datasets & data files. I decided this was too niche a point for inclusion in the main post but in case anyone’s wondering – you can easily share *any* filesize these days.

Journals like GigaScience specialise in publishing ‘big data’ studies and already make available petabytes (=1 million gigabytes) worth of data. Data archives like figshare allow unlimited filesize uploads (only limited to 1GB if you keep it private), I’m sure Dryad would also be willing to archive large files. Want proof? Look no further than this 21GB database of microbial data that’s been downloaded at least 10 times as made available via BioTorrents - I couldn’t find anyone seeding it just now, but if there was greater institutional support for p2p data sharing I’m sure this would take off.

 

On Open Source (code):

There’s an excellent editorial in PLoS Computational Biology that regrettably I only just became aware of too late to include. It’s by Andreas Prlić & Hilmar Lapp the latter of whom I had the pleasure of meeting recently at NESCent in Durham, North Carolina. Its a short paper and Open Access so I recommend you all read at least the Why Do We Support Open-Source Scientific Software? section – it’s an excellent clear and concise summary of the greater value of open in this area.

 

On MOOCs: 

The American Museum of Natural History (AMNH) have some online courses available here, and whilst they’re probably of the very highest quality, they are neither MOOCs nor OERs because they’re not open, nor free. Each course costs $495 plus a $25 one-time registration fee. Grad credit is also available at an additional cost.

Perhaps one day the AMNH might be persuaded to run one of these courses as a MOOC? If not to help advertise and drive interest in the other courses but also to demonstrate their quality. If MIT can do it…

I should also refer interested readers to the excellent set of seminars on phylogenetics over at phyloseminar. I’ve virtually-attended (live, over-the-internet but not there in person) a few of these and have enjoyed recordings of others. It’s not a complete course (so not a MOOC) but depending upon exact licencing details could perhaps be classed as OER-like material.  The next one will be soon: 5pm (UK time) Wednesday 5th December Understanding biodiversity patterns using the Tree of Life given by Hélène Morlon.

 

 

All the posts over at P [O] are of very high-quality and are worthy academic contributions. As such I’m going to list my post there on my CV as soon as I update it. It’ll sit nicely in my publications list alongside articles in BMC Research Notes, Nature, and The Systematist. I deliberately intermix peer-reviewed publications and non-peer-reviewed publications to make people reconsider and examine the relative merits of each, rather than just counting volume or (worse) the journal Impact Factor which is of course irrelevant.

I encourage everyone else who’s published an article at P [O] to also proudly display it on their CV.

 

 

So there’s been a few already:

SpotOn day 1 & SpotOn day 2 being the best I’ve read, from Ian Mulvany

see also: SpotOn London – a global conference by Jon Tennant | #solo12 reflection by A J Cann | SpotOn London 2012 in brief by Charis Cook | Altmetric @ SpotOn London 2012 by Jean Liu, and more

but since SpotOn London 2012 was such an interesting event, and there were so many parallel sessions I thought it would be good to add some more to the post-conference discussion

Data is the new black

photo by Bastian Greshake (@gedankenstuecke), Copyright not mine.

It wasn’t just a popular badge at the conference… data is a hot topic in science right now (as it should be!). Data is an undervalued but absolutely vital output of research. Research funding agencies appear to have over-incentivized the production of research publications (many of which are mere executive summaries of the years of research effort they represent) to the exclusion of almost everything else.

Science isn’t just about the production of papers; data and code are extremely important research outputs too (I’m not going to mention patents – they’re a sticky issue best dealt with in another post). The good news is that funding bodies now seem to have realised that they’re seriously missing out on RoI by focusing solely on papers; just recently NSF Grant Proposal Guidelines changed with amended terminology away from narrow-measurement ‘Publications’ to the newer broader term ‘Products’ that explicitly recognises non-publication outputs as creditworthy first class research objects (incidentally, this was one of the many excellent suggestions made in the Force11 manifesto for ‘Improving Future Research Communication and e-Scholarship’ read it if you haven’t already).

The immense value to be gained, time to be saved, and innovative research enabled by making data available for re-use was up for discussion at the #solo12reuse session. Mark Hahnel (@figshare) was organiser/chair, and Sarah Callaghan (@sorcha_ni) of the British Atmospheric Data Centre and I were the invited panelists for a ~1hr slot. As the conference was extremely well-organized *all* sessions were live-streamed via Google Hangouts & made publicly available via YouTube afterwards. I’ve embedded the stream of the #solo12reuse session below:

A transcript of some of what was discussed:

Intro’s from ~02:00 … then straight into discussion from ~09:00 onwards: Josh Greenberg (@epistemographer) contends that data sharing in chemistry perhaps ‘doesn’t make as much sense’ – I have a feeling PMR & many others would disagree with this!

At 13:20 Sarah Callaghan: NERC sets its data embargo policy so that data can only be withheld for a maximum of 2 years after it was collected after which it must be made publicly available, somewhere, somehow – the ambiguity of which IMO needs to be worked on…

At 14:25 discussion of ‘levels of re-usability’ and definition. Access control as a means of encouraging data sharing (?)

17:30 Sarah Callaghan: “It’s important to have ‘first dibs’ on your own data” but not beyond this without peer-vetted justification/scrutiny IMO

18:30 David Shotton (@dshotton): noted that one shouldn’t expect absolutely every data point/item to be shared – not all data is useful/valuable. It’s about retaining & making available bits that might be of re-use value.

At 20:24 I start to introduce AMI2 & the OpenContentMining project.

37:50 I bring the Panton Principles on screen, I also had the OKFN Science Working Group page displayed (although not discussed) for a good ~10 minutes. Note to self: hijack the display computer at panel sessions more often…

from 40:17 onwards… Mark Hahnel: “In terms of re-use and getting people incentivized, are Data Papers the future?” Sarah Callaghan “NO. Until research achievement is predicated on something other than publishing in ‘high-impact’ journals then we’re stuffed: we’ve got to shoehorn data & code in order for them to ‘count’ [lamentably]” So for now we need data papers, but perhaps in the future we won’t need to constrain these outputs to a ‘paper’ style format.

from 43:00 Martin Fenner (@mfenner) plays Devil’s Advocate and suggests that data citation may not work and that perhaps #altmetrics might be better indicators of usage. Much debate ensues…

from 45:50 I give a plug to Iain H’s paper: ‘Open By Default’ Hrynaszkiewicz, I. and Cockerill, M. 2012. Open by default: a proposed copyright license and waiver agreement for open access research and data in peer-reviewed journals. BMC Research Notes 5:494+ then we discuss legal barriers to re-using data.

This post has taken a while to write and is fairly long now, so I’m going to split my recap of #solo12 into two or more parts now. In part 2 I’ll attempt to discuss some of other *excellent* sessions I saw, in particular the brilliant, well-received outburst on the absurd inefficiency of the publication process by professional typesetter Dr Kaveh Bazargan during the #solo12journals session. I’m surprised someone hasn’t done a whole blogpost about this already – it was my highlight of the conference tbh!
I’ll be posting part two on Monday 19th November (weekends are slow for blogs… I want people to read this!)

Until then…