Show me the data!
Header

Here’s my submission for the House of Lords inquiry. I rather ran out of steam writing it so you’ll see it tails off towards the end. There’s probably loads of things I should mention too. But alas, I have lots of other work to be getting on with right now. Ironically, I highlight the excellent journal Impact Factor‘s of some OA journals. Please forgive me for those sins! So here it is:

 

17/01/2012 Author: Ross Mounce, final year PhD Student at University of Bath & Open Knowledge Foundation Panton Fellow email: rcpm20@bath.ac.uk

 

This submission is an individual contribution but I think it may be indicative of the opinion of many in the scientific research community. Of particular relevance to this inquiry I should state my research funding is from BBSRC, I am engaged in content mining research (which is commonly hampered by copyright/legal issues with respect to non-Open Access research), and I am a council member of The Systematics Association (a UK-based learned society that publishes academic works with CUP).

 

Background

 

  1. On the whole I was extremely pleased when the Finch Report came out and even more so when RCUK announced it was going to implement most if not all of the recommendations. I, and most of my colleagues strongly believe that taxpayer-funded research such as that given out by RCUK should be made openly available to everyone in the world to read and to use for whatever purpose (Open Access).
  2. Currently there are huge inequalities in access to scholarly outputs (not just papers, but data & software too). My research library at the University of Bath can only afford to subscribe to so many subscription access journals – very far from all of them. But for myself and my colleagues to do high-quality, high-impact, definitive research we frequently need access to materials we don’t have either free/Open Access, or quick paid-subscription access to. In these cases myself and colleagues often spend hugely-wasteful lengths of time trying to get copies of these must read materials that are buried behind paywalls we can’t unlock.
  3. The alternative options for access to paywall-restricted papers are poor and inefficient; inter-library loans can take days or weeks. Relatively few researchers currently post full-text self-archived copies of their own work in ‘green’ online repositories (although perhaps more might do so in the future). Electronic inter-library loans from the British Library can only be printed-once – if an error occurs during printing – tough luck, you’ll only ever have half a print version.
  4. Sympathetic colleagues at different institutions with different journal access rights pass each other PDFs all the time – technically this is copyright infringement – we have a system that appears to criminalise attempts to do comprehensive and diligent research. Yet these small acts of academic copyright infringement are rampant online if you know where to look and are often the only way to sensibly and efficiently get research done. Buying additional legal access is simply not affordable nor desirable at the outrageous prices often offered – and sometimes only upon inspection of the fulltext does one find that the paper isn’t actually of use and can be discarded.
  5. Many different peer-reviewed papers have shown that Open Access research has a higher citation rate than its paywall-protected ‘Closed Access’ counter-parts [e.g. 1-8]. Making RCUK research 100% Open Access should reasonably therefore confer some of this effect on our research and increase our already impressive global impact, particularly if we are one of the first big research nations to embrace this, rather than the last.
  6. But the UK is far from alone in strongly pursuing Open Access means of research dissemination. The NIH Public Access mandate requires that all NIH-funded research publications are accessible to the public (world-wide) via the PubMed Central repository no later than 12 months after publication. In Australia, both NHMRC & ARC have Open Access policies in place. In fact if one looks closely enough one will see a litany of national research funders that already have open access mandates in place Argentina, Denmark, Austria, Belgium, as well as innumerable policies at the university/institution level e.g. the Howard Hughes Medical Institute , Wellcome Trust, and even my own institution – the University of Bath (important to mention, because not all UK university research is funded by RCUK).
  7. In particular I think we should note the way in which the SciELO Network has provided sustainable free access to over a thousand South American, Latin American, and (more recently) African research journals via the internet. It is ethically awkward that ‘they’ provide access to so much of ‘their’ research to us for free whilst we often charge them for access to ‘our’ research (many institutions do NOT receive charitably given access via HINARI ). This is an asymmetrical access imbalance that sorely needs to be corrected.

 

On Learned Societies

 

  1. Learned societies heavily-reliant on subscription journal income and concerned with how the RCUK policy may affect them should closely examine the workings of other societies that have successfully operated open access journals for many years. West and colleagues [9] provide robust data showing hundreds of society-operated gold Open Access journals with good citation impact at either no-cost to authors, or for a usually reasonable APC.
  2. Good examples include the Journal of Economic Perspectives (of the American Economic Association) – not only do they charge nothing to authors (APC=0) and provide free access to readers, but also Thomson Reuters Journal Citation Reports (JCR) ranks this as the 5th best journal in Economics out of 321 listed. It is influential and extremely well cited.
  3. The journal Acta Veterinaria Scandinavica is a remarkable success story of society journals (it’s the official journal of the Veterinary Associations of the Nordic Countries). From 2000 to 2005 it was subscription-access only and was dwindling in impact and citations. In 2006 they changed to Open Access publishing with BioMed Central and now enjoy significantly increased impact and citations for the research published there.A plot of the Impact Factor of the journal Acta Veterinaria Scandinavica over time, showing a marked increase after switching to Open Access publishing. Source. Author: BioMed Central. Image licensed under the Creative Commons Attribution 3.0 Unported license 
  4. The European Geological Society (EGU) publishes 14 different gold Open Access journals with the help of Copernicus Publishing. One of these in particular – Atmospheric Chemistry and Physics has been hugely successful and through high citation rate is now ranked the 2nd best journal of 71 in the category “Meterology & Atmospheric Sciences” in Thomson Reuters JCR. It happily publishes articles using the Creative Commons Attribution Licence (CC BY) and charges a fair, variable APC that is cheaper for those who submit manuscripts in LaTeX form – reflecting the ease of which it is to convert such manuscripts into publishable forms. Microsoft Word submissions require more processing and thus they charge more. It is commendable that they expose, and make avoidable some of the effort costs of typesetting this way.
  5. Furthermore, I’d bet there are many different societies operating subscription access journals that already allow self-archiving of published works so that they’d be compliant with the Green OA route which the RCUK policy also allows (with additional leniency on the humanities, allowing a 12 month embargo). This would seem to me to be a fairly pain-free way of complying with the policy should they wish to (N.B. Learned societies are not obligated to comply with this policy, although you would think if it was a British society it might be in their best interests. It is the researchers that must comply).
  6. I am concerned for some UK learned societies that from their annual financial reports seem to indicate they are rather reliant on subscription-journal income to support their societies financially. I am not privy to the exact details of whether society subscription-journal income is ‘ringfenced’ away from supporting the other activities & perks of a societies’ membership. I hope it is. Otherwise I worry that perhaps some learned societies maybe using the surplus from the subscription-access journal income (paid for by libraries/institutions/universities world-wide) and spending this surplus on personal society member-only perks e.g. a free hardcopy paper newsletter only delivered to personal members. I have examined annual report accounts of some learned society accounts myself and find that where the money/surplus goes to be rather opaque in some cases.
  7. It appears that many societies have been operating a consistent and healthy surplus from their subscription-access journals and using this surplus to expand their outreach activities and member perks – free pens, paper, mugs, USB sticks and heavily discounted student memberships. I myself have greedily taken many of these membership benefits, and know that I have received goods and services that far exceed the cost of the small, hugely subsidized membership fee I paid. All this would be okay if it was only members paying for other (younger) members – self-sustainability. But I am increasingly concerned about the asymmetry of fees and benefits provided by some learned societies. Surely a significant portion of journal subscription income is from institutional subscriber agreements? Institutions are very rarely members of learned societies, and institutionally the only benefit they get from these fees paid is institutional access to subscription-only society journals. Yet the surplus from subscription income at societies doesn’t seem to be given back except to members through perks and the organisation of outreach events and such.
  8. Therefore I think it would be fairer for a society to publish any associated journals in an Open Access manner and concentrate on being financially self-sustaining – whilst clearly delivering on their core mission(s) of educating the world about their subject. Relying on denying access to research via paywalls to provide surplus income with which to spend on outreach to further their mission, seems like a very convoluted argument and an inefficient way of achieving their aims. Put simply, Open Access very clearly fulfils many of the core purposes of learned societies and provides an open platform with which to build outreach around.

 

Arrangements for APC funds

 

  1. As I’m sure many will cite, most gold open access journals listed in the Directory of Open Access Journals (DOAJ) are fee free. They do not charge an APC. Of those that do, the average APC is just $906 (Solomon & Bjork, 2012). There is no strong relationship between the APC cost of gold open access journals and their article level impact [9]. Intuitively this makes sense – if I submitted my work to Nature, or I submitted my work to the Panamanian Journal of Ichthyology (a fictional journal) the work, if published, would essentially be the same – journal ‘brand’ is just a label, it doesn’t change anything – especially not the quality of peer review. In terms of citations, solid evidence supports this intuition – since 1990 the relationship between Impact Factor (citations to a journal) and article-level citations has significantly weakened [10]. To put it another way – good research gets read and cited no matter where it’s published.
  2. I’m aware there are concerns in the Humanities and Social Sciences about Open Access and APCs. I don’t know why there aren’t more Open Access journals in these disciplines. There’s nothing technologically preventing a surfeit of new Open Access journals from forming. Good, well tested solutions like Open Journal Systems are free to implement (no software cost) and are used by over 11,000 journals world-wide. The implementation only needs bandwidth-cost support and the same human time/effort required to run a subscription access journal, which I’m sure institutions should be made willing to help with. Stuart Shieber gives an excellent description of how costs are managed at the Journal of Machine Learning Research. Here academics volunteer time, with the help of a little institutional support to produce a high-quality, high-impact peer-reviewed research journal that costs just $6.50 per paper to run.
  3. I would urge the House of Lords to look into how universities and libraries could be encouraged to help British academics create new, efficient, low-cost, peer-reviewed research journals. Martin Eve for one appears to have no trouble doing this. It need not even necessarily require additional cash-injection, just IT-support and the use of institutional bandwidth & servers to host Open Access journals. Willingness to try, rather than just moan about change is also required.
  4. Above all, academics in all areas need to consider and be made aware of the huge variety of open access publishing options available to them. The big commercial publisher brands may be the most well-known in some areas, and they spend significant marketing budgets on ensuring this. Unfortunately these commercial publishers also offer some of the most eye-wateringly expensive gold Open Access options. We need to incentivize and ensure a ‘value-for-money publishing’ mentality, and to discourage academics away from these expensive ‘hybrid’ OA options. It would be good to set a hard limit on the amount of cash that RCUK would be willing to pay for an APC for any one publication. Otherwise it might encourage some publishers to further indulge in price-gouging.
  5. I am glad that RCUK is supporting gold open access and green open access routes. I fail to see how green alone would work out in the end – it does not provide peer review. ‘Overlay’ peer-review services external to journal publishers operating on pre-print servers are a nice idea, but I’m not sure this model of publishing will gain traction or acceptance in academia, not for a while at least. Therefore to continue to build-on and support low-cost journals I think it is good that RCUK is encouraging the gold open access route.

 

Embargo periods

 

  1. I don’t have much to say about embargo periods. Only that I’ve seen some interesting arguments used against short embargo periods in the humanities e.g. history. One such argument used was that the ‘citation half-life’ was very long in History and therefore a short embargo period would harm this discipline more than in the sciences. Yet I know that in Palaeontology, the citation half-life of papers as you might imagine is also very long – yet there are few such concerns about embargo periods or the effect of Open Access in this discipline. I recently gathered data and found that the mean-age of cited papers in palaeontology is roughly >18 years. Therefore I don’t ‘buy’ this long-tail usage argument as it equally applies in other disciplines that appear to have no problem with open access, green or gold.

References

 

1. Lawrence, S. 2001. Free online availability substantially increases a paper’s impact. Nature 411:521 http://dx.doi.org/10.1038/35079151

2. Xia, J. and Nakanishi, K. 2012. Self-selection and the citation advantage of open access articles. Online Information Review 36:40-51.http://www.emeraldinsight.com/journals.htm?articleid=17004555&show=html  [the OA citation advantage is more pronounced for ‘smaller’ journals]

3. Xia, J., Myers, R. L., and Wilhoite, S. K. 2011. Multiple open access availability and citation impact. Journal of Information Science 37:19-28.http://dx.doi.org/10.1177/0165551510389358 [More copies available in different places, more citations…]

4. Riera, M. and Aibar, E. 2012. Does open access publishing increase the impact of scientific articles? an empirical study in the field of intensive care medicine. Medicina intensiva / Sociedad Espanola de Medicina Intensiva y Unidades Coronarias.http://dx.doi.org/10.1016/j.medin.2012.04.002

5. Norris, M., Oppenheim, C., and Rowland, F. 2008. The citation advantage of open-access articles. J. Am. Soc. Inf. Sci. 59:1963-1972.http://dx.doi.org/10.1002/asi.20898

6. Eysenbach, G. 2006. Citation advantage of open access articles. PLoS Biol 4:e157+. http://dx.doi.org/10.1371/journal.pbio.0040157

7. Hajjem, C., Harnad, S., and Gingras, Y. 2006. Ten-Year Cross-Disciplinary comparison of the growth of open access and how it increases research citation impact. http://arxiv.org/abs/cs.DL/0606079

8. Gargouri, Y., Hajjem, C., Larivière, V., Gingras, Y., Carr, L., Brody, T., and Harnad, S. 2010. Self-Selected or mandated, open access increases citation impact for higher quality research. PLoS ONE 5:e13636+. http://dx.doi.org/10.1371/journal.pone.0013636

9. West, J., Bergstrom, T. and Bergstrom, C. T. 2013. Cost-effectiveness of open access publications

10. Lozano, G. A. , Lariviere, V. and Gingras Y. 2012. The weakening relationship between the Impact Factor and papers’ citations in the digital age http://arxiv.org/abs/1205.4328v1

 

Anyone who knows me, knows I’m very passionate on the subject of data sharing in science, and after all the relevant conferences I’ve been to and research I’ve done – I don’t mind saying I’m fairly knowledgeable on the subject too.

It’s part of the reason I got this Panton Fellowship that has helped me develop my work and do what I want to do in pursuit of Open Data goals.

So when I saw this article come up on my RSS feeds – I thought great! It’s finally happening. The vertebrate palaeontology community is finally seeing the light – the absolute need to share research data associated with published papers (we’ll tackle pre-publication data sharing later, first things first…)!

Uhen, M. D., Barnosky, A. D., Bills, B., Blois, J., Carrano, M. T., Carrasco, M. A., Erickson, G. M., Eronen, J. T., Fortelius, M., Graham, R. W., Grimm, E. C., O'Leary, M. A., Mast, A., Piel, W. H., Polly, P. D., and Säilä, L. K. 2013.
From card catalogs to computers: databases in vertebrate paleontology. Journal of Vertebrate Paleontology 33:13-28.

2013-01-12-142813_1054x983_scrot

…and yet when I read the paper – it sorely disappointed me for a variety of reasons.

Choosing examples: bad choices & odd absences

Despite clear criteria given, I found the choice of databases reviewed to be an odd selection – for example they choose to include AHOB (Ancient Human Occupation of Britain) and write about it that:

“Access is restricted to project members during the life of the project, after which access will be publicly granted.”

This probably explains why then, that when I go to the database website – I can’t seem to get access to any of the purported data to be there!

AHOB
Screenshot of the login screen for AHOB. Try it yourself.

Yet apparently: “More than 250 publications have results from the AHOB project, all of which are recorded in the database.”

How many more publications will come out of this cosy little database before access will be publicly granted I wonder? I don’t think this is a good example of a research database as it doesn’t seem to publicly share any data.

Where’s Dryad?

Furthermore there are some really big, obvious, relevant databases it neglects to review, in particular Dryad – the only mention of which is that TreeBASE received “some support from Dryad” – with absolutely no mention anywhere that Dryad itself is a database with lots of vertebrate palaeontological data in it and likely to be a strongly important, long-lasting database in this area for the foreseeable future IMO! Even some data associated with an article in JVP itself is in Dryad! Although less prominently paleo-related figshare (with no less that 26 paleontology-related datasets there at the moment, TreeBASE has approximately as many!) might have been worth mentioning too.

Dryad has a partnership with The Paleontological Society and many evolutionary biology journals. Dryad even bought a promotional stand at last year’s Society of Vertebrate Paleontology annual meeting (the society that publishes the Journal of Vertebrate Paleontology) but as Richard Butler has pointed out to me on Twitter this article was submitted before that meeting. Still, it’s simply impossible that none of the 16 authors listed doesn’t know about Dryad. I find the non-inclusion of Dryad deeply suspicious and possibly political given it could ‘compete’ to store much of the data that some of the other reviewed databases do (it’s a broad generalist in the types of data it accepts).

Isn’t there a conflict of interest issue given that most of the authors of this paper are involved with at least one of the ‘reviewed’ (=advertised) databases in the paper? I see no mention of this conflict of interest anywhere in the paper. I dearly hope this paper was peer-reviewed – that it is an ‘invited article’ makes me wonder a bit about that…

The inclusion of Polyglot Paleontologist too, in the reviewed databases does also rather stretch the meaning of ‘data’ in the word database. Are translations of 434 different papers ‘data’? In the same way that TreeBASE or PaleoDB contain data? It’s a fantastic freely provided resource, no doubt – I mean no criticism of it – but is it data? I think not tbh.

Strong contenders for things that could/should have been cited but weren’t

WRT to Data Portals: rOpenSci provide great R interfaces for a wide variety of databases, including TreeBASE which was one of the ‘reviewed’ databases.

WRT to the History of databases section: I find it odd that they didn’t think to mention my own widely publicised and well-supported call for data archiving in palaeontology back in 2011. Nearly 200 palaeontologists signed in support of our ideas with some memorable quotes of support e.g. Brian Huber “This is the way of the future” , P J Wagner “I’ve been trying to get the Paleo Society to sign on with Dryad, but it’s been like slamming my head on jello…”

They could have explained why freely accessible databases/archives are so important a bit better in my opinion:
that ‘Data archiving is a good investment‘ (Piwowar et al, 2012),
that only 4% of phylogenetic data is currently archived and that it’s really useful data (Stoltzfus et al, 2012),
that Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results (Wicherts et al, 2011),
that the “data available upon request” system really doesn’t work (Wicherts et al, 2006)
the undesirable consquences of non-commercial clauses applied to biodiversity data (Hagedorn et al, 2011)

Odd wording

“…community approach, facilitated by the open access of the WWW and…”

sounds like something my dad would say about the interweb

“The CCL 3.0 license allows…”

a classic mistake – which CCL license?
In this case they mean the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 license, or CC BY-NC-SA for short. Calling it “Creative Commons License 3.0 (BY-NC-SA)” makes me wonder how familiar they are with licencing. Perhaps a sub-editor did this. And why they link specifically to the US version not the international unported license I do not know.

Data Citation: the Elephant in the Room?

Attribution is mentioned many times, and is vitally important to motivate people to share data. Yet the concept of citing data in countable ways or Data Citation isn’t explicitly mentioned once. Nor altmetrics for that matter.

This would have been an excellent opportunity – the start of a new year to encourage authors to actually cite data that they re-use from someone else so that those citations can be easily counted and contribute towards research evaluations, but alas no.

So what now?

So I like some of the message of this paper. But I don’t think it goes far enough, nor does a good job of it. Call me egotistical but I think I could do better and expand upon what I’ve written above.

If any journal editor happens to read this, and would like to commission an ‘invited article’, comment, or proper independent critical review of databases in vertebrate palaeontology / evolutionary biology please contact me. I think I could offer an interesting perspective.

PS I’m not going to write to the journal. I tried that with Nature and it took 6 months from submission for my comment to get published! It’s 2013 – if I’m going to do post-publication peer review – I’ll definitely be blogging it from now on, Rosie Redfield style!

I’m proud to announce I have a new article over at Palaeontology [Online]

The Palaeontology [Online] logo – by the P [O] team, licensed under a Creative Commons Attribution License

Posts at ‘P [O]’ are primarily aimed at public-engagement and since the site was launched back in July 2011, with sponsorship and support from the Palaeontology Association, one post per month has been featured on site. This month [December], I’ve written a rather different type of post for them. Not so much about fossils, creatures, classification and rocks – but instead on how palaeontology and science-as-a-whole is made available with respect to Open Access, Open Data, Open Source (code), and Open Educational Resources (OERs). Incidentally, I think it’s also the first P [O] post with embedded video content too – really making using of the digital medium!

I’ve tied these strands together with an explicit acknowledgement that Creative Commons has legally enabled all this Open content and that it’s a fantastic achievement. Consider it my early birthday present to celebrate that it’s now been nearly 10 years since Creative Commons first launched (#cc10 on Twitter btw for related news & events).

I’m hoping it will raise awareness that citizens & scientists alike can directly read the primary scientific literature themselves (via Open Access journals and articles) and they should be encouraged to – given as taxpayers they’ve paid for most of it to be created! Also more than just mere engagement, I’ve highlighted that uniquely with an Open philosophy there’s nothing stopping ‘amateur’ or citizen science contributions in palaeontology – it’s sad that more of the literature, data, code and educational resources in this area aren’t openly available for re-use – arguably the world would know a lot more about palaeontology if they were.

With specific reference to http://opendefinition.org/ I try and make it clear what open actually means in this context. There’s been a lot of openwashing this year. Open is clearly a desirable state, and a label which will help sell and ‘add value’ to products, therefore both innocent and malicious temptation abounds to mistakenly label or brand things as ‘open’ when they are de facto not open. Education and awareness-raising clearly has a significant role to play here in preventing this problem.

 

During the production of the article some interesting points were raised, which in the end didn’t make it to the ‘final’ version of the post, so I’ll blog them here instead.

 

On Open Access:

For the sake of simplicity I neglected to point out that in actuality the definition of OA is slightly narrower and more specific than just open as per http://opendefinition.org/ . OA is defined by the BOAI-definition which does not require nor allow(?) the ShareAlike (SA) clause. It does however require the Attribution clause (BY):

the relevant excerpt…

… The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited. …

see Mike Taylor’s excellent posts over at SVPOW for more.

 

On Open Data:

I wonder if perhaps there is still a perception out there that there are still technical barriers to sharing data openly?

Particularly with regard to very very large datasets & data files. I decided this was too niche a point for inclusion in the main post but in case anyone’s wondering – you can easily share *any* filesize these days.

Journals like GigaScience specialise in publishing ‘big data’ studies and already make available petabytes (=1 million gigabytes) worth of data. Data archives like figshare allow unlimited filesize uploads (only limited to 1GB if you keep it private), I’m sure Dryad would also be willing to archive large files. Want proof? Look no further than this 21GB database of microbial data that’s been downloaded at least 10 times as made available via BioTorrents – I couldn’t find anyone seeding it just now, but if there was greater institutional support for p2p data sharing I’m sure this would take off.

 

On Open Source (code):

There’s an excellent editorial in PLoS Computational Biology that regrettably I only just became aware of too late to include. It’s by Andreas Prlić & Hilmar Lapp the latter of whom I had the pleasure of meeting recently at NESCent in Durham, North Carolina. Its a short paper and Open Access so I recommend you all read at least the Why Do We Support Open-Source Scientific Software? section – it’s an excellent clear and concise summary of the greater value of open in this area.

 

On MOOCs: 

The American Museum of Natural History (AMNH) have some online courses available here, and whilst they’re probably of the very highest quality, they are neither MOOCs nor OERs because they’re not open, nor free. Each course costs $495 plus a $25 one-time registration fee. Grad credit is also available at an additional cost.

Perhaps one day the AMNH might be persuaded to run one of these courses as a MOOC? If not to help advertise and drive interest in the other courses but also to demonstrate their quality. If MIT can do it…

I should also refer interested readers to the excellent set of seminars on phylogenetics over at phyloseminar. I’ve virtually-attended (live, over-the-internet but not there in person) a few of these and have enjoyed recordings of others. It’s not a complete course (so not a MOOC) but depending upon exact licencing details could perhaps be classed as OER-like material.  The next one will be soon: 5pm (UK time) Wednesday 5th December Understanding biodiversity patterns using the Tree of Life given by Hélène Morlon.

 

 

All the posts over at P [O] are of very high-quality and are worthy academic contributions. As such I’m going to list my post there on my CV as soon as I update it. It’ll sit nicely in my publications list alongside articles in BMC Research Notes, Nature, and The Systematist. I deliberately intermix peer-reviewed publications and non-peer-reviewed publications to make people reconsider and examine the relative merits of each, rather than just counting volume or (worse) the journal Impact Factor which is of course irrelevant.

I encourage everyone else who’s published an article at P [O] to also proudly display it on their CV.

 

 

Opportunity Knocks

October 3rd, 2012 | Posted by rmounce in Open Access | Palaeontology - (2 Comments)

A few months ago I gave a short talk about the Open Knowledge Foundation and its activities as relevant to academics at a small (but good!) palaeontology conference in Cambridge (which I blogged about previously).

I didn’t need to give this talk. Neither the OKF nor my academic progression required me to give this talk. I just felt it might be helpful to let my friends and peers know who the OKF are, what they’re trying to achieve, and what my Panton Fellowship is about.

That optional talk has now paid HUGE dividends: enabling me to talk live on BBC Radio 3 last night about Open Access and the beneficial impact this will have on research with our Minister for Science & Universities, David Willetts MP & Dame Janet Finch (writer of ‘the Finch report’). I got some good time at the end after the show to speak with David about encouraging efficiently run ultra low-cost journals like the Journal of Machine Learning Research. I hope this will have had some influence, if not, I certainly tried!

So how did this come about?

Nick Crumpton, PhD student at the University of Cambridge, and one of the student organisers for Progressive Palaeontology 2012 (ProgPal) is also a BBC Online British Science Association Media Fellow and thus has good contacts at the BBC. They were apparently looking for a young scientist to come on the show and give an informed opinion from ‘the coalface’ of research so Nick kindly remembered my impassioned talk from ProgPal on OKF & openness in academia and recommended me.

I got in touch with the programme producer, and was invited to join the live radio debate later that night.

Image © British Broadcasting Company. Click through to listen to the radio programme. The Open Access discussion segment occurs from about 6min40s in

…and that’s how it happened.

With Open Access Week coming up very soon, 22-28 October, I guess the point of this post is:

No matter how small your contribution towards the advocacy of Open Access might seem; every little helps. Keep at it. Keep speaking out about OA until all publicly funded research everywhere (glares at the US) is Open Access.

Postscript: That same day Sir Mark Walport was also interviewed on BBC Radio, partly about Open Access – I highly recommend & agree with his opinions; the link is here. Listen from 11.38 to 15.10 for the OA bits h/t Steve Hitchcock @stevehit

I just sent this email to Darin Croft (of SVP). I chose to contact him because he recently answered questions about the embargo for EmbargoWatch and it was rather unclear who else I should approach. I did not want to blanket email the whole council.

This is the (entire) email I sent him, from my gmail account:
(I will post his reply as and when I receive it)

Dear Darin,

It’s been noted many times before, by many different researchers – but the SVP meeting abstract embargo just doesn’t make sense to me. I know of no other conference that operates like this, and indeed for most other conferences the abstract booklet (and it’s open, free availability online) is a big promotional aid in getting people interested in the event in the lead-up to it.

I saw you answered some questions on EmbargoWatch recently, so I thought you might be the correct person to contact for my queries on the same subject:

I have blogged my own displeasure with the embargo policy here:
http://rossmounce.co.uk/2012/08/23/the-ridiculous-svp-embargo-is-back-again/

I would like to ask:

1.) What would happen if a researcher (and SVP member) deliberately broke the embargo and blogged/tweeted/published research that was the basis of their own submitted talk abstract (I’m surprised this hasn’t happened already tbh, given how early the abstract deadline is – some e-journals have very quick turnaround times…)

2.) What would happen if a researcher (and SVP member) broke the embargo and blogged or tweeted some or all the of the content of another researcher’s talk abstract

3.) If a blogger or journalist *did* write an article or two on the basis of the meeting abstract booklet – do you seriously think that could harm the chances of VP’ers getting published in one of the glamour mags?

I look forward to hearing from you, and will publish your response in full context with this email on my blog

Best,

Ross



-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
Ross Mounce
PhD Student & Panton Fellow
Fossils, Phylogeny and Macroevolution Research Group
University of Bath, 4 South Building, Lab 1.07

http://about.me/rossmounce

-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-

Sometimes you just have to laugh…

The year is 2012, we have the internet, we have blogs, and a huge variety of other tools to enable free, efficient and rapid communication of information and yet the Society of Vertebrate Paleontology annual meeting rules still insist that all information within this year’s abstract booklet remain a big secret until the day of the event.

Many others have justly written to complain about this before.

Here’s the 2012 version I just received in my inbox today:

SVP Embargo Policy Regarding Content in the Program and Abstract Book

Unless specified otherwise, coverage of abstracts presented orally at the Annual Meeting is strictly prohibited until the start time of the presentation, and coverage of poster presentations is prohibited until the relevant poster session opens for viewing. As defined here, “œcoverage” includes all types of electronic and print media; this includes blogging, tweeting and other intent to communicate or disseminate results or discussion presented at the SVP Annual Meeting. Content that may be pre-published online in advance of print publication is also subject to the SVP embargo policy.

So I think I can tell you I’m giving a talk there in the ‘Phylogenetic and Comparative Paleobiology — New Approaches to the Study of Vertebrate Macroevolution’ symposium.

But can I tell you what the title of my talk is, or the abstract I submitted (a rather long time ago, which is another bugbear I have with this particular conference)? Well, given the quote above, probably not!

And therein is part of the ridiculousness of the embargo. By submitting a (subsequently accepted) talk & abstract to this conference – I’m banned from communicating about my own research on that subject until I give the talk. Not even a tweet about it.

It also seems to me that they’re preventing their own members from effectively promoting the event with this policy. Wouldn’t it be great if all speakers could blog and tweet: “Hey, I’m giving a talk on new dinosaur XXXX and it’s unusual anatomy (further details of which are in my abstract here) at a meeting in Raleigh, NC. Come along, tickets still available here” Isn’t that 100 times better than “Hey, I’m giving a talk at this conference – I can’t tell you what the title is or the subject, sorry” ?

This policy strikes me as a massive and unjustified own goal. I appreciate some of the science glamour mags don’t take kindly to press reportage of science before it is published in their glossy pages BUT I think we’ve got to remember that science talks & posters are NOT papers, and they should not and are not treated as such. The abstracts for SVP are only minimally peer-reviewed before acceptance and the talk content itself is completely unreviewed. Therefore if a journalist/blogger/tweeter did report on the abstract booklet (and btw, it would take tremendous journalistic spin to make good, interesting copy from most talk abstracts I’ve ever seen – they’re rather short!) they’d be reporting non-peer reviewed discussion, that may or may not be related to unspecified future peer-reviewed publications. So I don’t buy [what I presume is the justification for all this?] the argument that reportage of talk abstracts jeopardises the publication of peer-reviewed papers. The two may be related, but are also very distinct from each other.

I think it’s only a matter of time until this policy changes. SVP have being doing reasonably well with respect to openness recently. They’ve reduced their hybrid Open Access fees, and instituted new editorial policy encouraging data archiving so that data published in their journal is more transparent & re-usable (=better science). But it seems there are still improvements to be made. Will there be an abstract embargo in 2013 I wonder? I for one hope not.

I’m really pleased this new Open Access paper has just been published.

CC BY 3.0 Zookeys Special Issue 150ResearchBlogging.org

Hagedorn, G. et al. Creative commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information 150, 127-149 (2011).

Some background…

After parading my Open Data t-shirt (pictured below) around the Society of Vertebrate Paleontology meeting this month, I was invited to give an impromptu pitch in front of the great and good of the Mammal AToL project & MorphoBank people. Having pointed out to MorphoBank a while ago that they should really make explicit the terms and conditions [license] under which they make their (?) data available, I naturally advocated CC-BY 3.0 and CC0 licences. I talked about this very subject and pleaded with them NOT to use the NC clause refering to Rod Page & Peter Murray-Rust ‘s [1,2] thoughts on the matter.

Data providers vs Data re-users – need they really be in opposition?

The trouble is, a lot of (data providing) institutions seem hell-bent on ‘protecting commercial interests’, at the expense of research opportunities. So as I understand it, at the moment databases such as these face an awkward problem of either satisfying the restriction requests of data providers OR satisfying permissiveness of re-use by data re-users [such as myself!], and the needs of both camps are seldom entirely met.

Conclusion

I see this paper as an important step in persuading such restriction-minded institutions of the absolute importance of #OpenData / #PantonPrinciples and how NC clauses can genuinely obstruct and impair real academic research.
I just hope people read it and take note!

[Most of this is just a re-post of my spur of the moment G+ post here.
I’m reposting here so that this might hopefully get picked up by Research Blogging to give this paper the publicity it deserves. Much of the content is widely applicable IMO to most of scholarly communications, not just biodiversity informatics, and indeed the whole ZooKeys special issue (Open Access) is well worth a browse.]

References

[1] http://iphylo.blogspot.com/2010/12/plant-list-nice-data-shame-it-not-open.html
[2] http://blogs.ch.cam.ac.uk/pmr/2010/12/17/why-i-and-you-should-avoid-nc-licences/
[3] Hagedorn, G., Mietchen, D., Morris, R., Agosti, D., Penev, L., Berendsohn, W., & Hobern, D. (2011). Creative Commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information ZooKeys, 150 DOI: 10.3897/zookeys.150.2189

This is a re-post of something I was invited to write to sum-up my experiences at OKCon 2011. The original post can be viewed here on the official OKFN Open Science blog. For some reason the Prezi embed code at the bottom didn’t work, but does here on my blog

Many thanks to Jenny Molloy for inviting me to write the post, and Maria Neicu for editing it.

A couple of months ago, I gave a talk at the Open Knowledge Conference 2011, on ‘Open Palaeontology’ – based upon 18 months experience as a lowly PhD student trying, and mostly failing to get usable digital data from palaeontological research papers. As you might well have inferred already from that last sentence; it’s been an interesting ride.

The main point of my talk was the sheer stupidity/naivety of the way in which data is supplied (or in some cases, not at all!) with or within research papers. Effective science operates through the accumulation of knowledge and data, all advances are incremental and build upon the work of others – the Panton Principles probably sum it up far better than I could. Any such barriers to the accumulation of knowledge/data therefore impede the progress of science.

Whilst there are numerous barriers to academic research – access to research papers being perhaps the most well-known and well-publicised; the issue that most aggravates me, is not access to these papers, but the actual papers themselves – in the context of the 21st century (I’m thinking the Internet Age here…), they are only barely adequate (at best) for communicating research data and this is a major problem for the future legacy of our published work… and my research project.

My PhD thesis title is quite broad: ‘The Importance of Fossils in Phylogeny’. Given this title and (wide)scope, I need to look at a lot of papers, in a lot of different journals, and extract data from these articles to re-analyse; to assess the importance of fossils in phylogeny; on a meta-scale. There are long established data formats for the particular type of data I wish to extract. So well established and easy to understand there’s even a Wikipedia page here describing the most commonly used data format (nexus). There exist multiple databases set aside specifically to host this type of data e.g. TreeBASE and MorphoBank. Yet despite all this standardisation and provisioning for paleomorphological phylogenetic data – far less than 1% of all data published on, is actually readily-available in a standardised, digital, usable format.

In most cases the data is there; you just have to dig very very hard to release it from the pdf file it’s usually buried in (and then spend unnecessary and copious amounts of time, manually reformatting and validating it). See the picture below for a typical example (and yes, it is sadly printed sideways, this is a common and silly practice that publishers use to inappropriately squeeze data matrices into papers):
7BHO

I hope you’ll agree with me that this is clearly absurd and hugely inefficient. As I explain in my presentation (slides at the bottom of this post) the data, as originially analysed/used, comes in a much richer, more usable, digital, Standardised format. Yet when published it gets stripped of all useful metadata and converted into a flat, inextricable and significantly obfuscated table. Why? It’s my belief this practise is a lazy unwanted vestigial hangover from the days of paper-based (only) publishing, in which this might have been the only way in which to convey the data with the paper. But in 2011, I can confidently say that the vast majority of researchers read and the use the digital versions of research papers – so why not make full and proper use of the digital format to aid scientific communication? I argue, not to axe paper copies. But to make sure that digital versions are more than just plain pdf versions of the paper copy, as they can and should IMO be.

With this goal in mind, I set about writing an Open Letter to the rest of my research community to explain why we need to richly-digitise our published research data ASAP. Naturally, I wouldn’t get very far just by myself, so I enlisted the support of a variety of academic friends via Facebook, and (inspired by OKFN pads I’d seen) we concocted a draft letter together using an Etherpad. The result of this was a fairly basic Drupal-based website that we launched http://supportpalaeodataarchiving.co.uk/ and disseminated via mailing lists, Twitter, Academia.edu as far and wide as we possibly could, *hoping* just hoping, that our fellow academics would read, take note and support our cause.

Surprisingly, it worked to an extent and a lot of big names in Palaeontology signed our Open Letter in support of our cause; then things got even better when a Nature journalist (Ewen Callaway) got interested in our campaign and wrote an article for Nature News about it, which can be found here. A huge thanks must go to everyone who helped out with the campaign, it’s generated truly International support, as can be demonstrated on the map below:
(you might have to zoom out a bit. For some reason it zooms into Africa by default )


View Open Letter Signatures in a larger map

It’s far too soon to know the true impact of the campaign. Journal editorial boards can be very slow to change their editorial policies, especially if it requires a modicum of extra effort on the part of the publisher. Additionally, once editorial policy does change at a journal, it can only apply to articles submitted from henceforth and thus articles already in the submission pipeline don’t get affected by any new guidelines. It’s not uncommon for delays of a year between submission and publishing in palaeontology, so for this and other reasons, I’m not expecting to see visible change until 2012, but I think we might have helped get the ball rolling, if nothing else…
The Paleontological Society journals (Paleobiology and Journal of Paleontology) have recently adopted mandatory data submission to the Dryad repository, and the Journal of Vertebrate Paleontology has also improved their editorial policy with respect to certain types of data, but these are just a few of many many journals that publish palaeontological articles. I’m very much hoping that other journals will follow suit in the next few months and years by taking steps to improve the way in which research data is communicated, for the good of everyone; authors, publishers, funders and readers.

Anyway, here’s the Prezi I used to convey some of that (and more) at OKCon 2011. Huge thanks to the conference organisers for inviting me to give this talk. It was the most professionally run conference I’ve ever been to, by far. Great food, excellent WiFi provisioning, good comms, superb accommodation… I could go on. If the conference is on next year – I’ll be there for sure!