Show me the data!
Header

I just sent this email to Darin Croft (of SVP). I chose to contact him because he recently answered questions about the embargo for EmbargoWatch and it was rather unclear who else I should approach. I did not want to blanket email the whole council.

This is the (entire) email I sent him, from my gmail account:
(I will post his reply as and when I receive it)

Dear Darin,

It’s been noted many times before, by many different researchers – but the SVP meeting abstract embargo just doesn’t make sense to me. I know of no other conference that operates like this, and indeed for most other conferences the abstract booklet (and it’s open, free availability online) is a big promotional aid in getting people interested in the event in the lead-up to it.

I saw you answered some questions on EmbargoWatch recently, so I thought you might be the correct person to contact for my queries on the same subject:

I have blogged my own displeasure with the embargo policy here:
http://rossmounce.co.uk/2012/08/23/the-ridiculous-svp-embargo-is-back-again/

I would like to ask:

1.) What would happen if a researcher (and SVP member) deliberately broke the embargo and blogged/tweeted/published research that was the basis of their own submitted talk abstract (I’m surprised this hasn’t happened already tbh, given how early the abstract deadline is – some e-journals have very quick turnaround times…)

2.) What would happen if a researcher (and SVP member) broke the embargo and blogged or tweeted some or all the of the content of another researcher’s talk abstract

3.) If a blogger or journalist *did* write an article or two on the basis of the meeting abstract booklet – do you seriously think that could harm the chances of VP’ers getting published in one of the glamour mags?

I look forward to hearing from you, and will publish your response in full context with this email on my blog

Best,

Ross



-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-
Ross Mounce
PhD Student & Panton Fellow
Fossils, Phylogeny and Macroevolution Research Group
University of Bath, 4 South Building, Lab 1.07
http://about.me/rossmounce
-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-

Sometimes you just have to laugh…

The year is 2012, we have the internet, we have blogs, and a huge variety of other tools to enable free, efficient and rapid communication of information and yet the Society of Vertebrate Paleontology annual meeting rules still insist that all information within this year’s abstract booklet remain a big secret until the day of the event.

Many others have justly written to complain about this before.

Here’s the 2012 version I just received in my inbox today:

SVP Embargo Policy Regarding Content in the Program and Abstract Book

Unless specified otherwise, coverage of abstracts presented orally at the Annual Meeting is strictly prohibited until the start time of the presentation, and coverage of poster presentations is prohibited until the relevant poster session opens for viewing. As defined here, “¬úcoverage”¬Ě includes all types of electronic and print media; this includes blogging, tweeting and other intent to communicate or disseminate results or discussion presented at the SVP Annual Meeting. Content that may be pre-published online in advance of print publication is also subject to the SVP embargo policy.

So I think I can tell you I’m giving a talk there in the ‘Phylogenetic and Comparative Paleobiology — New Approaches to the Study of Vertebrate Macroevolution’ symposium.

But can I tell you what the title of my talk is, or the abstract I submitted (a rather long time ago, which is another bugbear I have with this particular conference)? Well, given the quote above, probably not!

And therein is part of the ridiculousness of the embargo. By submitting a (subsequently accepted) talk & abstract to this conference – I’m banned from communicating about my own research on that subject until I give the talk. Not even a tweet about it.

It also seems to me that they’re preventing their own members from effectively promoting the event with this policy. Wouldn’t it be great if all speakers could blog and tweet: “Hey, I’m giving a talk on new dinosaur XXXX and it’s unusual anatomy (further details of which are in my abstract here) at a meeting in Raleigh, NC. Come along, tickets still available here” Isn’t that 100 times better than “Hey, I’m giving a talk at this conference – I can’t tell you what the title is or the subject, sorry” ?

This policy strikes me as a massive and unjustified own goal. I appreciate some of the science glamour mags don’t take kindly to press reportage of science before it is published in their glossy pages BUT I think we’ve got to remember that science talks & posters are NOT papers, and they should not and are not treated as such. The abstracts for SVP are only minimally peer-reviewed before acceptance and the talk content itself is completely unreviewed. Therefore if a journalist/blogger/tweeter did report on the abstract booklet (and btw, it would take tremendous journalistic spin to make good, interesting copy from most talk abstracts I’ve ever seen – they’re rather short!) they’d be reporting non-peer reviewed discussion, that may or may not be related to unspecified future peer-reviewed publications. So I don’t buy [what I presume is the justification for all this?] the argument that reportage of talk abstracts jeopardises the publication of peer-reviewed papers. The two may be related, but are also very distinct from each other.

I think it’s only a matter of time until this policy changes. SVP have being doing reasonably well with respect to openness recently. They’ve reduced their hybrid Open Access fees, and instituted new editorial policy encouraging data archiving so that data published in their journal is more transparent & re-usable (=better science). But it seems there are still improvements to be made. Will there be an abstract embargo in 2013 I wonder? I for one hope not.

I’m really pleased this new Open Access paper has just been published.

CC BY 3.0 Zookeys Special Issue 150ResearchBlogging.org

Hagedorn, G. et al. Creative commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information 150, 127-149 (2011).

Some background…

After parading my Open Data t-shirt (pictured below) around the Society of Vertebrate Paleontology meeting this month, I was invited to give an impromptu pitch in front of the great and good of the Mammal AToL project & MorphoBank people. Having pointed out to MorphoBank a while ago that they should really make explicit the terms and conditions [license] under which they make their (?) data available, I naturally advocated CC-BY 3.0 and CC0 licences. I talked about this very subject and pleaded with them NOT to use the NC clause refering to Rod Page & Peter Murray-Rust ‘s [1,2] thoughts on the matter.

Data providers vs Data re-users – need they really be in opposition?

The trouble is, a lot of (data providing) institutions seem hell-bent on ‘protecting commercial interests’, at the expense of research opportunities. So as I understand it, at the moment databases such as these face an awkward problem of either satisfying the restriction requests of data providers OR satisfying permissiveness of re-use by data re-users [such as myself!], and the needs of both camps are seldom entirely met.

Conclusion

I see this paper as an important step in persuading such restriction-minded institutions of the absolute importance of #OpenData / #PantonPrinciples and how NC clauses can genuinely obstruct and impair real academic research.
I just hope people read it and take note!

[Most of this is just a re-post of my spur of the moment G+ post here.
I’m reposting here so that this might hopefully get picked up by Research Blogging to give this paper the publicity it deserves. Much of the content is widely applicable IMO to most of scholarly communications, not just biodiversity informatics, and indeed the whole ZooKeys special issue (Open Access) is well worth a browse.]

References

[1] http://iphylo.blogspot.com/2010/12/plant-list-nice-data-shame-it-not-open.html
[2] http://blogs.ch.cam.ac.uk/pmr/2010/12/17/why-i-and-you-should-avoid-nc-licences/
[3] Hagedorn, G., Mietchen, D., Morris, R., Agosti, D., Penev, L., Berendsohn, W., & Hobern, D. (2011). Creative Commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information ZooKeys, 150 DOI: 10.3897/zookeys.150.2189

This is a re-post of something I was invited to write to sum-up my experiences at OKCon 2011. The original post can be viewed here on the official OKFN Open Science blog. For some reason the Prezi embed code at the bottom didn’t work, but does here on my blog

Many thanks to Jenny Molloy for inviting me to write the post, and Maria Neicu for editing it.

A couple of months ago, I gave a talk at the Open Knowledge Conference 2011, on ‘Open Palaeontology’ – based upon 18 months experience as a lowly PhD student trying, and mostly failing to get usable digital data from palaeontological research papers. As you might well have inferred already from that last sentence; it’s been an interesting ride.

The main point of my talk was the sheer stupidity/naivety of the way in which data is supplied (or in some cases, not at all!) with or within research papers. Effective science operates through the accumulation of knowledge and data, all advances are incremental and build upon the work of others – the Panton Principles probably sum it up far better than I could. Any such barriers to the accumulation of knowledge/data therefore impede the progress of science.

Whilst there are numerous barriers to academic research – access to research papers being perhaps the most well-known and well-publicised; the issue that most aggravates me, is not access to these papers, but the actual papers themselves – in the context of the 21st century (I’m thinking the Internet Age here…), they are only barely adequate (at best) for communicating research data and this is a major problem for the future legacy of our published work… and my research project.

My PhD thesis title is quite broad: ‘The Importance of Fossils in Phylogeny’. Given this title and (wide)scope, I need to look at a lot of papers, in a lot of different journals, and extract data from these articles to re-analyse; to assess the importance of fossils in phylogeny; on a meta-scale. There are long established data formats for the particular type of data I wish to extract. So well established and easy to understand there’s even a Wikipedia page here describing the most commonly used data format (nexus). There exist multiple databases set aside specifically to host this type of data e.g. TreeBASE and MorphoBank. Yet despite all this standardisation and provisioning for paleomorphological phylogenetic data – far less than 1% of all data published on, is actually readily-available in a standardised, digital, usable format.

In most cases the data is there; you just have to dig very very hard to release it from the pdf file it’s usually buried in (and then spend unnecessary and copious amounts of time, manually reformatting and validating it). See the picture below for a typical example (and yes, it is sadly printed sideways, this is a common and silly practice that publishers use to inappropriately squeeze data matrices into papers):
7BHO

I hope you’ll agree with me that this is clearly absurd and hugely inefficient. As I explain in my presentation (slides at the bottom of this post) the data, as originially analysed/used, comes in a much richer, more usable, digital, Standardised format. Yet when published it gets stripped of all useful metadata and converted into a flat, inextricable and significantly obfuscated table. Why? It’s my belief this practise is a lazy unwanted vestigial hangover from the days of paper-based (only) publishing, in which this might have been the only way in which to convey the data with the paper. But in 2011, I can confidently say that the vast majority of researchers read and the use the digital versions of research papers – so why not make full and proper use of the digital format to aid scientific communication? I argue, not to axe paper copies. But to make sure that digital versions are more than just plain pdf versions of the paper copy, as they can and should IMO be.

With this goal in mind, I set about writing an Open Letter to the rest of my research community to explain why we need to richly-digitise our published research data ASAP. Naturally, I wouldn’t get very far just by myself, so I enlisted the support of a variety of academic friends via Facebook, and (inspired by OKFN pads I’d seen) we concocted a draft letter together using an Etherpad. The result of this was a fairly basic Drupal-based website that we launched http://supportpalaeodataarchiving.co.uk/ and disseminated via mailing lists, Twitter, Academia.edu as far and wide as we possibly could, *hoping* just hoping, that our fellow academics would read, take note and support our cause.

Surprisingly, it worked to an extent and a lot of big names in Palaeontology signed our Open Letter in support of our cause; then things got even better when a Nature journalist (Ewen Callaway) got interested in our campaign and wrote an article for Nature News about it, which can be found here. A huge thanks must go to everyone who helped out with the campaign, it’s generated truly International support, as can be demonstrated on the map below:
(you might have to zoom out a bit. For some reason it zooms into Africa by default )


View Open Letter Signatures in a larger map

It’s far too soon to know the true impact of the campaign. Journal editorial boards can be very slow to change their editorial policies, especially if it requires a modicum of extra effort on the part of the publisher. Additionally, once editorial policy does change at a journal, it can only apply to articles submitted from henceforth and thus articles already in the submission pipeline don’t get affected by any new guidelines. It’s not uncommon for delays of a year between submission and publishing in palaeontology, so for this and other reasons, I’m not expecting to see visible change until 2012, but I think we might have helped get the ball rolling, if nothing else…
The Paleontological Society journals (Paleobiology and Journal of Paleontology) have recently adopted mandatory data submission to the Dryad repository, and the Journal of Vertebrate Paleontology has also improved their editorial policy with respect to certain types of data, but these are just a few of many many journals that publish palaeontological articles. I’m very much hoping that other journals will follow suit in the next few months and years by taking steps to improve the way in which research data is communicated, for the good of everyone; authors, publishers, funders and readers.

Anyway, here’s the Prezi I used to convey some of that (and more) at OKCon 2011. Huge thanks to the conference organisers for inviting me to give this talk. It was the most professionally run conference I’ve ever been to, by far. Great food, excellent WiFi provisioning, good comms, superb accommodation… I could go on. If the conference is on next year – I’ll be there for sure!