Show me the data!
Header

yet another #solo12 recap (part 1)

November 17th, 2012 | Posted by rmounce in Conferences | Open Data | Panton Fellowship updates

So there’s been a few already:

SpotOn day 1 & SpotOn day 2 being the best I’ve read, from Ian Mulvany

see also: SpotOn London – a global conference by Jon Tennant | #solo12 reflection by A J Cann | SpotOn London 2012 in brief by Charis Cook | Altmetric @ SpotOn London 2012 by Jean Liu, and more

but since SpotOn London 2012 was such an interesting event, and there were so many parallel sessions I thought it would be good to add some more to the post-conference discussion

Data is the new black

photo by Bastian Greshake (@gedankenstuecke), Copyright not mine.

It wasn’t just a popular badge at the conference… data is a hot topic in science right now (as it should be!). Data is an undervalued but absolutely vital output of research. Research funding agencies appear to have over-incentivized the production of research publications (many of which are mere executive summaries of the years of research effort they represent) to the exclusion of almost everything else.

Science isn’t just about the production of papers; data and code are extremely important research outputs too (I’m not going to mention patents – they’re a sticky issue best dealt with in another post). The good news is that funding bodies now seem to have realised that they’re seriously missing out on RoI by focusing solely on papers; just recently NSF Grant Proposal Guidelines changed with amended terminology away from narrow-measurement ‘Publications’ to the newer broader term ‘Products’ that explicitly recognises non-publication outputs as creditworthy first class research objects (incidentally, this was one of the many excellent suggestions made in the Force11 manifesto for ‘Improving Future Research Communication and e-Scholarship’ read it if you haven’t already).

The immense value to be gained, time to be saved, and innovative research enabled by making data available for re-use was up for discussion at the #solo12reuse session. Mark Hahnel (@figshare) was organiser/chair, and Sarah Callaghan (@sorcha_ni) of the British Atmospheric Data Centre and I were the invited panelists for a ~1hr slot. As the conference was extremely well-organized *all* sessions were live-streamed via Google Hangouts & made publicly available via YouTube afterwards. I’ve embedded the stream of the #solo12reuse session below:

A transcript of some of what was discussed:

Intro’s from ~02:00 … then straight into discussion from ~09:00 onwards: Josh Greenberg (@epistemographer) contends that data sharing in chemistry perhaps ‘doesn’t make as much sense’ – I have a feeling PMR & many others would disagree with this!

At 13:20 Sarah Callaghan: NERC sets its data embargo policy so that data can only be withheld for a maximum of 2 years after it was collected after which it must be made publicly available, somewhere, somehow – the ambiguity of which IMO needs to be worked on…

At 14:25 discussion of ‘levels of re-usability’ and definition. Access control as a means of encouraging data sharing (?)

17:30 Sarah Callaghan: “It’s important to have ‘first dibs’ on your own data” but not beyond this without peer-vetted justification/scrutiny IMO

18:30 David Shotton (@dshotton): noted that one shouldn’t expect absolutely every data point/item to be shared – not all data is useful/valuable. It’s about retaining & making available bits that might be of re-use value.

At 20:24 I start to introduce AMI2 & the OpenContentMining project.

37:50 I bring the Panton Principles on screen, I also had the OKFN Science Working Group page displayed (although not discussed) for a good ~10 minutes. Note to self: hijack the display computer at panel sessions more often…

from 40:17 onwards… Mark Hahnel: “In terms of re-use and getting people incentivized, are Data Papers the future?” Sarah Callaghan “NO. Until research achievement is predicated on something other than publishing in ‘high-impact’ journals then we’re stuffed: we’ve got to shoehorn data & code in order for them to ‘count’ [lamentably]” So for now we need data papers, but perhaps in the future we won’t need to constrain these outputs to a ‘paper’ style format.

from 43:00 Martin Fenner (@mfenner) plays Devil’s Advocate and suggests that data citation may not work and that perhaps #altmetrics might be better indicators of usage. Much debate ensues…

from 45:50 I give a plug to Iain H’s paper: ‘Open By Default’ Hrynaszkiewicz, I. and Cockerill, M. 2012. Open by default: a proposed copyright license and waiver agreement for open access research and data in peer-reviewed journals. BMC Research Notes 5:494+ then we discuss legal barriers to re-using data.

This post has taken a while to write and is fairly long now, so I’m going to split my recap of #solo12 into two or more parts now. In part 2 I’ll attempt to discuss some of other *excellent* sessions I saw, in particular the brilliant, well-received outburst on the absurd inefficiency of the publication process by professional typesetter Dr Kaveh Bazargan during the #solo12journals session. I’m surprised someone hasn’t done a whole blogpost about this already – it was my highlight of the conference tbh!
I’ll be posting part two on Monday 19th November (weekends are slow for blogs… I want people to read this!)

Until then…