Show me the data!
Header

Last week, on Monday 19th January, I co-organised the first ever Open Research London event at Imperial College London, with the help of local organisers; Jon Tennant & Torsten Reimer.

OpenResearch

We invited two speakers for our first meeting:

They both gave excellent talks which were recorded on Imperial’s ‘Panopto’ recording system. We hope to make these available for viewing/download as soon as possible. The recordings are now publicly available! CB’s talk is available to stream here & download here, JMcA’s talk is available to stream here & download here.

 

We had lots of free swag to give away to attendees, including PLOS t-shirts, notebooks, USB sticks and ‘How Open Is It?‘ guides, as well as SPARC and OA Button stickers & badges – they seemed to go down well. I kept some swag back for the next event too, so if you didn’t get what you wanted this time, there will be more next time!

The speakers were kind enough to publicly post their slide-decks before their talks so you can alternatively catch-up with their content on Slideshare.

Chris Banks’ slides are embedded below:

Joe McArthur’s slides are below here:

I’ll refrain from naming names for the sake of privacy but what I most enjoyed about the event was the diversity of attendees. We had people who were ‘curious’ about Open Access and wanted to know more. We had a new PhD student, we had midway PhD students, librarians, open access publishers, and more… I believe one attendee might even have travelled back to Brighton after the event! In terms of affiliations, we had attendees from Jisc, The Natural History Museum London, Imperial College (two different campuses represented!), UCL, The National Institute for Medical Research (MRC), and AllTrials.

I was also mightily impressed that nearly all the attendees, including both speakers happily joined us in the student union (Eastside) afterwards for discussions & networking over drinks – a real sense of community here I think.

Can we do better next time? Sure we can, we must! Attendance was lower than I had hoped for but several people kindly messaged me afterwards to let me know they wanted to be there but couldn’t. I’ve no doubt that with warmer weather we’ll be able to double our attendance.

 

The next ORL meetup will be in mid or late March at UCL, further details TBC. 

Keep up-to-date with ORL via Twitter @OpenResLDN or our OKFN community group page: http://science.okfn.org/london-open-research/

 

I’m actively in the process of trying to grow the organising/steering committee for ORL. At the moment it’s just myself, Liz I-S and Jon Tennant. If you’re passionate about open research, open access, open data, reproducible research, citizen science, diversity in research, open peer-review etc… then get in contact with me: ross.mounce@gmail.com

I would love to have an OC that more broadly represents the variety of the open research community in London :)

 

Until next time…

 

Ross

[Update: I’ve submitted this idea as a FORCE11 £1K Challenge research proposal 2015-01-13. I may be unemployed from April 2015 onwards (unsolicited job offers welcome!), so I certainly might find myself with plenty of time on my hands to properly get this done…!]

Inspired by something I heard Stephen Curry say recently, and with a little bit of help from Jo McIntyre I’ve started a project to compare EuropePMC author manuscripts with their publisher-made (mangled?) ‘version of record’ twins.

How different are author manuscripts from the publisher version of record? Or put it another way, what value do publishers add to each manuscript? With the aggregation & linkage provided by EuropePMC – an excellent service – we can rigorously test this.

 

In this blog post I’ll go through one paper I chose at random from EuropePMC:

Sinha, N., Manohar, S., and Husain, M. 2013. Impulsivity and apathy in parkinson’s disease. J Neuropsychol 7:255-283.  doi: 10.1111/jnp.12013 (publisher version) PMCID: PMC3836240 (EuropePMC version)

Method

A quick & dirty analysis with a simple tool that’s easy to use & available to everyone:

pdftotext -layout     (you’re welcome to suggest a better method by the way, I like hacking PDFs)

(P) = Publisher-version , (A) = Author-version

Manual Post-processing – remove the header and footer crud from each e.g. “262
Nihal Sinha et al.” (P) and “J Neuropsychol. Author manuscript; available in PMC 2013 November 21.” (A)

Automatic Post-processing – I’m not interested in numbers or punctuation or words of 3-letters or less so I applied this bash-one-liner:

strings $inputfile | tr ‘[A-Z]’ ‘[a-z]’ | sed ‘s/[[:punct:]]/ /g’ | sed ‘s/[[:digit:]]/ /g’ |  sed s/’ ‘/\\n/g | awk ‘length > 3’ | sort | uniq -c | sort -nr > $outputfile

Then I just manually diff’d the resulting word lists – there’s so little difference it’s easy for this particular pair.

 

Results

The correspondence line changed slightly from this in the author version:

Correspondence should be addressed to Nuffield Department of Clinical Neurosciences and Department Experimental Psychology, Oxford University, Oxford OX3 9DU, UK (masud.husain@ndcn.ox.ac.uk). . (A)

To this in the publisher version (I’ve added bold-face to highlight the changes):

Correspondence should be addressed to Masud Husain, Nuffield Department of Clinical Neurosciences and Department Experimental Psychology, Oxford University, Oxford OX3 9DU, UK (e-mail: masud.husain@ndcn.ox.ac.uk). (P)

 

Reference styling has been changed. Why I don’t know, seems a completely pointless change. Either style seems perfectly functional to me tbh:

Drijgers RL, Dujardin K, Reijnders JSAM, Defebvre L, Leentjens AFG. Validation of diagnostic criteria for apathy in Parkinson’s disease. Parkinsonism & Related Disorders. 2010; 16:656–660. doi:10.1016/j.parkreldis.2010.08.015. [PubMed: 20864380] (A)

to this in the publisher version:

Drijgers, R. L., Dujardin, K., Reijnders, J. S. A. M., Defebvre, L., & Leentjens, A. F. G. (2010). Validation of diagnostic criteria for apathy in Parkinson’s disease. Parkinsonism & Related Disorders, 16, 656–660. doi:10.1016/j.parkreldis.2010.08.015 (P)

In the publisher-version only (P) “Continued” has been added below some tables to acknowledge that they overflow on the next page. Arguably the publisher has made the tables worse as they’ve put them sideways (landscape) so they now overflow onto other pages. In the author-version (A) they are portrait-orientated and so hence each fit on one page entirely.

 

Finally, and most intriguingly, some of the figure-text comes out only in the publisher-version (P). In the author-version (A) the figure text is entirely image pixels, not copyable text. Yet the publisher version has introduced some clearly imperfect figure text. Look closely and you’ll see in some places e.g. “Dyskinetic state” of figure 2 c) in (P), the ‘ti’ has been ligatured and is copied out as a theta symbol:

DyskineƟc state

 

Discussion

 

I don’t know about you, but for this particular article, it doesn’t seem like the publisher has really done all that much aside from add their own header & footer material, some copyright stamps & their journal logo – oh, and ‘organizing peer-review’. How much do we pay academic publishers for these services? Billions? Is it worth it?

I plan to sample at least 100 ‘twinned’ manuscript-copies and see what the average difference is between author-manuscripts and publisher-versions. If the above is typical of most then this will be really bad news for the legacy academic journal publishers… Watch this space!

 

Thoughts or comments as to how to improve the method, or relevant papers to read on this subject are welcome. Collaboration welcome too – this is an activity that scales well between collaborators.

So, apparently Elsevier are launching a new open access mega-journal some time this year, joining the bandwagon of similar efforts from almost every other major publisher. A lovely acknowledgement of the roaring success of PLOS ONE, who did it first a long time ago.

They’re only ~8 years behind, but they’re learning. I for one am pleased they are asking the research community what they want from this new journal. One of their “key points” in the press release is: “the journal will be developed in close collaboration with the research community and will evolve in response to feedback”

Well, I’m a member of the research community. I’m a BBSRC-funded postdoc at the University of Bath. I publish research myself AND I re-use published research, so I have a dual perspective that Elsevier should find useful. Here’s my feedback on their new open access journal proposal:

 

  • Does the research community really need or want a new journal?

We have at least 27,000 other peer-reviewed journals (source: Ulrich’s). I can’t see anything in Elsevier’s proposal that’s really new, or better than anything that already exists – you’ll be hard pressed to beat PeerJ. More journals add to the fragmentation of the research literature – it’s already hard to search across all these journals effectively. Why not just accept more volume in existing journals? It’d be great if you flipped The Lancet, Cell, and Trends in Ecology and Evolution to full (100%) open access journals, and rejected less submitted papers that present sound science. I genuinely do not know of any researcher that asked specifically for an additional new Elsevier journal.

 

The definition of open access always has been, and always will be this:

By “open access” to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited. (BOAI)

If you’re going to allow the CC-BY-NC-ND licence then by definition you can’t call it an open access journal. Either don’t allow that restrictive non-open licence, or call this new journal a ‘free-to-read’ journal or a ‘public access’ journal. These are the established terms for cost-free but not open journal content that the research community uses. Speak our language for a change instead of deliberately opaque legalese.

 

  • Take feedback on the design of your new journal from the WORLD not just the research community

Approximately 80% of the world’s academic research is taxpayer or charitably funded. The world is therefore your customer, not just researchers. Ask the world what they want from your new journal.

 

Take inspiration from the Panton Principles: “Science is based on building on, reusing and openly criticising the published body of scientific knowledge” – help researchers do the best science possible by not allowing them any excuses to not share non-sensitive data with their colleagues. The ’email the author’ system has been widely proven not to work, in my own experience too.

 

  • Make peer reviews open for all to see, post-publication alongside the paper

At the time of review, you can do single or double blind, but after the manuscript is accepted and published, please publish the reviews alongside the accepted paper. The research community can then see for themselves how good peer review is at your new journal. Allow people to sign their reviews if they wish to (and personally I think this is best in most circumstances).

 

  • Encourage data citation

Do I really need to explain this one? Old school academic editors have apparently been striking these out at some journals. Please make all editors aware that this is both a good thing and is encouraged.

 

  • Encourage authors to provide their ORCIDs upon submission, (and ORCIDs for reviewers and editors too please)

This will help people disambiguate who’s who’s which is important when there are at least 7 million active researchers.

 

  • Charge a reasonable APC ($1350 or less), and be generous with fee waivers and discounts for those that cannot afford them

Anything more than $1350 per article for a new journal in 2015 is daylight robbery. For the first year of publication you should waive charges for everyone, as everyone else does.

 

  • Provide open, full text XML

Great for text-mining. We don’t need your API. Just give us the content.

 

There you go Elsevier – that’s my feedback. If you can do ALL of the above or better, I might even publish with you myself. I have stated what I think you should do; it’s up to you now to implement it. I anticipate the launch of your glorious new journal. When your new journal comes out I shall revisit this post & score your new journal against it.

 

I encourage all other researchers & the scholarly poor who feel similarly, to also make their feelings known to Elsevier, and to add points I have perhaps overlooked. I’d say good luck Elsevier, but you don’t need luck with your fat profit margins – it’s simple to openly publish a good peer-reviewed research journal – just get on and do it already.

 

Sincerely,

 

Ross Mounce, PhD

 

 

 

 

I’ve just given an email interview for Abby Clobridge, for a forthcoming short column in Online Searcher.

I give many of these interviews and often very little material from it gets used, so I asked Abby if it was okay if I reposted what I wrote. Her response: “go for it” – thanks Abby! So here’s my thoughts on Generation Open, for a readership of librarians and information professionals:

 
1) Why are Open issues particularly important for early career researchers? 

Science is digital and online. Virtually no-one hand-writes a manuscript with pen & paper. Our digital research objects e.g. papers, data, software, if open as per opendefinition.org can be freely copied and shared to all, for the benefit of everyone. Yet legacy business models from the past are putting awkward constraints, restrictions and obstructions on the publishing and re-use of our research objects. This is deeply wrong. For reasons of efficiency, economic benefit & morality our research should be open, particularly if it’s publicly or charitably funded. Non-open research creates horrid inefficiencies and inequalities that effect us all. Early career researchers are the future of research; we are the ones who can put things right and do research as it should be done – maximising the utility of the internet for low-cost, open dissemination, evaluation and discussion of research. If the early career community don’t act now to help change things, change simply won’t happen.

2) What kind of changes would you like to see within universities/colleges in regard to Open Access, Open Education, or Open Data? 

All lecture material material should be openly-licensed and available online. It’s mad to think that lecturers all over the world are creating new slides every year with essentially the same content. Deeply inefficient. Share teaching materials. Re-use & adapt good content you find. Save time & enrich the quality of your teaching.
Teaching in many ways stems from research. There would be a lot more open content available for worry-free re-use & adaption if research papers, particularly research figures were openly-available. I honestly don’t think research academics are all that aware of the licencing costs involved for re-using non-open research to which a traditional publisher has taken the copyright of. Peter Murray-Rust has a great example  of a Nature paper, that if you want to print 10 copies of it for teaching purposes, it costs $1610 USD, not including the paper & ink, just the licence to reproduce!

 

It’s ridiculously obstructive and a waste of good research. No one will use that paper for teaching because of the prohibitive licencing costs. By contrast, open access papers published under the Creative Commons Attribution Licence (CC BY) can be emailed, put on Moodle, printed for no additional cost, nor does one need to ask permission before re-use. Open removes barriers and makes life easier for everyone.

 

With respect to data & software, institutions need to train-up their staff & students more in terms of research data management, reproducible research, git & version control. It’s mildly embarrassing that external (but brilliant) organisations like Software Carpentry & Data Carpentry are taking up the slack and giving everyone the training that they need. All Software Carpentry sessions in the UK have been packed as far as know because that kind & quality of training simply isn’t being adequately provided at many institutions.

 

3) What can librarians do to support ECRs in regards to being open? 

 

Go out into departments and speak to people. Give energetic presentations in collaboration with an enthusiastic researcher in that department (sometimes a librarian alone just won’t get listened to). Academics sorely need to know:
  • * the cost of academic journal subscriptions
  • * that using journal impact factors to assess an individual’s research is statistically illiterate practice
  • * the cost of re-using non-open research papers for teaching purposes (licencing)
  • * What Creative Commons licences are, and why CC BY or CC0 are best for open access
  • * new research tools that support open research: Zenodo, Dryad, Github, Sparrho, WriteLatex etc…

 

4) What action(s) have you personally taken to support or promote openness?

 

How long a list do you want?

 

5) Anything else I’m not asking that you think is important… 

 

What do I think of NPG’s recent #SciShare announcement. Will it help people gain access to research?

 

No. I think it’s just another form of #BeggarAccess. The actual terms & conditions of the scheme are extremely limiting and do not resemble the initial hype around the scheme when it was first announced. The Open Access Button and #icanhazpdf remain as the most optimal solutions for access to proper copies of NPG articles.

 

What do I think of the attitude and prevalence of academic copyright infringement amongst early career researchers?

 

Everyone is knowingly or unknowingly committing copyright infringement at the moment. If we didn’t, research would be incredibly painfully slow and inefficient. Ignoring silly laws is what my generation do. For context; the Napster generation was 1999-2001 – that was a long, long time ago. We know how to share files online. We know how to use torrents. I really don’t know why libraries don’t cut more subscription journals – the academic community is very good at routing around damage caused by paywalls. Have faith in our ability to find access, even if the institutional library can’t provide it. Cut subscriptions, let them go, we don’t need or want the restrictions they offer.