Show me the data!
Header

I’ve been invited to come in and have an informal chat about open access with the Linnean Society on March 24th this month. Particularly with regard to what is and what is not ‘open access’ in terms of Creative Commons licences. I write this blog post to spur on other advocates to try and encourage their society journals to use proper, open access compliant article licencing that facilitates rather than prevents text & data mining.

I have Tom Simpson at LinnSoc to thank for reaching out to make this happen. Thanks Tom!

It started from some tweets I sent a few days ago about an interesting new Zoo J Linn paper by Martin Brazeau & Matt Friedman. I’d include a pretty figure from this paper if I was allowed to, but unfortunately because it’s licensed with the Creative Commons Attribution-NonCommercial-NoDerivs License (CC BY-NC-ND) I can’t. To repost just a figure from the paper would be to create a smaller derivative work which the licence does not allow – I am only allowed to repost the *whole* article with absolutely no changes which is rather impractical for a 43 page article! Wiley in particular have a history of threatening scientist bloggers for reproducing a single figure from an article (read the Shelley Batts story here).

restricted access

It’s not just bloggers, and the outreach possibilities for the paper that are harmed with the use of such restrictive licenses – it also causes problems for RCUK funded researchers. Matt Friedman is based at Oxford at the moment – if the funding for this work came from any of the UK research councils, then the choice of the CC BY-NC-ND license could cause him problems – it is NOT compliant with the RCUK’s policy on open access. Wiley should know better than to offer this license to UK-based authors, but they have a significant conflict of interest in ensuring researchers choose more restrictive licencing options so that they can continue to be the sole proprietor of glossy reprint copies (ensured by the -NC clause). Both the -NC & the -ND clauses incidentally prevent the figures from being re-used on Wikipedia, another sad restriction for the authors who must have put a lot of effort into them.

In the realm of academic science, the application of that particular license to the paper-as-a-whole-work just doesn’t make sense. Many digital research projects need to be able excerpt, transform and translate research outputs such as academic papers, and in some cases create commercial value from this. My current BBSRC-funded research project ‘PLUTo: Phyloinformatic Literature Unlocking Tools. Software for making published phyloinformatic data discoverable, open, and reusable‘ relies on being allowed to transform, excerpt and republish extracted content from scientific papers. With Peter Murray-Rust we’re using text & image mining tools to generate open, re-usable phylogenetic data directly from the published literature, often directly from PDFs.  The Linnean Society have several good quality, well-respected journals which publish phylogenetic content, so they’re very much in the scope of our PLUTo work.

But clauses such as -ND stop us from using this material. It’s clear in the license terms and conditions – we are not allowed to make any derivative works from the original. So any papers using CC BY-NC-ND we will have to avoid. We cannot use them, and therefore they will not be cited by our project which is rather a shame for their authors.

Above all the CC BY-NC-ND license simply isn’t compliant with the very definition of open access as laid down over a decade ago at the Berlin, Budapest, Bethesda meetings. Wiley are knowingly mis-labelling articles using non-compliant licences as ‘open access’ even though they are by definition NOT open access. I hope the Linnean Society can spur Wiley to do something about this as it is not good for the journal, or its authors. Other journals using non-compliant licencing use terms like ‘public access‘ or ‘free access‘ or ‘sponsored access‘. Why can’t Wiley follow this lead? Open access is more than just free access – it enables re-use which is critical for research projects like mine. Please stop the ‘openwashing‘.

 

Further Reading:

Hagedorn, G., Mietchen, D., Morris, R., Agosti, D., Penev, L., Berendsohn, W., and Hobern, D. 2011. Creative commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information. ZooKeys 150:127-149.

Mounce, R. 2012. Life as a palaeontologist: Academia, the internet and creative commons. Palaeontology Online 2:1-10.

Klimpel, P. Consequences, Risks, and side-effects of the license module Non-Commercial – NC [PDF] 1-22.

 


Today I received proof that Elsevier are also sending takedown notices to UK universities – asking them to takedown copies of their staff’s academic research papers, hosted on university webpages. The full text is further down this post (in red). It is not just Academia.edu, it is not just the University of Calgary, University of California-Irvine, or Harvard University. Elsevier very probably are sending takedown notices to institutions and websites across the globe.
No-one is safe from these legal threats.

Not only that, but they seem to be encouraging universities to be pro-active and takedown more than just the specific articles identified in the DMCA notice they send! They are encouraging universities to limit access to their research works. This is simply disgraceful (even though I acknowledge they are technically, legally within their rights to do this because of the way in which their copyright transfer agreements are written, which incidentally many academics are effectively forced to sign in order to get published and make progress in their careers).

For background information read:

How one publisher is stopping academics from sharing their research. The Washington Post 19/12/2013

Elsevier steps up its War On Access SVPOW 17/12/2013

300px-Elsevier_poster_with_text

Librarians and university web admins: please publicly come out with more examples like this. Researchers, readers and taxpayers desperately need to know about this. Silence and subterfuge benefits no-one, these chilling effects must be publicly revealed.

This is the email I received with certain parts redacted:

*** Sent via Email – Inappropriate postings of Elsevier’s journal articles / DMCA Notice of Copyright Infringement ***

Dear Sir/Madam,

I write on behalf of Elsevier to bring to your attention the inappropriate posting of final published journal articles to your institutional website. I am President at Attributor (A Digimarc Company), which assists some of the world’s most prominent publishers, including Elsevier, with digital content protection (www.digimarc.com/guardian). Following the discussion below, a formal DMCA takedown request is included as Appendix A.

As you probably know, Elsevier journal article authors retain or are permitted a wide scope of scholarly use and posting on their own sites and for use within their own institutions. Those rights are more expansive when it comes to author preprints or accepted manuscripts than with respect to the final versions of published journal articles. Elsevier recognizes that in some cases authors or their institutions may not be fully aware of these rights and can by mistake post the final version of their articles to institutional websites or repositories. Unfortunately, it has come to our attention that copies of final published journal articles have, perhaps inadvertently, been posted for public access to one of your institutional websites.

I therefore request your cooperation to remove or disable access to these articles on your site, including but not limited to the articles identified in Appendix A. We have identified merely a sample in Appendix A, and as a publisher of close to 2,000 journals this might mean that more articles published by Elsevier could be found on your site. Please may I therefore draw your attention to Elsevier’s posting policy and ask for your attention to ensuring that your posting practices comply with this?
http://www.elsevier.com/about/open-access/open-access-policies/article-posting-policy#published-journal-article

In particular I note that Elsevier currently doesn’t permit posting of the final published journal article, and if there is a mandate or systematic posting mechanisms in place then Elsevier asks for a cost-free agreement with the institution before accepted author manuscripts are posted.
I would also recommend considering the use of DOI links as a way to access to the version of records of a published article. This would allow authors to list their work and to provide easy access to peers.

Finally, should you need any help in properly identifying a final published article to prevent any future improper posting, please do get in touch via the email address below.

I appreciate your anticipated cooperation and if you have any questions or feedback, or if you believe you have received this message in error (as you have received permission to post this article from Elsevier), please contact: UniversalAccess@Elsevier.com
Thank you.

Sincerely,
Eraj Siddiqui
Attributor (A Digimarc Company)

Appendix A

Copyright Infringement Notice

This notice is sent pursuant to the Digital Millennium Copyright Act (DMCA), the European Union’s Directive on the Harmonisation of Certain Aspects of Copyright and Related Rights in the Information Society (2001/29/EC), and/or other laws and regulations relevant in European Union member states or other jurisdictions.

Please remove or disable access to the infringing pages or materials identified below, as they infringe the copyright works identified below.

I certify under penalty of perjury, that I am an agent authorized to act on behalf of the owner of the intellectual property rights and that the information contained in this notice is accurate.

I have a good faith belief that use of the material listed below in the manner complained of is not authorized by the copyright owner, its agent, or the law.

My contact information is as follows:

Organization name: Attributor Corporation as agent for [Publisher Company]
Email: counter-notice@attributor.com
Phone: 650-340-9601
Mailing address:
400 South El Camino Real
Suite 650,
San Mateo, CA 94402

My electronic signature follows:
Sincerely,
/E Siddiqui/
E. Siddiqui
Attributor, Inc.

***List of Works and Location of Infringing Page or Material ***

Infringing page/material that I demand be disabled or removed in consideration of the above:

*** INFRINGING PAGE OR MATERIAL ***

Infringing page/material that I demand be disabled or removed in consideration of the above:

Rights Holder: Reed Elsevier

Original Work: [redacted]
Infringing URL: [redacted]

UPDATE:

Dutch Universities too are receiving DMCA’s from Elsevier:

2013-12-20-113623_939x846_scrot

@Wowter via Twitter

I’d just like to point out to anyone who asks, particularly CRC Press (part of Taylor&Francis Group, who are in turn part of Informa PLC) that by posting the full text of my book chapter to Academia.edu I am *not* breaching the copyright transfer agreement I signed.

Upon receiving a copyright transfer agreement as a PDF from them via email – I edited the PDF to reword the agreement to terms that were more agreeable to me (e.g. I did NOT want to transfer my copyright to them for my work).

The bit of wording I changed is as follows:

As such, copyrights in the Work will not inure to the benefit of the Publisher, the Publisher will not own the publication, its title and component parts, and all publication rights. This does not permit the Publisher, in its name, to copyright in the Contribution, make applications to register its copyright claim, and to renew its copyright certificate.

I signed this reworded form as PDF (displayed below, signature removed) and returned it to them. I have now kindly received a free ‘author copy’ of the printed book and my chapter has clearly been included so it’s too late for CRC press to exclude my chapter. I can only assume they agreed to the reworded terms of the contract I signed and sent them.

I doubt CRC press would even be bothered by my actions to be honest. They are allowing another of their books to be completely posted online for free, so in comparison to that, my action here is puny – but it certainly emboldens me for the next time I may have to sign a CTA form…

CRC Press are welcome to non-exclusively publish my book chapter. Thank you CRC Press for agreeing to my terms and conditions.

Contract

Lessons one might learn from this exercise:

DO NOT GIVE AWAY THE COPYRIGHT TO YOUR WORK!
PUBLISHERS DO NOT ‘NEED’ ALL YOUR COPYRIGHT TRANSFERRED TO THEM TO PUBLISH.
ALL THAT IS NEEDED IS FOR YOU TO GRANT THEM A NON-EXCLUSIVE LICENSE TO PUBLISH.

A word of warning though… I wouldn’t recommend relying on this method of editing CTA’s to get what you want. I was just lucky this time. Choosing an open access publication venue from the start is always the best option (if possible).

See also:

Mike Taylor 2010. Who Owns My Sauropod History Paper?
http://svpow.com/2010/10/13/who-owns-my-sauropod-history-paper/

Acknowledgements

October 21st, 2013 | Posted by rmounce in Open Access - (3 Comments)

I handed in my thesis not long ago, on Thursday 3rd October 2013. No idea when my viva is yet. I can’t blog many of the chapters because I haven’t convinced my manuscript co-authors of the value of preprints, yet. I’m also a bit unsure as to how some of the other chapters will be received and thus I’ll wait until after the viva before I decide what to do next with it.


Given it’s open access week this week, there is one bit of my thesis I should definitely share: the acknowledgements!

I can’t possibly thank everyone enough for the help I’ve received over the past 4 years – my knowledge, skills, and connections have been vastly extended. Note in particular the bit I’ve highlighted in bold just for this blog post – I want everyone to know how absolutely reliant I’ve been on ‘alternate’ forms of literature access during my research – this is the new ‘normal’ for many early career researchers I fear, until open access is more prevalent we’ll have to continue to hunt, scavenge, beg, steal, and borrow for every PDF. My generation of researchers grew-up using Napster, Isohunt, Library.nu. Copyright infringement is an everyday activity for many of us – WE DONT CARE. Have you been to a conference? How many of the pictures on the speakers slides weren’t technically infringing someone else’s copyright? WE DONT CARE. One can shut down or block specific portals, but doing so doesn’t really solve the basic problem: from what I’ve seen, time and time again, copyright’s only role in science is to obstruct it. My biggest hope for Open Access Week 2013 is that someone will torrent Elsevier’s back catalogue – journal/publisher torrents have been done before and will be done again!  It probably won’t happen, but I can dream…

Acknowledgements

I would like to thank my supervisor, Matthew Wills for putting up with me all this time. I
have been lucky to have such accommodating and understanding support. I also must
thank my lab mates Martin Hughes, Anne O’Connor, Sylvain Gerber, Katie Davis, Rob
Sansom and everyone else in the Biodiversity Lab at the University of Bath – we had some
great times and some brilliant times together. Sincere thanks also to the University of Oslo
Bioportal computing cluster for providing me free cloud computation for my work.
Many people have helped spur my imagination along the way with ideas for different
chapters of this thesis. For this I would like to thank Ward Wheeler, Pablo Goloboff, Mark
Siddall, Dan Janies, Steve Farris and the generous financial support of the Willi Hennig
Society. I want to thank all those in the palaeontology community who have shared their
published data with me, particularly Graeme Lloyd for his stirling work in making
dinosaurian data available – I hope I have done something interesting with the data I have
used and opened eyes to new possibilities. I also want to thank all those in the open
science community – Peter Murray-Rust, Todd Vision, Heather Piwowar, Mark Hahnel,
Martin Fenner, Geoffrey Boulton, Jenny Molloy and so many more I’ve had the pleasure of
meeting in person. The energy and enthusiasm I drew from countless online discussions
on Facebook, Google+ and Twitter was truly inspirational.
For facilitating greater access to scientific literature I must heartily thank the Natural
History Museum, London library and archives, the #icanhazpdf community on Twitter,
Wikipaleo on Facebook, References Wanted on FriendFeed, Library.nu, and SciHub.
Without these additional literature access facilitators I would not have been able to read
half the sources I cite in this thesis.
I must thank my wife Avril for her patience with me especially during the write-up phase,
for allowing me to go away to all these amazing conferences abroad, and for tolerating all
those long nights into mid-morning when I was tapping away on my noisy keyboard.
Finally, I thank my family: Richard, Rosemary & Tara for repeatedly encouraging me to
finish my thesis – I got there in the end!

Setting-up AMI2 on Windows

October 6th, 2013 | Posted by rmounce in Content Mining - (1 Comments)

I’ve been rather preoccupied in the last few months hence the lack of blog posts. (Apologies!)

Here’s a quick recap of some things I’ve done since July:

  • Got married in China (in September)
  • Successfully proposed that the Systematics Association (of which I’m a council member) should sign DORA
  • Gave an invited talk on open science at an INNGE workshop at INTECOL 2013
  • Completed and handed-in my PhD thesis last Thursday!

So yeah, I really didn’t have time blog until now.

But now my PhD thesis is handed-in I can concentrate on the next step… Matthew Wills, myself, and Peter Murray-Rust have an approved BBSRC grant to work on further developing AMI2 to extract phylogenetic trees from the literature (born-digital PDFs).

At the moment it is in alpha stage so it doesn’t extract trees perfectly – it needs work. But in case you might want to try it out I thought I’d use this post to explain how to get a test development of it running on Windows (I don’t usually use Windows myself, I much prefer linux). These notes are thus as much an open notebook science ‘aide memoire’ for myself as they are instructions for others!

Dependencies and IDE:

1.) You’ll need Java JDK, Eclipse, Mercurial and Maven for starters.

If you haven’t got this setup already you may need to set your environment variables e.g. JAVA_HOME

2.) Within Eclipse you need to install the m2e (maven integration) plugin

(from within the Eclipse GUI) click ‘Help’ -> Install New Software -> All available sites (from the dropdown) -> select m2e

 

3.) Using mercurial, clone the AMI2 suite to a clean workspace folder. The suite includes:

 

[euclid-dev itself has many dependencies which are indicated in its POM file which you shouldn't need to worry about - they should be pulled-in automatically. These include:  commons-io, log4j, xom, joda and junit.]

4.) From within the Eclipse GUI import your workspace of AMI2 tools:

click ‘File’ -> Import -> Maven -> select ‘Existing Maven Projects’ -> Next -> select your workspace

 

5.) Test if it works. In the package explorer side-pane window you should now see folders corresponding to the six AMI2 tools listed above.

Right-click on svg2xml-dev -> select ‘Run-as’ -> JUnit Test

and sit back and watch the test run in the console at the bottom of the Eclipse GUI.

(The tests are a little slow, have patience, it may take a few minutes – it took me 175 seconds)

To view the results, in the package explorer pane, navigate inside the svg2xml-dev document tree into /target/output/multiple-1471-2148-11-312 and click ont TEXT.0 to see what the text-extraction looks like. You should see something like this below (note it successfully gets italics, bold, and superscripts)

—————————————————————————————————-

Gene conversion and purifying selection shape nucleotide variation in gibbon L/M opsin genes

Tomohide Hiwatashi 1 , Akichika Mikami 2,8 , Takafumi Katsumura 1 , Bambang Suryobroto 3 , Dyah Perwitasari-Farajallah 3,4 , Suchinda Malaivijitnond 5 , Boripat Siriaroonrat 6 , Hiroki Oota 1,9 , Shunji Goto 7,10 and Shoji Kawamura 1*

 

Abstract Background: Routine trichromatic color vision is a characteristic feature of catarrhines (humans, apes and Old World monkeys). This is enabled by L and M opsin genes arrayed on the X chromosome and an autosomal S opsin gene. In non-human catarrhines, genetic variation affecting the color vision phenotype is reported to be absent or rare in both L and M opsin genes, despite the suggestion that gene conversion has homogenized the two genes. However, nucleotide variation of both introns and exons among catarrhines has only been examined in detail for the L opsin gene of humans and chimpanzees. In the present study, we examined the nucleotide variation of gibbon (Catarrhini, Hylobatidae) L and M opsin genes. Specifically, we focused on the 3.6~3.9-kb region that encompasses the centrally located exon 3 through exon 5, which encode the amino acid sites functional for the spectral tuning of the genes.

Results: Among 152 individuals representing three genera ( Hylobates ,  Nomascus and  Symphalangus ), all had both L and M opsin genes and no L/M hybrid genes. Among 94 individuals subjected to the detailed DNA sequencing, the nucleotide divergence between L and M opsin genes in the exons was significantly higher than the divergence in introns in each species. The ratio of the inter-LM divergence to the intra-L/M polymorphism was significantly lower in the introns than that in synonymous sites. When we reconstructed the phylogenetic tree using the exon sequences, the L/M gene duplication was placed in the common ancestor of catarrhines, whereas when intron sequences were used, the gene duplications appeared multiple times in different species. Using the GENECONV program, we also detected that tracts of gene conversions between L and M opsin genes occurred mostly within the intron regions.

Conclusions: These results indicate the historical accumulation of gene conversions between L and M opsin genes in the introns in gibbons. Our study provides further support for the homogenizing role of gene conversion between the L and M opsin genes and for the purifying selection against such homogenization in the central exons to maintain the spectral difference between L and M opsins in non-human catarrhines.

 

Background In catarrhine primates (humans, apes and Old World monkeys) the L and M opsin genes are closely juxta-posed on the X chromosome and, in combination with the autosomal S opsin gene, enable routinely trichro-matic color vision [1,2]. The L and M opsin genes have a close evolutionary relationship and are highly similar in nucleotide sequence (~96% identity). Among 15

* Correspondence: kawamura@k.u-tokyo.ac.jp 1 Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-8562, Japan Full list of author information is available at the end of the article

amino acid differences between the human L and M opsin genes, three account for the main shifts in spectral sensitivities and tuning [3-9]. The organization of the L and M opsin genes among humans is known to be variable and includes the absence of an L or M opsin gene or the presence of L/M hybrid genes with an intermediate spectral sensitivity. A high incidence (approximately 3-8%) of color vision   deficien-cies in males results as a consequence [10].
Hiwatashi et al . BMC Evolutionary Biology 2011, 11 :312 http://www.biomedcentral.com/1471-2148/11/312
—————————————————————————————————-

If you’d like to try your own PDFs with it you’ll need to do two things:

A.) place the PDF(s) to be tested within the folder:    svg2xml-dev/src/test/resources/pdfs

B.) edit the file:    svg2xml-dev/src/test/java/org/xmlcml/svg2xml/pdf/PDFAnalyzerTest.java

so that

new PDFAnalyzer().analyzePDFFile(new File(” …

points at your file(s).

 

You can then right-click ‘multipletest’ from within PDFAnalyzerTest.java and select Run As -> JUnit Test

 

We’re working with BMC journal content for the moment, and when we perfect it on this, we will expand our scope to include subscription access content too.

 

 

Hack4ac recap

July 9th, 2013 | Posted by rmounce in BMC | eLife | Hack days | Open Access | Open Data | Open Science | PeerJ | PLoS - (4 Comments)

Last Saturday I went to Hack4Ac – a hackday in London bringing together many sections of the academic community in pursuit of two goals:

  • To demonstrate the value of the CC-BY licence within academia. We are interested in supporting innovations around and on top of the literature.
  • To reach out to academics who are keen to learn or improve their programming skills to better their research. We’re especially interested in academics who have never coded before

DSCF3425

The list of attendees was stellar, cross-disciplinary (inc. Humanities) and international. The venue (Skills Matter) & organisation were also suitably first-class – lots of power leads, spare computer mice, projectors, whiteboards, good wi-fi, separate workspaces for the different self-assembled hack teams, tea, coffee & snacks all throughout the day to keep us going, prizes & promo swag for all participants…

The principal organizers; Jason Hoyt (PeerJ, formerly at Mendeley) & Ian Mulvany (Head of Tech at eLife) thus deserve a BIG thank you for making all this happen. I hear this may also be turned into a fairly regular set of meetups too, which will be great for keeping up the momentum of innovation going on right now in academic publishing.

The hack projects themselves…

The overall winner of the day was ScienceGist as voted for by the attendees. All the projects were great in their own way considering we only had from ~10am to 5pm to get them in a presentable state.

ScienceGist

 

This project was initiated by Jure Triglav, building upon his previous experience with Tiris. This new project aims to provide an open platform for post-publication summaries (‘gists’) of research papers, providing shorter, more easily understandable summaries of the content of each paper.

I also led a project under the catchy-title of Figures → Data where-by we tried to provide added-value by taking CC-BY bar charts and histograms from the literature and attempting to re-extract the numerical data from those plots with automated efforts using computer vision techniques. On my team for the day I had Peter Murray-Rust, Vincent Adam (of HackYourPhD) and Thomas Branch (Imperial College). This was handy because I know next to nothing about computer vision – I’m Your Typical Biologist ™  in that I know how to script in R, perl, bash and various other things, just enough to get by but not nearly enough to attempt something ambitious like this on my own!

Forgive me the self-indulgence if I talk about this  Figures → Data project more than I do the others but I thought it would be illuminative to discuss the whole process in detail…

In order to share links between our computers in real-time, and to share initial ideas and approaches, Vincent set-up an etherpad here to record our notes. You can see the development of our collaborative note-taking using the timeslider function below (I did a screen record of it for prosperity using recordmydesktop):

In this etherpad we document that there are a variety of ways in which to discover bar charts & histograms:

  • figuresearch is one such web-app that searches the PMC OA subset for figure captions & figure images. With this you can find over 7,000 figure captions containing the word ‘histogram’ (you would assume that the corresponding figure would contain at least one histogram for 99% of those figures, although there are exceptions).
  • figshare has nearly 10,000 hits for histogram figures, whilst BMC & PLOS can also be commended for providing the ability to search their literature stack by just figure captions, making the task of figure discovery far more efficient and targeted.

Jason Hoyt was in the room with us for quite a bit of the hack and clearly noted the search features we were looking for – just yesterday he tweeted: “PeerJ now supports figure search & all images free to use CC-BY (inspired by @rmounce at #hack4ac)” [link] – I’m really glad to see our hack goals helped Jason to improve content search for PeerJ to better enable the needs (albeit somewhat niche in this case) of real researchers. It’s this kind of unique confluence of typesetters, publishers, researchers, policymakers and hackers at doing-events like this that can generate real change in academic publishing.

The downside of our project was that we discovered someone’s done much of this before. ReVision: Automated Classification, Analysis and Redesign of Chart Images  [PDF] was an award-winning paper at an ACM conference in 2011. Much of this project would have helped our idea, particularly the classification of figures tech. Yet sadly, as with so much of ‘closed’ science we couldn’t find any open source code associated with this project. There were comments that this type of non-code sharing behaviour, blocking re-use and progress, are fairly typical in computer science & ACM conferences (I wouldn’t know but it was muttered…).  If anyone does know of the existence of related open source code available for this project do let me know!

So… we had to start from a fairly low-level ourselves: Vincent & Thomas tried MATLAB & C based approaches with OpenCV and their code is all up on our project github. Peter tried using AMI2 toolset, particularly the Canny algorithm, whilst I built up an annotated corpus of 40 CC-BY bar charts & histograms for testing purposes. Results of all three approaches can be seen below in their attempts to simplify this hilarious figure about dolphin cognition from a PLOS paper:

The plastic fish just wasn't as captivating...

“Figure 5. Total time spent looking at different targets.” from Siniscalchi M, Dimatteo S, Pepe AM, Sasso R, Quaranta A (2012) Visual Lateralization in Wild Striped Dolphins (Stenella coeruleoalba) in Response to Stimuli with Different Degrees of Familiarity. PLoS ONE 7(1): e30001. doi:10.1371/journal.pone.0030001 CC-BY

Peter’s results (using AMI2):

 

Thomas’s results (OpenCV & C):

 

Vincent’s results (OpenCV & MATLAB & bilateral filtering)

We might not have won 1st prize but I think our efforts are pretty cool, and we got some laughs from our slides presenting our days’ work at the end (e.g. see below). Importantly, *everything* we did that day is openly-available on github to re-use, re-work and improve upon (I’ll ping Thomas & Vincent soon to make sure their code contributions are openly licensed). Proper full-stack open science basically!

some figures are just awful

 

Other hack projects…

As I thought would happen I’ve waffled on about our project. If you’d like to know more about the other projects hopefully someone else will blog about them at greater length (sorry!) I’ve got my thesis to write y’know! ;)

You can find more about them all either on the Twitter channel #hack4ac or alternatively on the hack4ac github page. I’ll write a little bit more below, but it’ll be concise, I warn you!

  • Textmining PLOS Author Contributions

This project has a lovely website for itself: http://hack4ac.com/plos-author-contributions/ and so needs no more explanation.

  • Getting more openly-licensed content on wikipedia

This group had problems with the YouTube API I think. Ask @EvoMRI (Daniel Mietchen) if you’re interested…

  • articlEnhancer

Not content with helping out the PLOS author contribs project Walther Georg also unveiled his own article enhancer project which has a nice webpage about it here: http://waltherg.github.io/articlEnhancer/

  • Qual or Quant methods?

Dan Stowell & co used NLP techniques on full-text accessible CC-BY research papers, to classify all of them in an automated way determining whether they were qualitative or quantitative papers (or a mixture of the two). The last tweeted report of it sounded rather promising: “Upstairs at #hack4ac we’re hacking a system to classify research papers as qual or quant. first results: 96 right of 97. #woo #NLPwhysure” More generally, I believe their idea was to enable a “search by methods” capability, which I think would be highly sought-after if they could do it. Best of luck!
Apologies if I missed any projects. Feel free to post epic-long comments about them below ;)

 

 

 

This post was originally posted over at the LSE Impact blog where I was kindly invited to write on this theme by the Managing Editor. It’s a widely read platform and I hope it inspires some academics to upload more of their work for everyone to read and use

Recently I tried to explain on twitter in a few tweets how everyone can take easy steps towards open scholarship with their own work. It’s really not that hard and potentially very beneficial for your own career progress – open practices enable people to read & re-use your work, rather than let it gather dust unread and undiscovered in a limited access venue as is traditional. For clarity I’ve rewritten the ethos of those tweets below:

Step 1: before submitting to a journal or peer-review service upload your manuscript to a public preprint server

Step 2: after your research is accepted for publication, deposit all the outputs – full-text, data & code in subject or institutional repositories

The above is the concise form of it, but as with everything in life there is devil in the detail, and much to explain, so I will elaborate upon these steps in this post.

Step 1: Preprints

Uploading a preprint before submission is technically very easy to do – it takes just a few clicks, but the barrier that prevents many from doing this in practice is cultural and psychological. In disciplines like physics it’s completely normal to upload preprints to arXiv.org and their submission to a journal in some cases has more to do with satisfying the requirements of the Research Excellence Framework exercise than any real desire to see it in a journal. Many preprints on arXiv get cited and are valued scientific contributions, even without them ever being published in a journal. That said, even within this community author perceptions differ as to the exact practice of when to upload a preprint in the publication cycle.

Within biology it’s relatively unheard of to upload a preprint before submission but that’s likely to change this year because of an excellent well-put article advocating their use in biology and the very many different outlets available for them. My own experience of this has been illuminating – I recently co-authored a paper openly on github and the preprint was made available with a citable DOI via figshare. We’ve received a nice comment, more than 250 views and a citation from another preprint. All before our paper has been ‘published’ in the traditional sense. I hope this illustrates well how open practices really do accelerate progress.

This is not a one-off occurrence either. As with open access papers, freely accessible preprints have a clear citation advantage over traditional subscription access papers:

graph

Outside of the natural sciences the situation is also similar; Martin Fenner notes that in the social sciences (SSRN) and economics (RePEc) preprints are also common either in this guise, or as ‘working papers’ – the name may be different but the pre-submission accessibility is the same. Yet I suspect, like in biology, this practice isn’t yet mainstream in the Arts & Humanities – perhaps just a matter of time before this cultural shift occurs (more on this later on in the post…)?

There is one important caveat to mention with respect to posting preprints – a small minority of conservative, traditional journals will not accept articles that have been posted online prior to submission. You might well want to check Sherpa/RoMEo before you upload your preprint to ensure that your preferred destination journal accepts preprint submissions. There is an increasing grass-roots led trend apparent to convince these journals that preprint submissions should be allowed, of which some have already succeeded.

If even much-loathed publishers like Elsevier allow preprints, unconditionally, I think it goes to show how rather uncontroversial preprints are. Prior to submission it’s your work and you can put it anywhere you wish.

 

Step 2: Postprints

 

Unlike with preprints, the postprint situation is a little trickier. Publishers like to think that they have the exclusive right to publish your peer-reviewed work. The exact terms of these agreements will vary from journal to journal depending on the exact terms of the copyright or licencing agreement you might have signed. Some publishers try to enforce ‘embargoes’ upon postprints, to maintain the artificial scarcity of your work and their monopoly of control over access to it. But rest assured, at some point, often just 12 months after publication, you’ll be ‘allowed’ to upload copies of your work to the public internet (again SHERPA/RoMEO gives excellent information with respect to this).

So, assuming you already have some form of research output(s) to show for your work, you’ll want these to be discoverable, readable and re-usable by others – after all, what’s the point of doing research if no-one knows about it! If you’ve invested a significant amount of time writing a publication, gathering data, or developing software – you want people to be able to read and use this output. All outputs are important, not just publications. If you’ve published a paper in a traditional subscription access journal, then most of the world can’t read it. But, you can make a postprint of that work available, subject to the legal nonsense referred to above.

If it’s allowed, why don’t more people do it?

Similar to the cultural issues discussed with preprints, for some reason, researchers on the whole don’t tend to use institutional repositories (IR) to make their work more widely available. My IR at the University of Bath lists metadata for over 3300 published papers, yet relatively few of those metadata records have a fulltext copy of the item deposited with them for various reasons. Just ~6.9% of records have fulltext deposits, as published back in June 2011.

I think it’s because institutional repositories have an image problem: some are functional but extremely drab. I also hear of researchers full of disdain who say of their IR’s (I paraphrase):

“Oh, that thing? Isn’t that just for theses & dissertations – you wouldn’t put proper research there”

All this is set to change though as researchers are increasingly being mandated to deposit their fulltext outputs in IR’s. One particular noteworthy driver of change in this realm could be the newly-launched Zenodo service. Unlike Academia.edu or ResearchGate which are for-profit operations, and are really just websites in many respects; Zenodo is a proper repository – it supports harvesting of content via the OAI-PMH protocol and all metadata about the content is CC0, and it’s a not-for-profit operation. Crucially, it provides a repository for academics less well-served by the existing repository systems – not all research institutions have a repository, and independent or retired scholars also need a discoverable place to put their postprints. I think the attractive, modern-look, and altmetrics to demonstrate impact will also add that missing ‘sex appeal’ to provide the extra incentive to upload.

519a4594ec8d83225e000004

Providing Access to Your Published Research Data Benefits You

A new preprint on PeerJ shows that papers with associated open research data have a citation advantage. Furthermore other research has shown that willingness to share research data is related to the strength of the evidence and the quality of the results. Traditional repository software was designed around handling metadata records and publications. They don’t tend be great at storing or visualizing research data. But a new development in this arena is the use of CKAN software for research data management. Originally CKAN was developed by the Open Knowledge Foundation to help make open government data more discoverable and usable; the UK, US, and governments around the world now use this technology to make data available. Now research institutions like the University of Lincoln are also using this too for research data management, and like Zenodo the interface is clean, modern and provides excellent discoverability.

CKAN

Repositories are superior for enabling discovery of your work

Even though I use Academia.edu & ResearchGate myself. They’re not perfect solutions. If someone is looking for your papers, or a particular paper that you wrote these websites do well in making your output discoverable for these types of searches from a simple Google search. But interestingly, for more complex queries, these simple websites don’t provide good discoverability.

An example: I have a fulltext copy of my Nature letter on Academia.edu, it can’t be found from Google Scholar – but the copy in my institutional repository at Bath can. This is the immense value of interoperable and open metadata. Academics would do well to think closely about how this affects the discoverability of their work online.

The technology for searching across repositories for freely accessible postprints isn’t as good as I’d want it to be. But repository search engines like BASE, CORE and Repository Search are improving day by day. Hopefully, one day we’ll have a working system where you can paste-in a DOI and it’ll take you to a freely available postprint copy of the work; Jez Cope has an excellent demo of this here.

Open scholarship is now open to all

So, if there aren’t any suitable fee-free journals in your subject area (1), you find you don’t have funds to publish a gold open access article (2), and you aren’t eligible for am OA fee waiver (3), fear not. With a combination of preprint & postprint postings, you too can make your research freely available online, even if it has the misfortune to be published in a traditional subscription access journal. Upload your work today!

This is a re-post, originally first blogged by myself on the Open Knowledge Foundation main blog here

Recently Science Europe published a clear and concise position statement titled:
Principles on the Transition to Open Access to Research Publications

This is an extremely timely & important document that clarifies what governments and research funders should expect during the transition to open access. Unlike the recent US OSTP public access policy which allows publishers to apply up to a 12 month access embargo (to the disgust of some scientists like Michael Eisen) on publicly-funded research, this new Science Europe statement makes clear that only up to a 6 month embargo at maximum should be accepted for publicly funded STEM research. The recent RCUK (UK research councils) open access policy also requires 6 months embargo at most, with some caveats.

But among the many excellent principles is a particularly bold and welcome proclamation:

the hybrid model, as currently defined and implemented by publishers, is not a working and viable pathway to Open Access. Any model for transition to Open Access supported by Science Europe Member Organisations must prevent ‘double dipping’ and increase cost transparency

Hybrid options are typically far more expensive than ‘pure’ open access journal costs, and they don’t typically aid transparency or the wider transition to open access.

The Open Knowledge Foundation heartily endorses these principles as together with the above they respect, and reinforce the need for free access AND full re-use rights to scientific research.


About Science Europe:

Science Europe is an association of European Research Funding Organisations and Research Performing Organisations, based in Brussels. At present Science Europe comprises 51 Research Funding and Research Performing Organisations from 26 countries, representing around €30 billion per annum.