Show me the data!
Header

I note with interest that article publication charge data from the University of Edinburgh has been released on Figshare today.

There are some fascinating numbers in there and I applaud the transparency.

One particular article that took my eye is this one:
Paradoxical effects of heme arginate on survival of myocutaneous flaps

article

Page charges were paid for this article amounting to £1330.45, and that’s just for page charges – the journal did not make the article open access, nor was it asked to. This was for ‘page charges’ alone.

I also noted the research was paid for by the MRC – a top-class UK government-funded agency. As I am a full UK taxpayer, I feel especially entitled to read this research!

The MRC has a very clear policy on open access – the article must either:

1.) be made immediately open access by the publisher upon publication; ‘journal-mediated OA’ (sometimes called ‘gold’)

OR

2.) via the route of ‘repository-mediated access’ some kind of copy of the work must be made publicly accessible no more than 6 months after publication (sometimes called ‘green’)

Since the article clearly wasn’t open access at the publisher, I assume the authors have elected to choose the repository-access method. The article was formally published on 1st January 2014, so between then and now, clearly at least 6 months have elapsed. 7 months and 20 days to be precise. So where is the full text of this article?

It’s not in PubMed (abstract-only)
Nor EuropePubMed (abstract-only)
Nor the University of Edinburgh institutional repository (abstract-only)

So it would appear to me that the rules of the funding body (MRC) may have been broken here (sincere apologies if I am wrong about this), something all too easy to do if the repository route is chosen.

Wouldn’t it have been better to spend those page charges on making the paper immediately open access?

In the mean time, I have sent the University of Edinburgh open access team (openaccess@ed.ac.uk) an email to ask where the full text for this paper is, and I await their reply.

How best to link the figure, to the paper & the underlying data?

Whilst visiting EBI, Hinxton yesterday, Robert Hanson (computational chemist) reminded me of an interesting hack you can do to embed data in images.

Back in 2010 it was widely reported that people were using Flickr to transmit data (secretly) in images.

This general technique is called Steganography.

Turns out I can use this hack in my project too…

As a proof of concept, I’ve uploaded one recent PLOS ONE phylogeny figure to my ‘plosone-phylo’ flickr account:

https://www.flickr.com/photos/123621741@N08/14385231987/in/photostream/

In this special file, I’ve embedded the nexus, NeXML & Bibtex file from TreeBASE that correspond to the image. This website has cross-platform instructions – it’s remarkably simple.

So now if you download that special file I put on Flickr (this hack only works if you download the original, not resized versions) and ‘unzip’ the image file you’ll reveal the hidden data embedded in the image:  nexus, NeXML & Bibtex.
Try it!

Screenshot showing what to click to download the original image

Screenshot showing what to click to download the original image

 

This certainly isn’t the ‘optimal’ way of doing things. But it is a nice way of keeping everything together in ONE file. Maybe you might have a use for this hack too?

A quick blog from Meise, Belgium at the Pro-iBiosphere wrap-up event.

Yesterday I gave a talk about my progress liberating, and making searchable, OA figures from academic literature:

 

I’ve had a lot of great feedback and interest in what I’m doing with this.

Cyndy Parr has pointed out that EOL are on Flickr too, and have been marking-up photographs of taxa with ‘machine tags‘.

I will now start to experiment with how I can incorporate taxonomic & geographical machine tags into my workflow when uploading images to Flickr. As an example I have added binomial tags to two figures from an OA Zootaxa paper on ‘Urothrips’: https://www.flickr.com/photos/121174006@N06/13379028204/in/set-72157642842813323

 

see bottom right hand corner for the added 'machine tags'

see bottom right hand corner for the added ‘machine tags’

 

 

 

 

 

 

 

 

 

 

 

 

Jeremy Miller from Naturalis is also very interested in OA Zootaxa content from the point of view of spiders. He gave a talk on Data Visualization on behalf of his team from the Leiden hackday. Luckily, with no prior ‘special’ mark-up, by searching ‘Araneae‘ I could show Jeremy the promise of what I’m doing on Flickr. Many phylogenies containing spider taxa came up in the search, many of which he immediately recognized as from his own open access publications! With a little bit of work to further mark-up the attributes he’s interested in, I might be able to provide something of real use – the ability to search figure images/captions across hundreds of open access journals, from many different publishers with just ONE search!

The Bouchout Declaration will be launched today at this meeting. I’m happy to say I facilitated the signing of this declaration by Open Knowledge. Many other organisations have signed this declaration and I hope it makes a splash – we need science to be open to do good science!

Finally, I’ve also potentially got a new research collaboration going (more of which later!).
It’s been well worth the trip!

[Update: the conference itself will be in November, 2014 - this is just the first announcement!]

I’m super excited to announce I’m part of the international organizing committee for OpenCon 2014:

OpenCon 2014

 

 

 

 

You can read the official first press release about this event here:

http://www.righttoresearch.org/act/opencon/announcement

 

here’s an excerpt from it:

“From Nigeria to Norway, the next generation is beginning to take ownership of the system of scholarly communication which they will inherit,” said Nick Shockey, founding Director of the Right to Research Coalition. “OpenCon 2014 will support and accelerate this rapidly growing movement of students and early career researchers advocating for openness in research literature, education, and data.

The first event of its kind, OpenCon 2014 builds on the success of the Berlin 11 Satellite Conference for Students and Early Stage Researchers, which brought together more than 70 participants from 35 countries to engage on Open Access to scientific and scholarly research. The interest, energy, and passion from the student and researcher participants and the Open Access movement leaders who attended made a clear case for expanding the event in size and duration, and to broaden the scope to related areas of the Openness movement.”

 

Last year, I was also part of the organizing committee for the event that this has grown from – the Berlin 11 Satellite conference:

berlin11

 

 

 

 

The Berlin 11 Satellite Conference was really exciting but only a 1-day event before the ‘main’ Berlin 11 event – an assemblage of students and ECR’s from literally all over the world (attending with generous full funding support), including representatives from (in no particular order) China, India, Saudi Arabia, Georgia, Tanzania, Tasmania(!), Kenya, Nigeria, Ghana, Uganda, Columbia, FYR Macedonia,  Mexico, Brazil, Sweden, Holland, Denmark, Poland, Portugal, Canada, the US, the UK… So don’t worry about where you are in the world – as long as you’re a student or ECR you’ll be eligible to apply for OpenCon 2014 (places are limited though!).

As a reminder, at the event last year we had Jack Andraka and Mike Taylor amongst the guest speakers. It was such a comprehensive success that it’s been expanded into a full 3-day event this year, expanding scope too, to includmeandjacke Open Data and OER, not just OA (they’re all obviously inter-related problems; better to tackle the integrated set of problems rather than aspects in isolation!).

Applications for OpenCon 2014 will open in August. For more information about the conference and to sign up for updates, visit www.opencon.net

I promise you this – it’s going to be BIG and I’m stoked to be part of an international organizing committee helping to make this happen.

OpenCon 2014 is also looking for additional sponsorship, particularly for Travel Scholarships to ensure global representation at this meeting, so if you have a marketing budget to spend, or are feeling generous please do have a look at the sponsorship opportunities.

I’m proud to announce an interesting public output from my BBSRC-funded postdoc project:
PLUTo: Phyloinformatic Literature Unlocking Tools. Software for making published phyloinformatic data discoverable, open, and reusable

MOAR PHYLOGENY!

Screenshot of some of the PLOS ONE phylogeny figure collection on Flickr

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I’ve made openly available my first-pass filter of PLOS ONE phylogeny figures (I’m not in any way claiming this is *all* of them).

This curated & tagged image collection is on Flickr for easy browsing: http://bit.ly/PLOStrees

As well as on Github for version control, open archiving, and collaboration (I have remote collaborators):

https://github.com/rossmounce/P1-phylo-part1

https://github.com/rossmounce/P1-phylo-part2

https://github.com/rossmounce/P1-phylo-part3

https://github.com/rossmounce/P1-phylo-part4

(Github doesn’t like repositories over 1GB so I’ve had to split-up the content between 4 separate repositories)

 

Why?

The aim of the PLUTo project is to re-extract & liberate phylogenetic data & associated metadata from the research literature. Sadly, only ~4% of modern published phylogenetic analysis studies make their underlying data available. Another study finds that if you ask the authors for this data, only 16% will be kind enough to reply with the requested data!

This particular data type is a cornerstone of modern evolutionary biology. You’ll find phylogenetic analyses across a whole host of journal subjects – medical, ecological, natural history, palaeontology… There are also many different ways in which this data can be re-used e.g. supertrees  & comparative cladistics. Not to mention, simple validation studies &/or analyses which extend-upon or map new data on to a phylogeny. It’s really useful data and we should be archiving it for future re-use and re-analysis. To my great delight, this is what I’m being paid to attempt to do for my first postdoc; on a grant I co-wrote – finding & liberating phylogenetic data for everyone!

 

Why PLOS ONE?

 

  •  It’s a BOAI-compliant open access journal that publishes most articles under CC BY, with a few under CC0.
    • This means I can openly re-publish figures online (provided sufficient attribution is given) — no need to worry about DMCA takedown notices or ‘getting sued’! This makes the process of research much easier. Private, non-public, access-restricted repositories for collaboration are a hassle I’d rather do without.
  • It’s a high-volume ‘megajournal’ publishing ~200 articles per day, many of which include phylogenetic analyses.
    • Thus its worthwhile establishing a regular daily or weekly method for parsing-out phylogenetic tree figures from this journal
  • Killer feature: as far as I know, PLOS are the only publisher to embed rich metadata inside their figure image files.
    • This makes satisfying the CC BY licence trivially easy — sufficient attribution metadata is already embedded in the file. Just ensure that wherever you’re uploading the file to doesn’t wipe this embedded data, hence why I chose Flickr as my initial upload platform.

 

What does this enable or make easier?

 

On it’s own, this collection doesn’t do much, this is still an early stage – but it gives us an important insight into the prevalence of certain types of visual display-style that researchers are using:

‘radial’ phylogenies

https://www.flickr.com/search?user_id=123621741%40N08&sort=relevance&text=radial

Source: Zerillo et al 2013 PLOS ONE. Carbohydrate-Active Enzymes in Pythium and Their Role in Plant Cell Wall and Storage Polysaccharide Degradation

Source: Zerillo et al 2013 PLOS ONE. Carbohydrate-Active Enzymes in Pythium and Their Role in Plant Cell Wall and Storage Polysaccharide Degradation

 

 

 

 

 

 

 

 

 

 

 

 

 

‘geophylogeny’ (phylogeny displayed relative to a map of some sort, 2D or 3D)

https://www.flickr.com/search?user_id=123621741%40N08&sort=relevance&text=geophylogeny

Source: Guo et al 2012 PLOS ONE. Evolution and Biogeography of the Slipper Orchids: Eocene Vicariance of the Conduplicate Genera in the Old and New World Tropics

Source: Guo et al 2012 PLOS ONE. Evolution and Biogeography of the Slipper Orchids: Eocene Vicariance of the Conduplicate Genera in the Old and New World Tropics

 

 

 

 

 

 

 

 

 

 

‘timescaled’ (phylogenies where the branch lengths are proportional to units of time or geological periods)
https://www.flickr.com/search?user_id=123621741%40N08&sort=relevance&text=timescaled

Source: Pol et al 2014 PLOS ONE. A New Notosuchian from the Late Cretaceous of Brazil and the Phylogeny of Advanced Notosuchians

Source: Pol et al 2014 PLOS ONE. A New Notosuchian from the Late Cretaceous of Brazil and the Phylogeny of Advanced Notosuchians

 

 

 

 

 

 

 

 

 

‘splitstrees’

https://www.flickr.com/search?user_id=123621741%40N08&sort=relevance&text=splitstree

Source: McDowell et al 2013 PLOS ONE. The Opportunistic Pathogen Propionibacterium acnes: Insights into Typing, Human Disease, Clonal Diversification and CAMP Factor Evolution

Source: McDowell et al 2013 PLOS ONE. The Opportunistic Pathogen Propionibacterium acnes: Insights into Typing, Human Disease, Clonal Diversification and CAMP Factor Evolution

 

 

 

 

 

 

 

 

 

 

 

Arguably it also facilitates complex searches for specific types of phylogeny

e.g. analyses using cytochrome b
https://www.flickr.com/search/?w=123621741@N08&q=%22cyt%20b%22%20OR%20%22cytochrome%20b%22
(you could use PLOS’s API to do this, particularly their figure/table caption search field — but you’d get a lot of false positives — this is an expert-curated collection that has filtered-out non-phylo figures)

In my initial roadmap, the plan is to do PLOS ONE, the other PLOS journals, then BMC journals, then possibly Zootaxa & Phytotaxa (Magnolia Press). There will be a Github-based website for the project soon, lots still to do…!

 

Want to know more / collaborate / critique ?

Conferences:

I’ve got an accepted lightning talk at iEvoBio in Raleigh, NC later this year about the PLUTo project.

As well as an accepted lightning talk at the Bioinformatics Open Source Conference (BOSC) in Boston, MA.

Elsewise, contact me via twitter @rmounce , the comment section on this blog post, or email ross dot mounce <at> gmail dot com

I’ve been invited to come in and have an informal chat about open access with the Linnean Society on March 24th this month. Particularly with regard to what is and what is not ‘open access’ in terms of Creative Commons licences. I write this blog post to spur on other advocates to try and encourage their society journals to use proper, open access compliant article licencing that facilitates rather than prevents text & data mining.

I have Tom Simpson at LinnSoc to thank for reaching out to make this happen. Thanks Tom!

It started from some tweets I sent a few days ago about an interesting new Zoo J Linn paper by Martin Brazeau & Matt Friedman. I’d include a pretty figure from this paper if I was allowed to, but unfortunately because it’s licensed with the Creative Commons Attribution-NonCommercial-NoDerivs License (CC BY-NC-ND) I can’t. To repost just a figure from the paper would be to create a smaller derivative work which the licence does not allow – I am only allowed to repost the *whole* article with absolutely no changes which is rather impractical for a 43 page article! Wiley in particular have a history of threatening scientist bloggers for reproducing a single figure from an article (read the Shelley Batts story here).

restricted access

It’s not just bloggers, and the outreach possibilities for the paper that are harmed with the use of such restrictive licenses – it also causes problems for RCUK funded researchers. Matt Friedman is based at Oxford at the moment – if the funding for this work came from any of the UK research councils, then the choice of the CC BY-NC-ND license could cause him problems – it is NOT compliant with the RCUK’s policy on open access. Wiley should know better than to offer this license to UK-based authors, but they have a significant conflict of interest in ensuring researchers choose more restrictive licencing options so that they can continue to be the sole proprietor of glossy reprint copies (ensured by the -NC clause). Both the -NC & the -ND clauses incidentally prevent the figures from being re-used on Wikipedia, another sad restriction for the authors who must have put a lot of effort into them.

In the realm of academic science, the application of that particular license to the paper-as-a-whole-work just doesn’t make sense. Many digital research projects need to be able excerpt, transform and translate research outputs such as academic papers, and in some cases create commercial value from this. My current BBSRC-funded research project ‘PLUTo: Phyloinformatic Literature Unlocking Tools. Software for making published phyloinformatic data discoverable, open, and reusable‘ relies on being allowed to transform, excerpt and republish extracted content from scientific papers. With Peter Murray-Rust we’re using text & image mining tools to generate open, re-usable phylogenetic data directly from the published literature, often directly from PDFs.  The Linnean Society have several good quality, well-respected journals which publish phylogenetic content, so they’re very much in the scope of our PLUTo work.

But clauses such as -ND stop us from using this material. It’s clear in the license terms and conditions – we are not allowed to make any derivative works from the original. So any papers using CC BY-NC-ND we will have to avoid. We cannot use them, and therefore they will not be cited by our project which is rather a shame for their authors.

Above all the CC BY-NC-ND license simply isn’t compliant with the very definition of open access as laid down over a decade ago at the Berlin, Budapest, Bethesda meetings. Wiley are knowingly mis-labelling articles using non-compliant licences as ‘open access’ even though they are by definition NOT open access. I hope the Linnean Society can spur Wiley to do something about this as it is not good for the journal, or its authors. Other journals using non-compliant licencing use terms like ‘public access‘ or ‘free access‘ or ‘sponsored access‘. Why can’t Wiley follow this lead? Open access is more than just free access – it enables re-use which is critical for research projects like mine. Please stop the ‘openwashing‘.

 

Further Reading:

Hagedorn, G., Mietchen, D., Morris, R., Agosti, D., Penev, L., Berendsohn, W., and Hobern, D. 2011. Creative commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information. ZooKeys 150:127-149.

Mounce, R. 2012. Life as a palaeontologist: Academia, the internet and creative commons. Palaeontology Online 2:1-10.

Klimpel, P. Consequences, Risks, and side-effects of the license module Non-Commercial – NC [PDF] 1-22.

 


Today I received proof that Elsevier are also sending takedown notices to UK universities – asking them to takedown copies of their staff’s academic research papers, hosted on university webpages. The full text is further down this post (in red). It is not just Academia.edu, it is not just the University of Calgary, University of California-Irvine, or Harvard University. Elsevier very probably are sending takedown notices to institutions and websites across the globe.
No-one is safe from these legal threats.

Not only that, but they seem to be encouraging universities to be pro-active and takedown more than just the specific articles identified in the DMCA notice they send! They are encouraging universities to limit access to their research works. This is simply disgraceful (even though I acknowledge they are technically, legally within their rights to do this because of the way in which their copyright transfer agreements are written, which incidentally many academics are effectively forced to sign in order to get published and make progress in their careers).

For background information read:

How one publisher is stopping academics from sharing their research. The Washington Post 19/12/2013

Elsevier steps up its War On Access SVPOW 17/12/2013

300px-Elsevier_poster_with_text

Librarians and university web admins: please publicly come out with more examples like this. Researchers, readers and taxpayers desperately need to know about this. Silence and subterfuge benefits no-one, these chilling effects must be publicly revealed.

This is the email I received with certain parts redacted:

*** Sent via Email – Inappropriate postings of Elsevier’s journal articles / DMCA Notice of Copyright Infringement ***

Dear Sir/Madam,

I write on behalf of Elsevier to bring to your attention the inappropriate posting of final published journal articles to your institutional website. I am President at Attributor (A Digimarc Company), which assists some of the world’s most prominent publishers, including Elsevier, with digital content protection (www.digimarc.com/guardian). Following the discussion below, a formal DMCA takedown request is included as Appendix A.

As you probably know, Elsevier journal article authors retain or are permitted a wide scope of scholarly use and posting on their own sites and for use within their own institutions. Those rights are more expansive when it comes to author preprints or accepted manuscripts than with respect to the final versions of published journal articles. Elsevier recognizes that in some cases authors or their institutions may not be fully aware of these rights and can by mistake post the final version of their articles to institutional websites or repositories. Unfortunately, it has come to our attention that copies of final published journal articles have, perhaps inadvertently, been posted for public access to one of your institutional websites.

I therefore request your cooperation to remove or disable access to these articles on your site, including but not limited to the articles identified in Appendix A. We have identified merely a sample in Appendix A, and as a publisher of close to 2,000 journals this might mean that more articles published by Elsevier could be found on your site. Please may I therefore draw your attention to Elsevier’s posting policy and ask for your attention to ensuring that your posting practices comply with this?
http://www.elsevier.com/about/open-access/open-access-policies/article-posting-policy#published-journal-article

In particular I note that Elsevier currently doesn’t permit posting of the final published journal article, and if there is a mandate or systematic posting mechanisms in place then Elsevier asks for a cost-free agreement with the institution before accepted author manuscripts are posted.
I would also recommend considering the use of DOI links as a way to access to the version of records of a published article. This would allow authors to list their work and to provide easy access to peers.

Finally, should you need any help in properly identifying a final published article to prevent any future improper posting, please do get in touch via the email address below.

I appreciate your anticipated cooperation and if you have any questions or feedback, or if you believe you have received this message in error (as you have received permission to post this article from Elsevier), please contact: UniversalAccess@Elsevier.com
Thank you.

Sincerely,
Eraj Siddiqui
Attributor (A Digimarc Company)

Appendix A

Copyright Infringement Notice

This notice is sent pursuant to the Digital Millennium Copyright Act (DMCA), the European Union’s Directive on the Harmonisation of Certain Aspects of Copyright and Related Rights in the Information Society (2001/29/EC), and/or other laws and regulations relevant in European Union member states or other jurisdictions.

Please remove or disable access to the infringing pages or materials identified below, as they infringe the copyright works identified below.

I certify under penalty of perjury, that I am an agent authorized to act on behalf of the owner of the intellectual property rights and that the information contained in this notice is accurate.

I have a good faith belief that use of the material listed below in the manner complained of is not authorized by the copyright owner, its agent, or the law.

My contact information is as follows:

Organization name: Attributor Corporation as agent for [Publisher Company]
Email: counter-notice@attributor.com
Phone: 650-340-9601
Mailing address:
400 South El Camino Real
Suite 650,
San Mateo, CA 94402

My electronic signature follows:
Sincerely,
/E Siddiqui/
E. Siddiqui
Attributor, Inc.

***List of Works and Location of Infringing Page or Material ***

Infringing page/material that I demand be disabled or removed in consideration of the above:

*** INFRINGING PAGE OR MATERIAL ***

Infringing page/material that I demand be disabled or removed in consideration of the above:

Rights Holder: Reed Elsevier

Original Work: [redacted]
Infringing URL: [redacted]

UPDATE:

Dutch Universities too are receiving DMCA’s from Elsevier:

2013-12-20-113623_939x846_scrot

@Wowter via Twitter

I’d just like to point out to anyone who asks, particularly CRC Press (part of Taylor&Francis Group, who are in turn part of Informa PLC) that by posting the full text of my book chapter to Academia.edu I am *not* breaching the copyright transfer agreement I signed.

Upon receiving a copyright transfer agreement as a PDF from them via email – I edited the PDF to reword the agreement to terms that were more agreeable to me (e.g. I did NOT want to transfer my copyright to them for my work).

The bit of wording I changed is as follows:

As such, copyrights in the Work will not inure to the benefit of the Publisher, the Publisher will not own the publication, its title and component parts, and all publication rights. This does not permit the Publisher, in its name, to copyright in the Contribution, make applications to register its copyright claim, and to renew its copyright certificate.

I signed this reworded form as PDF (displayed below, signature removed) and returned it to them. I have now kindly received a free ‘author copy’ of the printed book and my chapter has clearly been included so it’s too late for CRC press to exclude my chapter. I can only assume they agreed to the reworded terms of the contract I signed and sent them.

I doubt CRC press would even be bothered by my actions to be honest. They are allowing another of their books to be completely posted online for free, so in comparison to that, my action here is puny – but it certainly emboldens me for the next time I may have to sign a CTA form…

CRC Press are welcome to non-exclusively publish my book chapter. Thank you CRC Press for agreeing to my terms and conditions.

Contract

Lessons one might learn from this exercise:

DO NOT GIVE AWAY THE COPYRIGHT TO YOUR WORK!
PUBLISHERS DO NOT ‘NEED’ ALL YOUR COPYRIGHT TRANSFERRED TO THEM TO PUBLISH.
ALL THAT IS NEEDED IS FOR YOU TO GRANT THEM A NON-EXCLUSIVE LICENSE TO PUBLISH.

A word of warning though… I wouldn’t recommend relying on this method of editing CTA’s to get what you want. I was just lucky this time. Choosing an open access publication venue from the start is always the best option (if possible).

See also:

Mike Taylor 2010. Who Owns My Sauropod History Paper?
http://svpow.com/2010/10/13/who-owns-my-sauropod-history-paper/