Show me the data!
Header

Author Archives: rmounce

Progress Update from Meise, Belgium

June 12th, 2014 | Posted by rmounce in Open Data | Open Science - (0 Comments)

A quick blog from Meise, Belgium at the Pro-iBiosphere wrap-up event.

Yesterday I gave a talk about my progress liberating, and making searchable, OA figures from academic literature:

 

I’ve had a lot of great feedback and interest in what I’m doing with this.

Cyndy Parr has pointed out that EOL are on Flickr too, and have been marking-up photographs of taxa with ‘machine tags‘.

I will now start to experiment with how I can incorporate taxonomic & geographical machine tags into my workflow when uploading images to Flickr. As an example I have added binomial tags to two figures from an OA Zootaxa paper on ‘Urothrips’: https://www.flickr.com/photos/121174006@N06/13379028204/in/set-72157642842813323

 

see bottom right hand corner for the added 'machine tags'

see bottom right hand corner for the added ‘machine tags’

 

 

 

 

 

 

 

 

 

 

 

 

Jeremy Miller from Naturalis is also very interested in OA Zootaxa content from the point of view of spiders. He gave a talk on Data Visualization on behalf of his team from the Leiden hackday. Luckily, with no prior ‘special’ mark-up, by searching ‘Araneae‘ I could show Jeremy the promise of what I’m doing on Flickr. Many phylogenies containing spider taxa came up in the search, many of which he immediately recognized as from his own open access publications! With a little bit of work to further mark-up the attributes he’s interested in, I might be able to provide something of real use – the ability to search figure images/captions across hundreds of open access journals, from many different publishers with just ONE search!

The Bouchout Declaration will be launched today at this meeting. I’m happy to say I facilitated the signing of this declaration by Open Knowledge. Many other organisations have signed this declaration and I hope it makes a splash – we need science to be open to do good science!

Finally, I’ve also potentially got a new research collaboration going (more of which later!).
It’s been well worth the trip!

[Update: the conference itself will be in November, 2014 – this is just the first announcement!]

I’m super excited to announce I’m part of the international organizing committee for OpenCon 2014:

OpenCon 2014

 

 

 

 

You can read the official first press release about this event here:

http://www.righttoresearch.org/act/opencon/announcement

 

here’s an excerpt from it:

“From Nigeria to Norway, the next generation is beginning to take ownership of the system of scholarly communication which they will inherit,” said Nick Shockey, founding Director of the Right to Research Coalition. “OpenCon 2014 will support and accelerate this rapidly growing movement of students and early career researchers advocating for openness in research literature, education, and data.

The first event of its kind, OpenCon 2014 builds on the success of the Berlin 11 Satellite Conference for Students and Early Stage Researchers, which brought together more than 70 participants from 35 countries to engage on Open Access to scientific and scholarly research. The interest, energy, and passion from the student and researcher participants and the Open Access movement leaders who attended made a clear case for expanding the event in size and duration, and to broaden the scope to related areas of the Openness movement.”

 

Last year, I was also part of the organizing committee for the event that this has grown from – the Berlin 11 Satellite conference:

berlin11

 

 

 

 

The Berlin 11 Satellite Conference was really exciting but only a 1-day event before the ‘main’ Berlin 11 event – an assemblage of students and ECR’s from literally all over the world (attending with generous full funding support), including representatives from (in no particular order) China, India, Saudi Arabia, Georgia, Tanzania, Tasmania(!), Kenya, Nigeria, Ghana, Uganda, Columbia, FYR Macedonia,  Mexico, Brazil, Sweden, Holland, Denmark, Poland, Portugal, Canada, the US, the UK… So don’t worry about where you are in the world – as long as you’re a student or ECR you’ll be eligible to apply for OpenCon 2014 (places are limited though!).

As a reminder, at the event last year we had Jack Andraka and Mike Taylor amongst the guest speakers. It was such a comprehensive success that it’s been expanded into a full 3-day event this year, expanding scope too, to includmeandjacke Open Data and OER, not just OA (they’re all obviously inter-related problems; better to tackle the integrated set of problems rather than aspects in isolation!).

Applications for OpenCon 2014 will open in August. For more information about the conference and to sign up for updates, visit www.opencon.net

I promise you this – it’s going to be BIG and I’m stoked to be part of an international organizing committee helping to make this happen.

OpenCon 2014 is also looking for additional sponsorship, particularly for Travel Scholarships to ensure global representation at this meeting, so if you have a marketing budget to spend, or are feeling generous please do have a look at the sponsorship opportunities.

PLOS ONE PHYLOGENY

May 7th, 2014 | Posted by rmounce in Content Mining | Open Data | Open Science | PLoS | PLUTo - (13 Comments)

I’m proud to announce an interesting public output from my BBSRC-funded postdoc project:
PLUTo: Phyloinformatic Literature Unlocking Tools. Software for making published phyloinformatic data discoverable, open, and reusable

MOAR PHYLOGENY!

Screenshot of some of the PLOS ONE phylogeny figure collection on Flickr

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I’ve made openly available my first-pass filter of PLOS ONE phylogeny figures (I’m not in any way claiming this is *all* of them).

This curated & tagged image collection is on Flickr for easy browsing: http://bit.ly/PLOStrees

As well as on Github for version control, open archiving, and collaboration (I have remote collaborators):

https://github.com/rossmounce/P1-phylo-part1

https://github.com/rossmounce/P1-phylo-part2

https://github.com/rossmounce/P1-phylo-part3

https://github.com/rossmounce/P1-phylo-part4

(Github doesn’t like repositories over 1GB so I’ve had to split-up the content between 4 separate repositories)

 

Why?

The aim of the PLUTo project is to re-extract & liberate phylogenetic data & associated metadata from the research literature. Sadly, only ~4% of modern published phylogenetic analysis studies make their underlying data available. Another study finds that if you ask the authors for this data, only 16% will be kind enough to reply with the requested data!

This particular data type is a cornerstone of modern evolutionary biology. You’ll find phylogenetic analyses across a whole host of journal subjects – medical, ecological, natural history, palaeontology… There are also many different ways in which this data can be re-used e.g. supertrees  & comparative cladistics. Not to mention, simple validation studies &/or analyses which extend-upon or map new data on to a phylogeny. It’s really useful data and we should be archiving it for future re-use and re-analysis. To my great delight, this is what I’m being paid to attempt to do for my first postdoc; on a grant I co-wrote – finding & liberating phylogenetic data for everyone!

 

Why PLOS ONE?

 

  •  It’s a BOAI-compliant open access journal that publishes most articles under CC BY, with a few under CC0.
    • This means I can openly re-publish figures online (provided sufficient attribution is given) — no need to worry about DMCA takedown notices or ‘getting sued’! This makes the process of research much easier. Private, non-public, access-restricted repositories for collaboration are a hassle I’d rather do without.
  • It’s a high-volume ‘megajournal’ publishing ~200 articles per day, many of which include phylogenetic analyses.
    • Thus its worthwhile establishing a regular daily or weekly method for parsing-out phylogenetic tree figures from this journal
  • Killer feature: as far as I know, PLOS are the only publisher to embed rich metadata inside their figure image files.
    • This makes satisfying the CC BY licence trivially easy — sufficient attribution metadata is already embedded in the file. Just ensure that wherever you’re uploading the file to doesn’t wipe this embedded data, hence why I chose Flickr as my initial upload platform.

 

What does this enable or make easier?

 

On it’s own, this collection doesn’t do much, this is still an early stage – but it gives us an important insight into the prevalence of certain types of visual display-style that researchers are using:

‘radial’ phylogenies

https://www.flickr.com/search?user_id=123621741%40N08&sort=relevance&text=radial

Source: Zerillo et al 2013 PLOS ONE. Carbohydrate-Active Enzymes in Pythium and Their Role in Plant Cell Wall and Storage Polysaccharide Degradation

Source: Zerillo et al 2013 PLOS ONE. Carbohydrate-Active Enzymes in Pythium and Their Role in Plant Cell Wall and Storage Polysaccharide Degradation

 

 

 

 

 

 

 

 

 

 

 

 

 

‘geophylogeny’ (phylogeny displayed relative to a map of some sort, 2D or 3D)

https://www.flickr.com/search?user_id=123621741%40N08&sort=relevance&text=geophylogeny

Source: Guo et al 2012 PLOS ONE. Evolution and Biogeography of the Slipper Orchids: Eocene Vicariance of the Conduplicate Genera in the Old and New World Tropics

Source: Guo et al 2012 PLOS ONE. Evolution and Biogeography of the Slipper Orchids: Eocene Vicariance of the Conduplicate Genera in the Old and New World Tropics

 

 

 

 

 

 

 

 

 

 

‘timescaled’ (phylogenies where the branch lengths are proportional to units of time or geological periods)
https://www.flickr.com/search?user_id=123621741%40N08&sort=relevance&text=timescaled

Source: Pol et al 2014 PLOS ONE. A New Notosuchian from the Late Cretaceous of Brazil and the Phylogeny of Advanced Notosuchians

Source: Pol et al 2014 PLOS ONE. A New Notosuchian from the Late Cretaceous of Brazil and the Phylogeny of Advanced Notosuchians

 

 

 

 

 

 

 

 

 

‘splitstrees’

https://www.flickr.com/search?user_id=123621741%40N08&sort=relevance&text=splitstree

Source: McDowell et al 2013 PLOS ONE. The Opportunistic Pathogen Propionibacterium acnes: Insights into Typing, Human Disease, Clonal Diversification and CAMP Factor Evolution

Source: McDowell et al 2013 PLOS ONE. The Opportunistic Pathogen Propionibacterium acnes: Insights into Typing, Human Disease, Clonal Diversification and CAMP Factor Evolution

 

 

 

 

 

 

 

 

 

 

 

Arguably it also facilitates complex searches for specific types of phylogeny

e.g. analyses using cytochrome b
https://www.flickr.com/search/?w=123621741@N08&q=%22cyt%20b%22%20OR%20%22cytochrome%20b%22
(you could use PLOS’s API to do this, particularly their figure/table caption search field — but you’d get a lot of false positives — this is an expert-curated collection that has filtered-out non-phylo figures)

In my initial roadmap, the plan is to do PLOS ONE, the other PLOS journals, then BMC journals, then possibly Zootaxa & Phytotaxa (Magnolia Press). There will be a Github-based website for the project soon, lots still to do…!

 

Want to know more / collaborate / critique ?

Conferences:

I’ve got an accepted lightning talk at iEvoBio in Raleigh, NC later this year about the PLUTo project.

As well as an accepted lightning talk at the Bioinformatics Open Source Conference (BOSC) in Boston, MA.

Elsewise, contact me via twitter @rmounce , the comment section on this blog post, or email ross dot mounce <at> gmail dot com

Discussing Open Access with the Linnean Society

March 13th, 2014 | Posted by rmounce in Open Access - (12 Comments)

I’ve been invited to come in and have an informal chat about open access with the Linnean Society on March 24th this month. Particularly with regard to what is and what is not ‘open access’ in terms of Creative Commons licences. I write this blog post to spur on other advocates to try and encourage their society journals to use proper, open access compliant article licencing that facilitates rather than prevents text & data mining.

I have Tom Simpson at LinnSoc to thank for reaching out to make this happen. Thanks Tom!

It started from some tweets I sent a few days ago about an interesting new Zoo J Linn paper by Martin Brazeau & Matt Friedman. I’d include a pretty figure from this paper if I was allowed to, but unfortunately because it’s licensed with the Creative Commons Attribution-NonCommercial-NoDerivs License (CC BY-NC-ND) I can’t. To repost just a figure from the paper would be to create a smaller derivative work which the licence does not allow – I am only allowed to repost the *whole* article with absolutely no changes which is rather impractical for a 43 page article! Wiley in particular have a history of threatening scientist bloggers for reproducing a single figure from an article (read the Shelley Batts story here).

restricted access

It’s not just bloggers, and the outreach possibilities for the paper that are harmed with the use of such restrictive licenses – it also causes problems for RCUK funded researchers. Matt Friedman is based at Oxford at the moment – if the funding for this work came from any of the UK research councils, then the choice of the CC BY-NC-ND license could cause him problems – it is NOT compliant with the RCUK’s policy on open access. Wiley should know better than to offer this license to UK-based authors, but they have a significant conflict of interest in ensuring researchers choose more restrictive licencing options so that they can continue to be the sole proprietor of glossy reprint copies (ensured by the -NC clause). Both the -NC & the -ND clauses incidentally prevent the figures from being re-used on Wikipedia, another sad restriction for the authors who must have put a lot of effort into them.

In the realm of academic science, the application of that particular license to the paper-as-a-whole-work just doesn’t make sense. Many digital research projects need to be able excerpt, transform and translate research outputs such as academic papers, and in some cases create commercial value from this. My current BBSRC-funded research project ‘PLUTo: Phyloinformatic Literature Unlocking Tools. Software for making published phyloinformatic data discoverable, open, and reusable‘ relies on being allowed to transform, excerpt and republish extracted content from scientific papers. With Peter Murray-Rust we’re using text & image mining tools to generate open, re-usable phylogenetic data directly from the published literature, often directly from PDFs.  The Linnean Society have several good quality, well-respected journals which publish phylogenetic content, so they’re very much in the scope of our PLUTo work.

But clauses such as -ND stop us from using this material. It’s clear in the license terms and conditions – we are not allowed to make any derivative works from the original. So any papers using CC BY-NC-ND we will have to avoid. We cannot use them, and therefore they will not be cited by our project which is rather a shame for their authors.

Above all the CC BY-NC-ND license simply isn’t compliant with the very definition of open access as laid down over a decade ago at the Berlin, Budapest, Bethesda meetings. Wiley are knowingly mis-labelling articles using non-compliant licences as ‘open access’ even though they are by definition NOT open access. I hope the Linnean Society can spur Wiley to do something about this as it is not good for the journal, or its authors. Other journals using non-compliant licencing use terms like ‘public access‘ or ‘free access‘ or ‘sponsored access‘. Why can’t Wiley follow this lead? Open access is more than just free access – it enables re-use which is critical for research projects like mine. Please stop the ‘openwashing‘.

 

Further Reading:

Hagedorn, G., Mietchen, D., Morris, R., Agosti, D., Penev, L., Berendsohn, W., and Hobern, D. 2011. Creative commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information. ZooKeys 150:127-149.

Mounce, R. 2012. Life as a palaeontologist: Academia, the internet and creative commons. Palaeontology Online 2:1-10.

Klimpel, P. Consequences, Risks, and side-effects of the license module Non-Commercial – NC [PDF] 1-22.