I’m proud to announce an interesting public output from my BBSRC-funded postdoc project:
PLUTo: Phyloinformatic Literature Unlocking Tools. Software for making published phyloinformatic data discoverable, open, and reusable
Screenshot of some of the PLOS ONE phylogeny figure collection on Flickr
I’ve made openly available my first-pass filter of PLOS ONE phylogeny figures (I’m not in any way claiming this is *all* of them).
This curated & tagged image collection is on Flickr for easy browsing: http://bit.ly/PLOStrees
As well as on Github for version control, open archiving, and collaboration (I have remote collaborators):
(Github doesn’t like repositories over 1GB so I’ve had to split-up the content between 4 separate repositories)
The aim of the PLUTo project is to re-extract & liberate phylogenetic data & associated metadata from the research literature. Sadly, only ~4% of modern published phylogenetic analysis studies make their underlying data available. Another study finds that if you ask the authors for this data, only 16% will be kind enough to reply with the requested data!
This particular data type is a cornerstone of modern evolutionary biology. You’ll find phylogenetic analyses across a whole host of journal subjects – medical, ecological, natural history, palaeontology… There are also many different ways in which this data can be re-used e.g. supertrees & comparative cladistics. Not to mention, simple validation studies &/or analyses which extend-upon or map new data on to a phylogeny. It’s really useful data and we should be archiving it for future re-use and re-analysis. To my great delight, this is what I’m being paid to attempt to do for my first postdoc; on a grant I co-wrote – finding & liberating phylogenetic data for everyone!
Why PLOS ONE?
- It’s a BOAI-compliant open access journal that publishes most articles under CC BY, with a few under CC0.
- This means I can openly re-publish figures online (provided sufficient attribution is given) — no need to worry about DMCA takedown notices or ‘getting sued’! This makes the process of research much easier. Private, non-public, access-restricted repositories for collaboration are a hassle I’d rather do without.
- It’s a high-volume ‘megajournal’ publishing ~200 articles per day, many of which include phylogenetic analyses.
- Thus its worthwhile establishing a regular daily or weekly method for parsing-out phylogenetic tree figures from this journal
- Killer feature: as far as I know, PLOS are the only publisher to embed rich metadata inside their figure image files.
- This makes satisfying the CC BY licence trivially easy — sufficient attribution metadata is already embedded in the file. Just ensure that wherever you’re uploading the file to doesn’t wipe this embedded data, hence why I chose Flickr as my initial upload platform.
What does this enable or make easier?
On it’s own, this collection doesn’t do much, this is still an early stage – but it gives us an important insight into the prevalence of certain types of visual display-style that researchers are using:
Source: Zerillo et al 2013 PLOS ONE. Carbohydrate-Active Enzymes in Pythium and Their Role in Plant Cell Wall and Storage Polysaccharide Degradation
‘geophylogeny’ (phylogeny displayed relative to a map of some sort, 2D or 3D)
Source: Guo et al 2012 PLOS ONE. Evolution and Biogeography of the Slipper Orchids: Eocene Vicariance of the Conduplicate Genera in the Old and New World Tropics
‘timescaled’ (phylogenies where the branch lengths are proportional to units of time or geological periods)
Source: Pol et al 2014 PLOS ONE. A New Notosuchian from the Late Cretaceous of Brazil and the Phylogeny of Advanced Notosuchians
Source: McDowell et al 2013 PLOS ONE. The Opportunistic Pathogen Propionibacterium acnes: Insights into Typing, Human Disease, Clonal Diversification and CAMP Factor Evolution
Arguably it also facilitates complex searches for specific types of phylogeny
e.g. analyses using cytochrome b
(you could use PLOS’s API to do this, particularly their figure/table caption search field — but you’d get a lot of false positives — this is an expert-curated collection that has filtered-out non-phylo figures)
In my initial roadmap, the plan is to do PLOS ONE, the other PLOS journals, then BMC journals, then possibly Zootaxa & Phytotaxa (Magnolia Press). There will be a Github-based website for the project soon, lots still to do…!
Want to know more / collaborate / critique ?
I’ve got an accepted lightning talk at iEvoBio in Raleigh, NC later this year about the PLUTo project.
As well as an accepted lightning talk at the Bioinformatics Open Source Conference (BOSC) in Boston, MA.
Elsewise, contact me via twitter @rmounce , the comment section on this blog post, or email ross dot mounce <at> gmail dot com