PLOS ONE PHYLOGENYMay 7th, 2014 | Posted by in Content Mining | Open Data | PLoS | Open Science | PLUTo
I’m proud to announce an interesting public output from my BBSRC-funded postdoc project:
PLUTo: Phyloinformatic Literature Unlocking Tools. Software for making published phyloinformatic data discoverable, open, and reusable
I’ve made openly available my first-pass filter of PLOS ONE phylogeny figures (I’m not in any way claiming this is *all* of them).
This curated & tagged image collection is on Flickr for easy browsing: http://bit.ly/PLOStrees
As well as on Github for version control, open archiving, and collaboration (I have remote collaborators):
(Github doesn’t like repositories over 1GB so I’ve had to split-up the content between 4 separate repositories)
The aim of the PLUTo project is to re-extract & liberate phylogenetic data & associated metadata from the research literature. Sadly, only ~4% of modern published phylogenetic analysis studies make their underlying data available. Another study finds that if you ask the authors for this data, only 16% will be kind enough to reply with the requested data!
This particular data type is a cornerstone of modern evolutionary biology. You’ll find phylogenetic analyses across a whole host of journal subjects – medical, ecological, natural history, palaeontology… There are also many different ways in which this data can be re-used e.g. supertrees & comparative cladistics. Not to mention, simple validation studies &/or analyses which extend-upon or map new data on to a phylogeny. It’s really useful data and we should be archiving it for future re-use and re-analysis. To my great delight, this is what I’m being paid to attempt to do for my first postdoc; on a grant I co-wrote – finding & liberating phylogenetic data for everyone!
Why PLOS ONE?
- It’s a BOAI-compliant open access journal that publishes most articles under CC BY, with a few under CC0.
- This means I can openly re-publish figures online (provided sufficient attribution is given) — no need to worry about DMCA takedown notices or ‘getting sued’! This makes the process of research much easier. Private, non-public, access-restricted repositories for collaboration are a hassle I’d rather do without.
- It’s a high-volume ‘megajournal’ publishing ~200 articles per day, many of which include phylogenetic analyses.
- Thus its worthwhile establishing a regular daily or weekly method for parsing-out phylogenetic tree figures from this journal
- Killer feature: as far as I know, PLOS are the only publisher to embed rich metadata inside their figure image files.
- This makes satisfying the CC BY licence trivially easy — sufficient attribution metadata is already embedded in the file. Just ensure that wherever you’re uploading the file to doesn’t wipe this embedded data, hence why I chose Flickr as my initial upload platform.
What does this enable or make easier?
On it’s own, this collection doesn’t do much, this is still an early stage – but it gives us an important insight into the prevalence of certain types of visual display-style that researchers are using:
‘geophylogeny’ (phylogeny displayed relative to a map of some sort, 2D or 3D)
‘timescaled’ (phylogenies where the branch lengths are proportional to units of time or geological periods)
Arguably it also facilitates complex searches for specific types of phylogeny
e.g. analyses using cytochrome b
(you could use PLOS’s API to do this, particularly their figure/table caption search field — but you’d get a lot of false positives — this is an expert-curated collection that has filtered-out non-phylo figures)
In my initial roadmap, the plan is to do PLOS ONE, the other PLOS journals, then BMC journals, then possibly Zootaxa & Phytotaxa (Magnolia Press). There will be a Github-based website for the project soon, lots still to do…!
Want to know more / collaborate / critique ?
I’ve got an accepted lightning talk at iEvoBio in Raleigh, NC later this year about the PLUTo project.
As well as an accepted lightning talk at the Bioinformatics Open Source Conference (BOSC) in Boston, MA.
Elsewise, contact me via twitter @rmounce , the comment section on this blog post, or email ross dot mounce <at> gmail dot com