Mastodon

Category: Content Mining

  • Traditional Publishers: please stop blocking research

    OpenCon 2015 Brussels was an amazing event. I’ll save a summary of it for the weekend but in the mean time, I urgently need to discuss something that came up at the conference. At OpenCon, it emerged that Elsevier have apparently been blocking Chris Hartgerink’s attempts to access relevant psychological research papers for content mining.…

  • Using the NHM Data Portal API

    Anyone care to remember how awful and unusable the web interface for accessing the NHM’s specimen records used to be? Behold the horror below as it was in 2013, or visit the Web Archive to see just how bad it was. It’s not even the ‘look’ of it that was the major problem – it was…

  • Command-line access to research: getpapers

    With a first commit to github not so long ago (2015-04-13), getpapers is one of the newest tools in the ContentMine toolchain. It’s also the most readily accessible and perhaps most immediately exciting – it does exactly what it says on the tin: it gets papers for you en masse without having to click around…

  • Deep indexing supplementary data files

    To prove my point about the way that supplementary data files bury useful data, making it utterly indiscoverable to most, I decided to do a little experiment (in relation to text mining for museum specimen identifiers, but also perhaps with some relevance to the NHM Conservation Hackathon): I collected the links for all Biology Letters…

  • Progress on specimen mining

    I’ve been on holiday to Japan recently, so work came to a halt on this for a while but I think I’ve largely ‘done’ PLOS ONE full text now (excluding supplementary materials). My results are on github: https://github.com/rossmounce/NHM-specimens/tree/master/results – one prettier file without the exact provenance or in-sentence context of each putative specimen entity, and one more…

  • BMNH specimens used in PLOS ONE

    In this post I’ll go through an illustrated example of what I plan to do with my text mining project: linking-up biological specimens from the Natural History Museum, London (sometimes known as BMNH or NHMUK) to the published research literature with persistent identifiers. I’ve run some simple grep searches of the PMC open access subset…