Show me the data!

Command-line access to research: getpapers

July 2nd, 2015 | Posted by rmounce in Content Mining | Open Science

With a first commit to github not so long ago (2015-04-13), getpapers is one of the newest tools in the ContentMine toolchain.

It’s also the most readily accessible and perhaps most immediately exciting – it does exactly what it says on the tin: it gets papers for you en masse without having to click around all those different publisher websites. A superb time-saver.

It kinda reminds me of mps-youtube: a handy CLI application for watching/listening to youtube.

Installation is super simple and usage is well documented at the source code repository on github, and of course it’s available under an OSI-approved open source MIT license.

An example usage querying Europe PubMedCentral

Currently you can search 3 different aggregators of academic papers: Europe PubMedCentral, arXiv, and IEEE. Copyright restrictions unfortunately mean that full text article download with getpapers is restricted to only freely accessible or open access papers. The development team plans to add more sources that provide API access in future, although it should be noted that many research aggregators simply don’t appear to have an API at the moment e.g. bioRxiv.

The speed of the overall process is very impressive. I ran the below search & download command and it executed it all in 32 seconds, including the download of 50 full text PDFs of the search-relevant articles!

You can choose to download different file formats of the search results: PDF, XML or even the supplementary data. Furthermore, getpapers integrates extremely well with the rest of the ContentMine toolchain, so it’s an ideal starting point for content mining.

getpapers is one of many tools in the ContentMine toolchain that I’ll be demonstrating to early career biologists at a FREE registration, one-day workshop at the University of Bath, Tuesday 28th July. If you’re interested in learning more about fully utilizing the research literature in scalable, reproducible ways, come along! We still have some places left. See the flyer below for more details or follow this link to the official workshop registration page:


One Response

Leave a Reply

Your email address will not be published. Required fields are marked *