Command-line access to research: getpapersJuly 2nd, 2015 | Posted by in Content Mining | Open Science
It’s also the most readily accessible and perhaps most immediately exciting – it does exactly what it says on the tin: it gets papers for you en masse without having to click around all those different publisher websites. A superb time-saver.
It kinda reminds me of mps-youtube: a handy CLI application for watching/listening to youtube.
Currently you can search 3 different aggregators of academic papers: Europe PubMedCentral, arXiv, and IEEE. Copyright restrictions unfortunately mean that full text article download with getpapers is restricted to only freely accessible or open access papers. The development team plans to add more sources that provide API access in future, although it should be noted that many research aggregators simply don’t appear to have an API at the moment e.g. bioRxiv.
The speed of the overall process is very impressive. I ran the below search & download command and it executed it all in 32 seconds, including the download of 50 full text PDFs of the search-relevant articles!
getpapers --query 'flaveria c4' -p --outdir test
You can choose to download different file formats of the search results: PDF, XML or even the supplementary data. Furthermore, getpapers integrates extremely well with the rest of the ContentMine toolchain, so it’s an ideal starting point for content mining.
getpapers is one of many tools in the ContentMine toolchain that I’ll be demonstrating to early career biologists at a FREE registration, one-day workshop at the University of Bath, Tuesday 28th July. If you’re interested in learning more about fully utilizing the research literature in scalable, reproducible ways, come along! We still have some places left. See the flyer below for more details or follow this link to the official workshop registration page: bit.ly/MiningWrkshp