Show me the data!

[Update 2015-09-19: since writing this, I notice my open access article has now been unpaywalled at Wiley’s site. No-one from Wiley has reached out to me to explain how, why, or when this happened. No compensation has been offered, nor any apology. I note that all the other articles in the special section, which should also be open access (CC BY) are still on sale, behind a paywall. Selling access to articles that should be open access is very scammy publishing. Shame on Wiley.]

I got invited to review a manuscript by a British Ecological Society journal (MEE) that is published with Wiley recently.

I rejected the request and will from now on decline to review for all Wiley journals. In this post I duplicate my email to the Assistant Editor (Chris Greaves) explaining why. FWIW Chris has handled my letter extremely well and will forward it on for me to where it needs to be seen/read within the British Ecological Society.

Below is the email I sent earlier today in full:

from: Ross Mounce <>
date: 18 August 2015 at 11:57
subject: Re: Follow-up: Invitation to Review for Methods in Ecology and Evolution

Dear Chris,

Thank you (and Rich FitzJohn) for inviting me to review this manuscript.

It looks interesting from the abstract and in other circumstances I would certainly agree to review it.

However, I refused to review this manuscript and will refuse to review any subsequent manuscript for this publisher (Wiley) because I believe they are actively impeding progress in science by choosing to operate a predominately subscription-based business model – artificially restricting access to knowledge that taxpayers (through government funding) and charities predominantly fund. Furthermore they do an extremely poor job of it.

  • They produce but actively withhold full text XML (even from subscribers). Reputable open access publishers have no qualms in making their full text XML available to all. This is deeply frustrating for those interested in synthesis, reproducibility and getting the most from published science in a time-efficient manner. As the manuscript I was just asked to review was principally about ‘automated content analysis’ I find this particularly galling and I am wondering why the authors thought it was appropriate to submit this to such a journal.
  • They use an outdated back-end system: ‘ManuscriptCentral’ which is by all accounts an extremely poor system. Wiley have made huge profits each and every year in the past decade and yet seem completely unwilling to re-invest that in improving their systems. There wasn’t even a free text box to explain my reasons for declining to review this manuscript. Utterly poor, neglected design. Try PeerJ or Pensoft’s submission system. They have clearly worked hard and invested time and effort into making publishing research better for everyone, not just their own profit-margin.
  • Wiley’s hybrid open access charge ($3000) is outrageously expensive and bears no resemblance or link to the actual cost of production or services provided. I am aware of the ‘discount’ levied for British Ecological Society members (down to $2,250). The ‘discount’ is only gained if one of the authors pays ~ $80 to join BES (full, ordinary member rate). That is still far too high. For context, some other open access fees: PLOS ONE charges $1350, PeerJ just $99 per author (the manuscript I was just asked to review has only 4 authors), Ubiquity Press journals $500, and Biodiversity Data Journal is still FREE ($0) whilst in launch phase. This to me is strong evidence of either deep inefficiency or profit-gouging or a mixture of both on Wiley’s part, none of which are excusable. I am certainly not alone in thinking this. See recent tweets from Rob Lanfear (an excellent scientist):
  • Wiley are a significant player in the modern oligopoly of academic publisher knowledge racketeering. Data from FOI requests in the UK show that in the last five years (2010-2014), 125 UK Higher Education Institutes have collectively spent nearly £77,000,000 renting access to knowledge that Wiley has captured. That’s just the UK. Wiley doesn’t pay authors for their content, nor do they pay reviewers. I don’t know why the British Ecological Society (BES) partners with these racketeers – I find this arrangement severely detrimental to the goals of BES and academic research.
  • Like the other big knowledge racketeers Wiley operate a ‘big bundle’ subscription system. By adding BES journals to this big bundle of subscriber-only knowledge, it makes it harder for libraries around the world to cancel their subscriptions to this big bundle. Wiley know this and hence are actively trying to acquire as many good journals as possible (e.g. ESA journals) to make themselves ‘too big to cancel’.
  • On a personal note, I am particularly aggrieved with Wiley because they are currently, without my consent, charging $45.60 including tax, to ‘non-subscribers’ for access to one of my open access articles that they have copied over from where it is freely available at the original publisher. Charging $45.60 to access something that is freely available at the original publisher is simply astonishing and is just another facet to the lunacy of the many and multiple ways in which Wiley and companies like it seek to profiteer from and restrict access to research.

For all these reasons and many more I simply cannot agree to review manuscripts for any Wiley journal. I am already boycotting Elsevier, and am considering applying the same to subscription-access Nature Springer and Taylor & Francis journals for similar reasons.

I urge the British Ecological Society to reconsider their ‘partnership’ with this profiteering entity and to pursue publishing with organisations that are actually competent at modern 21st century academic publishing, particularly those that support and actively facilitate content mining e.g. Pensoft, PLOS, PeerJ, eLife, Ubiquity Press, MDPI and F1000Research, to name but a few.


Ross Mounce


I feel relieved to have done this. Having reviewed for Wiley only last month it didn’t feel right. Why would I help them whilst boycotting Elsevier? They are essentially as bad as each other. My position is more logically consistent now.

Many thanks to others who have also publicly written about refusing to review for legacy publishers, these posts certainly helped me in my decision-making:

Mike Taylor: Researchers! Stop doing free work for non-open journals!

Heather Piwowar: Sending A Message

Ethan White: Why I will no longer review for your journal

Casey Bergman: Just Say No – The Roberts/Ashburner Response

PS Having read Tom Pollard’s post on this matter, I might also write to one of the authors to explain why I declined to review their article. I wish them them well and I look forward to reading their article when it comes out.

I read some sad news on Twitter recently. The Ecological Society of America has decided to publish its journals with Wiley:

Whilst I think the decision to move away from their old, unloved publishing platform is a good one. The move to publish their journals with Wiley is a strategically poor one. In this post I shall explain my reasoning and some of the widespread dissatisfaction with the direction of this change.

Society journals should not be a profit-driven business

The stated goals of The Ecological Society of America (ESA) are noble and I reproduce them here below to help you understand what the society in theory aims to do:

  • * promote ecological science by improving communication among ecologists;
  • * raise the public’s level of awareness of the importance of ecological science;
  • * increase the resources available for the conduct of ecological science;
  • * ensure the appropriate use of ecological science in environmental decision making by enhancing communication between the ecological community and policy-makers


Reading those four bullet points, it strikes me that a society with this stated mission should be a vanguard of the open access movement. An efficient, well-implemented open access publishing system, supported (and thus empowered) by the ESA would positively address all four of those goals.

Do I need to explain how open access would improve communication among ecologists? It should be obvious to most. Some facts:

Universities around the world do not have access to all subscription journals, not even Harvard. Wiley’s big journal bundle of subscriptions is no exception to this rule. Brock University in Canada is one such notable example. ‘Ecology and Evolution’ is one of two “main themes” of Brock’s Biology Department yet it does not have access to the Wiley bundle of subscription journals.

Furthermore, as the above tweet demonstrates many ecologists are not based at universities. Not all uses or readership of ecology journals is by ecologists, it’s absolutely not sufficient to just provide access to ecologists (alone). It’s vital that policymakers and the public have access to the latest research, no embargoes. Want evidence that policymakers lack access to research? Look no further than this blog post from a recent intern at the UK Parliamentary Office for Science & Technology (POST):

The level of access to journals was far lower than I had expected (it was actually shocking) – I ended up using my academic access throughout my placement.

Source: (2015-01-12)

If the ESA seriously wants to “ensure the appropriate use of ecological science in environmental decision making by enhancing communication between the ecological community and policy-makers” then making it easier for policymakers like those at POST to access research published in ESA journals would surely be a great way of doing that. How does the ESA expect to “raise the public’s level of awareness of the importance of ecological science” if most of the science that they themselves publish in their own journals is behind an expensive paywall? $20 for 30 day access to one article? Admittedly that’s cheaper than many but it’s simply not supportive of ESA’s mission.

Does this help raise the awareness of ecological science?

Does this unnecessary paywall help raise the awareness of ecological science?


Lastly, with respect to increasing “resources available for the conduct of ecological science” the ESA urgently needs to consider the big picture here. Wiley, Springer Nature, Elsevier and other legacy publishers are a major drain on the financial resources available for research. With their big bundle deals they ransom/rent access to libraries for sums that can be up to many millions of dollars, every year, per institution. Money should instead be diverted into efficient, high-quality publishing systems like JMLR, Open Library of Humanities, PeerJ, Pensoft and Ubiquity Press to name but a few. All of these not only provide open access, but also high-quality publishing services at a significantly lower cost. Many provide added extras such as semantically-enhanced full-text XML which would make synthesis of ecological science easier. Wiley does not provide direct access to per article full-text XML even to its paying subscribers! They do half the job for thrice the price. Why would ESA want to help to sustain and enhance Wiley’s famous 42% profit margin? These legacy publishers are strategically merging, and acquiring journals in order to make it harder for libraries to cancel their dross-laden ‘big bundle’ subscription packages. It doesn’t seem like a logical decision to me or others.

Comparing this to other recent journal publishing changes

To put into context the ESA move to Wiley, let’s look at three other recent examples of academic societies changing publisher:

1.) Museum für Naturkunde Berlin journals (flipping to open access)

In 2014 all of their journals moved away from being published with Wiley. Their two zoological journals which have been around since before the ESA was even formed(!) transferred to open access publishing with Pensoft. Their Earth Science journal Fossil Record also moved away from Wiley, to open access publishing with Copernicus Publications. Guess what? The sky didn’t fall. I predict the articles in these journals will start being read, downloaded and cited more now that they are open access to everyone.

2.) Paleontology Society journals (switching to arguably a more benign, less profit-driven legacy publisher)

In 2015 PalSoc journals switched to be published with Cambridge University Press (CUP). I’m not super enthusiastic about CUP but if a society really wants to do legacy publishing, without worsening the stranglehold of the big publishing companies over libraries then CUP, or other university presses (Oxford, John Hopkins, Chicago) seem like safer custodians of academic intellectual property to me.

3.) American Society of Limnology and Oceanography (moving to Wiley)

To provide a fair comparison it’s important to look at what happens when a society journal joins Wiley. I know of one such case recently: ASLO journals. The transfer to Wiley was far from smooth or professional. In the few months that Wiley had the ASLO journals, they managed to ‘accidentally’ paywall thousands of articles that should have been available for free (as per ASLO’s wishes) and charged actual readers for reading these older should-be-free articles.  I paid $45.60 for access to one such ASLO article at Wiley – it should have been made available to all for free. Both Springer and Elsevier have also been caught doing this. The ESA currently makes some articles in its subscription journals ‘free to read’ to all, so I shall be closely monitoring the new Wiley-ESA journal websites when they launch, to see if they make the same conveniently profit-generating ‘mistakes’ again.

How did this happen? Who was consulted? Why was this choice made?

I for one was completely unaware that ESA were looking for a new publisher. I would have tried to help if I had known. I have many unanswered questions over the consultation process. For example, the ESA has an Open Science section and mailing list, its members are extremely knowledgeable about the academic publishing landscape and publishing technology.

Was the ESA membership in it’s entirety specifically and clearly asked which publisher they would like the ESA to publish with? Did they ask their membership what features they wanted from their new publishing platform? I would have requested a platform that provides access to semantically-enriched full text XML – Wiley does not provide this. Given a choice, and the vital context and information given above I think few ESA members, policymakers, or members of the public would choose Wiley as ESA’s new publisher.

I gather from Twitter that “any and all” were invited to submit a proposal to publish ESA journals and that Elsevier submitted a proposal. But having a lazy tendering process only biases decisions towards major conglomerates who have the time, energy and resources to make slick proposals – I wonder if smaller but high-quality publishing companies were pro-actively approached by ESA to submit a proposal? In the public interest, I think the ESA should publish the names of all organisations who submitted proposals to publish ESA journals – I think just that data alone might potentially reveal flaws in the tendering process. I’m finding it really hard to reconcile the goals of ESA and shareholder-profits motivation of Wiley. I genuinely think the leadership of ESA is out of touch with its membership and that they may not have been properly consulted about this major change to the society.

This is a long post, and I’ve said enough, so I’ll leave it to a professional scholarly communications expert (Kevin Smith, Duke University), to have the last word about Wiley, and the recent trend towards cancelling Wiley subscriptions:

I don’t know if Wiley is the worst offender amongst the large commercial publishers, or whether there is a real trend toward cancelling Wiley packages.  But I know the future of scholarship lies elsewhere than with these large legacy corporations.



But perhaps we can turn this negative into positive by creating resources and impartial educational guides for academic societies on how to negotiate better publishing deals, and how to start a tendering process with an eye towards the inevitable future of open access? If SPARC or SPARC Europe already provides these resources please do point me at them!

With a first commit to github not so long ago (2015-04-13), getpapers is one of the newest tools in the ContentMine toolchain.

It’s also the most readily accessible and perhaps most immediately exciting – it does exactly what it says on the tin: it gets papers for you en masse without having to click around all those different publisher websites. A superb time-saver.

It kinda reminds me of mps-youtube: a handy CLI application for watching/listening to youtube.

Installation is super simple and usage is well documented at the source code repository on github, and of course it’s available under an OSI-approved open source MIT license.

An example usage querying Europe PubMedCentral

Currently you can search 3 different aggregators of academic papers: Europe PubMedCentral, arXiv, and IEEE. Copyright restrictions unfortunately mean that full text article download with getpapers is restricted to only freely accessible or open access papers. The development team plans to add more sources that provide API access in future, although it should be noted that many research aggregators simply don’t appear to have an API at the moment e.g. bioRxiv.

The speed of the overall process is very impressive. I ran the below search & download command and it executed it all in 32 seconds, including the download of 50 full text PDFs of the search-relevant articles!

getpapers --query 'flaveria c4' -p --outdir test

You can choose to download different file formats of the search results: PDF, XML or even the supplementary data. Furthermore, getpapers integrates extremely well with the rest of the ContentMine toolchain, so it’s an ideal starting point for content mining.

getpapers is one of many tools in the ContentMine toolchain that I’ll be demonstrating to early career biologists at a FREE registration, one-day workshop at the University of Bath, Tuesday 28th July. If you’re interested in learning more about fully utilizing the research literature in scalable, reproducible ways, come along! We still have some places left. See the flyer below for more details or follow this link to the official workshop registration page:


Deep indexing supplementary data files

June 20th, 2015 | Posted by rmounce in Conservation Hackathon | Content Mining | Hack days - (Comments Off on Deep indexing supplementary data files)

To prove my point about the way that supplementary data files bury useful data, making it utterly indiscoverable to most, I decided to do a little experiment (in relation to text mining for museum specimen identifiers, but also perhaps with some relevance to the NHM Conservation Hackathon):

I collected the links for all Biology Letters supplementary data files. I then filtered out the non-textual media such as audio, video and image files, then downloaded the remaining content.

A breakdown of file extensions encountered in this downloaded subset:

763 .doc files
543 .pdf files
109 .docx files
75 .xls files
53 .xlsx files
25 .csv files
19 .txt files
14 .zip files
2 .rtf files
2 .nex files
1 .xml file
1 “.xltx” file

I then converted some of these unfriendly formats into simpler, more easily searchable plain text formats:

for i in *.zip ; do unzip $i -d /home/ross/work/royal-soc-si/biol-letters-supp-info/transformed/unzipped_$i ; done
for i in *.docx ; do docx2txt $i /home/ross/work/royal-soc-si/biol-letters-supp-info/transformed/$i.txt ; done
for i in *.doc ; do catdoc -a $i > /home/ross/work/royal-soc-si/biol-letters-supp-info/transformed/$i.txt ; done
for i in *.pdf ; do pdftotext $i > /home/ross/work/royal-soc-si/biol-letters-supp-info/transformed/$i.txt ; done
for i in *.rtf ; do unrtf --text $i > /home/ross/work/royal-soc-si/biol-letters-supp-info/transformed/$i.txt ; done
for i in *.xls ; do in2csv $i > /home/ross/work/royal-soc-si/biol-letters-supp-info/transformed/$i.csv ; done
for i in *.xlsx ; do in2csv $i > /home/ross/work/royal-soc-si/biol-letters-supp-info/transformed/$i.csv ; done


Now everything is properly searchable and indexable!

In a matter of seconds I can find NHM specimen identifiers that might not otherwise be mentioned in the full text of the paper, without actually wasting any time manually reading any papers. Note, not all the ‘hits’ are true positives but most are, and those that aren’t e.g. “NHMQEVLEGYKKKYE” are easy to distinguish as NOT valid NHM specimen identifiers:

$ grep -ior 'nhm............'
20120949_ESM_1.txt:NHMUK R6792), N
20120949_ESM_1.txt:NHMUK R8646) in
20120949_ESM_1.txt:NHMUK R36615, ‘
20120949_ESM_1.txt:NHMUK R36620), 
20120949_ESM_1.txt:NHMUK R16586). 
20120949_ESM_1.txt:NHMUK R36620) a
20120949_ESM_1.txt:NHMUK R16586) a
20120949_ESM_1.txt:NHMUK R6856 was
20120949_ESM_1.txt:NHMUK Charig ar
20120949_ESM_1.txt:NHMUK R6856 and
20120949_ESM_1.txt:NHMUK R6856 wer
20120949_ESM_1.txt:NHMUK R6856 wer
20120949_ESM_1.txt:NHMUK R6856 and
20120949_ESM_1.txt:NHM R6856 just 
20120949_ESM_1.txt:NHM R6856 (figu
20120949_ESM_1.txt:NHMUK R6856 had
20120949_ESM_1.txt:NHMUK R3592) an
20120949_ESM_1.txt:NHMUK R6856. Th
20120949_ESM_1.txt:NHMUK R6856). M
20120949_ESM_1.txt:NHMUK with the 
20120949_ESM_1.txt:NHMUK R6856 is 
20120949_ESM_1.txt:NHMUK R6856 sug
20120949_ESM_1.txt:NHMUK R6856. Th
20120949_ESM_1.txt:NHMUK R6856 sug
20120949_ESM_1.txt:NHMUK R6856, bu
20120949_ESM_1.txt:NHMUK R6586 is 
20120949_ESM_1.txt:NHMUK R6586 als
20120949_ESM_1.txt:NHMUK R6586, we
20120949_ESM_1.txt:NHMUK R6586 can
20120949_ESM_1.txt:NHMUK R6586 was
20120949_ESM_1.txt:NHMUK R6586 may
20120949_ESM_1.txt:NHMUK R6856 are
20120949_ESM_1.txt:NHMUK R6856) av
20120949_ESM_1.txt:NHMUK R6795) in
20120949_ESM_1.txt:NHMUK R6795 is 
20120949_ESM_1.txt:NHMUK R6856 and
20120949_ESM_1.txt:NHMUK R6856 and
20120949_ESM_1.txt:NHMUK R6856 was
20120949_ESM_1.txt:NHMUK R6856 fal
20120949_ESM_1.txt:NHMUK R6856 is 
20120949_ESM_1.txt:NHMUK R6856 + S
20120949_ESM_1.txt:NHMUK R6856 whe
20120949_ESM_1.txt:NHMUK R6856 + S
20120949_ESM_1.txt:NHMUK 1, Tanzan
20120949_ESM_1.txt:NHMUK R6856 and
20120949_ESM_1.txt:NHMUK Charig ar
20120949_ESM_1.txt:NHMUK R6856 to 
20120949_ESM_1.txt:NHMUK) for perm
20120949_ESM_1.txt:NHMUK) for acce
20120949_ESM_1.txt:NHMUK Image Res
20120949_ESM_1.txt:NHMUK, The Natu
rsbl20060505supp.txt:NHM uncataloged
rsbl20060505supp.txt:NHM uncataloged
rsbl20070502supp01.doc.txt:NHM) provided v
rsbl20090302supp3.doc.txt:NHM = The Natur
rsbl20090302supp3.doc.txt:NHMW = Natural 
rsbl20090302supp3.doc.txt:NHM E32070	Plan
rsbl20090302supp3.doc.txt:NHM EE5034	Plan
rsbl20090302supp3.doc.txt:NHM E4381	Plank
rsbl20090302supp3.doc.txt:NHM E10384	Plan
rsbl20090302supp3.doc.txt:NHM EE4825	Plan
rsbl20090302supp3.doc.txt:NHM E8389	Plank
rsbl20090302supp3.doc.txt:NHM EE8132	Plan
rsbl20090302supp3.doc.txt:NHM EE5585	Non-
rsbl20090302supp3.doc.txt:NHM EE ?	Non-pl
rsbl20090302supp3.doc.txt:NHM EE1961	?	?	
rsbl20090302supp3.doc.txt:NHM E35551	Plan
rsbl20090302supp3.doc.txt:NHM E76539	?	Up
rsbl20090302supp3.doc.txt:NHM EE4055	Plan
rsbl20090302supp3.doc.txt:NHM E81494	Plan
rsbl20090302supp3.doc.txt:NHM EE4631	?	Ap
rsbl20090302supp3.doc.txt:NHM EE4632	?	Ap
rsbl20090302supp3.doc.txt:NHM EE4641	Plan
rsbl20090302supp3.doc.txt:NHM E20098	Plan
rsbl20090302supp3.doc.txt:NHM EE4404	Plan
rsbl20090302supp3.doc.txt:NHM EE8397	Plan
rsbl20090302supp3.doc.txt:NHM EE2372	?	Ma
rsbl20090302supp3.doc.txt:NHM E79718	Plan
rsbl20090302supp3.doc.txt:NHM E40574	Plan
rsbl20090302supp3.doc.txt:NHM EE4524	Plan
rsbl20090302supp3.doc.txt:NHM E79415	Non-
rsbl20090302supp3.doc.txt:NHM E45372	?	Tu
rsbl20090302supp3.doc.txt:NHM EE2321	Plan
rsbl20090302supp3.doc.txt:NHM EE2262	Plan
rsbl20090302supp3.doc.txt:NHM EE4610	Plan
rsbl20090302supp3.doc.txt:NHM E4052	Non-p
rsbl20090302supp3.doc.txt:NHM EE191	Plank
rsbl20090302supp3.doc.txt:NHM EE2353	Plan
rsbl20090302supp3.doc.txt:NHM E4034	Plank
rsbl20090302supp3.doc.txt:NHM EE2432	Plan
rsbl20090302supp3.doc.txt:NHM E4176	Plank
rsbl20090302supp3.doc.txt:NHM EE4048	?	Ma
rsbl20090302supp3.doc.txt:NHM E9892	Plank
rsbl20090302supp3.doc.txt:NHM E4979	?	Tur
rsbl20090302supp3.doc.txt:NHM E75821	Plan
rsbl20090302supp3.doc.txt:NHM E40974	?	Se
rsbl20090302supp3.doc.txt:NHM E79094	Plan
rsbl20090302supp3.doc.txt:NHM E582	Plankt
rsbl20090302supp3.doc.txt:NHMW 2005z0083/
rsbl20090302supp3.doc.txt:NHM E82582	?	U.
rsbl20090302supp3.doc.txt:NHM EE7698	Plan
rsbl20090302supp3.doc.txt:NHM E9392	Plank
rsbl20090302supp3.doc.txt:NHM E73207	?	Al
rsbl20090302supp3.doc.txt:NHM E43810	Plan
rsbl20090302supp3.doc.txt:NHM 56422	?	Apt
rsbl20090302supp3.doc.txt:NHM E83246	Plan
20120949_ESM_5.txt:NHMUK R6856) am
rsbl2011364supp1.doc.txt:NHM-72.666; MCZ
20120949_ESM_3.txt:NHMUK R6856). P
20120949_ESM_3.txt:NHMUK R6856) in
rsbl20090778supp1.doc.txt:NHM as a contro
rsbl20090139supp1.txt:NHM, The Natura
rsbl20090139supp1.txt:NHM R1034). As
20120949_ESM_2.txt:NHMUK R6856) in
20120949_ESM_2.txt:NHMUK R6856) in
rsbl20080409supp01.doc.txt:NHMW, Naturhist
rsbl20130021supp1.doc.txt:NHM, Staatliche
rsbl20130021supp1.doc.txt:NHMUK PV R498 a
rsbl20130021supp1.doc.txt:NHMUK PV OR3612
rsbl20130021supp1.doc.txt:NHMUK PV R3938 
rsbl20130021supp1.doc.txt:NHMUK PV R5465)
rsbl20130021supp1.doc.txt:NHMUK PV OR2003
rsbl20130021supp1.doc.txt:NHMUK PV R1158)
rsbl20130021supp1.doc.txt:NHMUK PV R5595)
rsbl20130021supp1.doc.txt:NHMUK PV R4086)
rsbl20130021supp1.doc.txt:NHMUK and GLAHM
rsbl20130021supp1.doc.txt:NHM); Sveltonec
rsbl20130021supp1.doc.txt:NHMUK PV R11185
rsbl20130021supp1.doc.txt:NHM1284-R); Mal
rsbl20130021supp1.doc.txt:NHMUK PV R6682)
rsbl20130021supp1.doc.txt:NHMUK PV R6682)
rsbl20130021supp1.doc.txt:NHMUK in 1959, 
rsbl20130021supp1.doc.txt:NHMUK. While th
rsbl20130021supp1.doc.txt:NHMUK PV R6682 
rsbl20130021supp1.doc.txt:NHMUK PV R6682)
rsbl20130021supp1.doc.txt:NHMUK PV R6682,
rsbl20130021supp1.doc.txt:NHMUK PV R6682,
rsbl20130021supp1.doc.txt:NHMUK PV R6682 
rsbl20130021supp1.doc.txt:NHMUK) for the 
20120949_ESM_4.txt:NHMUK R6856) wh


Perhaps this approach might be useful to the PREDICTS / LPI teams, looking for species occurrence data sets?

I don’t know why figshare doesn’t do deep indexing by default – it’d be really useful to search the morass of published supplementary data that out there!