Show me the data!
Header

Open in order to unleash the power of text mining

October 23rd, 2017 | Posted by rmounce in Generation Open | Open Access - (Comments Off on Open in order to unleash the power of text mining)

In 2017, we have a vast toolbox of informative methods to help us analyse large volumes of text. Sentiment analysis, topic modelling, and named entity recognition are to name but a few of these exciting approaches. Computational power and storage capacity are not the limiting factors on what we could do with the 100 million or so journal articles that comprise the ever-growing research literature so far. But the continued observance of 17th century limitations on how we can use research are simply jarring. Thanks to computers and the internet, we have the ability to do wonderful things, but the licensing and access-restrictions placed on most of the research literature explicitly and artificially prevent most of us from trying. As a result, few researchers bother thinking about using text mining techniques – it is often simpler and easier to just farm-out repetitive large-scale literature analysis tasks to an array of student minions and volunteers to do by-hand – even though computers could and perhaps should be doing these analyses for us.

Inadequate computational access to research has already caused us great harm. Just ask the Ministry of Health in Liberia: they were not pleased to discover, after a lethal Ebola virus outbreak, that vital knowledge locked-away in “forgotten papers” published in the 1980’s, clearly warned that the Ebola virus might be present in Liberia. This information wasn’t in the title, keywords, metadata, or abstract; it was completely hidden behind a paywall. Full text mining approaches would have easily found this buried knowledge and would have provided vital early warning that Ebola could come to Liberia, which might have prevented some deaths during the West African Ebola virus epidemic (2013–2016)

Some subscription-based publishers have been known to use ‘defence’ mechanisms such as ‘trap URLs’ that hinder text miners – making it even harder to do basic research. Whilst other subscription publishers like Royal Society Publishing are helpfully supportive to text miners, as are open access publishers. Hindawi for instance, allows anyone to download every single article they’ve ever published with a single mouse-click. Thanks to open licensing, aggregators like Europe PubMedCentral can bring together the outputs of many different OA publishers, making millions of articles available with a minimum of fuss. It is “no bullshit” access. You want it? You can have it all. No need to beg permission, to spend months negotiating and signing additional contracts, nor to use complicated publisher-controlled access APIs, and their associated restrictions. Furthermore, OA publishers typically provide highly structured full-text XML files which make it even easier for text miners. But only a small fraction of the research literature is openly-licensed open access. It’s for these reasons and more that many of the best text-mining researchers operate-on and enrich our understanding of open access papers-only e.g. Florez-Vargas et al 2016.

So if I had but one wish this Christmas, it would be for the artificial, legally-imposed restrictions on the bulk download and analysis of research texts, to be unambiguously removed for everyone, worldwide – so that no researcher need fear imprisonment or other punitive action, simply for doing justified and ethical academic research. Unchain the literature, and we might be able to properly unleash and apply the collected knowledge of humanity.  

 

This is my short contribution for Open Access Week 2017, and the #OpenInOrderTo website created by SPARC, to move beyond talking about openness in itself and focus on what openness enables.

 

New Career, Same Me

April 17th, 2017 | Posted by rmounce in Open Access - (3 Comments)

This is a quick post to announce what I’ll be doing next after my postdoc at the Department of Plant Sciences, University of Cambridge. From June 2017 onwards, I’m delighted to say I’ll be the new Open Access Grants Manager for Arcadia Fund.

About Arcadia Fund

If you haven’t heard of it before here’s what you need to know: Arcadia is a charitable fund, set up by Peter Baldwin and Lisbet Rausing in 2002. So far, it has awarded more than $440 million to cultural, environmental and open access projects. Within the open access funding programme Arcadia have awarded grants to organisations including Creative Commons, Wikimedia Foundation, Authors Alliance, Public.Resource.Org, Internet Archive, Digital Public Library of America and more…

New Career, Same Me

When the job ad came-up I could scarcely believe how good the organisational fit was for me: Arcadia funds brilliant projects in this space. I am genuinely looking forward to developing and advising on Arcadia’s open access policy, to continue engaging with the wider open access community, to manage Arcadia’s existing grants portfolio, and to identify new opportunities for high impact initiatives where funding from Arcadia will make a difference.

I feel extremely grateful to have been chosen for this position against many other talented and experienced applicants (and friends!) and although it’ll take me many months to ‘learn the ropes’ I see this as my new career now, no going back. I’m now part of the 88% majority of UK postdocs who never secure a tenured position in academia; but don’t feel sorry for me – I’m delighted with this new direction. New career, same me.

A lot of passionate, intelligent young people with an academic background have jobs where they can really make a difference (i.e. not in academia). In this regard, I’m inspired by the likes of TJ Bliss at Hewlett Foundation, Carly Strasser at Moore Foundation, Nick Shockey at SPARC, Heather Piwowar and Jason Priem at Impactstory, Joe McArthur at The Right to Research Coalition, and Jonathon Gray at Open Knowledge. Now I’ve turned 30, I’m married, and I have a beautiful baby daughter. Some things have changed, but my passion for open knowledge hasn’t. Doing ‘open’ on the side of research wasn’t enough. Soon it’ll be my full time endeavour!

There are a lot of really interesting works being published over at Research Ideas and Outcomes (RIO).  If you aren’t already following the updates you can do so via RSS, Twitter, or via email (scroll to the bottom for sign-up).

In this post I’m going to discuss why Chad Hammond’s contribution is so remarkable and why it could represent an exciting model for a more transparent and more immediate future of scholarly communications.

Version1

 

 

 

 

 

 

So, what’s special?

Well, to state the obvious first: it’s a grant proposal, not a research article. RIO Journal has published quite a lot of research proposals now, it’s becoming a real strength of the journal. But that’s not the really interesting thing about it. The really cool thing is that Chad published this grant proposal with RIO before it was submitted it to the funder (Canadian Institutes of Health Research) for evaluation.

You’ll see the publication date of Version 1 of the work is 24th March 2016. Pleasingly, after publication in RIO Chad’s proposal was evaluated by CIHR and awarded research funding. Chad received news of this in late April:

…and the story gets even better from here because thanks to RIO’s unique technology called ARPHA, Chad was able to re-import his published article back into editing mode, to update the proposal to acknowledge that it had been funded:

This proposal was submitted to and received funding from the annual Canadian Institutes of Health Research (CIHR) competition for postdoctoral fellowships.

The updated proposal was then checked by the editorial team and republished as an updated version of the original proposal: Version 2, making-use of CrossMark technology to formally link the two versions and to make sure readers are always made aware if a newer version of the work exists. Chad’s updated proposal now has a little ‘Funded’ button appended to it (see below), to indicate that this proposal has been successfully funded. We hope to see many more such successfully funded proposals published at RIO.

Title and metadata

 

 

With permission given, Chad was also able to supply some of the reviewer comments passed to him from CIHR reviewers as supplementary data to the updated Version 2 proposal. These will undoubtedly provide invaluable insight into reviewing processes for many.

Finally, for funders and publishing-tech geeks: you should really take note of the lovely machine-readable XML-formatted version of Chad’s proposal. Pensoft has machine-readable XML output as standard, not just PDF and HTML. Funding agencies around the world would do well to think closely about the value of having XML-formatted machine-readable grant proposal submissions. There’s serious value to this and I think it’s something we’ll see more of in the future. Pensoft is actively looking to work with funders to develop further these ideas and approaches for genuinely adding-value to scholarly communications.
RIO is truly an innovative journal don’t you think?
:)

References

version 1:
Hammond C (2016) Widening the circle of care: An arts-based, participatory dialogue with stakeholders on cancer care for First Nations, Inuit,and Métis peoples in Ontario, Canada. Research Ideas and Outcomes 2: e8615. doi: 10.3897/rio.2.e8615

version 2:
Hammond C (2016) Widening the circle of care: An arts-based, participatory dialogue with stakeholders on cancer care for First Nations, Inuit, and Métis peoples in Ontario, Canada. Research Ideas and Outcomes 2: e9115. doi: 10.3897/rio.2.e9115

Just a quick update to let you know how the new Research Ideas and Outcomes (RIO) journal is going. You may remember I wrote a blog post here explaining my enthusiasm for this new journal. I’m delighted to say it is exceeding my expectations.

After announcing the launch with coverage in Science (AAAS) News, Nature News, and Times Higher Education amongst others, RIO has now published many interesting and highly novel outputs.

My choice of the word ‘outputs’ rather than ‘articles’ is very deliberate. RIO is a sophisticated platform that publishes more than just articles. Central to the ethos of the journal is that academia should publish entire research cycles, not just traditional research articles. So in our first 24 published outputs there is impressive diversity on show. Below is a breakdown of these published outputs by type:

One Editorial

  • Mietchen D, Mounce R, Penev L (2015) Publishing the research process. Research Ideas and Outcomes 1: e7547. doi: 10.3897/rio.1.e7547

Ten Grant Proposals

  • Martone M, Murray-Rust P, Molloy J, Arrow T, MacGillivray M, Kittel C, Kasberger S, Steel G, Oppenheim C, Ranganathan A, Tennant J, Udell J (2016) ContentMine/Hypothes.is Proposal. Research Ideas and Outcomes 2: e8424.doi: 10.3897/rio.2.e8424
  • Susi T (2015) Heteroatom quantum corrals and nanoplasmonics in graphene (HeQuCoG). Research Ideas and Outcomes 1: e7479. doi: 10.3897/rio.1.e7479
  • Simms S, Jones S, Ashley K, Ribeiro M, Chodacki J, Abrams S, Strong M (2016) Roadmap: A Research Data Management Advisory Platform. Research Ideas and Outcomes 2: e8649. doi: 10.3897/rio.2.e8649
  • Mietchen D, Hagedorn G, Willighagen E, Rico M, Gómez-Pérez A, Aibar E, Rafes K, Germain C, Dunning A, Pintscher L, Kinzler D (2015) Enabling Open Science: Wikidata for Research (Wiki4R). Research Ideas and Outcomes 1: e7573. doi: 10.3897/rio.1.e7573
  • Wagner S (2015) Continuous and Focused Developer Feedback on Software Quality (CoFoDeF) . Research Ideas and Outcomes 1: e7576.doi: 10.3897/rio.1.e7576
  • Hartgerink C, George S (2015) Problematic trial detection in ClinicalTrials.gov. Research Ideas and Outcomes 1: e7462. doi: 10.3897/rio.1.e7462
  • Hammond C (2016) Widening the circle of care: An arts-based, participatory dialogue with stakeholders on cancer care for First Nations, Inuit,and Métis peoples in Ontario, Canada. Research Ideas and Outcomes 2: e8615. doi: 10.3897/rio.2.e8615
  • Tóth J (2016) Tools of Persuasion in Visual Advertisements at Maltese Sites of Cultural Tourism: A Social Science Analysis. Research Ideas and Outcomes 2: e8726. doi: 10.3897/rio.2.e8726
  • Wojnarski M, Hanken Kurtz D (2016) Paperity Central: An Open Catalog of All Scholarly Literature. Research Ideas and Outcomes 2: e8462.doi: 10.3897/rio.2.e8462
  • Koureas D, Hardisty A, Vos R, Agosti D, Arvanitidis C, Bogatencov P, Buttigieg P, de Jong Y, Horvath F, Gkoutos G, Groom Q, Kliment T, Kõljalg U, Manakos I, Marcer A, Marhold K, Morse D, Mergen P, Penev L, Pettersson L, Svenning J, van de Putte A, Smith V (2016) Unifying European Biodiversity Informatics (BioUnify). Research Ideas and Outcomes 2: e7787.doi: 10.3897/rio.2.e7787

One PhD Project Plan

  • Senderov V, Penev L (2016) The Open Biodiversity Knowledge Management System in Scholarly Publishing. Research Ideas and Outcomes 2: e7757.doi: 10.3897/rio.2.e7757

Two Data Management Plans

  • Fisher J, Nading A (2016) A Political Ecology of Value: A Cohort-Based Ethnography of the Environmental Turn in Nicaraguan Urban Social Policy. Research Ideas and Outcomes 2: e8720. doi: 10.3897/rio.2.e8720
  • Pannell J (2016) Data Management Plan for PhD Thesis “Climatic Limitation of Alien Weeds in New Zealand: Enhancing Species Distribution Models with Field Data”. Research Ideas and Outcomes 2: e8664. doi: 10.3897/rio.2.e8664

Four Research Ideas

  • Gordon R (2016) Partial synchronization of the colonial diatom Bacillaria “paradoxa”. Research Ideas and Outcomes 2: e7869. doi: 10.3897/rio.2.e7869
  • Vyshedskiy A, Dunn R (2015) Mental synthesis involves the synchronization of independent neuronal ensembles. Research Ideas and Outcomes 1: e7642.doi: 10.3897/rio.1.e7642
  • Zou Y (2015) Determining the direction of a gamma-ray burst’s jet in its host galaxy. Research Ideas and Outcomes 1: e7506. doi: 10.3897/rio.1.e7506
  • Page R (2016) Towards a biodiversity knowledge graph. Research Ideas and Outcomes 2: e8767. doi: 10.3897/rio.2.e8767

One Methods article

  • Abdullah N (2016) Vertical-Horizontal Regulated Soilless Farming via Advanced Hydroponics for Domestic Food Production in Doha, Qatar. Research Ideas and Outcomes 2: e8134. doi: 10.3897/rio.2.e8134

One Research article

  • Chen R, Shen T, Tsai K, Hu C (2016) Pericardial window operation for malignant pericardial effusion may have worse outcomes for lung cancer than the other cancers. Research Ideas and Outcomes 2: e8758.doi: 10.3897/rio.2.e8758

Three Workshop Reports

  • Wetzel F, Hoffmann A, Häuser C, Vohland K (2016) 1st EU BON Stakeholder Roundtable (Brussels, Belgium): Biodiversity and Requirements for Policy. Research Ideas and Outcomes 2: e8600. doi: 10.3897/rio.2.e8600
  • Vohland K, Häuser C, Regan E, Hoffmann A, Wetzel F (2016) 2nd EU BON Stakeholder Roundtable (Berlin, Germany): How can a European biodiversity network support citizen science? Research Ideas and Outcomes 2: e8616.doi: 10.3897/rio.2.e8616
  • Vohland K, Hoffmann A, Underwood E, Weatherdon L, Bonet F, Häuser C, Wetzel F (2016) 3rd EU BON Stakeholder Roundtable (Granada, Spain): Biodiversity data workflow from data mobilization to practice. Research Ideas and Outcomes 2: e8622. doi: 10.3897/rio.2.e8622

One Project Report

  • Egloff W, Agosti D, Patterson D, Hoffmann A, Mietchen D, Kishor P, Penev L (2016) Data Policy Recommendations for Biodiversity Data. EU BON Project Report. Research Ideas and Outcomes 2: e8458. doi: 10.3897/rio.2.e8458

Sustainable Development Goals

Sustainable Development Goals (SDGs)

Another feature of RIO is that all articles are labelled with their relevant Sustainable Development Goals. Interestingly, RIO has attracted 13 outputs which relate to SDG number 9: ‘Industry, Innovation and Infrastructure’. I take this as a great compliment to the journal – I infer from this that authors interested in true innovation and scholarly infrastructure are clearly attracted to this journal.

Openly Published Data Management Plans (DMPs)

I pushed hard to make sure Data Management Plans were included as their own distinct output type in RIO, so I’m really glad to see two exemplar DMPs being published, as well as the Roadmap research proposal which also relates to DMPs. A lot of US and UK researchers see funder-imposed DMPs as a bureaucratic checkbox exercise of little value to them. I hope that by being able to publish a DMP, researchers will see the point-of-it a little more – the documents will suddenly have value and meaning beyond the grant proposal process because other people can and will read them.

We have more DMPs in the pipeline too, so keep watching!

If you want to keep up to date with everything that gets published at RIO; follow the RSS feed, the journal Twitter feed, or the Facebook group. You can also read more blog posts about RIO at the official RIO Journal blog.