Elsevier's data and code for the bioCADDIE 2016 Dataset Retrieval Challenge

  • Peter Cotroneo (Creator)

Dataset

Description

The Elsevier DataSearch (https://datasearch.elsevier.com) team participated in the bioCADDIE 2016 Dataset Retrieval Challenge. The results of the Challenge, along with the example and test queries, can be found here: https://biocaddie.org/biocaddie-2016-dataset-retrieval-challenge

We have submitted a paper to DATABASE: The Journal of Biological Databases and Curation that details our work in the Challenge (to be published in the latter half of 2017). The attached file, elsevier-submission.zip, contains elsevier[1-5].txt, which correspond to the five-run submissions as described in the paper.

The following describes the code that we developed for the Challenge:

Aspire Content Processing by Search Technologies (https://www.searchtechnologies.com/en-gb/aspire):

Dictionary.xml - Loads dictionaries (MeSH, Genes, Solr fields) into Aspire so that they can be used to identify concepts in text (document or query).

QueryAnalyzer.xml - Receives a query, identifies concepts using the dictionaries and returns a response containing information about the concepts in the query.

ProcessJSON.xml - Processes the JSON documents (Flattens the metadata; Identifies MeSH and Gene concepts and embeds them in the text; Prepares the document to be indexed by Solr).

ProcessJSONSimple.xml - Enables JSON documents which have previously been created by ProcessJosn.xml to be sent to Solr without any further processing. This is much quicker than having to run ProcessJSONSimple.xml again; Prepares the document to be indexed by Solr.

All other aspects of Aspire (Aspire framework, content source to process a folder of JSON files, submission to Solr) are standard Aspire features with no customisation
Date made availableJun 5 2017
PublisherMendeley Data

Cite this