Description
The Elsevier DataSearch (https://datasearch.elsevier.com) team participated in the bioCADDIE 2016 Dataset Retrieval Challenge. The results of the Challenge, along with the example and test queries, can be found here: https://biocaddie.org/biocaddie-2016-dataset-retrieval-challenge
We have submitted a paper to DATABASE: The Journal of Biological Databases and Curation that details our work in the Challenge (to be published in the latter half of 2017). The attached file, elsevier-submission.zip, contains elsevier[1-5].txt, which correspond to the five-run submissions as described in the paper.
The following describes the code that we developed for the Challenge:
Aspire Content Processing by Search Technologies (https://www.searchtechnologies.com/en-gb/aspire):
Dictionary.xml - Loads dictionaries (MeSH, Genes, Solr fields) into Aspire so that they can be used to identify concepts in text (document or query).
QueryAnalyzer.xml - Receives a query, identifies concepts using the dictionaries and returns a response containing information about the concepts in the query.
ProcessJSON.xml - Processes the JSON documents (Flattens the metadata; Identifies MeSH and Gene concepts and embeds them in the text; Prepares the document to be indexed by Solr).
ProcessJSONSimple.xml - Enables JSON documents which have previously been created by ProcessJosn.xml to be sent to Solr without any further processing. This is much quicker than having to run ProcessJSONSimple.xml again; Prepares the document to be indexed by Solr.
All other aspects of Aspire (Aspire framework, content source to process a folder of JSON files, submission to Solr) are standard Aspire features with no customisation
We have submitted a paper to DATABASE: The Journal of Biological Databases and Curation that details our work in the Challenge (to be published in the latter half of 2017). The attached file, elsevier-submission.zip, contains elsevier[1-5].txt, which correspond to the five-run submissions as described in the paper.
The following describes the code that we developed for the Challenge:
Aspire Content Processing by Search Technologies (https://www.searchtechnologies.com/en-gb/aspire):
Dictionary.xml - Loads dictionaries (MeSH, Genes, Solr fields) into Aspire so that they can be used to identify concepts in text (document or query).
QueryAnalyzer.xml - Receives a query, identifies concepts using the dictionaries and returns a response containing information about the concepts in the query.
ProcessJSON.xml - Processes the JSON documents (Flattens the metadata; Identifies MeSH and Gene concepts and embeds them in the text; Prepares the document to be indexed by Solr).
ProcessJSONSimple.xml - Enables JSON documents which have previously been created by ProcessJosn.xml to be sent to Solr without any further processing. This is much quicker than having to run ProcessJSONSimple.xml again; Prepares the document to be indexed by Solr.
All other aspects of Aspire (Aspire framework, content source to process a folder of JSON files, submission to Solr) are standard Aspire features with no customisation
Date made available | Jun 5 2017 |
---|---|
Publisher | Mendeley Data |