CORD-19 SciSpaCy Entity Dataset

Research output: Other contributionpeer-review


Dataset of biomedical entities extracted from the CORD-19 dataset (2020-08-28 and 2020-09-28) using trained NER (trained against CRAFT, JNLPBA, BC5CDR, and BioNLP) and NERL models (UMLS, MeSH, GO, HPO, and RxNorm) from the SciSpaCy project, provided as structured Parquet files. Dataset may be useful for downstream tasks around entity linking and relationship extraction. The work was carried out using Dask on the Saturn Cloud platform, and was a joint effort between Elsevier Labs and Saturn Cloud.
Original languageAmerican English
StatePublished - Oct 24 2020


Dive into the research topics of 'CORD-19 SciSpaCy Entity Dataset'. Together they form a unique fingerprint.

Cite this