BASIL DB: bioactive semantic integration and linking database

David Jackson, Paul Groth, Hazar Harmouch

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Bioactive compounds found in foods and plants can provide health benefits, including antioxidant and anti-inflammatory effects. Research into their role in disease prevention and personalized nutrition is expanding, but challenges such as data complexity, inconsistent methods, and the rapid growth of scientific literature can hinder progress. To address these issues, we developed BASIL DB (BioActive Semantic Integration and Linking Database), a knowledge graph (KG) database that leverages natural language processing (NLP) techniques to streamline data organization and analysis. This automated approach offers greater scalability and comprehensiveness than traditional methods such as manual data curation and entry. Construction and content: The process of constructing the BASIL DB is divided into four fundamental steps: data collection, data preprocessing, data extraction, and data integration. Data on bioactives and foods are sourced from structured databases. The relevant randomized controlled trials (RCTs) were extracted from PubMed. The data are then prepared by cleaning inconsistencies and structuring them for analysis. In the data extraction phase, NLP tools, including a large language model (LLM), are utilized to analyze clinical trials and extract data on bioactive compounds and their health impacts. The integration phase compiles these data into a knowledge graph, which consists of the entities Foods, Bioactives, and Health Conditions as nodes and their interactions as edges. To quantify the relationships/interactions between these entities, we generate a weight for each edge on the basis of empirical evidence and methodological rigor. Utility and discussion: The BASIL DB incorporates 433 compounds, 40296 research papers, 7256 health effects, and 4197 food items. The database features query and visualization capabilities, including interactive graphs and custom filtering options, that showcase different aspects of the data. Users are able to explore the relationships between bioactives and health effects, enhancing both research efficiency and insight discovery. Conclusion: The BASIL DB is a knowledge graph database of bioactive compounds. This study provides a structured resource for exploring the relationships among bioactives, foods, and health outcomes, representing a step toward a more systematic and data-driven approach to understanding the health effects of bioactive compounds. Future work will focus on expanding the database and refining the utilized methods. Extending the BASIL DB will help bridge the gap between traditional and conventional approaches to nutrition, guiding future research in bioactive compound discovery and health optimization. Availability: Users can access and explore the data via https://basil-db.github.io/info.html or fork and run the respective script via https://github.com/basil-db/script.

Original languageEnglish
Article number14
JournalJournal of Biomedical Semantics
Volume16
Issue number1
DOIs
StatePublished - Dec 2025

Keywords

  • Bioactive compounds
  • Clinical trials
  • Data integration
  • Evidence-based health
  • Knowledge graph
  • Natural language processing (NLP)
  • PubMed

Fingerprint

Dive into the research topics of 'BASIL DB: bioactive semantic integration and linking database'. Together they form a unique fingerprint.

Cite this