ChemFOnt (the Chemical Functional Ontology) is a hierarchical, OWL-compatible ontology describing the functions and actions of more than 319,000 biologically important chemicals. It is intended to bring the same rigor, standardization and formal structure to the terminology used in biochemistry, food chemistry and environmental chemistry as the gene ontology (GO) has brought to molecular biology.
ChemFOnt is available as both a freely accessible, web-enabled database and a downloadable OWL file. Users may download and deploy ChemFOnt within their own chemical databases or integrate ChemFOnt into their own analytical software to generate machine readable relationships that can be used to make new inferences, enrich their metabolomic set data (metabolite set enrichment) or make new, non-obvious connections.
In general terms, an ontology comprises a series of named entities, their descriptions or definitions, their relationships to each other and a series of hierarchical categories that group those entities according to various criteria.
ChemFOnt’s top layer in its hierarchy consists of 12 major categories:
- 1) Environmental processes
- 2) Biological processes
- 3) Industrial processes
- 4) Adverse biological roles
- 5) Normal biological roles
- 6) Environmental roles
- 7) Industrial applications
- 8) Health effects
- 9) Organoleptic effects
- 10) Sources
- 11) Biological locations
- 12) Routes of exposure
These major categories are subdivided into another 384 subcategories which are divided into thousands of other branches or leaf nodes for a maximum depth of up 7 layers. The entire ontological hierarchy for ChemFOnt currently comprises a total of 394,000 fully defined terms and over 11 million chemical/functional relationships. All terminal leaf nodes in the ChemFOnt hierarchy contain a fact supported by a citeable reference. The information for constructing ChemFOnt was built from many pre-existing ontologies and definitions (GO, FOBI, FoodOnt, ChemOnt, Disease Ontology), handwritten definitions (where no prior ontology or definitions existed) and structured facts/references acquired from hand-curated databases maintained in the Wishart lab (FooDB, HMDB, MiMeDB, DrugBank, MarkerDB, PathBank). To ensure uniformity, consistency and compliance, additions, corrections and improvements to ChemFOnt are done through a moderated process and strict standard operating protocols (SOPs) maintained by designated ChemFOnt editors. Requests to join the ChemFOnt editorial team and suggestions from external users can be emailed to the ChemFOnt editors and will be handled as a first-come-first-served basis. This is the first release of ChemFOnt and it is expected that annual or bi-annual updates will continue over many years.
The use of text mining tools such as PolySearch2 is expected to facilitate the continued expansion and updating of ChemFOnt’s contents. The long-term goal is to fully annotate the nearly 1 million detectable compounds known to exist in the human and natural environment using the ChemFOnt structure.
ChemFOnt is FAIR. Specifically it is:
- F1. meta(data) are assigned a globally unique and eternally persistent identifier.
All data and metadata in the ChemFOnt is assigned a 7 digit globally unique identifier. This identifier is searchable within the database and associated with all data in the online database. Furthermore, the identifier is associated with all data downloadable from the database.
- F2. data are described with rich metadata.
All data in the ChemFOnt are described with rich metadata. Every compound is described in detail. Structures, names, synonyms, Physico chemical properties. Additionally, scientific references are provided for each entry.
- F3. meta(data) are registered or indexed in a searchable resource.
All the data and metadata in the ChemFOnt is indexed, viewable and registered through the ChemFont database at www.chemfont.ca
- A1 (meta)data are retrievable by their identifier using a standardized communications protocol.
All data and metadata in the ChemFOnt are retrievable from their unique identifier through the website. Similarly, all data in the chemFOnt may be downloaded in SQL or OWL format via the ChemFOnt’s download section through a standard internet communications protocol.
- A1.1 the protocol is open, free, and universally implementable.
The ChemFOnt website is open and free and its data download operation is compatible with all modern web browsers. The downloadable data is in two formats (SQL and OWL) that is universally readable and acceptable for ontologies.
- A1.2 the protocol allows for an authentication and authorization procedure, where necessary.
No authentication or authorization is required to access or download ChemFOnt’s data.
- A2 metadata are accessible, even when the data are no longer available.
All of the ChemFOnt’s metadata are linked or linkable to more permanent data sources (PubMed, HMDB, FooDB,GeneOntolgy,FOBI etc.). The availability of freely downloadable data (and metadata) for the ChemFOnt ensures that its metadata will exist and be accessible for beyond the lifetime of the project.
- I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
All textual data and metadata in the ChemFOnt are written in English, all data is available in tow formats i.e. SQL and OWL. all images are stored in PNG format, all nomenclature for compounds and spectral data follows standard ontologies or vocabularies used to describe these entities.
- I2. (meta)data use vocabularies that follow FAIR principles.
All data and metadata in the ChemFOnt use descriptions and vocabularies that are open, findable, accessible, interoperable and re-usable.
- I3. (meta)data include qualified references to other (meta)data.
All data and metadata in the ChemFOnt are fully referenced with detailed descriptions of their provenance and sources.
- R1. (meta)data have a plurality of accurate and relevant attributes.
All data and metadata in the ChemFOnt deposited by its curation team have been carefully curated and vetted by multiple skilled curators. All data deposited into the ChemFOnt have been automatically checked for accuracy and consistency using comprehensive data checking software and all user depositions have received depositor assurances that they are correct and accurate. All data in the ChemFOnt have attributes that are relevant, up-to-date and accurate to the best of the curation team’s knowledge.
- R1.1 (meta)data are released with a clear and accessible data usage license.
All data and metadata in the ChemFont are released under the Creative Commons (CC) 4.0 License Suite according to the Attribution (BY) and Non-commercial (NC) licensing conditions.
- R1.2 (meta)data are associated with their provenance.
All data in the ChemFOnt have detailed descriptions of their provenance.
- R1.3 (meta)data meet domain-relevant community standards
The data and metadata in the ChemFOnt has undergone extensive peer review by members of the NMR and natural products community. The data in the ChemFOnt has met the standards for publication in peer-reviewed scientific journals and international scientific conferences.