Your guide to what's next.
Home › Eckher Insights › The RDF model of the Gene Ontology, demystified
Aug 5, 2020

The RDF model of the Gene Ontology, demystified

An outline of the structure of the Gene Ontology RDF graph and ways to query it.

The Gene Ontology (GO) is a controlled vocabulary of terms describing genes and gene products. It facilitates the integration of biological and biomedical data and is widely used in bioinformatics.

The ontology defines terms (GO terms) and relationships between those terms. The terms can be thought of as tags that can be assigned to gene products. An individual gene can also be annotated with multiple GO terms.

Gene Ontology in RDF

The GO terms and relationships between them form a graph which can be described in RDF.

In the RDF representation of the Gene Ontology, each term has the IRI of the form <>. For example, the term "mitochondrion inheritance" is represented by the node <> (obo:GO_0000001). Here is an excerpt of the ontology showing information about obo:GO_0000001:

@prefix obo: <> .
@prefix owl: <> .
@prefix rdfs: <> .

obo:GO_0000001 a owl:Class .
                 rdfs:subClassOf obo:GO_0048308 ,
                                 obo:GO_0048311 ;
                 obo:IAO_0000115 "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." ;
                 rdfs:label "mitochondrion inheritance" .

Running SPARQL queries against the Gene Ontology data

A SPARQL query interface provided by the Gene Ontology project and powered by Blazegraph makes it possible to issue queries against the GO RDF dataset directly in the browser. For example, this query returns all superclasses of mitochondrion inheritance:

PREFIX obo: <>
SELECT ?class ?classLabel
    obo:GO_0000001 rdfs:subClassOf ?class .
    ?class rdfs:label ?classLabel .

Query result:

<>"mitochondrion inheritance"
<>"mitochondrion organization"
<>"cellular process"
<>"cellular component organization"
<>"organelle inheritance"
<>"mitochondrion distribution"
<>"organelle organization"
<>"organelle localization"
<>"cellular localization"
<>"mitochondrion localization"
<>"cellular component organization or biogenesis"

What does it mean?

An instance of a Linked Data set, the Gene Ontology RDF graph is a step towards the more efficient sharing and reuse of biological information. While the GO terms provide the common language for annotating genes, RDF enables the interoperability and integration of those annotations with other datasets and ontologies.


See also
The building blocks of OWL
What makes up OWL ontologies and how do they support logical inference?
Scalable genomic alignment with Progressive Cactus
How progressive alignment makes it possible to efficiently align hundreds to thousands of large genomes.
WikiPathways: A Wikipedia for biological pathways
An overview of the collaboratively edited structured pathway encyclopedia.
The ambitious challenge of finishing the human genome
Generating a complete human genome sequence, chromosome by chromosome.
Your guide to what's next.
Copyright © 2021 Eckher. Various trademarks held by their respective owners.