Your guide to what's next.
Home › Eckher Insights › Why federation is a game-changing feature of SPARQL
Jul 21, 2020

Why federation is a game-changing feature of SPARQL

SPARQL federation is an incredibly useful feature for querying distributed RDF graphs.

SPARQL has been widely adopted since first proposed as the query language for the Semantic Web. There are many SPARQL endpoints available today, both public and private, exposing various interlinked data sources that are all part of the global RDF data cloud.

SPARQL federation offers the mechanism for integrating RDF data distributed across multiple sources. It allows data consumers to retrieve and join data from those sources via a single query in a simple and elegant way. This way, it effectively exposes the data as a single integrated RDF graph.

What is SPARQL federation?

SPARQL federation is a dedicated SPARQL language construct defined in SPARQL 1.1 Federated Query, a W3C Recommendation that introduces the SPARQL SERVICE keyword. When using the SERVICE clause, you need to specify the SPARQL endpoint URL to retrieve the data from together with the query pattern, as demonstrated in the example below.

Example of a federated query

The following example shows how to query a local RDF graph combined with the data from a remote SPARQL endpoint.

Suppose that the local RDF graph contains only one triple:

<> <> <> .

while the remote RDF dataset available at the endpoint contains the following data:

<> <> "Alice" .
<> <> "Bob" .

Locally, we know that <> knows <> but in order to get the name of <>, we need to query the remote endpoint. To retrieve the names of all people that <> knows, a single federated query can be used:

SELECT ?name
    <> <> ?person .
    SERVICE <> {
        ?person <> ?name .

This query retrieves the local data joined with the response from the remote SPARQL endpoint, and returns the following:


Which SPARQL processors support federated queries?

Under the hood, the SERVICE keyword makes the SPARQL query issue a query on another SPARQL endpoint during its execution. The databases and services that support SPARQL 1.1 Federated Query and the SERVICE keyword include, but are not limited to, the following:

Considerations with federated queries

As part of executing the federated query, the query processor calls the external SPARQL endpoint. This comes with a number of potential issues that need to be addressed.


The SERVICE keyword makes the federated query processor invoke a portion of a SPARQL query against a remote SPARQL endpoint over HTTP. HTTP communication overheads make those queries slower which adds to the execution time of the whole query run by the federated query processor.

Failed remote executions

If the remote SPARQL service is unavailable, returns an error, or cannot be accessed for other reasons, the federated query execution will fail as a whole.

It may be desirable to ignore the remote service errors, in which case the query does not fail as a whole but the SERVICE pattern is ignored. This can be achieved by using the SERVICE clause with the SILENT keyword, as in the following query:

SELECT ?name
    <> <> ?person .
        ?person <> ?name .

This query will ignore all errors encountered while accessing the remote SPARQL endpoint.


When query processors execute federated queries, the external endpoint URIs are dereferenced and the SERVICE queries and parameters are passed to those external SPARQL query processors.

SPARQL federation does not support authentication, and when your use case involves issuing federated queries distributed over multiple private SPARQL endpoints, it is your responsibility to secure the network and make sure that your remote SPARQL services are only accessible from within that network.

The external SPARQL endpoints, together with the data received and incorporated into the query output, all need to be verified. Therefore, you have to make sure that they satisfy your data processing and licensing requirements.

What does it mean?

SPARQL 1.1 Federated Query allows you to distribute your RDF data across multiple databases and use a single query to access it across all those database instances. The data does not need to be colocated or made publicly accessible.

SPARQL federation effectively enriches your working datasets with external public or private data. It is a mechanism for querying and retrieving the data from the joined global RDF graph.


See also
The building blocks of OWL
What makes up OWL ontologies and how do they support logical inference?
RDF* and the onset of Linked Data* and the Semantic Web*
The evolution of RDF and the related technologies fuelled by the need to make statements about statements.
One schema, one API: Inside the world of Data Commons
Data Commons brings thousands of public datasets together into one data graph to give data analysts and researchers a jump-start on analysing open data.
Linked data for the enterprise: Focus on Bayer's corporate asset register
An overview of COLID, the data asset management platform built using semantic technologies.
Towards more linked lexicographical data: Lexemes on Wikidata
A glimpse into the meaning and other properties of words described with structured and linked data.
Document understanding: Modern techniques and real-world applications
How document understanding helps bring order to unstructured data.
Navigating unstructured data: The rise of question answering
Question answering technologies are key to efficiently dealing with overwhelming amounts of unstructured data.
Let's explore the Nobel Prize dataset
An overview of the official Nobel Prize Linked Data dataset with some example SPARQL queries.
Your guide to what's next.
Copyright © 2021 Eckher. Various trademarks held by their respective owners.