Oct 2, 2020

Understanding what graph embeddings are and why they are important for graph analytics.

Graph data underpins a broad array of applications in industries ranging from transportation and telecom to banking and healthcare. As graphs are becoming more and more pervasive, many organisations seek to leverage graph analytics and machine learning to derive insights from their graph data.

Instead of working with the graph data directly, many graph analytics implementations use graph embeddings—compressed representations of the graphs. Such representations enable a range of graph machine learning applications which include link prediction, similarity search, node classification, clustering, and community and anomaly detection.

Embedding is a common technique used in machine learning to represent complex discrete items like English words or nodes of a graph as vectors which encode the information contained in the data while greatly reducing its dimensionality.

More specifically, graph embedding is the task of creating vector representations for each node in a graph so that distances between these vectors predict the occurrence of edges in the graph. Intuitively, the generated graph embeddings act as "compressed" representations of the nodes in the graph, i.e. feature vectors, for downstream machine learning tasks.

There are multiple graph embedding implementations that rely on different embedding algorithms. The most popular ones include node2Vec, GraphSAGE, and PyTorch-BigGraph.

The goal of each of these algorithms is to "learn" a feature representation for each node in a given graph. The choice of algorithm commonly depends on the structure and size of the input graph. PyTorch-BigGraph, for example, can handle multi-entity/multi-relation graphs with billions of nodes and trillions of edges.

Graph embeddings are used for building graph machine learning models which power a growing number of graph analytics and intelligence applications. This highlights the importance of graph embeddings and the algorithms used to generate them for graphs of different types and varying complexity.

See also

Running Neo4j in Docker with the Graph Data Science library

How to run the official Neo4j Docker image and enable the Graph Data Science library?

A technical introduction to OpenAI's GPT-3 language model

An overview of the groundbreaking GPT-3 language model created by OpenAI.

Linked data for the enterprise: Focus on Bayer's corporate asset register

An overview of COLID, the data asset management platform built using semantic technologies.

Towards more linked lexicographical data: Lexemes on Wikidata

A glimpse into the meaning and other properties of words described with structured and linked data.

Document understanding: Modern techniques and real-world applications

How document understanding helps bring order to unstructured data.

Why federation is a game-changing feature of SPARQL

SPARQL federation is an incredibly useful feature for querying distributed RDF graphs.

Harnessing the power of the Oxford English Dictionary for linguistic research and NLP applications

How the OED Text Annotator may help bring text mining and natural language processing technologies to the next level.