# <Foundation Models for Graph and Geometric Deep Learning>

Written on

Foundation models (FMs) in language, vision, and audio have dominated machine learning research in 2024, while FMs for graph-structured data have been slower to develop. This article posits that the era of Graph FMs is upon us and offers examples of their current applications.

*Written and edited by Michael Galkin and Michael Bronstein, with significant contributions from Jianan Zhao, Haitao Mao, and Zhaocheng Zhu.*

## Table of Contents

- What are Graph Foundation Models and how to build them?
- Node Classification: GraphAny
- Link Prediction: Not yet
- Knowledge Graph Reasoning: ULTRA and UltraQuery
- Algorithmic Reasoning: Generalist Algorithmic Learner
- Geometric and AI4Science Foundation Models
- ML Potentials: JMP-1, DPA-2 for molecules, MACE-MP-0 and MatterSim for inorganic crystals
- Protein LMs: ESM-2
- 2D Molecules: MiniMol and MolGPS

- Expressivity & Scaling Laws: Do Graph FMs scale?
- The Data Question: What should be scaled? Is there enough graph data to train Graph FMs?
- Key Takeaways

## What are Graph Foundation Models and how to build them?

To clarify what constitutes a "foundational" model, we define it as follows:

“A Graph Foundation Model is a single (neural) model that learns transferable graph representations that can generalize to any new, previously unseen graph.”

Graphs vary widely in form, connectivity, and features, making it challenging for standard Graph Neural Networks (GNNs) to qualify as foundational models. For instance, while heuristics like Label Propagation can operate on any graph, they do not involve any learning, thus failing to meet the criteria. Moreover, the utility of Large Language Models (LLMs) for processing graphs into sequences that retain their symmetries is still under investigation.

A critical aspect of designing Graph FMs is achieving transferable graph representations. As highlighted in a recent ICML 2024 position paper, LLMs can compress text in any language into fixed-size tokens. In contrast, creating a universal featurization method for graphs is complex due to their varied characteristics, such as:

- A single large graph with specific node features and labels (common in node classification)
- A large graph lacking node features and classes but containing meaningful edge types (typical for link prediction and KG reasoning)
- Many smaller graphs with or without features and graph-level labels (common in graph classification and regression)

Research questions remain for the graph learning community regarding the design of Graph FMs:

- How can we generalize across graphs with diverse features?
- How can we generalize across different prediction tasks?
- What should the expressivity of foundational models be?

The following sections will demonstrate that Graph FMs are already in use for specific tasks and domains, highlighting their design choices regarding transferable features and the practical advantages they provide for inductive inference on new, unseen graphs.

## Node Classification: GraphAny

Historically, GNN-based node classifiers have been limited to a specific graph dataset. For example, a GNN trained on the Cora graph (2.7K nodes, 1433-dimensional features) cannot easily adapt to another graph like Citeseer, which has 3703-dimensional features and a different number of classes.

**GraphAny** represents a significant advancement as the first Graph FM that allows a single pre-trained model to perform node classification across any graph, regardless of feature dimensions or class counts. A pre-trained GraphAny model on the Wisconsin dataset can generalize to over 30 other graphs of varying sizes and features, consistently outperforming GCN and GAT architectures trained from scratch.

**Setup:** Semi-supervised node classification involves predicting labels for target nodes based on a graph (G), node features (X), and a few labeled nodes from (C) classes, with no fixed dimension or unique class count.

**Transferability:** Instead of creating a universal latent space for all graphs, GraphAny focuses on the interactions among predictions from spectral filters. It applies filters to all nodes, optimizes weights from known labels, and computes pairwise distances to generate predictions. The only learnable element is the attention parameterization, which does not depend on the number of unique classes.

## Link Prediction: Not yet

**Setup:** Given a graph (G), the goal is to predict whether a link exists between two nodes.

Currently, no single transferable model exists for link prediction in graphs with node features. However, for non-featurized graphs, GNNs utilizing labeling tricks can potentially transfer to new graphs due to their uniform node featurization strategy.

For instance, automorphic nodes present a challenge in link prediction, as GNNs may treat them identically. Labeling strategies, such as Double Radius Node Labeling, can help break these symmetries.

The UniLP framework, which employs a contrastive learning approach, has been evaluated for link prediction on unseen graphs. It utilizes a shared subgraph GNN encoder and an attention mechanism to score links based on their similarity to in-context links.

**Transferability:** The structural patterns learned by labeling trick GNNs can be applied to new graphs, although further support for heterogeneous node features is needed.

## Knowledge Graph Reasoning: ULTRA and UltraQuery

Knowledge graphs contain specific sets of entities and relations, making traditional reasoning models less adaptable to new, unseen graphs. ULTRA is a pioneering foundation model for knowledge graph reasoning, capable of transferring to any multi-relational graph without prior training on specific entities or relations.

**Setup:** Given a multi-relational graph (G) with (E) nodes and (R) edge types, ULTRA answers queries by returning probabilities over all nodes.

**Transferability:** ULTRA captures relational interactions across various graphs, allowing it to generalize effectively.

## Algorithmic Reasoning: Generalist Algorithmic Learner

The Generalist Algorithmic Learner is a GNN capable of executing multiple algorithmic tasks within a shared latent space. This model demonstrates that similar algorithms can leverage a homogeneous feature space for effective problem-solving.

## Geometric and AI4Science Foundation Models

In the realm of Geometric Deep Learning, foundation models are emerging as key tools for predicting molecular properties and protein sequences. The complexity of real-world physical structures necessitates models capable of understanding and processing these intricacies.

### ML Potentials: JMP-1, DPA-2 for molecules, MACE-MP-0 and MatterSim for inorganic crystals

**Setup:** Given a 3D structure, the aim is to predict energy and per-atom forces.

**Transferability:** These models generalize across various atomistic structures, providing stability for applications in molecular dynamics.

### Protein LMs: ESM-2

**Setup:** Predict masked tokens from protein sequences to gain insights into unseen combinations of amino acids.

**Transferability:** ESM-2 serves as a versatile tool due to its extensive training data and effectiveness in various applications.

### 2D Molecules: MiniMol and MolGPS

**Setup:** Given a 2D molecular structure, the task is to predict properties based on atom and bond types.

**Transferability:** These models utilize a fixed vocabulary of atom and bond types, facilitating their application across different tasks.

## Expressivity & Scaling Laws: Do Graph FMs scale?

Understanding how transformers and GNNs scale is critical. While transformers excel in sequential data, GNNs show promise for graph data due to their linear scaling properties.

## The Data Question: What should be scaled? Is there enough graph data to train Graph FMs?

Scaling efforts should focus on enhancing the diversity of graph data patterns rather than merely increasing quantity. The challenge remains whether sufficient data exists to train effective Graph FMs.

## Key Takeaways

- Generalization across heterogeneous graphs remains challenging.
- No universal model currently exists for performing multiple prediction tasks in a zero-shot manner.
- Model expressivity needs to balance performance with computational efficiency.
- The data landscape for graph models is limited, necessitating advancements in sample-efficient architectures.

- Mao, Chen, et al. Graph Foundation Models Are Already Here. ICML 2024
- Morris et al. Future Directions in Foundations of Graph Machine Learning. ICML 2024
- Zhao et al. GraphAny: A Foundation Model for Node Classification on Any Graph. Arxiv 2024. Code on Github
- Dong et al. Universal Link Predictor By In-Context Learning on Graphs, arxiv 2024
- Zhang et al. Labeling Trick: A Theory of Using Graph Neural Networks for Multi-Node Representation Learning. NeurIPS 2021
- Chamberlain, Shirobokov, et al. Graph Neural Networks for Link Prediction with Subgraph Sketching. ICLR 2023
- Zhu et al. Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction. NeurIPS 2021
- Galkin et al. Towards Foundation Models for Knowledge Graph Reasoning. ICLR 2024
- Galkin et al. Zero-shot Logical Query Reasoning on any Knowledge Graph. arxiv 2024. Code on Github
- Ibarz et al. A Generalist Neural Algorithmic Learner LoG 2022
- Markeeva, McLeish, Ibarz, et al. The CLRS-Text Algorithmic Reasoning Language Benchmark. arxiv 2024
- Shoghi et al. From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction. ICLR 2024
- Zhang, Liu et al. DPA-2: Towards a universal large atomic model for molecular and material simulation, arxiv 2023
- Batatia et al. A foundation model for atomistic materials chemistry, arxiv 2024
- Yang et al. MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures, arxiv 2024
- Rives et al. Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences. PNAS 2021
- Lin, Akin, Rao, Hie, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. Science 2023. Code
- Morgan HL (1965) The generation of a unique machine description for chemical structures — a technique developed at chemical abstracts service. J Chem Doc 5:107–113.
- Kläser, Banaszewski, et al. MiniMol: A Parameter Efficient Foundation Model for Molecular Learning, arxiv 2024
- Sypetkowski, Wenkel et al. On the Scalability of GNNs for Molecular Graphs, arxiv 2024
- Morris et al. Future Directions in Foundations of Graph Machine Learning. ICML 2024
- Liu et al. Neural Scaling Laws on Graphs, arxiv 2024
- Frey et al. Neural scaling of deep chemical models, Nature Machine Intelligence 2023