Scaling up Graph Neural Networks

Authors
Ningyi Liao
Publication
PhD Thesis
Type
Thesis

TL;DR

Qualifying Examinations report.

Abstract

Recent years have witnessed the burgeoning of services based on data represented by graphs, which leads to rapid increase in the amount and complexity of such graph data. Graph Neural Networks (GNNs) are specialized neural models designed to represent and process graph data, and have emerged remarkably for their impressive performance on a wide range of graph learning tasks. Most of these models, however, are known to be difficult to apply to large-scale data. When the graph size increases, regular GNNs become impractical due to the prohibitive computational overhead and unsatisfying performance. Hence, understanding and enhancing the scalability of GNN models is a compelling and prominent task for their broader application in handling real-world graph data.

In this thesis, we explore the scalability issue on different graph variants and GNN designs, uncover the theoretical groundings from the perspective of graph processing, and propose methods to scale up GNN to up to billion-scale graphs with top-tier performance.

In the first part, we propose novel GNN model designs by enhancing graph data management techniques. Our examination reveals that the scalability bottleneck of GNN designs typically lies in repetitive graph-related computation and leads to a coupled overhead with data scale. To apply GNNs to large graphs, it is crucial to specifically managing the graph computation. Our first model, SCARA, is developed for efficiently computing graph embedding from the dimension of node features. The other model, LD2, is specialized for graphs under heterophily exhibiting distinct data distribution. Theoretical analyses are offered for both models to assess their approximation precision, graph expressiveness, and computational complexity. Experiment results demonstrate that both designs achieve state-of-the-art scalability performance in terms of computation speed and memory footprint without compromising accuracy.

In the second part, we extend the idea of boosting graph computations to broader graph learning approaches. In HubGT, we examine the promising Transformer-based GNN model, analyze its scalability issues, and propose dedicated graph processing algorithms for fast and effective data retrieval. In GENTI, we introduce the strategy of decoupling and accelerating graph element computation for the subgraph learning problem, which has particular use in handling dynamic graph links. These applications highlight the utility of our intuition for scaling up graph learning through advanced graph management techniques, especially decoupling on the computational device layer, graph-oriented storage on the data structure layer, and approximation on the algorithmic layer. Both works showcase practical efficiency in processing larger graphs at a faster speed, significantly advancing the capabilities of these diverse graph learning approaches.

In the third part, we dive deep into understanding the GNN scalability issue concerning graph data management. From the theoretical side, our Unifews framework innovatively integrates the graph signal processing framework with the graph sparsification mechanism to provide an interpretation of the approximate graph propagation schemes, which incorporate and generalize the scalable computations in SCARA and LD2. Our derivation establishes a new framework for characterizing graph sparsification as an approximation for the GNN learning process with a bounded error, offering solid theoretical support for a range of scalable GNN designs. From the empirical side, we conduct a benchmark study focusing on Spectral GNN models, which are unique for their scalable learning schemes. Based on our characterization and taxonomy, we formulate and analyze over 30 GNNs as spectral filters, including the SCARA and LD2 paradigms, with a unified and scalable implementation applicable to million-scale graphs. Within this framework, extensive evaluations are conducted with comprehensive metrics on time efficiency, memory footprint, and effectiveness, where novel observations and practical guidelines are exclusively available due to the benchmark scale and coverage. Challenging the prevailing belief, our benchmark reveals an intricate landscape regarding the effectiveness and efficiency of spectral graph filters, demonstrating the potential to achieve desirable performance through tailored spectral manipulation of graph data.

In conclusion, this thesis aims to advance the design and implementation of scalable GNNs, especially by identifying key performance bottlenecks and underpinning graph management theories. Our contributions not only equip various graph learning systems with the capability to operate on real-world, billion-scale graphs, but also provide pioneering theoretical foundations and practical guidelines that inspire future research regarding GNN performance at large scale.


Citation
Ningyi Liao. "Scaling up Graph Neural Networks." PhD Thesis. 2025.