The Libraries will be performing system maintenance to UWSpace on Thursday, March 13th from 12:30 to 5:30 pm (EDT). UWSpace will be unavailable during this time.
 

Perspectives of Graph Diffusion: Computation, Local Partitioning, Statistical Recovery, and Applications

Loading...
Thumbnail Image

Date

2025-03-06

Advisor

Fountoulakis, Kimon

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Diffusion describes the process of mass moving from one region to another. In the con- text of graph, the diffusing mass spreads from nodes to nodes along the edges of the graph. Broadly speaking, this includes a number of stochastic and deterministic processes such as random walk, heat diffusion, network flow and electrical flow on graphs. Graph diffusion is a highly important primitive, and has been shown to have a variety of surprising properties both theoretically and practically. In this thesis, we present several new perspectives of graph diffusion, with an emphasis on how diffusion algorithms uncover the local clustering structure of the input data without necessarily exploring the entire graph. In the first two parts of the thesis, we introduce a new class of graph diffusion methods that are provably better at extracting the local clustering structure of a graph or a hy- pergraph. Here, diffusion is formulated as a pair of primal and dual convex optimization problems, based on the idea of spreading mass in the graph while minimizing a p-norm net- work flow cost. The primal solution of the diffusion problem provides an intuitive physical interpretation where paint (i.e. mass) spills from the source nodes, spreads over the graph, and there is a sink at each node where up to a certain amount of paint can settle. The dual solution embeds the nodes on the non-negative real line and is considered as the output of diffusion. We will show that the dual variables nicely encode the local clustering structure around a given set of seed nodes. In particular, assume the existence of a cluster C of low conductance Φ(C), the sweep cut procedure on the dual variables returns a cluster whose conductance is not too much larger than Φ(C). In the next two parts of the thesis, we introduce a weighted diffusion mechanism which allows any existing diffusion method to take into account additional node information such as node attributes and labels. The method weighs the edges of the graph based on the attributes or the labels of each node. Depending on the nature and availability of additional node information, two simple yet effective edge-weighting schemes are introduced and analyzed. Over contextual random graphs generated by a local variant of the stochastic block model with noisy node information, we will show that, if the additional information contains enough signal about the ground-truth cluster, then employing existing diffusion algorithms in the weighted graph can more accurately recover the ground-truth cluster than employing diffusion in the original graph without edge weights. In particular, statistical recovery guarantees in terms of precision and F1 score will be derived and compared. All of the results are supplemented with extensive experiments on both synthetic and real-world data to illustrate the technical results and the effectiveness of the new methods in practice. The code is open-source on GitHub.

Description

Keywords

graph algorithms, graph diffusion methods, local graph clustering, hypergraphs, semi-supervised learning, community detection

LC Subject Headings

Citation