Motivation
- Existing GNNs’ expressive power is limited to 1-WL test, thus they cannot count some substructures likes: triangle, tailed triangle, and 4-cycle graphlets
Existing beyond 1-WL test GNNs is computational heavily $O(n^3)$ and memory hungry $O(n^2)$.
Some methods introduce some handcrafted features that cannot be extracted by GNN into the node attributes to increase the GNN’s expressive power: Distance encoding, identity GNN, and some Jure’s works.
Existing GNN focuses on the low pass frequency component.
Contribution
Give the explanation of what make GNN has certain expressive power.
Propose a graph convolution method (GNNML) that is more powerful than 1-WL test and reaches the 3-WL test.
GNNML has linear memory and computational complexities.
Enough spectral ability and local update process.
Road Map
GNN->WL test->Matrix Language-> GNNML
Weisfeiler-Lehman Test (WL-test)
1-WL test
Consider the 1-vertex.
k-WL test
Consider k-vertex tuple.
Examples of non-isomorphic graphs that cannot be distinguished by 1-WL but can be distinguished by 3-WL due to its capability of counting triangles.
Matrix Language and WL test
Matrix Language (MATLANG) includes different operations on matrices and makes some explicit connections between specific dictionaries of operations and the 1-WL and 3-WL tests
Definition 1: $ML(\mathcal{L})$ is a matrix language with operations set $\mathcal{L}={op_1,⋯,op_n }, op_i\in{⋅,+,⊤,diag, tr,1,⊙,×,f}$
Definition 2: $e(X)∈\mathbb{R}$ is a sentence in $ML(\mathcal{L})$ if it consists of any possible consecutive operations in $\mathcal{L}$, operating on a given matrix $X$ and resulting in a scalar value.
Example: $e(X)=\mathbf{1}^\top X^2\mathbf{1}$ is a sentence of $ML(\mathcal{L})$ with $\mathcal{L}={⋅,⊤,\mathbf{1}}$
Matrix operation can extract different features from the graph that can be used to differentiate the graph.
Remark 1: Two adjacency matrices are indistinguishable by the 1-WL test if and only if $e(A_G )=e(A_H)$ for all $e∈L_1$ with $L_1={⋅,⊤,1, diag}$.
Remark 2: $ML(L_2 )$ with $L_2={⋅,⊤,1,diag,tr}$ is more powerful than 1-WL test but less powerful than 3-WL test.
Remark 3: $ML(L_3 )$ with $L_3={⋅,⊤,1,diag,tr,⊙}$ is as powerful as 3-WL test.
Remark 4: Enriching operation set to $L^+=L∪{+,×,f}$, $L∈(L_1,L_2,L_3) $ cannot improve the expressive power.
How powerful are GNNs?
Theorem 1: Spatial GNNs such as GCN, GAT cannot go further than 1-WL test ($ML(L_1^+)$)
Theorem 2: Chebnet (Spectral GNN) is more powerful than the 1-WL test if the Laplacian maximum eigenvalues of the non-regular graphs to be compared are not the same.
Theorem 3: 3-star graphlets can be counted by sentences $L_1^+$
Theorem 4: Triangle and 4-cycle graphlets can be counted by sentences $L_2^+$
Theorem 5: Tailed triangle graphlets can be counted by sentences $L_3^+$
Generalization of Spectral and Spatial GNN
Spectral GNN
Spatial GNN
Generalization GNN
where $C^((s))∈\mathbb{R}^{(n×n)}$ is the $s$-th convolution support that defines how the node features are propagated to the neighboring nodes (different aggregation methods)
GNN beyond 1-WL
As aforementioned, the K-WL test can be represented by some matrix operations. Thus, the GNN contains the certain operations can also have the express power of k-WL test.
The paper proposes two methods. The first one, called GNNML1 is shown to be as powerful as the 1-WL test. The second one, called GNNML3 exploits the theoretical results of to break the limits of 1-WL and reach 3-WL equivalence experimentally.
GNNML1
Proof: The GNNML1 can produce all possible vectors in $L_1={⋅,⊤,1,𝑑𝑖𝑎𝑔}$
GNNML3
To reach more powerful models than 1-WL, the graph convolution must support trace ($tr$) and multiplication ($⊙$) operation in $L_3$.
The trace (tr) and multiplication (⊙) operation often take effect in high power of adjacent matrix.
However, we cannot initially know which power of adjacency matrix is necessary for a given problem.
Thus paper designs a spectral convolution support that can be expressed as a linear combination of all powers of adjacency matrix.
With the any power of the adjacency matrix, the trace ($tr$) and multiplication ($⊙$) operation can be approximated by MLP (A trick ! MLP can approximate every function~)
Where M can be seen as a mask (attention matrix) to select certain powers of the adjacency matrix.
The forward calculation of GNNML3 is :
Where $C^s$ can be seen as a multi-heads attention of different graph structure components to provide different operations’ result vector that can be used to approximate 3-WL test.
Experiments
Q1: How many pairs of non-isomorphic simple graphs that are either 1-WL or 3-WL equivalent are not distinguished by the models?
Q2: Can the models generalize the counting of some substructures in a given graph?
Q3: Can the models learn low-pass, high-pass and bandpass filtering effects?
Q4: Can the models generalize downstream graph classification and regression tasks?
Conclusion
Theoretically give the explanation that to reach different express power (k-WL test) what operations GNN should have. It will help the successors find way to further improve the express power of GNN.
The multiple convolution supports used in GNNML3 is actually a multi-heads attention of different graph structure components to provide different operations’ result vector that can be used to approximate 3-WL test.
It cannot theoretically prove the GNNML3 is equal to 3-WL test, because its use the MLP to approximate the trace and multiplication operations.