Seeing as transformers and MPNNs usually are not the one ML approaches for the structural evaluation of graphs, we additionally in contrast the analytical capabilities of all kinds of different GNN- and transformer-based architectures. For GNNs, we in contrast each transformers and MPNNs to fashions like graph convolutional networks (GCNs) and graph isomorphism networks (GINs).
Moreover, we in contrast our transformers with a lot bigger language fashions. Language fashions are transformers as properly, however with many orders of magnitude extra parameters. We in contrast transformers to the language modeling method described in Speak Like a Graph, which encodes the graph as textual content, utilizing pure language to explain relationships as an alternative of treating an enter graph as a set of summary tokens.
We requested a educated language mannequin to unravel varied retrieval duties with quite a lot of prompting approaches:
- Zero-shot, which offers solely a single immediate and asks for the answer with out additional hints.
- Few-shot, which offers a number of examples of solved immediate–response pairs earlier than asking the mannequin to unravel a process.
- Chain-of-thought (CoT), which offers a set of examples (just like few-shot), every of which incorporates a immediate, a response, and a proof earlier than asking the mannequin to unravel a process.
- Zero-shot CoT, which asks the mannequin to point out its work, with out together with further worked-out examples as context.
- CoT-bag, which asks the LLM to assemble a graph earlier than being supplied with related data.
For the theoretical a part of the experiment, we created a process issue hierarchy to evaluate which duties transformers can clear up with small fashions.
We solely thought-about graph reasoning duties that apply to undirected and unweighted graphs of bounded measurement: node rely, edge rely, edge existence, node diploma, connectivity, node connectivity (for undirected graphs), cycle test, and shortest path.
On this hierarchy, we categorized graph process issue primarily based on depth (the variety of self-attention layers within the transformer, computed sequentially), width (the dimension of the vectors used for every graph token), variety of clean tokens, and three differing types:
- Retrieval duties: straightforward, native aggregation duties.
- Parallelizable duties: duties that profit vastly from parallel operations.
- Search: duties with restricted advantages from parallel operations.