Quickstart¶
Run in Google Colab
View on GitHub
This section runs through the API for common practice to perform temporal graph learning. In this tutorial, we train TGAT on Wikipedia dataset as an example.
Basic settings¶
TGLite uses PyTorch as the backend to perform tensor operations. Here we wrap some helper functions such as dataset handling in support.py.
[1]:
import torch
import tglite as tg
import support
Next we set the runtime parameters, including hyper-parameters for TGAT training and system-level optimization configurations. TGLite provides several semantic-preserving system optimization options for CTDG-based models like TGAT, including deduplication, memoization, and time-precomputation. Here we enable all the optimizations with OPT_DEDUP, OPT_CACHE and OPT_TIME being True, and set the related cache size. By setting MOVE = True, we will make all feature data reside on
GPU device memory to reduce data movements.
[2]:
DATA: str = 'wiki' # 'wiki', 'reddit', 'mooc', 'mag', 'lastfm', 'gdelt', 'wiki-talk'
DATA_PATH: str = '/shared'
EPOCHS: int = 3
BATCH_SIZE: int = 200
LEARN_RATE: float = 0.0001
DROPOUT: float = 0.1
N_LAYERS: int = 2
N_HEADS: int = 2
N_NBRS: int = 20
DIM_TIME: int = 100
DIM_EMBED: int = 100
N_THREADS: int = 32
SAMPLING: str = 'recent' # 'recent'or 'uniform'
OPT_DEDUP = True
OPT_CACHE = True
OPT_TIME = True
OPT_ALL = True
OPT_DEDUP: bool = OPT_DEDUP or OPT_ALL
OPT_CACHE: bool = OPT_CACHE or OPT_ALL
OPT_TIME: bool = OPT_TIME or OPT_ALL
CACHE_LIMIT: int = int(2e6)
TIME_WINDOW: int = int(1e4)
MOVE = True
GPU = 0
SEED = 1
PREFIX = ''
Then, specify the training device and the random seed.
[3]:
device = support.make_device(GPU)
model_path = support.make_model_path('tgat', PREFIX, DATA)
if SEED >= 0:
support.set_seed(SEED)
Loading temporal graph data¶
TGraph object serves as the container for node and edge tensor data. We load graph data to create a TGraph object g first, and load the features next. TGraph also provides the functions to manage graph data. Here, we set computation device to GPU 0 using g.set_compute(device). With g.move_data(device), we move graph features to GPU 0 as well.
[4]:
import os
g = support.load_graph(os.path.join(DATA_PATH, f'data/{DATA}/edges.csv'))
support.load_feats(g, DATA, DATA_PATH)
dim_efeat = 0 if g.efeat is None else g.efeat.shape[1]
dim_nfeat = g.nfeat.shape[1]
g.set_compute(device)
if MOVE:
g.move_data(device)
num edges: 157474
num nodes: 9228
edge feat: torch.Size([157474, 172])
node feat: torch.Size([9228, 172])
Runtime setup¶
TGLite uses TContext as the settings and scratch space for runtime. Here, a TContext ctx is initialized with the TGraph object g. Then, ctx.need_sampling(True) will create a TCSR structure inside TGraph g for more efficient sampling. Next, we invoke several functions of ctx to perform optimization settings.
[5]:
ctx = tg.TContext(g)
ctx.need_sampling(True)
ctx.enable_embed_caching(OPT_CACHE, DIM_EMBED)
ctx.enable_time_precompute(OPT_TIME)
ctx.set_cache_limit(CACHE_LIMIT)
ctx.set_time_window(TIME_WINDOW)
Creating temporal sampler¶
TGLite provides a TSampler module that exposes 1-hop temporal sampling. Here, by setting num_threads, we can control how many threads are used to perform parallel sampling. The sampler will evenly distribute the target nodes in the mini-batch to each thread.
[6]:
sampler = tg.TSampler(N_NBRS, strategy=SAMPLING, num_threads=N_THREADS)
Creating models¶
A TBatch object represents a batch of temporal edges to process, which is passed to TGAT.forward() as the input. With a batch, a head TBlock is created. TBlock is the centerpiece of TGLite. A block essentially captures the 1-hop message-flow dependencies between target node-time pairs (i.e. destination nodes) and their temporally sampled neighbors (i.e. source nodes), along with their respective edges. What’s more,
TGLite use a doubly-linked list structure for the blocks, each representing one layer of GNN. Here, we iteratively perform sampling and generate TBlocks.
Another feature TGLite provides to allow users to apply optimizations to TBlock before sampling its neighbors so to minimize the size of the following subgraphs and thus minimize potential computations. Here inside the loops, we invoke dedup() and cache() provided by tglite.op module to perform such optimizations, and then sample with passed TSampler.
Once the full linked list of the TBlocks are created, we can load features and perform aggregation to compute node embeddings easily with functions provided by tglite.op. Here we directly use tglite.nn.TemporalAttnLayer to construct the TGAT model.
[7]:
from torch import nn, Tensor
from tglite.nn import TemporalAttnLayer
class TGAT(nn.Module):
def __init__(self, ctx: tg.TContext,
dim_node: int, dim_edge: int, dim_time: int, dim_embed: int,
sampler: tg.TSampler, num_layers=2, num_heads=2, dropout=0.1,
dedup: bool = True):
super().__init__()
self.ctx = ctx
self.num_layers = num_layers
self.attn = nn.ModuleList([
TemporalAttnLayer(ctx,
num_heads=num_heads,
dim_node=dim_node if i == 0 else dim_embed,
dim_edge=dim_edge,
dim_time=dim_time,
dim_out=dim_embed,
dropout=dropout)
for i in range(num_layers)])
self.sampler = sampler
self.edge_predictor = support.EdgePredictor(dim=dim_embed)
self.dedup = dedup
def forward(self, batch: tg.TBatch) -> Tensor:
head = batch.block(self.ctx)
for i in range(self.num_layers):
tail = head if i == 0 \
else tail.next_block(include_dst=True)
tail = tg.op.dedup(tail) if self.dedup else tail
tail = tg.op.cache(self.ctx, tail.layer, tail)
tail = self.sampler.sample(tail)
tg.op.preload(head, use_pin=True)
if tail.num_dst() > 0:
tail.dstdata['h'] = tail.dstfeat()
tail.srcdata['h'] = tail.srcfeat()
embeds = tg.op.aggregate(head, list(reversed(self.attn)), key='h')
del head
del tail
src, dst, neg = batch.split_data(embeds)
scores = self.edge_predictor(src, dst)
if batch.neg_nodes is not None:
scores = (scores, self.edge_predictor(src, neg))
return scores
Now that we’ve defined the TGAT model, we can proceed to instantiate a new TGAT model with pre-set parameters and transfer it to GPU 0.
[8]:
model = TGAT(ctx,
dim_node=dim_nfeat,
dim_edge=dim_efeat,
dim_time=DIM_TIME,
dim_embed=DIM_EMBED,
sampler=sampler,
num_layers=N_LAYERS,
num_heads=N_HEADS,
dropout=DROPOUT,
dedup=OPT_DEDUP,)
model = model.to(device)
Training models¶
Here we use BCEWithLogitsLoss as the loss function and Adam as the optimizer.
[9]:
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=LEARN_RATE)
Data is splitted into training set(70%), validating set(15%) and testing set(15%). neg_sampler randomly picks target nodes as negative samples. Then, we launch a support.LinkPredTrainer to train the model.
[10]:
import numpy as np
train_end, val_end = support.data_split(g.num_edges(), 0.7, 0.15)
neg_sampler = lambda size: np.random.randint(0, g.num_nodes(), size)
trainer = support.LinkPredTrainer(
ctx, model, criterion, optimizer, neg_sampler,
EPOCHS, BATCH_SIZE, train_end, val_end,
model_path, None)
trainer.train()
trainer.test()
epoch 0:
loss:293.4295 val ap:0.9739 val auc:0.9782
epoch | total:13.45s loop:11.98s eval:1.47s
loop | forward:6.84s backward:5.08s sample:0.68s prep_batch:0.06s prep_input:0.46s post_update:0.00s
comp | mem_update:0.00s time_zero:0.99s time_nbrs:0.65s self_attn:3.34s
epoch 1:
loss:170.0556 val ap:0.9819 val auc:0.9843
epoch | total:17.91s loop:16.31s eval:1.58s
loop | forward:7.48s backward:8.73s sample:0.86s prep_batch:0.09s prep_input:0.55s post_update:0.00s
comp | mem_update:0.00s time_zero:0.36s time_nbrs:1.19s self_attn:3.63s
epoch 2:
loss:142.4712 val ap:0.9833 val auc:0.9861
epoch | total:19.56s loop:17.82s eval:1.72s
loop | forward:8.03s backward:9.68s sample:0.88s prep_batch:0.09s prep_input:0.58s post_update:0.00s
comp | mem_update:0.00s time_zero:0.46s time_nbrs:1.43s self_attn:3.76s
best model at epoch 2
loading saved checkpoint and testing model...
test time:1.64s AP:0.9798 AUC:0.9827
To see and run more TGNN models with tglite, see Running Examples.