4. LightGCN

Graphs are versatile data strucutres that can model complex elements and relationships. In this chapter I implement a Light Graph Convolution Network (LightGNC) to make recommendations. This work utilises a recommender library developed by Microsoft, instructions on installation can be found here. The library provides utilities to aid common recommendation building tasks such as data cleaning, test/train splitting and the implementation of algorithms.

4.1. Outline

  1. Overview of LightGCN

  2. Prepare data and hyper-parameters

  3. Create and train model¶

  4. Recommendations and evaluation¶

4.2. LightGCN Overview & Architecture

Graph Convolution Network (GCNs) approaches involve semi-supervised learning on graph-structured data. Many real-world datasets come in the form of property graphs, yet until recently little effort has been devoted to the generalization of neural network models to graph structured datasets. GCNs are based on an efficient variant of convolutional neural networks. Convolutional architecure allow the to scale linearly and learn hidden layer representations.

LightGCN is a simplified design of GCN, more concise and appropriate for recommenders. The model architecture is illustrated below.

In Light Graph Convolution, only the normalized sum of neighbor embeddings is performed towards next layer; other operations like self-connection, feature transformation, and nonlinear activation are all removed, which largely simplifies GCNs. In Layer Combination,the embeddings at each layer are summed over to achieve the final representations.

4.2.1. Light Graph Convolution (LGC)

In LightGCN, a simple weighted sum aggregator is utilised. The graph convolution operation in LightGCN is defined as:

\[\begin{split} \begin{array}{l} \mathbf{e}_{u}^{(k+1)}=\sum_{i \in \mathcal{N}_{u}} \frac{1}{\sqrt{\left|\mathcal{N}_{u}\right|} \sqrt{\left|\mathcal{N}_{i}\right|}} \mathbf{e}_{i}^{(k)} \\ \mathbf{e}_{i}^{(k+1)}=\sum_{u \in \mathcal{N}_{i}} \frac{1}{\sqrt{\left|\mathcal{N}_{i}\right|} \sqrt{\left|\mathcal{N}_{u}\right|}} \mathbf{e}_{u}^{(k)} \end{array} \end{split}\]

The symmetric normalization term \(\frac{1}{\sqrt{\left|\mathcal{N}_{u}\right|} \sqrt{\left|\mathcal{N}_{i}\right|}}\) follows the design of standard GCN, which can avoid the scale of embeddings increasing with graph convolution operations.

4.2.2. Layer Combination and Model Prediction

The embeddings at the 0-th layer are the only trainable parameters, i.e., \(\mathbf{e}_{u}^{(0)}\) for all users and \(\mathbf{e}_{i}^{(0)}\) for all items. After \(K\) layer, the embeddings are further combined at each layer to arrive at the final representation of a user (an item):

\[ \mathbf{e}_{u}=\sum_{k=0}^{K} \alpha_{k} \mathbf{e}_{u}^{(k)} ; \quad \mathbf{e}_{i}=\sum_{k=0}^{K} \alpha_{k} \mathbf{e}_{i}^{(k)} \]

where \(\alpha_{k} \geq 0\) denotes the importance of the \(k\)-th layer embedding in constituting the final embedding. In our experiments, we set \(\alpha_{k}\) uniformly as \(1 / (K+1)\).

The model prediction is defined as the inner product of user and item final representations:

\[ \hat{y}_{u i}=\mathbf{e}_{u}^{T} \mathbf{e}_{i} \]

which is used as the ranking score for recommendation generation.

4.2.3. Matrix Form

Let the user-item interaction matrix be \(\mathbf{R} \in \mathbb{R}^{M \times N}\) where \(M\) and \(N\) denote the number of users and items, respectively, and each entry \(R_{ui}\) is 1 if \(u\) has interacted with item \(i\) otherwise 0. The adjacency matrix of the user-item graph is

\[\begin{split} \mathbf{A}=\left(\begin{array}{cc} \mathbf{0} & \mathbf{R} \\ \mathbf{R}^{T} & \mathbf{0} \end{array}\right) \end{split}\]

Let the 0-th layer embedding matrix be \(\mathbf{E}^{(0)} \in \mathbb{R}^{(M+N) \times T}\), where \(T\) is the embedding size. Then we can obtain the matrix equivalent form of LGC as:

\[ \mathbf{E}^{(k+1)}=\left(\mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}}\right) \mathbf{E}^{(k)} \]

where \(\mathbf{D}\) is a \((M+N) \times(M+N)\) diagonal matrix, in which each entry \(D_{ii}\) denotes the number of nonzero entries in the \(i\)-th row vector of the adjacency matrix \(\mathbf{A}\) (also named as degree matrix). Lastly, we get the final embedding matrix used for model prediction as:

\[\begin{split} \begin{aligned} \mathbf{E} &=\alpha_{0} \mathbf{E}^{(0)}+\alpha_{1} \mathbf{E}^{(1)}+\alpha_{2} \mathbf{E}^{(2)}+\ldots+\alpha_{K} \mathbf{E}^{(K)} \\ &=\alpha_{0} \mathbf{E}^{(0)}+\alpha_{1} \tilde{\mathbf{A}} \mathbf{E}^{(0)}+\alpha_{2} \tilde{\mathbf{A}}^{2} \mathbf{E}^{(0)}+\ldots+\alpha_{K} \tilde{\mathbf{A}}^{K} \mathbf{E}^{(0)} \end{aligned} \end{split}\]

where \(\tilde{\mathbf{A}}=\mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}}\) is the symmetrically normalized matrix.

4.2.4. Model Training

Bayesian Personalized Ranking (BPR) loss is used. BPR is a a pairwise loss that encourages the prediction of an observed entry to be higher than its unobserved counterparts:

\[ L_{B P R}=-\sum_{u=1}^{M} \sum_{i \in \mathcal{N}_{u}} \sum_{j \notin \mathcal{N}_{u}} \ln \sigma\left(\hat{y}_{u i}-\hat{y}_{u j}\right)+\lambda\left\|\mathbf{E}^{(0)}\right\|^{2} \]

Where \(\lambda\) controls the \(L_2\) regularization strength.

4.3. Import required packages

import sys
import os
import papermill as pm
import scrapbook as sb
import pandas as pd
import numpy as np
import tensorflow as tf
tf.get_logger().setLevel('ERROR') # only show error messages

from recommenders.utils.timer import Timer
from recommenders.models.deeprec.models.graphrec.lightgcn import LightGCN
from recommenders.models.deeprec.DataModel.ImplicitCF import ImplicitCF
from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_stratified_split
from recommenders.evaluation.python_evaluation import map_at_k, ndcg_at_k, precision_at_k, recall_at_k
from recommenders.utils.constants import SEED as DEFAULT_SEED
from recommenders.models.deeprec.deeprec_utils import prepare_hparams

4.4. Read in Data & Set Parameters

listens = pd.read_csv('.\\data\\processed\\listens.csv',index_col=0)
artists = pd.read_csv('.\\data\\processed\\artists.csv',index_col=0)
artist_dict = pd.Series(artists.name,index=artists.id).to_dict()
listens.head(3)
userID artistID listenCount
0 0 45 3.047442
1 0 46 3.047442
2 0 47 3.047442
# top k items to recommend
TOP_K = 10

LISTENS_DATA_SIZE = '100k'

# Model parameters
EPOCHS = 50
BATCH_SIZE = 1024

SEED = DEFAULT_SEED  # Set None for non-deterministic results

yaml_file = "./lightgcn.yaml"

4.5. LightGCN Implementation

4.5.1. Split Data

We split the full dataset into a train and test dataset to evaluate performance of the algorithm against a held-out set not seen during training. Because SAR generates recommendations based on user preferences, all users that are in the test set must also exist in the training set. We can use the provided python_stratified_split function which holds out a percentage of items from each user, but ensures all users are in both train and test datasets. We will use a 75/25 train/test split. I considered keeping the split at for consistency with the matrix factorization and softmax models. However,this method relies heavily on users’ historic listening records and is being split in a different manner so I decided against it.

df = listens
df = df.rename(columns={'listenCount': 'rating', 'artistID':'itemID'})
# listens['timestamp'] = np.nan

df.head()
userID itemID rating
0 0 45 3.047442
1 0 46 3.047442
2 0 47 3.047442
3 0 48 3.047442
4 0 49 3.047442
train, test = python_stratified_split(df, ratio=0.75)

4.5.2. Process data

ImplicitCF is a class that intializes and loads data for the training process. During the initialization of this class, user IDs and item IDs are reindexed, ratings greater than zero are converted into implicit positive interaction, and an adjacency matrix of the user-item graph is created.

data = ImplicitCF(train=train, test=test, seed=SEED)

4.5.3. Prepare hyper-parameters

Parameters can be set for ths LightGNC. To save time on tuning parameters we will use the prepared paramemters that can be found in yaml_file. prepare_hparams reads in the yaml file and prepares a full set of parameters for the model.

hparams = prepare_hparams(yaml_file,
                          n_layers=3,
                          batch_size=BATCH_SIZE,
                          epochs=EPOCHS,
                          learning_rate=0.005,
                          eval_epoch=5,
                          top_k=TOP_K,
                         )

4.5.4. Create and train model

With data and parameters prepared, we can create and train the LightGCN model.

model = LightGCN(hparams, data, seed=SEED)
Already create adjacency matrix.
Already normalize adjacency matrix.
Using xavier initialization.
with Timer() as train_time:
    model.fit()

print("Took {} seconds for training.".format(train_time.interval))
Epoch 1 (train)9.2s: train loss = 0.42353 = (mf)0.42337 + (embed)0.00016
Epoch 2 (train)9.1s: train loss = 0.19883 = (mf)0.19836 + (embed)0.00047
Epoch 3 (train)8.4s: train loss = 0.15302 = (mf)0.15242 + (embed)0.00059
Epoch 4 (train)9.4s: train loss = 0.12323 = (mf)0.12253 + (embed)0.00070
Epoch 5 (train)8.7s + (eval)1.1s: train loss = 0.10586 = (mf)0.10505 + (embed)0.00080, recall = 0.09133, ndcg = 0.11955, precision = 0.10797, map = 0.04692
Epoch 6 (train)9.0s: train loss = 0.09377 = (mf)0.09288 + (embed)0.00089
Epoch 7 (train)9.1s: train loss = 0.08220 = (mf)0.08122 + (embed)0.00099
Epoch 8 (train)8.5s: train loss = 0.07451 = (mf)0.07344 + (embed)0.00107
Epoch 9 (train)8.6s: train loss = 0.06745 = (mf)0.06629 + (embed)0.00116
Epoch 10 (train)8.7s + (eval)0.9s: train loss = 0.05959 = (mf)0.05835 + (embed)0.00124, recall = 0.11003, ndcg = 0.14639, precision = 0.13027, map = 0.05780
Epoch 11 (train)9.1s: train loss = 0.05491 = (mf)0.05359 + (embed)0.00132
Epoch 12 (train)8.5s: train loss = 0.04991 = (mf)0.04851 + (embed)0.00139
Epoch 13 (train)8.7s: train loss = 0.04857 = (mf)0.04710 + (embed)0.00147
Epoch 14 (train)8.7s: train loss = 0.04412 = (mf)0.04257 + (embed)0.00154
Epoch 15 (train)8.7s + (eval)0.9s: train loss = 0.04175 = (mf)0.04014 + (embed)0.00161, recall = 0.12334, ndcg = 0.16331, precision = 0.14599, map = 0.06553
Epoch 16 (train)8.9s: train loss = 0.03916 = (mf)0.03748 + (embed)0.00168
Epoch 17 (train)9.0s: train loss = 0.03575 = (mf)0.03400 + (embed)0.00175
Epoch 18 (train)9.0s: train loss = 0.03453 = (mf)0.03272 + (embed)0.00182
Epoch 19 (train)8.6s: train loss = 0.03400 = (mf)0.03212 + (embed)0.00188
Epoch 20 (train)9.2s + (eval)1.0s: train loss = 0.03236 = (mf)0.03042 + (embed)0.00194, recall = 0.13128, ndcg = 0.17543, precision = 0.15550, map = 0.07098
Epoch 21 (train)9.1s: train loss = 0.03115 = (mf)0.02916 + (embed)0.00199
Epoch 22 (train)9.4s: train loss = 0.02957 = (mf)0.02751 + (embed)0.00205
Epoch 23 (train)9.0s: train loss = 0.02818 = (mf)0.02608 + (embed)0.00211
Epoch 24 (train)8.7s: train loss = 0.02638 = (mf)0.02422 + (embed)0.00216
Epoch 25 (train)8.9s + (eval)0.9s: train loss = 0.02666 = (mf)0.02445 + (embed)0.00221, recall = 0.13709, ndcg = 0.18341, precision = 0.16240, map = 0.07455
Epoch 26 (train)9.2s: train loss = 0.02509 = (mf)0.02282 + (embed)0.00227
Epoch 27 (train)8.9s: train loss = 0.02325 = (mf)0.02093 + (embed)0.00232
Epoch 28 (train)8.5s: train loss = 0.02202 = (mf)0.01965 + (embed)0.00237
Epoch 29 (train)9.0s: train loss = 0.02138 = (mf)0.01896 + (embed)0.00243
Epoch 30 (train)9.0s + (eval)0.9s: train loss = 0.02225 = (mf)0.01978 + (embed)0.00248, recall = 0.14030, ndcg = 0.18926, precision = 0.16612, map = 0.07737
Epoch 31 (train)8.8s: train loss = 0.02231 = (mf)0.01980 + (embed)0.00251
Epoch 32 (train)8.7s: train loss = 0.02038 = (mf)0.01781 + (embed)0.00256
Epoch 33 (train)8.7s: train loss = 0.02028 = (mf)0.01766 + (embed)0.00261
Epoch 34 (train)8.7s: train loss = 0.01879 = (mf)0.01614 + (embed)0.00266
Epoch 35 (train)8.5s + (eval)0.9s: train loss = 0.01845 = (mf)0.01575 + (embed)0.00271, recall = 0.14525, ndcg = 0.19626, precision = 0.17180, map = 0.08061
Epoch 36 (train)9.0s: train loss = 0.01845 = (mf)0.01569 + (embed)0.00275
Epoch 37 (train)8.8s: train loss = 0.01814 = (mf)0.01535 + (embed)0.00279
Epoch 38 (train)8.7s: train loss = 0.01757 = (mf)0.01474 + (embed)0.00283
Epoch 39 (train)8.8s: train loss = 0.01699 = (mf)0.01412 + (embed)0.00287
Epoch 40 (train)8.4s + (eval)0.9s: train loss = 0.01673 = (mf)0.01382 + (embed)0.00291, recall = 0.14834, ndcg = 0.20013, precision = 0.17515, map = 0.08211
Epoch 41 (train)9.8s: train loss = 0.01574 = (mf)0.01279 + (embed)0.00295
Epoch 42 (train)8.5s: train loss = 0.01637 = (mf)0.01339 + (embed)0.00298
Epoch 43 (train)8.8s: train loss = 0.01714 = (mf)0.01412 + (embed)0.00302
Epoch 44 (train)8.9s: train loss = 0.01568 = (mf)0.01262 + (embed)0.00305
Epoch 45 (train)8.9s + (eval)1.1s: train loss = 0.01618 = (mf)0.01309 + (embed)0.00309, recall = 0.15104, ndcg = 0.20393, precision = 0.17870, map = 0.08413
Epoch 46 (train)9.4s: train loss = 0.01409 = (mf)0.01097 + (embed)0.00313
Epoch 47 (train)8.8s: train loss = 0.01500 = (mf)0.01183 + (embed)0.00316
Epoch 48 (train)9.5s: train loss = 0.01427 = (mf)0.01107 + (embed)0.00319
Epoch 49 (train)8.3s: train loss = 0.01427 = (mf)0.01104 + (embed)0.00323
Epoch 50 (train)10.5s + (eval)1.2s: train loss = 0.01370 = (mf)0.01044 + (embed)0.00326, recall = 0.15244, ndcg = 0.20657, precision = 0.18019, map = 0.08504
Took 455.2403181999998 seconds for training.

4.5.5. Recommendations

recommend_k_items produces k artist recommendations for each user passed to the function. remove_seen=True removes the artists already listened to by the user. We will produce recommendations using the trained model on instances from the test set as input.

topk_scores = model.recommend_k_items(test, top_k=TOP_K, remove_seen=True)
top_scores = topk_scores
top_scores['name'] = topk_scores.itemID.map(artist_dict)
top_scores.head()
userID itemID prediction name
0 0 992 12.571548 Pet Shop Boys
1 0 151 11.216814 Michael Jackson
2 0 181 11.112077 a-ha
3 0 593 10.635868 David Bowie
4 0 1005 10.495359 Erasure
def user_recommendations(user):
    listened_to = train[train.userID == user].sort_values('rating',ascending=False)
    listened_to['name'] = listened_to.itemID.map(artist_dict)
    listened_to = listened_to.head(10).name
    print('User ' + str(user) + ' most listened to artists...')
    print('\n'.join(listened_to) + '\n')
    
    topk_scores_recs = topk_scores[topk_scores.userID == user].sort_values('prediction',ascending=False).name
    print('User ' + str(user) + ' recommendations...')
    print('\n'.join(topk_scores_recs.tolist()))
    return
user_recommendations(user=500)
User 500 most listened to artists...
Christina Aguilera
John Mayer
Chico Buarque
Sarah Brightman
Oasis
The Beatles
Lady Gaga
Adele
Justin Timberlake
Paul McCartney

User 500 recommendations...
Britney Spears
Beyoncé
Kylie Minogue
P!nk
Coldplay
Amy Winehouse
Black Eyed Peas
Mariah Carey
Ke$ha
Kelly Clarkson
user_recommendations(user=300)
User 300 most listened to artists...
Van Halen
KISS
Iron Maiden
Black Sabbath
Leaves' Eyes
Epica
The Agonist
Five Finger Death Punch
AC/DC
Deadstar Assembly

User 300 recommendations...
System of a Down
Metallica
Korn
In Flames
Megadeth
Rammstein
Bullet for My Valentine
HIM
Judas Priest
Pantera

At a glance, the recommendation system appears to work extremely well. User 500 has pretty broad and genric music tastes, yet each recommended artist makes sense. User 300 appears to have more specified music interests. Most of user 300’s top listened to artists are rock/heavy metal bands from the 70s/80s. The recommendations are also mainly rock/heavy metal bands from the same time period. Across both users, all recommendations appear relevant and potentially useful.

4.5.6. Evaluation

With topk_scores (k=10) predicted by the model, we can evaluate how LightGCN performs on the test set. We will use four evaluation metrics:

  1. Mean Average Precision (MAP)

  2. Normalized Discounted Cumulative Gain (NDCGG)

  3. Precision at 10

  4. Recall at 10

eval_map = map_at_k(test, topk_scores, k=TOP_K)
eval_ndcg = ndcg_at_k(test, topk_scores, k=TOP_K)
eval_precision = precision_at_k(test, topk_scores, k=TOP_K)
eval_recall = recall_at_k(test, topk_scores, k=TOP_K)

print("MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')
MAP:	0.023575
NDCG:	0.082716
Precision@K:	0.087785
Recall@K:	0.074625

These results are promising and they back up the assumption made from looking at two users’ recommendations that the model works. Although, the test split was different than the test splits used to evaluate matrix factorization and softmax, this model’s precision is still almost 10 times higher. It appears that this is the superior recommendation system and that we have managed to beat the standard of the initial matrix factorization model.

4.6. Conclusion

LightGCN is a light weight and efficient form of a GCN that can be quickly built, trained, and evaluated on this dataset without the need for a GPU. Even without tuning the hyperparameters, the results and recommendations produced by this model are impressive. Here, we have produced a relevant and potentially useful artist recommendation system. The recommender library was also extremely useful and appropiate for our objective of building an artist recommender system using our Last.fm dataset.