Explainable Reasoning over Knowledge Graphs for Recommendation

By: Dingxian Wang, Canran Xu, Hua Yang and Xiaoyuan Wu

Incorporating knowledge graphs into recommender systems has attracted increasing attention in recent years. By exploring the interlinks within a knowledge graph, the connectivity between users and items can be discovered as paths, which provide rich and complementary information to user-item interactions. Such connectivity not only reveals the semantics of entities and relations, but also helps to comprehend a user’s interest. However, existing efforts have not fully explored this connectivity to infer user preferences, especially in terms of modeling the sequential dependencies within and holistic semantics of a path. We have developed a new model named Knowledge-aware Path Recurrent Network (KPRN) to exploit knowledge graphs for recommendation.

Our new model, Knowledge-aware Path Recurrent Network (KPRN), can generate path representations by composing the semantics of both entities and relations. By leveraging the sequential dependencies within a path, we allow effective reasoning on paths to infer the underlying rationale of a user-item interaction. Furthermore, we design a new weighted pooling operation to discriminate the strengths of different paths in connecting a user with an item, endowing our model with a certain level of explainability. We conduct extensive experiments on two datasets about movie and music, demonstrating significant improvements over state-of-the-art solutions, Collaborative Knowledge Base Embedding and Neural Factorization Machine.

Incorporating knowledge graphs (KGs) into recommender systems has attracted increasing attention in recent years. By exploring the interlinks within a knowledge graph, the connectivity between users and items can be discovered as paths, which provide rich and complementary information to user-item interactions. Such connectivity not only reveals the semantics of entities and relations, but also helps to comprehend a user's interest. However, existing efforts have not fully explored this connectivity to infer user preferences, especially in terms of modeling the sequential dependencies within and holistic semantics of a path.

In this article, we summarize our paper “Explainable Reasoning over Knowledge Graphs for Recommendation,” accepted by the AAAI 2019 conference. We contribute KPRN to exploit knowledge graphs for recommendation. KPRN can generate path representations by composing the semantics of both entities and relations. By leveraging the sequential dependencies within a path, we allow effective reasoning on paths to infer the underlying rationale of a user-item interaction. Furthermore, we design a new weighted pooling operation to discriminate the strengths of different paths in connecting a user with an item, endowing our model with a certain level of explainability. We conduct extensive experiments on two datasets about movies and music, demonstrating significant improvements over state-of-the-art solutions, Collaborative Knowledge Base Embedding and Neural Factorization Machine.

Introduction

Figure 1. Illustration of a KG-aware recommendation in the music domain.

Extra user-item connectivity information derived from a Knowledge Graph endows recommender systems the abilities to reason and explain. Taking music recommendations as an example (Figure.1), a user is connected to "I See Fire," since she likes the song "Shape of You" sung by the same singer Ed Sheeran. Such connectivity helps to reason about unseen user-item interactions (i.e. a potential recommendation) by synthesizing information from paths.

Running Example: {Alice, Interact, Shape of You}^{Shape of You, SungBy, Ed Sheeran}^{Ed Sheeran, IsSingerOf, I See Fire}=>{Alice, Interact, I See Fire}.

Clearly, the reasoning unveils the possible user intents behind an interaction, offering explanations behind a recommendation. How to model such connectivity in KGs, hence, is of critical importance to inject knowledge into a recommender systems.

Our new solution, KPRN, not only generates representations for paths by accounting for both entities and relations, but also performs reasoning based on paths to infer user preference. Specifically, we first extract qualified paths between a user-item pair from the KG, each of which consists of the related entities and relations. We then adopt a Long Short-Term Memory (LSTM) network to model the sequential dependencies of entities and relations. Thereafter, a pooling operation is performed to aggregate the representations of paths to obtain a prediction signal for the user-item pair. More importantly, the pooling operation is capable of discriminating the contributions of different paths for a prediction, which functions as the attention mechanism (Chen et al. 2017; Neelakantan, Roth, and McCallum 2015). Owing to such attentive effect, our model can offer path-wise explanations, such as why "Castle on the Hill" is recommended since you have listened to "Shape of You," which is sung and written by Ed Sheeran. We conducted extensive experiments on two datasets to verify our method.

Knowledge-aware Path Recurrent Network

In this section, we elaborate on our proposed method, as illustrated in Figure 2. Before introducing our proposed method, we first formally define the knowledge graph and the user-item data, and describe how to combine them in an enriched knowledge graph as the inputs of our model.

Figure 2. Schematic overview of our model architecture.

Background

A Knowledge Graph (KG) is a directed graph whose nodes are entities ${E}$, and edges ${R}$ denote their relations. Formally, we define KG as ${KG}=\{(h,r,t)| h,r\in {E}, r\in {R}\}$, where each triplet $(h,r,t)$ indicates a fact that there is a relationship $r$ from the head entity $h$ to the tail entity $t$.

The user-item interaction data is usually presented as a bipartite graph. In particular, we use ${U}=\{u_{t}\}_{t=1}^{M}$ and ${I}=\{i_{t}\}_{t=1}^{N}$ to separately denote the user set and the item set, where $M$ and $N$ are the number of users and items, respectively. Following that (Chaudhari, Azaria, and Mitchell 2016), we represent the interaction between a user and an item with a triplet $\tau=$($u$, interact, $i$), if there is an observed interaction (e.g., rate, click, and view feedbacks), where the interaction is a pre-defined relation.

We merge the item set and the entity set through string matching: ${I}\subseteq{E}$, ${G}=\{(h,r,t)|h,r\in {E}',r\in {R}'\}$, where ${E}'={E}\cup{U}$ and ${R}'={R}\cup\{\text{interact}\}$. For consistency, the KG in the rest of paper denotes the combined graph ${G}$, including both the original KG and user-item data, unless otherwise noted.

Preference Inference via Paths

The triplets in the KG clearly describe direct or indirect (i.e. multiple-step) relational properties of items, which shall constitute one or several paths between the given user and item pair. We explore these paths in order to achieve comprehensive reasoning and understanding for a recommendation.

Within ${G}$, we formally define the path from the user $u$ to the item $i$ as a sequence of entities and relations: $p=[e_{1}\xrightarrow{r_{1}}e_{2}\xrightarrow{r_{2}}\cdots\xrightarrow{r_{L-1}}e_{L}]$, where $e_{1}=u$, $e_{L}=i$; $(e_{l},r_{l},e_{l+1})$ is the $l$-th triplet in $p$, and $L$ denotes the number of triplets in the path. We elaborate on the construction of paths in the Path Extraction section of the paper.

Next, we will use a realistic example to show the sophisticated relationships (i.e. paths) between a user and an item behind their possible interactions, which inspires us to model the high-level semantics of a path compositionally by considering both entities and (multiple-step) relationships.

Examples: Consider the music recommendation shown in Figure 1, where the “listen to 'Castle on the Hill'” behavior of user Alice can be referred by the following paths:

$p_{1}=\text{[Alice}\xrightarrow{\text{Interact}}\text{Shape of You}\xrightarrow{\text{IsSongOf}}\div\xrightarrow{\text{ContainSong}}\text{Castle on the Hill]}$;
$p_{2}=~\text{[Alice}\xrightarrow{\text{Interact}}\text{Shape of You}\xrightarrow{\text{SungBy}}\text{Ed Sheeran}\xrightarrow{\text{IsSingerOf}}\text{Castle on the Hill]}$.
$p_{3}=~\text{[Alice}\xrightarrow{\text{Interact}}\text{Shape of You}\xrightarrow{\text{InteractedBy}}\text{Tony}\xrightarrow{\text{Interact}}\text{Castle on the Hill]}$;

These paths from the same user Alice to the same item, "Castle on the Hill," obviously express their different multiple-step relations, and implies various compositional semantics and possible explanations of the listen behavior. In particular, $p_{1}$ and $p_{2} $ infer that Alice may prefer songs that belonging to the album $\div$ and the songs sung by Ed Sheeran, while $p_{3}$ reflects the collaborative filtering (CF) effect: similar users tend to have similar preferences.

Therefore, from the view of reasoning, we consume the connectivity along all paths to learn compositional relation representations, and weighted pool them together for predicting the interact relationship between the user and the target item.

Task Definition: Our task can be formulated as follows: given a user $u$, a target item $i$, and a set of paths ${P}(u,i)=\{p_{1},p_{2},\cdots,p_{K}\}$ connecting $u$ and $i$, the holistic goal is to estimate the interaction by:

$
\hat{y}_{ui}=f_{\Theta}(u,i|{P}(u,i)),
% \hat{y}_{ui}=\Space{P}(\tau|\Set{P}),
% \hat{y}_{ui}=\Space{P}((u,\text{interact},i)|\Set{P}),
$
where $f$ denotes the underlying model with parameters $\Theta$, and $\hat{y}_{ui}$ presents the predicted score for the user-item interaction. Distinct from embedding-based methods, we can explain $\hat{y}_{ui}$ as the plausibility score of the triplet $\tau=(u,{interact},i)$ inferred by the connectivity ${P}(u,i)$.

Modeling: KPRN takes a set of paths of each user-item pair as input and outputs a score indicating how possible the user will interact the target item. As illustrated in Figure 2, there are three key components: (1) the embedding layer to project three types of ID information: the entity, entity type, and the relation pointing to the next node into a latent space, (2) the LSTM layer that encodes the elements sequentially with the goal of capturing the compositional semantics of entities conditioned on relations, and (3) the pooling layer to combine multiple paths and output the final score of the given user interacting the target item.

Embedding layer

Given a path $p_{k}$, we project the type (\eg person or movie) and specific value (\eg Peter Jackson or The Hobbit II) of each entity into two separate embedding vectors, ${e}_{l}$$\in$${R}^{d}$ and ${e}'_{l}\in{R}^{d}$, where $d$ is the embedding size.

In real-world scenarios, it is common that the same entity-entity pairs may have different semantics due to different relations connecting them.
Such differences may reveal the diverse intents about why a user selected the item. As an example, let ({Ed Sheeran, IsSingerOf, Shape of You}) and ({Ed Sheeran, IsSongwriterOf, Shape of You}) be the triplets in two paths referring a user's preferences.

Without specifying the relations, these paths will be represented as the same embeddings, regardless of the possibility that the user only prefers songs sung by Ed Sheeran, rather than that written by Ed Sheeran. We hence believe that it is important to explicitly incorporate the semantics of the relations into the path representation learning.

Towards this end, each relation $r_{l}$ in $p_{k}$ is represented as an embedding vector ${r}_{l}\in{R}^{d}$. As a result, we obtain a set of embeddings for path $p_{k}$, $[{e}_{1},{r}_{1},{e}_{2},\cdots,{r}_{L-1},{e}_{L}]$, where each element denotes an entity or a relation.

LSTM layer

With the embedding sequence to describe a path, we employ LSTM models to explore the sequential information and generate a single representation for encoding its holistic semantics. Such a long-term sequential pattern is crucial to reason on paths connecting a user and item entities to estimate the confidence of the ''interact'' relation.

At the path-step $l-1$, the LSTM layer outputs a hidden state vector ${h}_{l-1}$, consuming the subsequence $[e_{1},r_{1},\cdots,e_{l-1},r_{1-1}]$. Simultaneously, we concatenate the embedding of current entity $e_{l-1}$ and relation $r_{l-1}$ as the input vector:

${x}_{l-1}={e}_{l-1}\oplus{e}'_{l-1}\oplus{r}_{l-1}$

where $\oplus$ is the concatenation operation.

Note that for the last entity $e_{L}$, a null relation $r_{L}$ is padded to the end of path. As such, the input vector contains not only the sequential information, but also the semantic information of the entity and its relation to the next entity. Consequently, ${h}_{l-1}$ and ${x}_{l-1}$ are used to learn the hidden state of the next path-step $l$, which could be found through LSTM.

Having established the representation of path ${p}_{k}$, we aim to predict the plausibility of $\tau=(u,\text{interact},i)$. Towards this end, two fully-connected layers are adopted to project the final state into the predictive score for output, given by:

\begin{align}\label{equ:path-pred}
s(\tau|{p}_{k})={{W}}_{2}\text{ReLU}({{W}}_{1}{p}_{k}),
\end{align}

where ${W}_{1}$ and ${W}_{2}$ are the coefficient weights of the first and second layers respectively. Bias vectors are omitted form simplicity, and the rectifier is adopted as the activation function.

Weighted pooling layer

A user-item entity pair usually has a set of paths connecting them in a KG. Let ${S}=\{s_{1},s_{2},\cdots,s_{K}\}$ be the predictive scores for $K$ paths, ${P}(u,i)=\{p_{1},p_{2},\cdots,p_{K}\}$, connecting a user-item pair $(u,i)$. The final prediction could be the average of the scores of all paths, which is formulated as,

\begin{align}\label{equ:mean-pooling}
\hat{y}_{ui}=\sigma(\frac{1}{K}\sum_{k=1}^{K}s_{k}).
\end{align}

Nevertheless, prior studies suggest that different paths have varying contributions to model user preferences. Inspired by previous work~\cite{reasonchain,ACF}, we design a weighted pooling operation to aggregate scores of all paths. Here the pooling function is defined as follows,

\begin{align}\label{equ:per-path-score}
g(s_{1},s_{2},\cdots,s_{K})=\log\left[\sum_{k=1}^{K}\exp\left(\frac{s_{k}}{\gamma}\right)\right],
\end{align}
and the final prediction score is given by,
\begin{align}
\hat{y}_{ui}=\sigma(g(s_{1},s_{2},\cdots,s_{K})),
\end{align}

where $\gamma$ is the hyper-parameter to control each exponential weight.

Such pooling is capable of distinguishing the path importance, which is attributed by the gradient:

\begin{align}
\frac{\partial g}{\partial s_{k}}=\frac{\exp(s_{k}/\gamma)}{\gamma \sum_{k'}\exp(s_{k'}/\gamma)},
\end{align}

which is proportional to the score of each path during the back-propagation step.

Moreover, the pooling function endows the final prediction more flexibility. In particular, when setting $\gamma\rightarrow 0$, the pooling function can degenerate to max-pooling; whereas, it can degrade to mean-pooling by setting $\gamma\rightarrow\infty$. We conduct a case study on exploring the utility of the weighted pooling operation in the Case Studies section of the paper.

Learning: we treat the recommender learning task as a binary classification problem, where an observed user-item interaction is assigned a target value $1$, otherwise $0$. We use the pointwise learning methods to learn the parameters of our model.

In particular, the negative log-likelihood is adopted as the objective function, which is defined as follows,

\begin{align}
l=-\sum_{(u,i)\in{O}^{+}}\log\hat{y}_{ui}+\sum_{(u,j)\in{O}^{-}}\log(1-\hat{y}_{uj}),
\end{align}

where ${O}^{+}=\{(u,i)|y_{ui}=1\}$ and ${O}^{-}=\{(u,j)|y_{uj}=0\}$ are the positive and negative user-item interaction pairs, respectively.
We conduct $L_{2}$ regularization on the trainable parameters $\Theta$, which is omitted here for simplicity, to avoid overfitting.

Experiments

We performed experiments on two real-world datasets (movie item recommendation: MovieLens-1M and IMDb datasets, named MI, and music recommendation: KKBox) to evaluate our proposed method.

We aimed to answer the following research questions:

RQ1: Compared with the state-of-the-art KG-enhanced methods, how does our method perform?
RQ2: How does the multi-step path modeling (\eg the incorporation of both entity and relation types) affect KPRN?
RQ3: Can our proposed method reason on paths to infer user preferences towards items?

We process the datasets as: if a user rates a movie or has an interaction record with a song, we set the user-movie or user-song pair as the observed positive feedback with the target value of $1$, and $0$ otherwise.

For each dataset, we hold out the $80\%$ and $20\%$ interaction history of each user randomly to construct the training and test sets.

For each positive user-item interaction pair in the training set, we conducted the negative sampling strategy to pair it with four negative items that the user has not interacted with.

During the test stage, the ratio between positive and negative interactions is set as $1:100$, namely, $100$ negative items are randomly sampled and pair with one positive item in the testing set.

Path Extraction: In practice, it is labor intensive and infeasible to fully exploring all connected paths over the KG. Especially, the number of paths grows exponentially the length of path, where millions of interlinks will be generated. As suggested in prior efforts (Sun et al. 2011; Shu et al. 2018), truncating all paths at a certain length and disregarding remote connections are sufficient to model the connectivity between a user-item pair. Moreover, as pointed out by (Sun et al. 2011), paths with length greater than six will introduce noisy entities. Therefore, we extract all qualified paths, each with length up to six, that connect all user-item pairs.

RQ1: Our method KPRN substantially outperforms MF (Rendle et al. 2009), NFM (He and Chua 2017), CKE (Zhang et al. 2016) and FMG (Zhao et al. 2017) hit@$K$ and ndcg@$K$, achieving the best performance.

RQ2: We set up another method KPRN-r without considering relations in paths. The performance of KPRN-r decreases on both datasets. This justifies our intuition that specifying different relations is of importance to capture the path semantics, especially when the same entities are involved.

RQ3: The desirable property of KPRN is to reason on paths to infer the user preferences towards target items and generate reasonable explanations. This is because our model captures the higher-level semantics from these key factors: entity, entity type, and relation. To demonstrate this, we show an example drawn from KPRN on a movie recommendation task. We randomly select a user, whose ID is u4825 in MovieLens-1M, and select the movie "Shakespeare in Love" from her interaction record. We then extract all the qualified paths connecting the user-item pair and present the subgraph in Figure 3. We have several observations.

Collaborative filtering effect plays a pivotal rule to recommend the movie "Shakespeare in Love" to the user, since the interaction behaviors from other users (e.g., u940 and u5448) are involved in two paths. In particular, the path containing u5448 offers the high contribution score of 0.356 to infer the user's interest.

The target item is connected to what u4825 has watched before {Rush Hour, Titanic, and Fantasia} by the shared knowledge entities, such as actor {Tom Wilkinson} and director{James Algar}. This shows that KPRN is capable of extending user interests along KG paths.

Analyzing these three paths jointly, we find that different paths describe the user-item connectivity from dissimilar angles, which can be treated as the evidence why the item is suitable for the user. Specially, we can offer path-wise explanations such as {Shakespeare in Love is recommended since you have watched Rush Hour acted by the same actor Tom Wilkinson} or {since it is similar to Titanic that you watched before}. This case demonstrates KPRN's capacity of providing informative explanations.

Figure 3. Visualization of three paths with prediction scores for the user of u4825 in the MI dataset. The prediction scores are normalized for illustration.

The details are discussed in our paper “Explainable Reasoning over Knowledge Graphs for Recommendation.”

Conclusion

In this work, we exploit knowledge graphs to construct paths as extra user-item connectivity, which is complementary to user-item interactions. We propose a knowledge-aware path recurrent network to generate representation for each path by composing semantics of entities and relations. By adopting LSTM on paths, we can capture the sequential dependencies of elements and reason on paths to infer user preference. Hopefully, this article gave you some insights into why these tips are important. Please read “Explainable Reasoning over Knowledge Graphs for Recommendation,” AAAI 2019 for more details.

Acknowledgements

This work is supported by eBay, Search Science Shanghai Director Hua Yang, Manager Wu Xiaoyuan, and Intern Zhang Mohan.

Reference

Chaudhari, S.; Azaria, A.; and Mitchell, T. M. 2016. An entity graph based recommender system. In RecSys.

Chen, J.; Zhang, H.; He, X.; Nie, L.; Liu, W.; and Chua, T.-S. 2017. Attentive collaborative filtering: Multimedia recommendation with item- and component-level attention. In SIGIR, 335–344.

He, X., and Chua, T. 2017. Neural factorization machines for sparse predictive analytics. In SIGIR, 355–364.

McCallum, A.; Neelakantan, A.; Das, R.; and Belanger, D. 2017. Chains of reasoning over entities, relations, and text using recurrent neural networks. In EACL, 132–141.

Neelakantan, A.; Roth, B.; and McCallum, A. 2015. Compositional vector space models for knowledge base completion. In ACL, 156–166.

Rendle, S.; Freudenthaler, C.; Gantner, Z.; and SchmidtThieme, L. 2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI, 452–461.

Shu, Z.; Yang, J.; Zhang, J.; Bozzon, A.; Huang, L.-K.; and Xu, C. 2018. Recurrent knowledge graph embedding for effective recommendation. In RecSys.

Sun, Y., and Han, J. 2012. Mining heterogeneous information networks: a structural analysis approach. SIGKDD 14(2):20–28.

Sun, Y.; Han, J.; Yan, X.; Yu, P. S.; and Wu, T. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB 4(11):992– 1003.

Yu, X.; Ren, X.; Sun, Y.; Gu, Q.; Sturt, B.; Khandelwal, U.; Norick, B.; and Han, J. 2014. Personalized entity recommendation: a heterogeneous information network approach. In WSDM, 283–292.

Zhang, F.; Yuan, N. J.; Lian, D.; Xie, X.; and Ma, W. 2016. Collaborative knowledge base embedding for recommender systems. In SIGKDD, 353–362.

Zhao, H.; Yao, Q.; Li, J.; Song, Y.; and Lee, D. L. 2017. Meta-graph based recommendation fusion over heterogeneous information networks. In SIGKDD, 635–644.

Tags: Deep Learning, Knowledge Graphs, Machine Learning