Leveraging Demonstrations for Reinforcement Recommendation Reasoning over Knowledge Graphs
Knowledge graphs, which provide structured auxiliary facts about items in heterogeneous graphs, have been widely adopted to improve the recommendation accuracy. The multi-hop user-item connections on knowledge graphs also endow reasoning about why an item is recommended. However, reasoning on paths is a complex combinatorial optimization problem, in which feasible solutions consist of multi-hop paths that connect users with items. Traditional recommendation methods usually adopt brute-force methods to find feasible paths, which results in issues related to convergence and explainability. In this paper, we address these issues by better supervising the path finding process. The key idea is to extract imperfect path demonstrations with minimum labeling efforts and effectively leverage these demonstrations to guide path finding. In particular, we design a demonstration-based knowledge graph reasoning framework for explainable recommendation. We also propose an ADversarial Actor-Critic (ADAC) model for the demonstration-guided path finding. This model provides a unified
solution for optimizing the path finding policy (actor) by jointly and effectively modeling both demonstrations (via adversarial imitation) and reward signals obtained based on historical user preferences (accurately estimated by using the critic). Extensive experiments on three real-world benchmarks show that our method converges much quicker than the state-of-art baseline and achieves better recommendation accuracy and explainability.