Abstract

Recent online services rely heavily on automatic personalization to recommend relevant content to a large number of users. This requires systems to scale promptly to accommodate the stream of new users visiting the online services for the first time. In this work, we propose a content-based recommendation system to address both the recommendation quality and the system scalability. We propose to use a rich feature set to represent users, according to their web browsing history and search queries. We use a Deep Learning approach to map users and items to a latent space where the similarity between users and their preferred items is maximized. We extend the model to jointly learn from features of items from different domains and user features by introducing a multi-view Deep Learning model. We show how to make this rich-feature based user representation scalable by reducing the dimension of the inputs and the amount of training data. The rich user feature representation allows the model to learn relevant user behavior patterns and give useful recommendations for users who do not have any interaction with the service, given that they have adequate search and browsing history. The combination of different domains into a single model for learning helps improve the recommendation quality across all the domains, as well as having a more compact and a semantically richer user latent feature vector. We experiment with our approach on three real-world recommendation systems acquired from different sources of Microsoft products: Windows Apps recommendation, News recommendation, and Movie/TV recommendation. Results indicate that our approach is significantly better than the state-of-the-art algorithms (up to 49% enhancement on existing users and 115% enhancement on new users). Scalability analysis show that our multi-view DNN model can easily scale/expand to encompass millions of users and items with billions of entries. Experimental results also confirm that combining features from all domains produces much better performance than building separate models for each domain.