{"id":296795,"date":"2016-09-23T02:09:19","date_gmt":"2016-09-23T09:09:19","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=296795"},"modified":"2018-10-16T21:45:56","modified_gmt":"2018-10-17T04:45:56","slug":"learning-distributed-representations-data-community-question-answering-question-retrieval","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/learning-distributed-representations-data-community-question-answering-question-retrieval\/","title":{"rendered":"Learning Distributed Representations of Data in Community Question Answering for Question Retrieval"},"content":{"rendered":"<p>We study the problem of question retrieval in community question answering (CQA). The biggest challenge within this task is lexical gaps between questions since similar questions are usually expressed with different but semantically related words. To bridge the gaps, state-of-the-art methods incorporate extra information such as word-to-word translation and categories of questions into the traditional language models. We find that the existing language model based methods can be interpreted using a new framework, that is they represent words and question categories in a vector space and calculate question-question similarities with a linear combination of dot products of the vectors. The problem is that these methods are either heuristic on data representation or difficult to scale up. We propose a principled and efficient approach to learning representations of data in CQA. In our method, we simultaneously learn vectors of words and vectors of question categories by optimizing an objective function naturally derived from the framework. In question retrieval, we incorporate learnt representations into traditional language models in an effective and efficient way. We conduct experiments on large scale data from Yahoo! Answers and Baidu Knows, and compared our method with state-of-the-art methods on two public data sets. Experimental results show that our method can significantly improve on baseline methods for retrieval relevance. On 1 million training data, our method takes less than 50 minutes to learn a model on a single multicore machine, while the translation based language model needs more than 2 days to learn a translation table on the same machine.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We study the problem of question retrieval in community question answering (CQA). The biggest challenge within this task is lexical gaps between questions since similar questions are usually expressed with different but semantically related words. To bridge the gaps, state-of-the-art methods incorporate extra information such as word-to-word translation and categories of questions into the traditional [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"ACM","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM'16)","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"533-542","msr_page_range_start":"533","msr_page_range_end":"542","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM'16)","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2016-02-08","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"http:\/\/dl.acm.org\/citation.cfm?id=2835786","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13545,13555],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-296795","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-human-language-technologies","msr-research-area-search-information-retrieval","msr-locale-en_us"],"msr_publishername":"ACM","msr_edition":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM'16)","msr_affiliation":"","msr_published_date":"2016-02-08","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"533-542","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"http:\/\/dl.acm.org\/citation.cfm?id=2835786","msr_doi":"","msr_publication_uploader":[{"type":"url","title":"http:\/\/dl.acm.org\/citation.cfm?id=2835786","viewUrl":false,"id":false,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":0,"url":"http:\/\/dl.acm.org\/citation.cfm?id=2835786"}],"msr-author-ordering":[{"type":"text","value":"Kai Zhang","user_id":0,"rest_url":false},{"type":"text","value":"Wei Wu","user_id":0,"rest_url":false},{"type":"text","value":"Fang Wang","user_id":0,"rest_url":false},{"type":"text","value":"Ming Zhou","user_id":0,"rest_url":false},{"type":"text","value":"Zhoujun Li","user_id":0,"rest_url":false},{"type":"user_nicename","value":"wuwei","user_id":34855,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=wuwei"},{"type":"user_nicename","value":"mingzhou","user_id":32942,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=mingzhou"}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[],"msr_group":[144735],"msr_project":[295931],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":295931,"post_title":"Chatbots and\u00a0Conversation As A Platform (CAAP)","post_name":"chatbots-conversation-platform-caap","post_type":"msr-project","post_date":"2016-09-21 23:16:41","post_modified":"2017-06-05 12:48:54","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/chatbots-conversation-platform-caap\/","post_excerpt":"At\u00a0Microsoft Build 2016 event, Microsoft CEO Satya Nadella said\u00a0that chatbots, as next big thing, will have\u00a0\u201cas profound an impact as previous shifts we\u2019ve had.\u201d\u00a0The past paradigm shifts include graphical user interface, the web browser and the touchscreen. Conversations As\u00a0A platform(CAAP) has\u00a0the promise of making booking a flight or buying a new shirt as easy as sending a text message,\u00a0with the potential to make computing more\u00a0accessible to users\u00a0on mobile devices. This group has been worked on&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/295931"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/296795","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/296795\/revisions"}],"predecessor-version":[{"id":538857,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/296795\/revisions\/538857"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=296795"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=296795"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=296795"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=296795"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=296795"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=296795"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=296795"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=296795"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=296795"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=296795"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=296795"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=296795"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=296795"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}