{"id":4812,"date":"2015-07-08T09:00:34","date_gmt":"2015-07-08T16:00:34","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/msr_er\/?p=4812"},"modified":"2016-07-20T07:29:07","modified_gmt":"2016-07-20T14:29:07","slug":"recent-progress-on-language-and-vision-observations-from-naacl-and-cvpr-2015","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/recent-progress-on-language-and-vision-observations-from-naacl-and-cvpr-2015\/","title":{"rendered":"Recent progress on language and vision: Observations from NAACL and CVPR 2015"},"content":{"rendered":"<p>I recently had the opportunity to attend two interesting conferences, NAACL (North America Association of Computational Linguistics) and CVPR (Computer Vision and Pattern Recognition). They are top conferences in the fields of natural language processing (NLP) and computer vision, respectively, and traditionally, their audiences are quite different. Along with all the exciting advances, I noticed an interesting coincidence: This year both conferences included a session on language and vision. However, perhaps this is <em>not<\/em> a coincidence. The merging of language and vision is one of the most active areas in NLP and computer vision research, and the community has made significant progress in this area over the past year, largely driven by recent breakthroughs in deep learning technologies and their applications in vision and language.<\/p>\n<p>When I was at CVPR, I heard one question repeated several times: Why should vision researchers care about language processing? In computer vision, researchers recently built <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.technologyreview.com\/view\/530561\/the-revolutionary-technique-that-quietly-changed-machine-vision-forever\/\" target=\"_blank\">very deep convolutional neural networks<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (CNN), achieved an impressively low error rate in <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/image-net.org\/challenges\/LSVRC\/2014\/results\" target=\"_blank\">large-scale image classification tasks<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and even reached <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/blogs.technet.com\/b\/inside_microsoft_research\/archive\/2015\/02\/10\/microsoft-researchers-algorithm-sets-imagenet-challenge-milestone.aspx\" target=\"_blank\">human-level image classification accuracy<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> early this year. These accomplishments were achieved while requiring little understanding about language processing.<\/p>\n<p>In order to teach a computer to predict the category of a given image, one way is for researchers to first annotate each image in a training set with a category label (called \u201csupervision\u201d) from a predefined list of 1000 categories. Through trial-and-error training, the computer learns how to classify an image.<\/p>\n<p>However, it\u2019s a different situation when we want computers to <em>understand<\/em> complex scenes. In such cases, it is usually not possible to define simply by category all fine-grained, subtle differences among these scenes. Instead, the best supervision is full descriptions in natural language.<\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/frisbeephoto-1024x377.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4832\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/frisbeephoto-1024x377.png\" alt=\"frisbeephoto-1024x377\" width=\"733\" height=\"270\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/frisbeephoto-1024x377.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/frisbeephoto-1024x377-300x110.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/frisbeephoto-1024x377-768x283.png 768w\" sizes=\"auto, (max-width: 733px) 100vw, 733px\" \/><\/a><\/p>\n<p style=\"text-align: center\"><em>An example from the MS COCO dataset. (Microsoft Research)<\/em><\/p>\n<p>Although the photos could be quite diverse, each description, annotated by a human, focuses on salient information in the image, delivers a coherent story with clear and consistent semantic meaning, and reflects certain common knowledge. These descriptions provide very rich information about the meaning of the picture, from a human point of view, and could serve as the supervision to train the computer to understand the image as a human does.<\/p>\n<p>Another problem is how to test whether the computer understands the image. One way is to give the computer an image and ask the computer to generate a descriptive caption about it. We can discover how well the computer understands the image by looking at the generated caption through a Turing Test, in which we weigh the quality of the computer\u2019s output against the output generated by humans for the same photograph.<\/p>\n<p>For both of these training and evaluation problems, it is necessary for the computer to understand language. As a result, language processing is of particular importance for vision researchers for building strong artificial intelligence in vision.<\/p>\n<p>In order to facilitate the research of image understanding, Microsoft sponsored the creation of the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/mscoco.org\/\" target=\"_blank\">COCO (Common Objects in Context) dataset<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, the largest image captioning dataset available to the public. The availability of the data was in part the motivation that led some groups from academic and industry institutes to work on solving the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/blogs.technet.com\/b\/machinelearning\/archive\/2014\/11\/18\/rapid-progress-in-automatic-image-captioning.aspx\" target=\"_blank\">auto image captioning problem<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in the last year.<\/p>\n<p>In order to have a rigorous comparison of different approaches and speed up the research for the community, the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/mscoco.org\/people\/\" target=\"_blank\">MS COCO committee<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> organized an <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/mscoco.org\/dataset\/#cap2015\" target=\"_blank\">image captioning challenge<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> at CVPR2015. There were a total of fifteen entries, and the top teams were invited to present their systems at the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/lsun.cs.princeton.edu\/\" target=\"_blank\">LSUN workshop<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> at CVPR.<\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/blogs.technet.com\/b\/inside_microsoft_research\/archive\/2015\/06\/11\/microsoft-researchers-tie-for-best-image-captioning-technology.aspx\" target=\"_blank\">Microsoft tied Google<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for first prize at the competition. The MSR entry had slightly more outputs passing the Turing Test compared to the Google entry, though MSR generated slightly fewer captions that were better or equal to those generated by human annotators.<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/google-system.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4842\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/google-system.png\" alt=\"google-system\" width=\"457\" height=\"286\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/google-system.png 905w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/google-system-300x188.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/google-system-768x480.png 768w\" sizes=\"auto, (max-width: 457px) 100vw, 457px\" \/><\/a><\/p>\n<p style=\"text-align: center\"><em>Google\u2019s system uses a two-stage approach for direct caption generation. (Google)<\/em><\/p>\n<p>Despite the tie, Microsoft and Google have quite different systems. Google uses a CNN to generate a whole-image feature vector, then feed it into a recurrent neural network (RNN) based language model to generate the caption directly.<\/p>\n<p>In contrast, Microsoft\u2019s system takes a three-stage approach. \u00a0First, we tuned a CNN to detect multiple objects in the input image\u2014not only nouns, but also verbs and adjectives (in this example, <em>woman, crowd, cat, camera, holding, purple<\/em>; note that this stage might introduce detection noise, such as <em>cat<\/em>).<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/07\/MS-system.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4851\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/07\/MS-system.png\" alt=\"MS-system\" width=\"820\" height=\"176\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/07\/MS-system.png 2045w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/07\/MS-system-300x64.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/07\/MS-system-768x165.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/07\/MS-system-1024x220.png 1024w\" sizes=\"auto, (max-width: 820px) 100vw, 820px\" \/><\/a><em>Microsoft\u2019s system takes a three-stage approach. (Microsoft Research)<\/em><\/p>\n<p>Second, we proposed an image-conditioned maximum entropy language model (LM) to generate 500 caption candidates that covered the detected words (<em>a purple camera with a woman, a woman holding a camera in a crowd, a woman holding a cat.<\/em>) The LM injects words such as <em>a<\/em> and <em>with<\/em> to make the caption read fluently.<\/p>\n<p>Third, we developed a deep multimodal similarity model (DMSM) that re-ranks and selects the optimal caption that captures the overall semantic content of the image. (Note the noise in word detection and LM stages: the DMSM will pick the semantically correct caption <em>a woman holding a camera in a crowd,<\/em> and drop others that don\u2019t match the global semantics of the image such as <em>a purple camera with a woman<\/em> and <em>a woman holding a cat.<\/em>)<\/p>\n<p>Compared to the CNN->RNN approach Google and others adopted, our system can trace back and locate the region in the image for key words in the caption that the system generated, thanks to the explicit word detection. So it can provide grounded evidences for the generated caption and add interpretability to the outcome.<\/p>\n<p>The results indicate that the current state-of-the-art system is about halfway to passing the whole Turing Test\u2014that is, the MSR system scored 32.2% while a human scored 67.5%. (Note that even humans cannot pass with 100% since people disagree on the best way to describe an image. This also reflects the difficulty and ambiguity in the task of computer understanding of an image.)<\/p>\n<p>Continuing in this line of investigation, researchers are already exploring advanced algorithms that <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.cs.utexas.edu\/~ai-lab\/pub-view.php?PubID=127495\" target=\"_blank\">translate videos to natural language<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.bloomberg.com\/news\/articles\/2015-05-22\/what-s-in-this-picture-ai-becomes-as-smart-as-a-toddler\" target=\"_blank\">answer questions related to the content of an image<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. In the future, by connecting speech, language, vision, and knowledge bases, we are looking forward to building a universal intellectual system that would blur the boundary between the machine and the human, with ubiquitous and invisible computational intelligence.<\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/xiaodong.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-4822\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/xiaodong.jpg\" alt=\"xiaodong\" width=\"301\" height=\"301\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/xiaodong.jpg 503w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/xiaodong-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/xiaodong-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/xiaodong-180x180.jpg 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/xiaodong-360x360.jpg 360w\" sizes=\"auto, (max-width: 301px) 100vw, 301px\" \/><\/a><\/p>\n<p><strong>Xiaodong He<\/strong> is a researcher in the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/dltc\" target=\"_blank\">Deep Learning Technology Center<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0of\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/\" target=\"_blank\">Microsoft Research<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in\u00a0Redmond, Washington. He is also an\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.ee.washington.edu\/people\/faculty\/affiliate.html\" target=\"_blank\">Affiliate Professor<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0in the Department of Electrical Engineering at the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.washington.edu\/\" target=\"_blank\">University of Washington<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0(Seattle)\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/faculty.washington.edu\/xiaohe\/\" target=\"_blank\">serving in the PhD reading committee<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. His\u00a0research interests include\u00a0deep learning, speech, natural language, vision, information retrieval, and knowledge management.\u00a0He and\u00a0his colleagues developed the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/pubs\/155473\/NIST_MT08_sys_desc_MSR-NRC-SRI_Chinese.pdf\" target=\"_blank\">MSR-NRC-SRI entry<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0and the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/pubs\/163079\/IWSLT2011_MSR_v04.pdf\" target=\"_blank\">MSR entry<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0which\u00a0were\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.nist.gov\/speech\/tests\/mt\/2008\/doc\/mt08_official_results_v0.html\" target=\"_blank\">No. 1\u00a0<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>in the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.nist.gov\/speech\/tests\/mt\/2008\/\" target=\"_blank\">2008 NIST MT Evaluation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0and the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.mt-archive.info\/IWSLT-2011-Federico.pdf\" target=\"_blank\">2011 IWSLT Evaluation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Chinese-English, respectively. He and colleagues also developed the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/blogs.microsoft.com\/next\/2015\/05\/28\/picture-this-microsoft-research-project-can-interpret-caption-photos\/\" target=\"_blank\">MSR image captioning system\u00a0<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>that won the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/mscoco.org\/dataset\/#leaderboard-cap\" target=\"_blank\">1st Prize<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0in the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/mscoco.org\/dataset\/#cap2015\" target=\"_blank\">MS COCO Captioning Challenge 2015<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. He has published\u00a0in\u00a0Proc. IEEE, IEEE TASLP, IEEE SPM, ICASSP, ACL, EMNLP, NAACL, CVPR, SIGIR, WWW,\u00a0CIKM, ICLR.<\/p>\n<p>His current research focus is on\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/people\/xiaohe\/#deep_learning_work\" target=\"_blank\">deep learning\u00a0for semantics and applications to text, vision, information retrieval, and knowledge graph<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Relevant\u00a0studies are summarized in\u00a0the recent\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/pubs\/232372\/CIKM14_tutorial_HeGaoDeng.pdf\" target=\"_blank\">tutorial at\u00a0CIKM 2014<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. More details can be found at the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/projects\/dssm\/\" target=\"_blank\">DSSM\u00a0site<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<p>He has held editorial positions on several IEEE Journals and has served on the organizing committee\/program committee of major speech and language processing conferences. He is a senior member of IEEE and a member of ACL.<\/p>\n<p>He received a BS degree from\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.tsinghua.edu.cn\/\" target=\"_blank\">Tsinghua University<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0(Beijing) in 1996, MS degree from\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/english.cas.cn\/\" target=\"_blank\">Chinese Academy of Sciences<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0(Beijing) in 1999,\u00a0and a PhD degree from the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.missouri.edu\/\" target=\"_blank\">University of Missouri, Columbia<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, in 2003.<\/p>\n<p>For more computer science research news, visit <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.researchnews.com\/\" target=\"_blank\">ResearchNews.com<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently had the opportunity to attend two interesting conferences, NAACL (North America Association of Computational Linguistics) and CVPR (Computer Vision and Pattern Recognition). They are top conferences in the fields of natural language processing (NLP) and computer vision, respectively, and traditionally, their audiences are quite different. Along with all the exciting advances, I noticed [&hellip;]<\/p>\n","protected":false},"author":32627,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194471,194455,194456,194459],"tags":[195101,195138,186925,195882,193504,186936,196682],"research-area":[],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-4812","post","type-post","status-publish","format-standard","hentry","category-computer-vision","category-machine-learning","category-natural-language-processing","category-researchnews","tag-common-objects-in-context-coco-dataset","tag-computer-vision-and-pattern-recognition-cvpr","tag-deep-learning","tag-image-captioning-challenge","tag-microsoft-research","tag-natural-language-processing","tag-north-america-association-of-computational-linguistics-naacl","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"July 8, 2015","formattedExcerpt":"I recently had the opportunity to attend two interesting conferences, NAACL (North America Association of Computational Linguistics) and CVPR (Computer Vision and Pattern Recognition). They are top conferences in the fields of natural language processing (NLP) and computer vision, respectively, and traditionally, their audiences are&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/4812","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/32627"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=4812"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/4812\/revisions"}],"predecessor-version":[{"id":260739,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/4812\/revisions\/260739"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=4812"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=4812"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=4812"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=4812"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=4812"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=4812"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=4812"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=4812"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=4812"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=4812"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=4812"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}