{"id":615306,"date":"2019-10-22T08:00:49","date_gmt":"2019-10-22T15:00:49","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=615306"},"modified":"2019-10-23T16:28:21","modified_gmt":"2019-10-23T23:28:21","slug":"getting-a-better-visual-reppoints-detect-objects-with-greater-accuracy-through-flexible-and-adaptive-object-modeling","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/getting-a-better-visual-reppoints-detect-objects-with-greater-accuracy-through-flexible-and-adaptive-object-modeling\/","title":{"rendered":"Getting a better visual: RepPoints detect objects with greater accuracy through flexible and adaptive object modeling"},"content":{"rendered":"<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-615645 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788.png\" alt=\"Illustration depicting RepPoints detecting objects with greater accuracy through flexible and adaptive object modeling\" width=\"1400\" height=\"788\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-1280x720.png 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/a><\/p>\n<p>Visual understanding tasks are typically centered on objects, such as human pose tracking in Microsoft Kinect and obstacle avoidance in autonomous driving. In the deep learning era, these tasks follow a paradigm where bounding boxes are localized in an image, features are extracted within the bounding boxes, and object recognition and reasoning are performed based on these features.<\/p>\n<p>The use of bounding boxes as the intermediate object representation has\u00a0long\u00a0been\u00a0the convention\u00a0because of\u00a0their\u00a0practical advantages.\u00a0One\u00a0advantage\u00a0is that they are\u00a0easy\u00a0for users to\u00a0annotate\u00a0with little ambiguity. Another is that\u00a0their\u00a0structure\u00a0is convenient for\u00a0feature extraction via grid sampling.<\/p>\n<p>However, bounding boxes come with disadvantages as well.\u00a0As illustrated below, the geometric information revealed by a bounding box is coarse. It cannot describe more fine-grained information such as\u00a0different human poses. In addition, feature extraction by grid sampling is inaccurate, as it may not conform to semantically meaningful image areas. As seen in the figure, many features are extracted on the background rather than on the foreground object.<\/p>\n<div id=\"attachment_615312\" style=\"width: 869px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/tennis-player.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-615312\" class=\"wp-image-615312 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/tennis-player.jpg\" alt=\"Bounding boxes of a tennis player. The boxes provide only a coarse geometric description, and feature extraction locations (denoted by yellow dots) may not lie on the foreground object.\" width=\"859\" height=\"172\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/tennis-player.jpg 859w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/tennis-player-300x60.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/tennis-player-768x154.jpg 768w\" sizes=\"auto, (max-width: 859px) 100vw, 859px\" \/><\/a><p id=\"caption-attachment-615312\" class=\"wp-caption-text\">Bounding boxes of a tennis player. The boxes provide only a coarse geometric description, and feature extraction locations (denoted by yellow dots) may not lie on the foreground object.<\/p><\/div>\n<p>In\u00a0our paper\u00a0that\u00a0will be presented\u00a0at\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/iccv2019.thecvf.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">ICCV 2019<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,\u00a0\u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/reppoints-point-set-representation-for-object-detection\/\">RepPoints: Point Set Representation for Object Detection<\/a>,\u201d our team of researchers at Microsoft Research Asia\u00a0introduce an alternative to bounding boxes in the form of a set of points. This point set representation, which we call\u00a0<i>RepPoints<\/i>,\u00a0can conform to object pose or shape, as shown in the figure below.\u00a0RepPoints\u00a0learn to adaptively position themselves over an object in a manner that circumscribes the object\u2019s spatial extent and\u00a0indicates\u00a0semantically significant\u00a0local\u00a0regions.\u00a0In this way, they provide a more detailed geometric description of an object, while pointing to\u00a0areas from which useful features\u00a0for recognition\u00a0may be extracted.<\/p>\n<div id=\"attachment_615318\" style=\"width: 804px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/reppoints-tennis-player.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-615318\" class=\"wp-image-615318 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/reppoints-tennis-player.jpg\" alt=\"RepPoints on the same tennis player. In comparison to bounding boxes, RepPoints reveal greater geometric detail of an object and identify better locations for feature extraction.\" width=\"794\" height=\"164\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/reppoints-tennis-player.jpg 794w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/reppoints-tennis-player-300x62.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/reppoints-tennis-player-768x159.jpg 768w\" sizes=\"auto, (max-width: 794px) 100vw, 794px\" \/><\/a><p id=\"caption-attachment-615318\" class=\"wp-caption-text\">RepPoints on the same tennis player. In comparison to bounding boxes, RepPoints reveal greater geometric detail of an object and identify better locations for feature extraction.<\/p><\/div>\n<h3>RepPoints\u00a0identify key points of interest without explicit supervision<\/h3>\n<p>The way it works\u00a0is rather simple.\u00a0Given a source point near an object center (marked\u00a0in the figure\u00a0below in red), the network applies 3&#215;3 convolutions on the point\u2019s feature to regress 2D offsets from the source point to multiple target points\u00a0in the image\u00a0(shown in green\u00a0in the figure below),\u00a0which together comprise the\u00a0RepPoints\u00a0representation.\u00a0As can be seen in the figure below, this allows for more accurate\u00a0key\u00a0point detection when compared with bounding boxes.\u00a0The source points are uniformly sampled across\u00a0the image, without the need to additionally\u00a0hypothesize over\u00a0multiple anchors as\u00a0is\u00a0done in bounding box-based techniques.<\/p>\n<div id=\"attachment_615324\" style=\"width: 954px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/3x3-conv.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-615324\" class=\"wp-image-615324 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/3x3-conv.jpg\" alt=\"RepPoints (in green) are regressed over an object from a central source point (in red) via 3x3 convolutions.\" width=\"944\" height=\"286\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/3x3-conv.jpg 944w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/3x3-conv-300x91.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/3x3-conv-768x233.jpg 768w\" sizes=\"auto, (max-width: 944px) 100vw, 944px\" \/><\/a><p id=\"caption-attachment-615324\" class=\"wp-caption-text\">RepPoints (in green) are regressed over an object from a central source point (in red) via 3&#215;3 convolutions.<\/p><\/div>\n<p>RepPoints\u00a0are learned through the implementation of two processes: localization supervision and recognition supervision.\u00a0To drive the learning of\u00a0RepPoints, our method utilizes\u00a0bounding box\u00a0information to constrain the point locations. This localization supervision is illustrated by the\u00a0upper\u00a0branch of the figure below, where a pseudo box formed from\u00a0RepPoints\u00a0needs to closely match the ground-truth bounding box. In addition,\u00a0the learning is guided by\u00a0recognition supervision\u00a0in the lower branch, which\u00a0favors point locations where the features aid in object recognition.<\/p>\n<div id=\"attachment_615645\" style=\"width: 1410px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-615645\" class=\"wp-image-615645 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788.png\" alt=\"Illustration depicting RepPoints detecting objects with greater accuracy through flexible and adaptive object modeling\" width=\"1400\" height=\"788\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-1280x720.png 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/a><p id=\"caption-attachment-615645\" class=\"wp-caption-text\">RepPoints are learned through two forms of supervision: localization from ground-truth bounding boxes and recognition from object class labels.<\/p><\/div>\n<p>A visualization of learned\u00a0RepPoints\u00a0and the corresponding detection results are shown below for various kinds of objects.\u00a0It can be seen that\u00a0RepPoints\u00a0tend to be located at extreme points or key semantic points of objects. These point distributions are automatically learned without explicit supervision.<\/p>\n<div id=\"attachment_615330\" style=\"width: 839px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/reppoints-object-detection.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-615330\" class=\"wp-image-615330 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/reppoints-object-detection.jpg\" alt=\"RepPoints distributions and object detection results. Without explicit supervision, RepPoints identify and learn both extreme and key semantic points of objects accurately when objects are out of focus in an image (upper left), objects overlap (upper middle), objects vary in size (upper right), and object boundaries are, for various reasons, ambiguous (bottom row). \" width=\"829\" height=\"400\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/reppoints-object-detection.jpg 829w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/reppoints-object-detection-300x145.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/reppoints-object-detection-768x371.jpg 768w\" sizes=\"auto, (max-width: 829px) 100vw, 829px\" \/><\/a><p id=\"caption-attachment-615330\" class=\"wp-caption-text\">RepPoints distributions and object detection results. Without explicit supervision, RepPoints identify and learn both extreme and key semantic points of objects accurately when objects are out of focus in an image (upper left), objects overlap (upper middle), objects vary in size (upper right), and object boundaries are, for various reasons, ambiguous (bottom row).<\/p><\/div>\n<h3>Performance on\u00a0COCO\u00a0benchmark and comparison to other object detectors<\/h3>\n<p>Our experiments on the COCO object detection benchmark show appreciable performance gains from changing the object representation from bounding boxes to\u00a0RepPoints. With ResNet-50 or ResNet-101 as the network backbone,\u00a0RepPoints\u00a0obtain improvements of +2.1\u00a0mAP\u00a0(mean Average Precision)\u00a0or +2.0\u00a0mAP, respectively, as seen in the table\u00a0below\u00a0on the left.\u00a0Reported in the\u00a0second\u00a0table\u00a0below,\u00a0a\u00a0RepPoints\u00a0based object detector, denoted as\u00a0RPDet,\u00a0compares favorably to existing leading object detectors.\u00a0RPDet\u00a0is the most accurate anchor-free detector to date (anchor-free methods are generally preferred due to the simplicity in use).<\/p>\n\t\t\t<div class=\"ms-grid \">\n\t\t\t<div class=\"ms-row\">\n\t\t\t\t\t<div  class=\"m-col-12-24\" >\n\t\t<div id=\"attachment_615861\" style=\"width: 1396px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_left.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-615861\" class=\"wp-image-615861 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_left.png\" alt=\"Bounding box vs. RepPoints\" width=\"1386\" height=\"397\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_left.png 1386w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_left-300x86.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_left-768x220.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_left-1024x293.png 1024w\" sizes=\"auto, (max-width: 1386px) 100vw, 1386px\" \/><\/a><p id=\"caption-attachment-615861\" class=\"wp-caption-text\">Bounding box vs. RepPoints<\/p><\/div><p>\t<\/div>\n\t \t<div  class=\"m-col-12-24\" >\n\t\t<\/p><div id=\"attachment_615858\" style=\"width: 1451px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_right.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-615858\" class=\"wp-image-615858 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_right.png\" alt=\"Our object detector (RPDet) is the most accurate anchor-free method to date when compared with other methods\u2019 accuracy.\" width=\"1441\" height=\"401\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_right.png 1441w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_right-300x83.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_right-768x214.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/table_right-1024x285.png 1024w\" sizes=\"auto, (max-width: 1441px) 100vw, 1441px\" \/><\/a><p id=\"caption-attachment-615858\" class=\"wp-caption-text\">Our object detector (RPDet) is the most accurate anchor-free method to date when compared with other methods\u2019 accuracy.<\/p><\/div><p>\t<\/div>\n\t<\/p>\t\t\t<\/div>\n\t\t<\/div>\n\t\t\n<p>Learning richer and more natural object representations like\u00a0RepPoints\u00a0is a direction that holds much promise for object detection\u00a0in general. The\u00a0descriptiveness of\u00a0RepPoints\u00a0may make it useful for other visual understanding tasks,\u00a0such as object segmentation,\u00a0as well.\u00a0If you are curious about\u00a0exploring\u00a0RepPoints\u00a0in more depth,\u00a0we encourage you to check out our source code, which is available on GitHub\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/RepPoints\" target=\"_blank\" rel=\"noopener noreferrer\">here<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Visual understanding tasks are typically centered on objects, such as human pose tracking in Microsoft Kinect and obstacle avoidance in autonomous driving. In the deep learning era, these tasks follow a paradigm where bounding boxes are localized in an image, features are extracted within the bounding boxes, and object recognition and reasoning are performed based [&hellip;]<\/p>\n","protected":false},"author":39507,"featured_media":615645,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Han Hu","user_id":"36771"},{"type":"user_nicename","value":"Steve Lin","user_id":"33735"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194471],"tags":[243777,186897,243774,201905,243765,195953,193504,243780,243771,243768],"research-area":[13562],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-615306","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-vision","tag-asia-lab","tag-computer-vision","tag-detection","tag-iccv","tag-iccv2019","tag-international-conference-on-computer-vision","tag-microsoft-research","tag-msr-blog","tag-point-set-representation","tag-reppoints","msr-research-area-computer-vision","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[610425],"related-researchers":[{"type":"user_nicename","value":"Steve Lin","user_id":33735,"display_name":"Steve Lin","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/stevelin\/\" aria-label=\"Visit the profile page for Steve Lin\">Steve Lin<\/a>","is_active":false,"last_first":"Lin, Steve","people_section":0,"alias":"stevelin"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-960x540.png\" class=\"img-object-cover\" alt=\"Illustration depicting RepPoints detecting objects with greater accuracy through flexible and adaptive object modeling\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/10\/MSFTResearch_20191009_RepPoints_RepPoints_Site_07_2019_1400x788.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"Han Hu and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/stevelin\/\" title=\"Go to researcher profile for Steve Lin\" aria-label=\"Go to researcher profile for Steve Lin\" data-bi-type=\"byline author\" data-bi-cN=\"Steve Lin\">Steve Lin<\/a>","formattedDate":"October 22, 2019","formattedExcerpt":"Visual understanding tasks are typically centered on objects, such as human pose tracking in Microsoft Kinect and obstacle avoidance in autonomous driving. In the deep learning era, these tasks follow a paradigm where bounding boxes are localized in an image, features are extracted within the&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/615306","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=615306"}],"version-history":[{"count":17,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/615306\/revisions"}],"predecessor-version":[{"id":617193,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/615306\/revisions\/617193"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/615645"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=615306"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=615306"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=615306"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=615306"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=615306"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=615306"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=615306"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=615306"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=615306"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=615306"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=615306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}