{"id":490736,"date":"2018-06-14T08:57:32","date_gmt":"2018-06-14T15:57:32","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=490736"},"modified":"2018-06-14T09:38:11","modified_gmt":"2018-06-14T16:38:11","slug":"believing-is-seeing-insightful-research-illuminates-the-newly-possible-in-the-realm-of-natural-and-synthetic-images","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/believing-is-seeing-insightful-research-illuminates-the-newly-possible-in-the-realm-of-natural-and-synthetic-images\/","title":{"rendered":"Believing is seeing: Insightful research illuminates the newly possible in the realm of natural and synthetic images"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-490757\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/06\/GangIdentity_CV_BHeader_06_2018_1000x400.jpg\" alt=\"\" width=\"1000\" height=\"400\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/06\/GangIdentity_CV_BHeader_06_2018_1000x400.jpg 1000w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/06\/GangIdentity_CV_BHeader_06_2018_1000x400-300x120.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/06\/GangIdentity_CV_BHeader_06_2018_1000x400-768x307.jpg 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/p>\n<p>A pair of groundbreaking papers in computer vision open new vistas on possibilities in the realms of creating very real-looking natural images and synthesizing realistic, identity-preserving facial images. In <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/openaccess.thecvf.com\/content_ICCV_2017\/papers\/Bao_CVAE-GAN_Fine-Grained_Image_ICCV_2017_paper.pdf\">CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, presented this past October at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/iccv2017.thecvf.com\/\">ICCV 2017<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in Venice, the team of researchers from Microsoft and the University of Science and Technology of China came up with a model for image generation based on a variational autoencoder generative adversarial network capable of synthesizing natural images in what are known as fine-grained categories. Fine-grained categories would include faces of specific individuals, say of celebrities, or real-world objects such as specific types of flowers or birds.<\/p>\n<p>The researchers \u2013 Dong Chen, Fang Wen and Gang Hua of Microsoft, Jianmin Bao, an intern at Microsoft Research, along with Houqiang Li of China\u2019s University of Science and Technology \u2013 in looking at how to better build effective generative models of natural images were grappling with a key problem in computer vision: how to generate very diverse and yet realistic images by varying a finite number of latent parameters as related to the natural distribution of any image in the world. The challenge lay in coming up with a generative model to capture that data. They opted for an approach using generative adversarial networks combined with a variational auto-encoder to come up with their learning framework. The approach models any image as a composition of label and latent attributes in a probabilistic model. By varying the fine-grained category label (say, \u201coriole\u201d or \u201cstarling\u201d for specific bird types, or the names of specific celebrities) that would be fed into the generative model, the team was able to synthesize images in specific categories using randomly drawn values with respect to the latent attributes. It\u2019s only recently that this kind of deep learning makes possible the modeling of the distribution of images of specific objects out in the world, allowing us to draw from that model to basically synthesize the image, <span style=\"display: inline !important; float: none; background-color: transparent; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;\">explained<\/span> <span style=\"display: inline !important; float: none; background-color: transparent; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;\">Gang Hua, principal researcher at Microsoft Research in Redmond, Washington<\/span>.<\/p>\n<p>\u201cOur approach has two novel aspects,\u201d said Hua. \u201cFirst, we adopted a cross-entropy loss for the discriminative and classifier network but opted for a mean discrepancy objective for the generative network.\u201d The resulting asymmetric loss function and its effect on the machine learning aspects of the framework were encouraging. \u201cAsymmetric loss actually makes the training of the GANs more stable,\u201d said Hua. \u201cWe designed an asymmetric loss to address the instability issue in training of vanilla GANs that specifically addresses numerical difficulties when matching two non-overlapping distributions.\u201d<\/p>\n<p>The other innovation was adopting an encoder network that could learn the relationship between the latent space and use pairwise feature matching to retain the structure of the synthesized images.<\/p>\n<p>Experimenting with natural images \u2013 genuine photographs of real things found in nature such as faces, flowers and birds, the researchers were able to show that their machine learning models could synthesize recognizable images with an impressive variety within very specific categories. The potential applications cover everything from image inpainting, to data augmentation and better facial recognition models.<\/p>\n<p>\u201cOur technology addressed a fundamental challenge in image generation, that of the controllability of identity factors. This allows us to generate images as we want them to look. said Hua.\u201d<\/p>\n<h3>Synthesizing faces<\/h3>\n<p>How do you take the power to synthesize realistic images of flowers or birds a step further? You look at human faces. Human faces, when taken in the context of identity, are among the most sophisticated images that can be captured in nature. In <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/pdf\/1803.11182.pdf\">Toward Open-Set Identity Preserving Face Synthesis<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, presented this month at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/cvpr2018.thecvf.com\/\">CVPR 2018<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in Salt Lake City, the researchers developed a GAN-based framework that can disentangle the identity and attributes of faces, with attributes including such intrinsic properties as the shapes of noses and mouths or even age, as well as environmental factors, such as lighting or whether makeup was applied to the face. While previous identity preserving face synthesis processes were largely confined to synthesizing faces with known identities that were already contained in the training dataset, the researchers developed a method of achieving identity-preserving face synthesis in open domains \u2013 that is, for a face that fell outside any training dataset. To do this, they landed upon a unique method of using one input images of a subject that would produce an identity vector and combined it with any other input face image (not of the same person) in order to extract an attribute vector, such as pose, emotion or lighting. The identity vector and the attribute vector are then recombined to synthesize a new face for the subject featuring the extracted attribute. Notably, the framework does not have to annotate and categorize the attributes of any of the faces in any way. It is trained with an asymmetric loss function to better preserve the identity and stabilize the machine learning aspects. Impressively, it can also effectively leverage massive amounts of unlabeled training face images (think random facial images) to further enhance the fidelity or accuracy of the synthesized faces.<\/p>\n<blockquote><p><strong>\u201cWhat we have here is technology that can synthesize faces while preserving the identity of the faces we generate, in a controllable fashion.\u201d \u2013 Gang Hua<\/strong><\/p><\/blockquote>\n<h3>Say cheese!<\/h3>\n<p>One obvious consumer application is the classic example of the photographer\u2019s challenge of taking a group photo that includes dozens of subjects; the common objective is the elusive ideal shot in which all subjects are captured eyes open and even smiling. \u201cWith our technology, the great thing is that I could literally render a smiling face for each of the participants in the shot!\u201d exclaims Hua. What makes this utterly different from mere image editing, says Hua, is that the actual identity of the face is preserved. In other words, although the image of a smiling participant is synthesized \u2013 a \u201cmoment\u201d that did not in fact occur in reality, the face is unmistakably that of the individual; his or her identity has been preserved in the process of altering the image.<\/p>\n<p>Hua sees many useful applications that will benefit society and sees constant improvements in image recognition, video understanding and even the arts.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A pair of groundbreaking papers in computer vision open new vistas on possibilities in the realms of creating very real-looking natural images and synthesizing realistic, identity-preserving facial images. In CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training, presented this past October at ICCV 2017 in Venice, the team of researchers from Microsoft and the University of [&hellip;]<\/p>\n","protected":false},"author":37074,"featured_media":490760,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194471],"tags":[],"research-area":[13562],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-490736","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-vision","msr-research-area-computer-vision","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[247949],"related-projects":[],"related-events":[488849],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"480\" height=\"280\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/06\/GangIdentity_CV_Carosel_06_2018_480x280.jpg\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/06\/GangIdentity_CV_Carosel_06_2018_480x280.jpg 480w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/06\/GangIdentity_CV_Carosel_06_2018_480x280-300x175.jpg 300w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/>","byline":"","formattedDate":"June 14, 2018","formattedExcerpt":"A pair of groundbreaking papers in computer vision open new vistas on possibilities in the realms of creating very real-looking natural images and synthesizing realistic, identity-preserving facial images. In CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training, presented this past October at ICCV 2017 in Venice,&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/490736","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37074"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=490736"}],"version-history":[{"count":8,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/490736\/revisions"}],"predecessor-version":[{"id":491180,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/490736\/revisions\/491180"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/490760"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=490736"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=490736"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=490736"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=490736"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=490736"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=490736"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=490736"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=490736"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=490736"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=490736"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=490736"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}