{"id":979587,"date":"2023-10-25T21:34:34","date_gmt":"2023-10-26T04:34:34","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=979587"},"modified":"2024-02-01T10:08:49","modified_gmt":"2024-02-01T18:08:49","slug":"kahani","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/kahani\/","title":{"rendered":"Kahani"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background  has-background- card-background--full-bleed\">\n\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"720\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Kahani_header_1920x720.jpg\" class=\"attachment-full size-full\" alt=\"Project Kahani header image - young boy looking mesmerized by the camera\" style=\"object-position: 64% 36%\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Kahani_header_1920x720.jpg 1920w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Kahani_header_1920x720-300x113.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Kahani_header_1920x720-1024x384.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Kahani_header_1920x720-768x288.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Kahani_header_1920x720-1536x576.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Kahani_header_1920x720-1600x600.jpg 1600w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/02\/Kahani_header_1920x720-240x90.jpg 240w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/>\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 \">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 class=\"wp-block-heading is-style-default\" id=\"kahani-visual-storytelling\">Kahani: Visual Storytelling<\/h1>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n<p>Image generation models like MidJourney, DALL-E 3, and SDXL, have made remarkable progress recently to produce visually stunning images from natural language descriptions. However, most of these models are limited by their lack of cultural awareness and diversity, and often fail to capture the subtle nuances and variations that exist in different cultures and languages.<\/p>\n\n\n\n<p>To generate an image that matches the user\u2019s expectations and preferences, one has to either provide extensive prompts and edit the output using sophisticated tools like Adobe Photoshop or fine-tune a model using large amounts of data and skills. Both of these approaches are time-consuming, costly, and inaccessible for most people.<\/p>\n\n\n\n<p><strong>Kahani: Visual Storytelling<\/strong> is a research prototype that allows the user to create visually striking and culturally nuanced images just by describing them in their local languages. Kahani leverages state-of-the-art techniques like Inpainting, and models like Segment Anything and GPT-4 vision to generate feedback for the candidate images.<\/p>\n\n\n\n<div class=\"wp-block-media-text has-video  has-vertical-padding-none  is-stacked-on-mobile is-style-border is-style-offset-media--top is-style-offset-media--offset-\"><figure class=\"wp-block-media-text__media video-wrapper\"><div class=\"yt-consent-placeholder\" role=\"region\" aria-label=\"Video playback requires cookie consent\" data-video-id=\"uXyt_E2_myA\" data-poster=\"https:\/\/img.youtube.com\/vi\/uXyt_E2_myA\/maxresdefault.jpg\"><iframe class=\"media-text__video\" data-src=\"https:\/\/www.youtube-nocookie.com\/embed\/uXyt_E2_myA?enablejsapi=1&rel=0\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen aria-hidden=\"true\" tabindex=\"-1\"><\/iframe><div class=\"yt-consent-placeholder__overlay\"><button class=\"yt-consent-placeholder__play\"><svg width=\"42\" height=\"42\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><g fill=\"none\" fill-rule=\"evenodd\"><circle fill=\"#000\" opacity=\".556\" cx=\"21\" cy=\"21\" r=\"21\"\/><path stroke=\"#FFF\" d=\"M27.5 22l-12 8.5v-17z\"\/><\/g><\/svg><span class=\"yt-consent-placeholder__label\">Video playback requires cookie consent<\/span><\/button><\/div><\/div><\/figure><div class=\"wp-block-media-text__content\">\n<h3 class=\"wp-block-heading\" id=\"our-goal\">Our goal?<\/h3>\n\n\n\n<p>To democratize image generation and empower users to express their creativity and identity through visual storytelling.<\/p>\n<\/div><\/div>\n\n\n","protected":false},"excerpt":{"rendered":"<p>Kahani: Visual Storytelling is a research prototype that allows the user to create visually striking and culturally nuanced images just by describing them in their local languages.<\/p>\n","protected":false},"featured_media":1003971,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13554,13559,13568],"msr-locale":[268875],"msr-impact-theme":[261667],"msr-pillar":[],"class_list":["post-979587","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-human-computer-interaction","msr-research-area-social-sciences","msr-research-area-technology-for-emerging-markets","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[1167746],"related-downloads":[],"related-videos":[1003953],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[1023300],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Kalika Bali","user_id":32477,"people_section":"Related people","alias":"kalikab"},{"type":"user_nicename","display_name":"Sameer Segal","user_id":40861,"people_section":"Related people","alias":"sameersegal"}],"msr_research_lab":[199562],"msr_impact_theme":["Empowerment"],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/979587","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":14,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/979587\/revisions"}],"predecessor-version":[{"id":1004055,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/979587\/revisions\/1004055"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1003971"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=979587"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=979587"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=979587"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=979587"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=979587"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}