<?xml version="1.0"?>
<oembed><version>1.0</version><provider_name>Microsoft Research</provider_name><provider_url>https://www.microsoft.com/en-us/research</provider_url><author_name>Neel Joshi</author_name><author_url>https://www.microsoft.com/en-us/research/people/neel/</author_url><title>A Dynamic Benchmark for Image Understanding - Microsoft Research</title><type>rich</type><width>600</width><height>338</height><html>&lt;blockquote class="wp-embedded-content" data-secret="VpSD52eujD"&gt;&lt;a href="https://www.microsoft.com/en-us/research/project/image-understanding-benchmark/"&gt;A Dynamic Benchmark for Image Understanding&lt;/a&gt;&lt;/blockquote&gt;&lt;iframe sandbox="allow-scripts" security="restricted" src="https://www.microsoft.com/en-us/research/project/image-understanding-benchmark/embed/#?secret=VpSD52eujD" width="600" height="338" title="&#x201C;A Dynamic Benchmark for Image Understanding&#x201D; &#x2014; Microsoft Research" data-secret="VpSD52eujD" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" class="wp-embedded-content"&gt;&lt;/iframe&gt;&lt;script type="text/javascript"&gt;
/* &lt;![CDATA[ */
/*! This file is auto-generated */
!function(d,l){"use strict";l.querySelector&amp;&amp;d.addEventListener&amp;&amp;"undefined"!=typeof URL&amp;&amp;(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&amp;&amp;!/[^a-zA-Z0-9]/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret="'+t.secret+'"]'),o=l.querySelectorAll('blockquote[data-secret="'+t.secret+'"]'),c=new RegExp("^https?:$","i"),i=0;i&lt;o.length;i++)o[i].style.display="none";for(i=0;i&lt;a.length;i++)s=a[i],e.source===s.contentWindow&amp;&amp;(s.removeAttribute("style"),"height"===t.message?(1e3&lt;(r=parseInt(t.value,10))?r=1e3:~~r&lt;200&amp;&amp;(r=200),s.height=r):"link"===t.message&amp;&amp;(r=new URL(s.getAttribute("src")),n=new URL(t.value),c.test(n.protocol))&amp;&amp;n.host===r.host&amp;&amp;l.activeElement===s&amp;&amp;(d.top.location.href=t.value))}},d.addEventListener("message",d.wp.receiveEmbedMessage,!1),l.addEventListener("DOMContentLoaded",function(){for(var e,t,s=l.querySelectorAll("iframe.wp-embedded-content"),r=0;r&lt;s.length;r++)(t=(e=s[r]).getAttribute("data-secret"))||(t=Math.random().toString(36).substring(2,12),e.src+="#?secret="+t,e.setAttribute("data-secret",t)),e.contentWindow.postMessage({message:"ready",secret:t},"*")},!1)))}(window,document);
//# sourceURL=https://www.microsoft.com/en-us/research/wp-includes/js/wp-embed.min.js
/* ]]&gt; */
&lt;/script&gt;
</html><thumbnail_url>https://www.microsoft.com/en-us/research/wp-content/uploads/2024/06/spatialunderstanding-66734655c31e1.jpg</thumbnail_url><thumbnail_width>1798</thumbnail_width><thumbnail_height>395</thumbnail_height><description>We have created a procedurally generatable, synthetic dataset for testing spatial reasoning, visual prompting, object recognition and detection. A key question for understanding multimodal model performance is how well is can understand images, in particular basic vs. detailed spatial understanding of images. These capabilities are needed for models to be used in real-world tasks, such [&hellip;]</description></oembed>
