Dataset Source Code
VISOR
Benchmarking Spatial Relationships in Text-to-Image Generation—Spatial understanding is a fundamental aspect of computer vision and integral for human-level reasoning about images, making it an important component for grounded language understanding. While recent large-scale text-to-image synthesis…
Microsoft Research Blog
Frontiers of multimodal learning: A responsible AI approach
New evaluation methods and a commitment to continual improvement are musts if we’re to build multimodal AI systems that advance human goals. Learn about cutting-edge research into the responsible development and use of multimodal AI…
Publication