Microsoft Research Blog

English

  1. Provably Efficient Interactive-Grounded Learning with Personalized Reward 

    May 30, 2024 | Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, and Paul Mineiro

    Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as…

  2. Research Focus: May 27, 2024

    Research Focus: Week of May 27, 2024 

    May 29, 2024

    How can generative AI tools represent less common identities and narratives; Can LLMs help players participate in game narratives; Using LLMs to improve geospatial demographic data; A Graph RAG Approach to Query-Focused Summarization; and more.

  3. Instruction-Guided Visual Masking 

    May 29, 2024

    Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment between specific textual instruction and targeted local region of an image. To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a…

  4. Rich-Observation Reinforcement Learning with Continuous Latent Dynamics 

    May 29, 2024 | Yuda Song, Lili Wu, Dylan J. Foster, and Akshay Krishnamurthy

    Sample-efficiency and reliability remain major bottlenecks toward wide adoption of reinforcement learning algorithms in continuous settings with high-dimensional perceptual inputs. Toward addressing these challenges, we introduce a new theoretical framework, RichCLD (Rich-Observation RL with Continuous Latent Dynamics), in which the agent performs control based on…

  5. Parrot: Efficient Serving of LLM-based Applications with Semantic Variable 

    May 29, 2024

    The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could design complex workflows using multiple LLM requests to accomplish…

  6. Comparison of the 3D neural edge detector with related works

    3D Neural Edge Reconstruction 

    May 29, 2024

    Real-world objects and environments are predominantly composed of edge features, including straight lines and curves. Such edges are crucial elements for various applications, such as CAD modeling, surface meshing, lane mapping, etc. However, existing traditional methods only prioritize lines over curves for simplicity in geometric…

  7. Autodroid: LLM-Powered Task Automation in Android 

    May 29, 2024

    Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or endusers. The recent advance of…

  8. Participation in the age of foundation models 

    May 28, 2024

    Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array of public services. Alongside these opportunities is the risk that these systems reify existing power imbalances and cause disproportionate harm to marginalized communities. Participatory approaches hold…

  9. Think Before You Act: Decision Transformers with Internal Working Memory 

    May 28, 2024

    Large language model (LLM)-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and compute. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training.…