Microsoft Research Blog

English

Collaborative Quest Completion with LLM-driven Non-Player Characters in Minecraft

August 1, 2024

The use of generative AI in video game development is on the rise, and as the conversational and other capabilities of large language models continue to improve, we expect LLM-driven non-player characters (NPCs) to become widely deployed. In this paper, we seek to understand how…
m3: Accurate Flow-Level Performance Estimation using Machine Learning

August 1, 2024

Data center network operators often need accurate estimates of aggregate network performance, such as the frequency of poor tail latency events, to guide network configuration -- when and where to add capacity as a function of increased load, which network congestion control algorithm to use…
ECBD: Evidence-Centered Benchmark Design for NLP

August 1, 2024

Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which datasets to include, which metrics to use) that often rely on tacit, untested assumptions about what the benchmark is intended to measure or is actually…
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

August 1, 2024 | Emre Kiciman, Robert Osazuwa Ness, Amit Sharma, and Chenhao Tan

The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. We conduct a "behavorial" study of LLMs to benchmark their capability in…
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation

August 1, 2024

Recent advancements in Large Language Models (LLMs) have revolutionized decision-making by breaking down complex problems into more manageable language sequences referred to as ”thoughts”. An effective thought design should consider three key perspectives: performance, efficiency, and flexibility. However, existing thought can at most exhibit two…
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

August 1, 2024

Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context. Yet the mechanisms underlying this contextual grounding remain unknown, especially in situations where contextual information contradicts factual knowledge stored in the parameters, which LLMs also excel at recalling.…
Tabularis Revilio: Converting Text to Tables

August 1, 2024 | Mukul Singh, Sumit Gulwani, Vu Le, and Gust Verbruggen

Copying tables from documents and applications without proper tabular support, like PDF documents, web pages or images, surprisingly remains a challenge. In this paper, we present Revilio, a novel neurosymbolic system for reconstructing tables when their column boundaries have been lost. Revilio addresses this task…
Let’s Fix this Together: Conversational Debugging with GitHub Copilot

August 1, 2024

Despite advancements in IDE tooling, code understanding, generation, and automated repair, debugging continues to present significant challenges. Existing debugging strategies available to developers in literature are often too mechanical and rigid for day-to-day issues. Recent advances in Large Language Models (LLMs) promise practical solutions that…
LordNet: An efficient neural network for learning to solve parametric partial differential equations without simulated data

August 1, 2024

Neural operators, as a powerful approximation to the non-linear operators between infinite-dimensional function spaces, have proved to be promising in accelerating the solution of partial differential equations (PDE). However, it requires a large amount of simulated data, which can be costly to collect. This can…
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

August 1, 2024

We present AutoGen, an open-source framework that allows developers to build LLM applications by composing multiple agents to converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools.…
Commentary: Productivity implications for generative AI role-based prompts as a networked hermeneutic

August 1, 2024 | Sean Rintel

Commentary for Membership categorisation, sociological description and role prompt engineering with ChatGPT - William Housley, Patrik Dahl, 2024 As Housley and Dahl (2024) demonstrate, role-based prompts for Generative AI (GenAI) systems are based on vernacular resources of membership categorization and action description, representing a networked…
Efficient Policy-Rich Rate Enforcement with Phantom Queues

August 1, 2024

Rate enforcement is routinely employed in modern networks (e.g. ISPs rate-limiting users traffic to the subscribed rates). In addition to correctly enforcing the desired rates, rate-limiting mechanisms must be able to support rich rate-sharing policies within each traffic aggregate (e.g. per-flow fairness, weighted fairness, and…

No results