A brain-inspired agentic architecture to improve planning with LLMs

  • Karen Easterbrook, Microsoft; Ida Momennejad, Microsoft

Inspired by human collective cognition and neuroscience, we conducted two studies showing that a) multi-LLM architectures with mixed communication connectivity lead to better collaborative innovation (Artificial Life 2024), and b) brain-inspired multi-LLM architectures improve multi-step reasoning and planning and substantially reduce hallucination (Nature Communications 2025).

Explore more

Transcript

A brain-inspired agentic architecture to improve planning with LLMs

KAREN EASTERBROOK: Joining us from our New York City lab, Ida Momennejad is exploring how neuroscience can inform artificial intelligence. Her research on brain-inspired agentic architectures shows how multiple LLMs can collaborate like neurons in the brain, improving reasoning, planning, and even reducing hallucinations.

This is the kind of curiosity-driven science that lays the foundation for future breakthroughs. Over to you, Ida.

[MUSIC]   

[MUSIC FADES INTO SWEEPING SOUND]

IDA MOMENNEJAD: Hello, I’m Ida Momennejad, principal researcher at Microsoft Research New York City. Today, I will talk to you a little bit about how brain-inspired agentic architectures built with LLMs can improve multi-step reasoning.

Large language models demonstrate impressive performance on a variety of tasks like writing emails or answering questions. But LLMs and agentic AI systems often struggle with tasks that require multi-step reasoning or goal-directed planning, which are necessary skills in many real-world applications.

Think about planning a trip, coordinating a project, or enforcing safety rules in a sequence of steps.Those are problems that require multi-step reasoning and planning. You need to keep track of where you are, what’s allowed, and what you’re trying to achieve. In this way, you take actions and track your goals. These skills are crucial for users, organizations, and Microsoft customers who wish to deploy generative AI in their work.

In our earlier work, we have rigorously shown failure modes of large language models in solving simple multi-step reasoning tasks like navigating a building described in text or passing a message in a network of colleagues.While the models often sound confident, we found that they commonly propose illegal moves and hallucinate paths that get stuck in loops or wander off, detours.

For users and Microsoft customers, improving multi-step reasoning and planning is not just academic; it’s a necessity. This is why we propose a Modular Agentic Planner, or MAP, inspired by the brain to solve these problems.

To address these challenges, we took inspiration from the human brain, in which planning is accomplished via component processes that are predominantly associated with specific brain regions. These processes include task decomposition, task coordination, conflict monitoring, state prediction, and evaluation.

A key insight from our research was that when testing LLMs on individual functions or processes, we find that they are often capable of carrying out these functions in isolation. However, they struggle to autonomously coordinate them in the service of a goal. Inspired by these findings, we designed a brain-inspired architecture with specific brain-inspired modules.

Here is how MAP, or our Modular Agentic Planner, works.

It is designed with functional roles, communication protocols, and iterative algorithmic steps for collaborative problem-solving inspired by the brain. The brain-inspired interactions include the Actor proposing actions in response to a prompt to solve a task; the Monitor checking whether those actions are valid or hallucinations and gating them; the Predictor and Evaluator then taking the actions that passed that first stage and then doing a tree search to predict its future and evaluate whether it’s the right outcome; and the Orchestrator determining when the goals are achieved. This is implemented with specialized prompting of LLM instances where roles and interaction protocols are all inspired by the brain.

What we found is that MAP yields significant improvement over both standard LLMs at zero-shot and multi-shot examples and competitive agentic baselines. For instance, on Tower of Hanoi, we show that MAP improves performance from 11% for GPT-4 Zero-shot to 74% for MAP that’s been designed with all GPT-4 agents. Graph traversal was improved from 50% for our best baseline to 95% on a four-step path. In terms of errors and hallucinations, MAP had 0% invalid moves even in out-of-distribution tasks, whereas the other methods had up to 31% hallucination.

MAP also shows superior transfer learning between tasks and out of distribution. A notable outcome was that when we built MAP with a smaller and more cost-efficient LLM, meaning Llama 70B as opposed to GPT-4, we saw superior performance and transfer across tasks.

So as a summary, MAP can improve accuracy, reliability, and potentially safety in AI systems. In parallel work published in Artificial Life, we explored topologies of multi-agent architectures for collective innovation.

Together, this line of research findings offers a blueprint to advance future end-to-end architectures beyond transformers that readily incorporate MAP’s multi-level and multi-role computations to improve performance, reliability, and safety. This has implications for customers’ real-world needs and their trust way beyond our toy research tasks.

Together, we can use human-centered and brain-inspired methods in the service of mitigating inaccuracies and risks and improving performance of our AI systems with rigorous methods. Thank you.