AutoGen: Enabling next-generation large language model applications

Published

“Capabilities like AutoGen are poised to fundamentally transform and extend what large language models are capable of. This is one of the most exciting developments I have seen in AI recently.”

Doug Burger, Technical Fellow, Microsoft
Figure 1 shows three shaded boxes, each containing symbols that represent AutoGen agents and the large language models, tools, and humans that comprise them, and illustrates how AutoGen agents can converse to solve tasks.
Figure 1. AutoGen enables complex LLM-based workflows using multi-agent conversations. (Left) AutoGen agents are customizable and can be based on LLMs, tools, humans, and even a combination of them. (Top-right) Agents can converse to solve tasks. (Bottom-right) The framework supports many additional complex conversation patterns.

It requires a lot of effort and expertise to design, implement, and optimize a workflow that can leverage the full potential of large language models (LLMs). Automating these workflows has tremendous value. As developers begin to create increasingly complex LLM-based applications, workflows will inevitably grow more intricate. The potential design space for such workflows could be vast and complex, thereby heightening the challenge of orchestrating an optimal workflow with robust performance.

AutoGen is a framework for simplifying the orchestration, optimization, and automation of LLM workflows. It offers customizable and conversable agents that leverage the strongest capabilities of the most advanced LLMs, like GPT-4, while addressing their limitations by integrating with humans and tools and having conversations between multiple agents via automated chat.

Ideas: Exploring AI frontiers with Rafah Hosn

Energized by disruption, partner group product manager Rafah Hosn is helping to drive scientific advancement in AI for Microsoft. She talks about the mindset needed to work at the frontiers of AI and how the research-to-product pipeline is changing in the GenAI era.

With AutoGen, building a complex multi-agent conversation system boils down to:

  • Defining a set of agents with specialized capabilities and roles.
  • Defining the interaction behavior between agents, i.e., what to reply when an agent receives messages from another agent.

Both steps are intuitive and modular, making these agents reusable and composable. For example, to build a system for code-based question answering, one can design the agents and their interactions as in Figure 2. Such a system is shown to reduce the number of manual interactions needed from 3x to 10x in applications like supply-chain optimization (opens in new tab). Using AutoGen leads to more than a 4x reduction in coding effort.

Figure 2 illustrates an example workflow with dotted-line relationships between three AutoGen agents—Commander, Writer, and Safeguard—and how the agents work together to answer code-based questions from users. (opens in new tab)
Figure 2. An example workflow to address code-based question answering in supply-chain optimization (opens in new tab). The Commander receives user questions and coordinates with the Writer and Safeguard. The Writer crafts the code and interpretation, the Safeguard ensures safety, and the Commander executes the code. If issues arise, the process can repeat until resolved. Shaded circles represent steps that may be repeated multiple times.

Capable, conversable, and customizable agents – integrating LLMs, humans, and tools

AutoGen agents have capabilities enabled by LLMs, humans, tools, or a mix of those elements. For example:

One straightforward way of using built-in agents from AutoGen is to invoke automated chat between an assistant agent and a user proxy agent. As an example (Figure 3), one can easily build an enhanced version of ChatGPT + Code Interpreter + plugins, with a customizable degree of automation, usable in a custom environment and embeddable in a bigger system. It is also easy to extend their behavior to support diverse application scenarios, such as adding personalization and adaptability based on past interactions (e.g., automated continual learning (opens in new tab), teach agents new skills (opens in new tab)).

Figure 3 shows the details of a chat between an assistant agent and a user proxy agent to illustrate how AutoGen automates such chats, while seamlessly engaging humans or using tools as needed to complete complex tasks.
Figure 3. A user proxy agent and assistant agent from AutoGen can be used to build an enhanced version of ChatGPT + Code Interpreter + plugins. The assistant agent plays the role of an AI assistant like Bing Chat. The user proxy agent plays the role of a user and simulates users’ behavior such as code execution. AutoGen automates the chat between the two agents, while allowing human feedback or intervention. The user proxy seamlessly engages humans and uses tools when appropriate.

The agent conversation-centric design has numerous benefits, including that it:

  • Naturally handles ambiguity, feedback, progress, and collaboration.
  • Enables effective coding-related tasks, like tool use with back-and-forth troubleshooting.
  • Allows users to seamlessly opt in or opt out via an agent in the chat.
  • Achieves a collective goal with the cooperation of multiple specialists.

AutoGen supports automated chat and diverse communication patterns, making it easy to orchestrate a complex, dynamic workflow and experiment with versatility. Figure 4 illustrates a new game, conversational chess (opens in new tab), enabled by AutoGen. Figure 5 illustrates how AutoGen supports group chats (opens in new tab) between multiple agents using another special agent called the “GroupChatManager”.

Figure 4 displays two small chessboards side-by-side, with black and white chess pieces in various positions on each board showing a game in progress, plus a chat between two users, to illustrate how AI, human, or hybrid users can play conversational chess. (opens in new tab)
Figure 4. An example of a new application enabled by AutoGen: conversational chess (opens in new tab). It can support various scenarios, as each player can be an LLM-empowered AI, a human, or a hybrid of the two. It allows players to express their moves creatively, such as using jokes, meme references, and character-playing, making chess games more entertaining to players as well as observers.
Figure 5 shows three shaded boxes, each containing symbols that represent various agents, to illustrate how AutoGen enables dynamic group chats. Each box represents a different step in the three-step process. (opens in new tab)
Figure 5. Overview of how AutoGen enables dynamic group chats (opens in new tab) to solve tasks: We use a special agent called the Manager that repeats the following three steps—select a single speaker (in this case Bob), ask the speaker to respond, and broadcast the selected speaker’s message to all the other agents.

Getting started

AutoGen (opens in new tab) (in preview) is freely available as a Python package. To install it, run

pip install pyautogen

You can quickly enable a powerful experience with just a few lines of code:

import autogen
assistant = autogen.AssistantAgent("assistant")
user_proxy = autogen.UserProxyAgent("user_proxy")
user_proxy.initiate_chat(assistant, message="Show me the YTD gain of 10 largest technology companies as of today.")
# This triggers automated chat to solve the task

Check examples for a wide variety of tasks: https://microsoft.github.io/autogen/docs/Examples/AutoGen-AgentChat (opens in new tab).

Next steps:

AutoGen is an open-source, community-driven project under active development (as a spinoff from FLAML (opens in new tab), a fast library for automated machine learning and tuning), which encourages contributions from individuals of all backgrounds. Many Microsoft Research collaborators have made great contributions to this project, including academic contributors like Pennsylvania State University and the University of Washington, and product teams like Microsoft Fabric and ML.NET. AutoGen aims to provide an effective and easy-to-use framework for developers to build next-generation applications, and already demonstrates promising opportunities to build creative applications and provide a large space for innovation.

Names of Microsoft contributors: 

Chi Wang, Gagan Bansal, Eric Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, Ahmed Awadallah, Ryen White, Doug Burger, Robin Moeur, Victor Dibia, Adam Fourney, Piali Choudhury, Saleema Amershi, Ricky Loynd, Hamed Khanpour, and Ece Kamar

Related publications

Continue reading

See all blog posts