AutoGen: Paving the Way for LLM Agents with Multi-Agent Chats

7 min readNov 27, 2023

AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools. By enabling the creation of customizable and conversable agents, AutoGen allows for seamless human participation within these multi-agent dialogues. In real-world scenarios, this framework can be leveraged to automate complex tasks by orchestrating cooperative interactions among multiple AI agents and humans. For instance, in a software development setting, AutoGen could facilitate collaborative problem-solving by autonomously coordinating the efforts of various AI agents alongside human developers, thereby enhancing productivity and accelerating the resolution of development challenges.

AutoGen as a travel agent

How AutoGen works

Main Features

AutoGen enables building next-gen LLM applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.
AutoGen provides a drop-in replacement for the openai.Completion and openai.ChatCompletion as an enhanced inference API. It allows easy performance tuning, utilities like API unification & caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc.
AutoGen is powered by collaborative research studies from Microsoft, Penn State University, and the University of Washington.

Architecture

AutoGen framework simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses. It enables building next-gen LLM applications based on multi-agent conversations with minimal effort.

AutoGen abstracts and implements conversable agents designed to solve tasks through inter-agent conversations. Specifically, the agents in AutoGen have the following notable features:

Conversable: Agents in AutoGen are conversable, which means that any agent can send and receive messages from other agents to initiate or continue a conversation
Customizable: Agents in AutoGen can be customized to integrate LLMs, humans, tools, or a combination of them.

The figure below shows the built-in agents in AutoGen.

Various Findings and Use Cases

Various use cases that can be implemented with AutoGen can be found here.

But AutoGen can be used in a far more extensive manner. I have explored some of the ways you can use external tools and non-LLM models with AutoGen and have provided my findings below.

Multi-Agent Collaboration (>3 Agents):

AutoGen can use any number of agents that can work in tandem to greatly improve the quality of your results. Here I have used AutoGen to analyse and critique its own analysis of the financial data of the Fortune 500 companies. 2 agents in a GroupChat, one working as an analyst and the other as a critic provide me with the information I need.

Performing tasks that require the agent to perform external actions with AutoGen.

AutoGen is quite constrained to the environment it runs in. But with a little tweaking, you can modify it enough to interact with the external environment. In the following demo, I used AutoGen to generate an email for me on some topic and then automatically send that email to someone by logging into my GMail account.

Now you can extend this interaction further by using LLM agents that can use external APIs. I will elaborate upon that in the following use case.

Using non-OpenAI models or locally hosted LLMs to run inference with AutoGen.

AutoGen by default uses OpenAI for generating its results. But it is neither necessary nor preferable to do so all the time. Below I have a small demo where I use the lmsys/vicuna-7b-v1.5–16k and THUDM/chatglm2–6b models for running inference. I used those 2 models to create a simple Flask REST endpoint. The models are in a GroupChat where the former develops the REST API and the latter tests it by sending POST requests to that API.

To know more about using non-OpenAI models you can read this.

Now, using the above method you can allow AutoGen access to external APIs by creating agents that run on LLMs that can access the internet like Gorilla LLM.

Direct tool usage without running a new LLM Locally

It is simply too computationally expensive to add on a new LLM whenever you need to access external APIs. AutoGen has the functionality to add tooling for your agents. So here I have a small example where I call upon an external API without creating an entirely new LLM. In the following demo, I called 2 APIs, one that can generate an image based upon a prompt and another that can describe an image provided a URL.

Using those 2 APIs, I tasked AutoGen to optimize both the original prompt that I provided it with, and the image that it generates using that prompt. These 2 APIs are then used to complete my task by a few agents in a GroupChat.

You can similarly provide any tool to AutoGen for completing your particular task.

Achieving multimodality with AutoGen

Combining all the functionalities displayed above we can give AutoGen a semblance of multimodality that can take everything including text, image, audio, and video as input and similarly give them all back as outputs. The following diagram is a rough sketch of how one can go about doing the same.

Code Execution by the Assistant Agent.

AutoGen claims that only the UserProxyAgent can execute code but I have observed instances where that is not the case. Following is the demo for the same.

Now, while it has the capability to do so, I believe it is most prudent to have a separation of capabilities among the Assistant and UserProxy agents and only give some of them the capability to execute code unless absolutely necessary.

Weaknesses

Most math-related tasks I tried to challenge AutoGen with were solved quite easily by the WolframAlpha integration it provides. So we will keep that aside while talking about AutoGen.

Now onto the coding side of things.

Simple coding tasks in Python we done quite easily to no one’s surprise as the inference results were generated by gpt-3.5-turbo-16k.
But it failed to execute the code that it generated in most other languages, even if the generated code was correct, due to the lack of proper interaction that it is running within by the UserProxyAgent. You will have to write code to provide it with the tools necessary to run any language other than Python and Bash.
Coming onto the tools, unlike AutoGPT, in AutoGen you explicitly code every single tool that is needed to fulfill your requirements.
In a majority of cases, as shown in the above demos, even when specifically directed to save any code or output it generated within a file, it repeatedly failed to do so, only saving its results to a file very few times. For e.g., in the demo where I created the REST API, I tasked it to save the results of each query into a text file. But it only saved the Python code.
To put it simply, while the inference, flow of logic, and interaction between the different models of AutoGen is quite good, the way it interacts with its environment natively is very limited.
Now as for tasks that require web access, we need to put very specific instructions including the URL of the site we want to interact with. Even then the results are subpar at best with few exceptions. Once again, you will need to write a tool for it to do that.
Onto prompts that require the generated code to call other models, such as text, image, or video generation, the results were more often than not subpar, filled with errors, and most importantly, full of hallucinations. Once again, AutoGen itself cannot be fully blamed for this, as the inference results were generated by the gpt-3.5-turbo-16k model. However, the limited web and external API access provided by AutoGen and its constrained environmental interaction do not make the task any easier.
To mitigate results from the previous kinds of prompts we can do the following:

AutoGen allows running multi-LLM inference.
We can assign one or more of the agents to run on models that allow external API access such as Gorilla LLM.
Trials using the above-mentioned methods produced much better results than simply using OpenAI for similar inference tasks.

We can do various tasks that are not generally possible by using AutoGen directly, by using appropriate combinations of models, keeping in mind that all models need to be LLMs of some kind that can interact via text with the UserProxyAgent.

Some Closing Notes

You could also choose to optimize the model further for enhanced inference by following the directions given here.
For further fine-tuning the models follow the instructions given here.
AutoGen, by default, uses OpenAI as the base LLM for inference.
Finally, for more information on using AutoGen with non-OpenAI models or with running inference with local LLMs, you can do so by following the instructions provided here.

Conclusion

AutoGen has been designed to enable the creation of LLM applications driven by multi-agent conversations. This innovative system allows for the orchestration and automation of complex LLM workflows, with agents that are both customizable and conversable, seamlessly integrating human participation. AutoGen’s potential shines in real-world applications where it can streamline complex tasks by orchestrating interactions between multiple AI agents and humans. For example, it could revolutionize the software development process by autonomously coordinating various AI agents and human developers. However, despite its strengths, the framework displays limitations in executing code in languages other than Python, environmental interactions, and tasks requiring web access. While AutoGen’s underlying LLM, such as gpt-3.5-turbo-16k, gpt-4, etc., might be at the root of some of these issues, incorporating multiple LLMs with varying outputs and modalities, has been shown to enhance performance.