An objective comparison of LLM Agents

RUiNtheExtinct
10 min readNov 5, 2023

--

There are quite a few LLM agents available today. Some of the most prominent ones are AutoGPT, AutoGen, BabyAGI, and OpenAgents. This article aims to provide a side-by-side comparison of those models and in which use cases we should and should not use them.

AutoGen

AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools. By enabling the creation of customizable and conversable agents, AutoGen allows for seamless human participation within these multi-agent dialogues.

Autogen Workflow

In real-world scenarios, this framework can be leveraged to automate complex tasks by orchestrating cooperative interactions among multiple AI agents and humans. For instance, in a software development setting, AutoGen could facilitate collaborative problem-solving by autonomously coordinating the efforts of various AI agents alongside human developers, thereby enhancing productivity and accelerating the resolution of development challenges.

AutoGen GroupChat workflow

Finally with the integration of external tools, that can be implemented similarly to OpenAI functions, the capabilities of AutoGen can be greatly extended.

AutoGen as a Travel Assistant

AutoGPT

AutoGPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, chains together LLM “thoughts”, to autonomously achieve whatever goal you set. As one of the first examples of GPT-4 running fully autonomously, AutoGPT pushes the boundaries of what is possible with AI.

When given a task to perform, AutoGPT sets goals and constraints for itself. It queries an LLM (OpenAI by default) to achieve its goals. For knowledge that is not available to the LLM, it has the capability to browse the internet to gather the data that is required to complete its goals.

AutoGPT workflow

Finally, AutoGPT has support for plugins that allow it to perform many complex tasks such as sending emails, posting on Twitter or Instagram, etc.

AutoGPT as a personal email response service

BabyAGI

BabyAGI is an example of an AI-powered task management system. The system uses OpenAI and vector databases such as Chroma or Weaviate to create, prioritize, and execute tasks. The main idea behind this system is that it creates tasks based on the result of previous tasks and a predefined objective. The script then uses OpenAI’s natural language processing (NLP) capabilities to create new tasks based on the objective, and Chroma/Weaviate to store and retrieve task results for context.

Credits: https://github.com/yoheinakajima/babyagi

Fundamentally, BabyAGI is less of an LLM agent and more of a smart to-do list. Although it has something called “Skills” that can be implemented to give it agent capabilities, it is far less intuitive to use than AutoGen functions. Recently it was integrated with Langchain Agents to give it the capability to use Langchain tools.

OpenAgents

Most language agent frameworks aim to facilitate the construction of proof-of-concept language agents while neglecting the non-expert user access to agents and paying little attention to application-level designs. OpenAgents was built as an open platform for using and hosting language agents in the wild of everyday life.

OpenAgents enables general users to interact with agent functionalities through a web UI optimized for swift responses and common failures, while offering developers and researchers a seamless deployment experience on local setups, providing a foundation for crafting innovative language agents and facilitating real-world evaluations. We elucidate both the challenges and promising opportunities, aspiring to set a foundation for future research and development of real-world language agents.

OpenAgents comprises three separate agents:

  • Data Agent for data analysis with Python/SQL and data tools.
Credits: https://github.com/xlang-ai/OpenAgents
  • Plugins Agent with 200+ daily tools.
Credits: https://github.com/xlang-ai/OpenAgents
  • Web Agent for autonomous web browsing.
Credits: https://github.com/xlang-ai/OpenAgents

OpenAgents also provides functionality to add needed agents as we like by following these instructions.

Installation

The first thing that we need to compare and contrast the LLM agents on is their installation process.

For AutoGen the process is much less convoluted than that of AutoGPT, especially if you need plugin support in AutoGPT, as that has not been integrated in the master branch yet, only in releases higher than 0.4.1.

So full installation of AutoGPT along with AutoGPT-Plugins is quite an involved process unlike with AutoGen which just requires a pip package to be installed.

That is because even following the all instructions provided in the documentation does not guarantee the installation of all the dependencies needed by AutoGPT plugins.

On the other hand, BabyAGI is quite easy to install, as at its very roots, it's just a single Python script that uses OpenAI to create and prioritize tasks. Its base model only has web scraping, and to-do list creation functionalities as added Skills.

OpenAgents, although currently supported only on Linux and Mac, with some trouble with the dependencies, can be installed on Windows as well. It has excellent documentation on how to go about installing it and unlike AutoGPT following these instructions lets us run OpenAgents seamlessly.

OpenAI Usage

While both AutoGen and AutoGPT need OpenAI keys for their inference, AutoGen provides out-of-the-box support for non-OpenAI or locally hosted LLMs. To learn how to do that you can go here.

AutoGPT does not do the same and while in AutoGen we can configure what API is being called for inference by changing the OAI_CONFIG_LIST file, AutoGPT has no way to do the same and calls the OpenAI API internally.

For completely free usage of AutoGPT via locally hosted LLMs, one has two options:

  • You can directly make changes in your local copy of the AutoGPT codebase. An example of how that was done can be seen here where AutoGPT was used alongside keldenl/gpt-llama.cpp. It boils down to adding base_url to openai_base_url, and adjusting the dimensions of the vector to match that of the local LLM. Many such forks of AutoGPT can be found that run on non-OpenAI models, each with its challenges.
  • You can run AutoGPT within a docker container and proxy the OpenAI API calls from within the container to the API for your locally hosted LLM.

BabyAGI can use either OpenAI or any LLaMA-based models with only slight modifications to code and environment variables thus making it highly customizable.

For using OpenAgents with a different LLM, you can follow the instructions provided here.

External Tool/API Usage

AutoGPT can directly browse the internet, this is the first major advantage that AutoGPT has over AutoGen

Next AutoGPT has inbuilt plugins support while AutoGen does not. You will need to explicitly program every tool that you want AutoGen to use.

Now keeping those two points in mind, there are still circumstances where using AutoGen is preferable over AutoGPT, and a few where that is not the case.

In the case of BabyAGI, we have 2 options:

  • Skills: BabyAGI natively supports something called skills that we can implement to extend its functionality. The current iteration of BabyAGI has a limited number of skills that are available natively including a web scraping skill, a file and directory creation skill, a file reading skill, a DALL-e based image generation skill, and a music player skill. New skills can be added but the process of doing so is quite unintuitive and at that point, we are better off just using AutoGen functions.
  • Langchain Agents: BabyAGI has been integrated with Langchain which allows us to create and use Langchain tools along with it.

Finally, OpenAgents also has plugin support, but unlike AutoGPT, it has a much wider range (>200) of plugins to select from. Its web integration plugin is much more powerful than the one provided by AutoGPT. While AutoGPT can just browse the web, OpenAgents can — using the XLang Web Agent plugin — interact with your browser directly and perform tasks for you.

So it boils down to this, almost anything AutoGPT can do, OpenAgents can do better. The only place where AutoGPT is ahead of OpenAgents is in tasks requiring interaction with the user's local machine and that too can be remedied by creating a new OpenAgents plugin for it.

Use Cases and Applications

Now given the differences that we have come across for the different agents, we can decide upon what niches each Agent fulfills.

Places where AutoGPT can be used:

  • For use cases like question answering, browsing the web for collection and summarization of information, code generation on larger scales like creating an entire website, etc, AutoGPT is the better choice for its greater access to information.
  • It is preferable to use AutoGPT over AutoGen when we need to interact with the user’s local system as AutoGPT has much greater freedom to interact with the environment that it is running in and can execute commands like creating new project directories, creating web apps from the cli like next-app, etc.
  • AutoGen does not have that same level of freedom. This allows it to run in a much safer manner but also constrains it a lot.

Places where AutoGen can be used:

  • For tasks that require a level of precision and consistency over multiple executions, it is preferable to use AutoGen because as mentioned earlier, we have to explicitly program the tools that we want AutoGen to use but that leads to greater consistency and control on the results of the execution.
  • In AutoGPT tool usage and input, generation are all done by themselves so we have no control over the process, thus AutoGen is preferable in those scenarios.
  • Also, there are no flags in AutoGPT that we can trigger that will guarantee calling of the particular that runs the tool that you want to use, plugins are called based upon the context of the prompt that was provided, thus there is no guarantee that a plugin will be called.
  • For any use case that making API calls to our custom domains, we have no control over the input format that AutoGPT generates based upon our context even if we explicitly provide it with the format in our prompt, so the safer choice would be to use AutoGen in such scenarios.

One more thing that is as of yet unique to AutoGen is that it supports multi-agent chat and collaboration. So we can create different agents, each with their own roles and set of tools. We can even make the different agents call different LLM models.

  • Thus with this feature, tasks that require a team with a separation of responsibilities such as an app development team with dedicated developers, testers, QAs, etc., can be simulated.
  • Finally given that each agent can be connected to different models, multimodality can be achieved using AutoGen.

Now I think that BabyAGI is best used as a task orchestration tool rather than an LLM agent. What I mean by this is that BabyAGI is quite good at creating subtasks for any given task and prioritizing them. It also has the capability to run indefinitely as long as the tasks are incomplete. Using those 2 facts we can configure BabyAGI to create the subtasks for our use case and leave the actual execution to other agents like AutoGen, OpenAgents, etc.

Finally for OpenAgents, any of the use cases that can be performed by AutoGPT, OpenAgents can probably do it better. OpenAgents also has 3 separate agents, each of which can perform unique actions. Although we have the option to use any LLM, I got the best results when I used GPT-4 with the 3 agents.

Data Agent is a comprehensive toolkit designed for efficient data operations. It provides capabilities to search, handle, manipulate, and visualize data. With its proficiency in writing and executing code, Data Agent simplifies a wide range of data-centric tasks. In my experience, the data-searching capabilities of this agent seemed quite lacking. Here are a few use cases for the same:

  • Search and combine multiple datasets to train your custom model.
  • SQL and Python code generation and execution.
  • Data visualization.
  • Image Processing.

Plugins Agent seamlessly integrates with over 200 third-party plugins, each handpicked to enrich various facets of your daily life. With these plugins at its disposal, the agent empowers you to tackle a wide range of tasks and activities more efficiently. Multiple plugins can be used at once. New plugins can be added as needed for your application. Most of the plugins available are mostly for obscure shopping websites with few exceptions like WolframAlpha, Zapier, etc.

Web Agent harnesses the power of a Chrome extension to navigate and explore websites automatically. This agent streamlines the web browsing experience, making it easier to find relevant information, access desired resources, and so on. It can be used to automate web tasks. Here are a few use cases for the same:

  • Travel assistant.
  • Sentiment analysis on comments sections of any service.
  • Automated content generation and posting on social media.

A side-by-side comparision

Conclusion

In evaluating the capabilities of current LLM agents, each agent showcases distinct strengths tailored to specific use cases. AutoGPT excels in broad tasks and interaction with local environments, while AutoGen offers precise execution, multi-agent collaboration, and customizable tool integration. BabyAGI, while primarily a task orchestration tool, effectively creates and prioritizes sub-tasks, making it ideal for broader project management. Meanwhile, OpenAgents stands out with its extensive plugin support, enabling a more comprehensive and versatile approach to tasks, solidifying its edge over other agents for diverse applications as well as amazing browser interaction that can be used for automation. As technology progresses, the choice of agent will hinge on the specificity of the task at hand and the desired level of autonomy and customization.

--

--