What are AI Agents exactly?
When ChatGPT first made its debut, the field of artificial intelligence was abuzz with talk about the new generation of AI assistants. However, over the past year, attention has begun to shift towards a new goal: AI Agents.
At Google's annual I/O conference in May this year, Google highlighted its newly launched AI Agent called "Astra," with which users can interact using audio and video. Furthermore, before Google, OpenAI's newly launched GPT-4o model is also referred to as an AI Agent.
This is not just hype (although there is certainly some element of hype involved), tech companies are investing huge sums of money to create AI Agents, and the research work of these companies may bring about the kind of truly useful artificial intelligence we have been hoping for over the past few decades.
Many industry insiders, including Ultraman, have stated that "AI Agents will be the next industry focus." But what are AI Agents, and how should we use them?
How to define "AI Agent"?In fact, the industry's research on artificial agents is still in its infancy, and there has not yet been a clear definition for them. "Simply put, they are essentially artificial intelligence models and algorithms that can make decisions autonomously in a dynamic world," said Jim Fan, a senior research scientist at Nvidia and the head of the artificial agent project.
Advertisement
The grand vision of artificial agents is a system capable of performing a multitude of tasks, akin to a "human assistant." In the future, it could help you plan your vacation, remember whether you prefer luxury hotels, and book one in a four-star or higher-rated hotel; then, it would also suggest the most suitable flights for your schedule and plan your itinerary according to your preferences; it could compile a list of personal items to bring based on your travel plans and weather forecasts; it might even send your travel itinerary to your good friends and invite them to travel together; and in the workplace, it could analyze your to-do list and execute tasks, such as sending meeting invitations, memos, and emails.
"Multimodality" is one of the important visions for artificial agents, which means they can handle language, audio, and video, among others. For example, in Google's demonstration, users can point their smartphone camera at various objects and ask Astra questions, and the artificial agent can respond to text, audio, and video inputs.
"These artificial agents can also make the processes of businesses and public organizations smoother," said David Barber, the director of the UCL Center for Artificial Intelligence.
For instance, artificial agents might be able to act as more complex customer service robots. Current language model-based assistants can only generate the next likely word in a sentence and form sentences, while artificial agents will have the ability to autonomously process natural language commands and handle customer service tasks without supervision. Additionally, artificial agents will be able to analyze customer complaint emails, then know how to check the customer's order number, access databases such as customer relationship management and delivery systems, to see if the complaint is legitimate, and handle it according to the company's policy.Broadly speaking, there are two different types of artificial intelligence entities: Software Agents and Embodied Agents, said Jim Fan. "Software Agents run on computers or mobile phones and use applications, which are very useful for office work, sending emails, or completing a series of such activities."
Embodied Agents are entities located in a 3D world (such as in computer games) or in robots. Embodied Agents allow people to interact with non-player characters controlled by artificial intelligence, making video games more engaging. These types of agents can also help build more useful robots that assist people with everyday household tasks, such as folding clothes and cooking.
Jim Fan's team has built an artificial intelligence entity (MineDojo) in a computer game called "Minecraft." This artificial intelligence entity can learn new skills and tasks by utilizing a vast amount of data collected from the internet, explore freely in a virtual 3D world, and complete a series of complex tasks, such as fencing camels or shoveling lava into a bucket. After all, computer games can simulate the real world, requiring the agent to understand physics, reasoning, and common sense.
Researchers at Princeton University, in a new paper that has not yet been peer-reviewed, have stated that artificial intelligence entities often have three different characteristics: if an artificial intelligence system can pursue difficult goals unguided in complex environments, they are considered agents; or if they can act autonomously under the guidance of natural language without supervision, they are also considered agents; finally, the term "agent" also applies to systems that can use tools such as web searches or programming, as well as systems capable of planning.Artificial agents are not a new phenomenon.
Chirag Shah, a professor of computer science at the University of Washington, stated that the term "artificial agents" has actually existed for many years, but it has meant different things at different times.
"There have been two waves of artificial agents, and the current wave is mainly attributed to the prosperity of language models and the rise of ChatGPT," Jim Fan pointed out, "The previous wave was in 2016 when Google DeepMind launched AlphaGo, a powerful Go artificial intelligence system capable of making decisions and formulating strategies. AlphaGo mainly relied on reinforcement learning, a technology that rewards artificial intelligence algorithms for making ideal behaviors."
Oriol Vinyals, Vice President of Research at Google DeepMind, said, "But these artificial agents are too 'single-minded'. In other words, these agents are only created to complete a specific task, such as AlphaGo, which only knows how to play Go. In contrast, the new generation of artificial agents based on foundational models makes agents more general because they can learn from the world of human interaction."
"You would feel that this model is interacting with the world and then giving you better answers or better assistance, etc.," said Oriol Vinyals.What are the current limitations?
However, there are still many unresolved questions that need to be answered at this stage. Qiu Kanjun, the CEO and founder of the AI startup Imbue, is committed to developing intelligent agents capable of reasoning and coding. She compares the current state of intelligent agents to self-driving cars from more than a decade ago. In her view, although today's AI agents can accomplish certain tasks, they are not reliable and still lack true autonomy.
Qiu Kanjun said, "For example, a coding intelligent agent can generate code, but sometimes it makes mistakes, and it doesn't know how to test the code it is creating. Therefore, humans still need to be involved in this process. AI systems still cannot fully achieve reasoning, which is a crucial step in operating in the complex and ambiguous human world."
Jim Fan added, "We are still far from having an intelligent agent that can automate all these household chores for us. Current systems may have hallucinations, and they do not always strictly follow instructions, which is obviously troublesome."In addition to this, there is another limitation: artificial agents may completely forget their work content after a period of time. It is important to understand that artificial intelligence systems are limited by their context window, which means the amount of data they can think about is limited.
"ChatGPT can write code, but it does not handle longer content well. However, for human developers, what they need to refer to is the entire GitHub code repository, which contains dozens or even hundreds of lines of code. Obviously, this is no pressure at all for humans," said Jim Fan.
To address this limitation, Google has improved its model's ability to process data, allowing users to interact with them for longer periods and remember more of past interactions. Google claims to be working to make its context window infinitely large in the future.
For embodied intelligent agents like robots, there are even more limitations. Researchers do not yet have enough training data to train them and have only just begun to harness the power of robot base models.
Therefore, amidst all the hype and excitement, it is important to note that human research on artificial agents is still in its early stages, and it may take several years for us to fully experience their potential.Can you experience artificial intelligence agents now?
In fact, to some extent, you may have already experienced their early prototypes, such as OpenAI's ChatGPT and GPT-4. "If you are interacting with software that feels intelligent, then it is actually an agent," said Qiu Kanjun.
"For now, the best agents we have are specialized, purpose-specific systems, such as coding assistants, customer service robots, or workflow automation software like Zapier, etc., which are far from general artificial intelligence agents capable of performing complex tasks," she added.
She said, "Today we have these computers, which are really powerful, but we have to 'micro-manage' them."
"For example, OpenAI's ChatGPT plugin allows people to create artificial intelligence assistants for web browsers, which is an attempt at an agent, but these systems are still clumsy, unreliable, and unable to reason," said Qiu Kanjun.Despite this, Qiu Kanjun believes that these systems will one day change the way humans interact with technology, a trend that people need to pay attention to.
"It's not that we suddenly have a general artificial intelligence, but rather, my computer can do more things than it could five years ago," she said.