How to make your AI agents remember and forget?

Posted on: 2026-02-25

Like humans, an agent interacting with a large language model has memory constraints. Similarly, some models have larger capacity to retain information and act upon it.

The LLM receives a limited number of tokens describing the memory it can use to reason and generate accurate responses. This capacity is called the context. As of early 2026, it typically ranges from roughly 200,000 to 1 million tokens. What you provide is critical for the precision and relevance of the agent's output.

The context can be broken down as follows:

System Prompt
Tools available
Task state
Last few history of discussions (dialog between user and agent)
Summary (compaction) of long discussion
Other documents and informations

System Prompt

The system prompt varies depending on the information the agent needs to operate effectively. It is essentially the agent’s autobiography: its purpose, rules, beliefs, and identity. It defines its ethos and the core values it brings to the system. This is why I strongly advocate breaking agents into specialized entities, much like human experts: software engineers, designers, managers, painters, electricians, and so on. Specialization allows the agent to operate with clarity and depth in its domain.

Tools Available

The potential tools for an agent can be vast. Some are generic, some highly specialized. Dividing agents into specialized roles helps constrain the tools each carries to only those relevant to the task. This is analogous to a construction worker bringing a drill to a job but leaving the jackhammer behind. Each tool should include a name, description, and guidance on when and how to use it. Limiting the toolset is critical when hundreds or thousands of options exist; otherwise, the LLM can become confused. For instance, knowledge of SQL tools is unnecessary when writing React code.

Task State

Users communicate with a fleet of agents because they have one or more tasks to accomplish. The initial conversation defines the primary goal. Ideally, this task state is rich in information; if not, the agent can fall back to historical context.

History of Discussions

The size of retained history depends on the LLM’s capacity, but you cannot carry hundreds of messages. The most recent messages are the most critical, as they capture the freshest details, similar to human memory.

Summary (Compaction) of Discussions

As dozens of messages accumulate, the agent must synthesize their meaning. Like a human taking notes in a meeting and producing a summary, the agent creates a compact representation that preserves the essential information while discarding minor details. This allows the LLM to maintain task context without exceeding its capacity.

Remember and Forget

As the title suggests, agents must remember and forget. In practice, agents have zero persistent memory; they are bootstrapped each time they interact with the LLM. "Remembering" means ensuring that all relevant information, whether recently received or from long ago, is included in the context for the current task. "Forgetting" means filtering out irrelevant information, guided by both time and task-specific relevance.

What I observe in successful agent deployments mirrors human behavior: break the task into manageable chunks, focus on each, complete it, and move to the next. Large, unwieldy tasks overwhelm both humans and agents, causing details to be lost or conflated. Agents must divide and conquer, orchestrate subtasks, merge results, and surface insights in a structured manner.

Patrick Desjardins Blog