Deep Agents, Part 1: What They Are and How They Work

Learn what Deep Agents from LangChain are and how do they work.

Most AI agents work the same way. A model gets a message, decides if it needs to use a tool, uses it, looks at the result, and repeats until it has an answer. For simple tasks, this works fine. For anything complicated, it breaks down fast.

A Hands-On Guide to Experimenting with AI Agents

The problem is that this basic loop has no real structure. Give it a long research task or a messy codebase problem and it will lose track of what it was doing, run out of context window, or just start making things up. There's nothing telling it to slow down, make a plan, or save its work as it goes.

Deep Agents, introduced by LangChain as part of the deepagents library, are built to solve exactly this. They add structure on top of the standard loop with a planning tool, a virtual filesystem for offloading large outputs, and the ability to spin up isolated subagents for specific pieces of work. In this article, we are going to explain what they are and how they work. In a follow-up article, we will cover how to use the library to create a Deep Agent.

What Is a Deep Agent

A Deep Agent is what you get when you move beyond the basic AI agent pattern of an LLM calling tools in a loop. That basic setup works fine for simple tasks but falls apart on anything that requires planning, multiple steps, or managing a lot of information over time. Deep Agents handle this by combining four built-in capabilities:

planning tool
virtual filesystem
subagent spawning
automatic context compression

Before doing any work, the agent will use the planning tool to write out a task list, break the goal into smaller steps, and track what has been done and what still needs to happen. This keeps the agent oriented on long tasks and stops it from losing track of where it is.

The virtual filesystem enables the agent to save outputs to storage as it works, rather than trying to hold everything in memory at once. When a tool returns a large result, the agent writes it to a file and reads back only what it needs, which keeps the context window from filling up too quickly.

Using a built-in tool, the main agent can also hand off a specific piece of work to a fresh subagent instance. That subagent runs on its own, completes the task, and returns a summary. The main agent only sees the final result, not all the steps that produced it, which keeps its own context clean.

Finally, when the context window gets close to its limit, the agent automatically summarizes older parts of the conversation and saves the full history to the filesystem. Essentially, it performs automatic context compression. This enables the session to keep going without hitting token limits while still giving the agent the ability to look up earlier details if it needs them.

The Model Context Protocol (MCP): A Universal Standard for AI Agent Integration

Article continues below

Want to learn more? Check out some of our courses:

Introduction to Python for Engineers

Learn More

Agentic AI

Learn More

Apache Spark and Data Stream Processing: A Crash Course

Learn More

The Planning Tool

Before doing any work, the agent will use the planning tool to write out a task list, break the goal into smaller steps, and track what has been done and what still needs to happen. This ensures the agent doesn't get lost when working on complex tasks.

The planning tool is called write_todos. It takes a list of steps as input and writes them to a structured task list that the agent can read and update throughout its run. That task list looks something like this:

write_todos(todos=[
    "Load the dataset and inspect its structure",
    "Check for missing values across all columns",
    "Compute descriptive statistics for numeric columns",
    "Identify outliers using the IQR method",
    "Compute the correlation matrix",
    "Summarise findings and write the report"
])

Every time the agent completes a step, it marks it done. Every time it starts a new one, it knows exactly where it left off. This matters because the standard agent loop has no concept of progress. It only sees the conversation so far. On a short task, that is enough. On a longer one, the model has to infer where it is from the history of messages, which becomes unreliable as the context grows. The planning tool replaces that inference with an explicit record. That way the agent always knows its current position in the plan.

The Virtual Filesystem

The virtual filesystem enables the agent to save outputs to storage as it works, rather than trying to hold everything in memory at once. A standard ReAct agent keeps every tool result in the conversation history. On a short task with small outputs, this is manageable. On a longer task the tool results accumulate quickly. After several steps the context is dense with intermediate output, leaving less and less room for the model to reason clearly about what to do next.

The virtual filesystem breaks this pattern. Instead of the conversation holding every result, the agent offloads large outputs to files as it goes. The context stays lean. What remains in the conversation is the agent's reasoning, its current step, and whatever small pieces of output it actually needs to act on. Everything else sits on disk until it is needed.

Every Deep Agent gets a set of filesystem tools built in:

write_file
read_file
edit_file
ls
grep
glob

The tools are available the moment the agent is created. Each tool handles a specific part of the read-write workflow. Together they give the agent a working directory where it can write intermediate results, read them back selectively, search through content without loading all of it, and update files as findings change. None of this requires any configuration. Let's explain what each tool does.

write_file takes a path and content and saves the result to the agent's working directory. The agent uses this whenever a tool returns output that is large enough to fill a significant portion of the context window. Conversely, read_file retrieves the contents of a previously saved file. The agent calls this when it needs a result it wrote earlier, rather than recomputing it.

ls lists the files and directories at a given path. The agent uses this to orient itself in the working directory, particularly after context compression when it needs to recall what it has already produced.

grep searches a file for a pattern and returns matching lines. This lets the agent extract a specific value from a large file without loading the whole thing into context. This tool is often used in conjunction with glob, which finds all files matching a pattern, which can be very useful when the agent needs to locate files it created earlier without knowing the exact name.

Finally, edit_file performs a targeted replacement inside an existing file. The agent uses this to update a result without rewriting the entire file from scratch.

GraphRag: How Knowledge Graphs Make Your LLMs Dramatically Smarter

Subagent Spawning

Using a built-in tool, the main agent can hand off a specific piece of work to a fresh subagent instance. On a multi-part task this makes the main agent a pure coordinator. It is responsible for understanding the goal, delegating the pieces, and assembling the results. The detailed work belongs to the subagents.

The tool that does this is called task. When the main agent calls it, a new agent instance spins up with its own context window, its own tool access, and a single focused objective. It runs its full loop, including planning and tool calls, then returns a concise summary. Everything that happened inside the subagent stays there. The main agent receives one result, not a transcript of everything that produced it.

Calling task looks similar to calling any other tool. The main agent passes a description of the work it wants done, and optionally a system prompt to give the subagent additional context about how to approach it. This is an example of how calling task looks like:

task(
    description=(
        "Load the file at sales_data.csv. "
        "Compute descriptive statistics for all numeric columns. "
        "Detect outliers using the IQR method. "
        "Return a plain-text summary of the findings."
    ),
    system_prompt=(
        "You are a data analyst. "
        "Be concise. Return findings only."
    )
)

In the code above, description is the subagent's objective. It should be specific enough that the subagent can complete the task without needing to ask clarifying questions. Vague descriptions produce vague summaries, which the main agent has no way to look deeper into.

The system prompt is optional but useful. It tells the subagent how to behave, what format to return results in, and what to prioritise. Without it, the subagent starts with no context about the main agent's broader goal and will apply its own defaults.

One thing worth noting is that the subagent does not share state with the main agent. It starts with a clean context, so file paths and anything else it needs to know must be included in the description explicitly. The description and system_prompt are the only levers for controlling what the subagent does and what it returns.

Automatic Context Compression

On a long task, the conversation history between the agent and its tools grows continuously. Every tool call, every result, every reasoning step adds to it. Left unchecked, this fills the context window and stops the task cold. Automatic context compression prevents this by summarising older parts of the conversation and writing the full history to the filesystem, so the agent can keep working without starting over.

Deep Agents compress context automatically. When the context reaches a threshold, the framework compresses the oldest portion of the conversation into a short summary, saves the full history to a file in the working directory, and continues with the summary in place of the original turns. Let's take a look at an example. Let's assume that our model generated the following todos:

write_todos(todos=[
    "Load sales_data.csv and inspect its structure",
    "Compute summary statistics for all numeric columns",
    "Detect outliers using the IQR method",
    "Compute pairwise correlations for numeric columns",
    "Analyse categorical columns and their value distributions",
    "Write the final report to report.html"
])

Based on these todos, our agent is going to work on small tasks each turn until it finishes all of the tasks in the todos list. That will look similar to this:

[turn 1]  user: analyse sales_data.csv and produce a report
[turn 2]  agent: calls write_todos with 6 steps
[turn 3]  tool: todos written

[turn 4]  agent: calls load_csv_info
[turn 5]  tool: returns 400 lines of column metadata
[turn 6]  agent: calls write_file, saves output to structure.txt
[turn 7]  tool: file written
[turn 8]  agent: calls grep on structure.txt to verify column types
[turn 9]  tool: returns matching lines
[turn 10] agent: marks step 1 complete, calls update_todos
[turn 11] tool: todos updated

...        (steps 2 and 3 continue across turns 12–27)

[turn 28] agent: calls compute_correlations
[turn 29] tool: returns 500-line correlation matrix
[turn 30] agent: calls write_file, saves output to correlations.txt
[turn 31] tool: file written, step 4 of 6 in progress

As you can see, we have already gone through 31 turn, but we still have two more tasks to solve. At this point, based on how we setup our agent, automatic context compression might kick in, and compress what the agent has worked on up until now, so it ends up looking like this instead:

[summary] The agent is analysing sales_data.csv. It has completed steps 1–3:
          loaded the dataset (12,000 rows, 18 columns) and saved structure to
          structure.txt, computed summary statistics saved to summary_stats.txt,
          and detected outliers across all numeric columns saved to outliers.txt.
          Step 4 is in progress: computing pairwise correlations.
[turn 31] tool: file written, step 4 of 6 in progress

The full history up to the compression point is written to the working directory. A summary replaces it in context. The most recent turns are kept as-is, so the agent retains full detail about what it was just doing. The task list and all files in the virtual filesystem are unaffected throughout.

A standard ReAct agent has no equivalent mechanism. When it hits the context limit mid-task, the task fails. A Deep Agent is not racing against that limit. As the conversation grows the framework trims it, and work continues. Combined with the virtual filesystem, this makes large intermediate outputs a non-issue: the agent writes them to files before they accumulate in context, and if the conversation itself gets long, compression handles that too.

After a compression event, the agent can still recover anything from earlier in the task. The full history file in the working directory contains every turn in order, and the agent can use read_file or grep to retrieve specific results without loading the whole thing. The task list and saved output files are never compressed away at all, they live in the filesystem independently of the conversation. This is what makes compression safe: the agent does not need the conversation history to know where it is or what it has already produced.

How to Run Serverless GPU AI with Modal

From Theory to Practice

Deep Agents represent a meaningful step beyond the standard tool-calling loop. By combining structured planning, a virtual filesystem, subagent delegation, and automatic context compression, they address the core limitations that cause basic agents to fall apart on complex, multi-step tasks.

None of these ideas are individually radical. Planning, delegation, and memory management are well-understood concepts in software engineering. What makes the pattern worth paying attention to is how it packages them into a single coherent framework that an LLM can operate within autonomously, without the developer having to wire up each piece by hand.

That said, understanding the architecture is only half the picture. Knowing that a Deep Agent can plan, offload results, and spawn subagents doesn't tell you much about what it looks like to actually build one, or where the rough edges show up in practice. In the next article, we'll move from theory to implementation, demonstrating how to set up a Deep Agent using LangChain's deepagents library, configure its tools, and run it on a real task end to end.

Deep Agents, Part 1: What They Are and How They Work

What Is a Deep Agent

Want to learn more? Check out some of our courses:

The Planning Tool

The Virtual Filesystem

Automatic Context Compression

From Theory to Practice

Data Science Trainer

Boris Delovski

How to Process Text Data with NLTK and Python

How to Use MusicGen for Music Generation

Intro to Programming: What Are Booleans in Python?

Deep Agents, Part 1: What They Are and How They Work

What Is a Deep Agent

Want to learn more? Check out some of our courses:

The Planning Tool

The Virtual Filesystem

Automatic Context Compression

From Theory to Practice

Data Science Trainer

Boris Delovski

Read Next

How to Process Text Data with NLTK and Python

How to Use MusicGen for Music Generation

Intro to Programming: What Are Booleans in Python?