Deep Agents, Part 2: Building a Local Code Debugger with Gemma 4 E4B

Learn how to build an autonomous Python code review agent using LangChain's Deep Agents framework and a locally running model
By Boris Delovski • Updated on Apr 15, 2026
blog image

In the first part of this series, we looked at what Deep Agents are and how their four core components solve the problems that cause basic agent loops to break down on complex tasks. In this article, we move from theory to practice and build something concrete. We are going to build a fully autonomous code review agent that reads Python files, identifies bugs and anti-patterns, writes a corrected version, and produces a structured report. For this, we will use Deep Agents as the agent harness, Gemma 4 E4B as the model, and Ollama to serve the model locally.

Running a Model Locally with Ollama

Before writing any agent code, we need a model to run it. Ollama is an open-source tool that handles the entire lifecycle of working with local models. It handles downloading weights, managing quantized variants, loading them into memory, and exposing them over a REST API  for us. As mentioned previously, in this article we will use Gemma 4 E4B as our model of choice. Gemma 4 E4B is Google DeepMind's efficient four-bit variant, designed to run on 8 GB of RAM with no GPU required. Ollama is going to make performing inference with it locally completely frictionless, and enable us to divert our attention to our agentic system and making sure that it works as intended. You can download Ollama here:

https://ollama.com/download/windows

After you download it, for whatever platform you are working on, to download the Gemma 4 E4B model you need to run the following in your terminal to download the model and prepare it for use:

ollama pull gemma4:e4b

The model should start downloading immediately. To ensure that you have successfully downloaded the model run the following command in the terminal:

ollama list

This command tells Ollama to list all available models. If you successfully downloaded the model, you should see it in the list of available models.

An Intentionally Buggy Python File

Our goal is to implement an autonomous agent capable of code analysis and automated debugging, concluding with a summary report of all resolved issues. However, before creating this agent, we need to first generate some buggy code that we can test it on. We will create a Python file with multiple bugs spanning resource management, scoping, naming, error handling, and style. The important design decision is that the file contains no comments or hints, meaning that the agent has to discover every issue through genuine analysis before it even gets to fixing those mistakes. Let's create this Python file:

from pathlib import Path

BUGGY_CODE = '''
import os, sys, json

USER_DATA = []

def load_users(filepath):
    f = open(filepath)
    data = json.loads(f.read())
    USER_DATA = data
    return data

def get_user(id):
    for user in USER_DATA:
        if user["id"] == id:
            return user

def divide(a, b):
    return a / b

def process_users():
    users = load_users("users.json")
    for i in range(len(users)):
        user = users[i]
        print("Processing: " + str(user))
'''

Path("sample_code.py").write_text(BUGGY_CODE.strip(), encoding="utf-8")

The bugs that appear in this file are:

  • multiple imports on the same line
  • the file handle is opened but never closed
  • USER_DATA = data creates a local variable instead of updating the module-level list (missing global)
  • parameter named id shadows Python's built-in
  • get_user falls off the end without returning or raising, so the caller gets None silently
  • divide crashes with ZeroDivisionError if b is 0
  • for i in range(len(users)) is an anti-pattern in Python
  • String concatenation with + instead of an f-string

All of these are real bugs or anti-patterns you would find in production code. The file is syntactically valid, meaning it will compile without errors. The bugs are logical and stylistic, which means the agent cannot rely on a simple syntax check to find them.

Article continues below

Giving the Agent a Custom Tool

Deep Agents ships with built-in tools for all common file operations. These include:

  •  ls, read_file
  •  write_file
  •  edit_file
  •  glob
  •  grep

These cover everything the agent needs to read the source file, write a corrected version, and save a report. The only capability that is not built in is syntax-checking Python code, which is domain-specific, so we add one custom tool.

import py_compile

def run_syntax_check(filepath: str) -> str:
    """
    Run a Python syntax check (py_compile) on the given file.
    Pass the exact path of the file you want to validate.
    """
    path = Path(filepath)
    if not path.exists():
        return f"ERROR: '{filepath}' not found."
    try:
        py_compile.compile(str(path), doraise=True)
        return f"Syntax OK - '{path}' compiles without errors."
    except py_compile.PyCompileError as e:
        return f"Syntax error in '{path}':\n{e}"

Deep Agents automatically converts any Python function with a docstring into a tool the model can call. The function name becomes the tool name, and the docstring becomes the description the model reads to decide when and how to use it. The type hints are used to build the tool's JSON schema.

Let's take a deeper look at the code above. The function we have defined is going to check whether a Python file is free of syntax errors without actually running it using py_compile. It is a module built into Python's standard library that can parse a .py file and report any syntax problems. The function takes a single argument, called filepath, which is a string containing the path to the file you want to check. It returns a string describing the result. That return type matters here because the function is designed to be called by an AI agent, which can only read text back from a tool, and cannot interpret Python objects or exceptions directly. 

The first thing the function does is convert the raw string path into a Path object, then checks whether that file actually exists on disk. If it does not, it returns an error message immediately rather than letting py_compile throw a confusing exception. If the file does exist, it calls py_compile.compile with doraise=True. That flag is important. By default, py_compile silently swallows errors and returns without raising anything, which would make it impossible to detect failures. With doraise=True, it raises a PyCompileError if the file has a syntax problem, which the except block catches and formats into a readable error message.

If no exception is raised, the file parsed successfully and the function returns a confirmation string. If an exception is raised, it returns a description of the syntax error instead.

Writing the System Prompt

The system_prompt parameter in create_deep_agent is appended to the library's built-in system prompt. You do not need to re-explain the built-in tools, as the default prompt already teaches the model when and how to use ls, read_file, write_file, write_todos, and so on. Your custom prompt should contain three things. First, the agent's role. Second, domain-specific instructions about what to produce. And third, documentation for any custom tools you have added. This is what our system prompt is going to look like.

SYSTEM_PROMPT = """You are an expert Python code reviewer and debugger. 

Analyze the target file for bugs, anti-patterns, and style issues.
Write the corrected version as `sample_code_fixed.py`.
Validate it compiles using `run_syntax_check` (pass the exact file path).

## run_syntax_check
Call this tool with the path to a Python file to verify it compiles without syntax errors via `py_compile`.

## Report format
Code Review Report — <filename>

Summary
<2–3 sentences on overall quality>

Issues Found
Critical
<issue + line reference>
Major
<issue + line reference>
Minor / Style
<issue + line reference>

Changes Made
<describe each change and why>

Verdict
<rating: Poor / Fair / Good / Excellent, and key takeaway>"""

This system prompt defines a specialized persona for an AI agent tasked with autonomous Python code quality control. It instructs the model to identify bugs and style violations, generate a corrected file, and perform a mandatory compilation check using a provided syntax tool. Finally, it enforces a structured reporting format to ensure the agent delivers consistent, categorized feedback and a definitive quality rating for every reviewed file.

Notice what is absent from the prompt. There is no workflow section walking through steps like "first use ls, then use read_file". The built-in system prompt already contains detailed instructions for these tools, and re-specifying them wastes tokens and can confuse the model with conflicting instructions.

Understanding the Backend

This is the most important concept in the implementation, and it is the one most likely to cause silent failures if you get it wrong. 

The default backend is StateBackend, which stores everything in memory as part of the LangGraph agent state. It works like a temporary scratchpad, meaning that the agent can write and read files freely during a session, but everything disappears the moment the thread ends. This is the safest default because it gives the agent a place to work without touching anything on disk.

FilesystemBackend is for local development. It maps the agent's file tools directly to a real directory on your machine, so when the agent writes a file, it actually appears on disk. The virtual_mode=True option adds path sandboxing to prevent the agent from reading or writing outside the directory you specify, but there is no deeper process-level isolation, which is why it is not recommended for production.

StoreBackend is for situations where you need the agent's work to survive beyond a single session. It backs the filesystem with a database, so files written in one run can be read back in a later one. This is useful for long-running workflows or agents that are meant to accumulate knowledge over time.

SandboxBackend provides proper execution isolation by running the agent inside a container through services like Modal, Daytona, or Deno. Unlike FilesystemBackend, which only restricts paths, a sandbox restricts what the process itself can do. This is the appropriate choice for production deployments where the agent is working with untrusted input or sensitive environments.

CompositeBackend lets you combine the others by routing different file paths to different backends. For example, you could send anything written to a /memory/ path to StoreBackend for persistence, while keeping everything else in the ephemeral StateBackend. It is useful when a single backend does not fit the whole workflow.

In this example, we are using a locally deployed model to debug code found in a file we created and saved on our disk, so our backend of choice will be the FilesystemBackend.

from deepagents.backends import FilesystemBackend

local_backend = FilesystemBackend(root_dir=".", virtual_mode=True)

Creating and Running the Agent

With the tool and the backend defined, creating the agent is straightforward and boils down to just a few lines of code:

from deepagents import create_deep_agent

agent = create_deep_agent(
    model="ollama:gemma4:e4b",
    tools=[run_syntax_check],
    system_prompt=SYSTEM_PROMPT,
    backend=local_backend
)

In the code above, create_deep_agent returns a compiled LangGraph graph, a stateful and checkpointed execution loop. The library automatically adds all built-in tools on top of whatever you provide in tools. Your system_prompt is appended to the library's default. The backend wires up the file tools to your actual working directory.

Running the agent is also very straightforward, and consists of defining a prompt and sending it to the agent:

USER_REQUEST = (
    "Review `sample_code.py`, fix all bugs and anti-patterns, "
    "write the corrected version as `sample_code_fixed.py`, and produce a "
    "structured review report called `review_report.md` that you will write to the same folder as the fixed file."
)

result = agent.invoke(
    {"messages": [{"role": "user", "content": USER_REQUEST}]}
)

print(result["messages"][-1].content)

agent.invoke runs the full ReAct loop synchronously. The input is a standard chat message dictionary. Under the hood, the loop works exactly as described in Part 1. The model reads the system prompt and user message, decides which tool to call, runs the tool, appends the result, and reasons again. A typical run for this task involves 5 to 10 tool calls, moving from ls to read_file to write_todos to write_file (fixed code) to run_syntax_check to write_file (report) and finally a closing message.

If you are running this code on a CPU, it might take a while for the agent to finish working. On the other hand, even though Gemma 4 E4B was designed so that it can run without GPU acceleration, it can still take advantage of a compatible GPU if you have one. Running the agent on a high-end consumer GPU instead of on a CPU will cut down the time needed for the agent to fix the file and generate a report from a few minutes to less than 10 seconds.

Inspecting the Results

After the agent finishes, two new files should exist on disk, sample_code_fixed.py and review_report.md. This is what the fixed code looks like:

import json
from typing import Dict, Any, List, Optional

# Global state management is discouraged. We will pass the loaded data structure
# instead of relying on a global variable.

def load_users(filepath: str) -> Optional[Dict[str, Dict[str, Any]]]:
    """
    Loads user data from a JSON file and converts it into a dictionary
    keyed by user ID for efficient lookup.

    :param filepath: The path to the users JSON file.
    :return: A dictionary {user_id: user_data}, or None if loading fails.
    """
    try:
        # Use 'with' statement for safe file handling (ensures file closure)
        with open(filepath, 'r') as f:
            data: List[Dict[str, Any]] = json.load(f)
    except FileNotFoundError:
        print(f"Error: The file '{filepath}' was not found.")
        return None
    except json.JSONDecodeError:
        print(f"Error: The file '{filepath}' contains invalid JSON.")
        return None
    except Exception as e:
        print(f"An unexpected error occurred while loading users: {e}")
        return None

    # Anti-pattern fix: Convert list to dictionary for O(1) lookup efficiency
    user_dict: Dict[str, Dict[str, Any]] = {user["id"]: user for user in data if "id" in user}
    return user_dict

def get_user(user_dict: Dict[str, Dict[str, Any]], user_id: str) -> Optional[Dict[str, Any]]:
    """
    Retrieves a user by ID from the pre-loaded user dictionary.
    Uses O(1) complexity lookup.

    :param user_dict: Dictionary containing user data keyed by ID.
    :param user_id: The ID of the user to retrieve.
    :return: The user's data dictionary, or None if not found.
    """
    return user_dict.get(user_id)

def divide(a: float, b: float) -> Optional[float]:
    """
    Divides two floating-point numbers, handling division by zero.

    :param a: The numerator.
    :param b: The denominator.
    :return: The result of a / b, or None if b is zero.
    """
    if b == 0:
        print("Warning: Division by zero attempted.")
        return None
    return a / b

def process_users(filepath: str) -> None:
    """
    Loads users, processes them, and simulates work.
    """
    users_dict = load_users(filepath)
    if users_dict is None:
        print("Processing aborted due to loading error.")
        return

    print(f"Successfully loaded {len(users_dict)} users.")

    # Modern Python iteration: Direct iteration is preferred over range(len(list))
    for user_id, user in users_dict.items():
        # Using f-strings for cleaner output formatting
        print(f"Processing user with ID: {user_id}")

# Example usage section (assuming users.json exists and has the right structure)
if __name__ == "__main__":
    # NOTE: This requires a valid 'users.json' file in the execution directory.
    # Example dummy setup (uncomment if you want to test structure):
    # with open("users.json", "w") as f:
    #     json.dump([{"id": "user1", "name": "Alice"}, {"id": "user2", "name": "Bob"}], f)

    # Simulate calling the main process function
    process_users("users.json")

    # Load users_dict directly into the main scope for the examples below
    users_dict = load_users("users.json")

    if users_dict is not None:
        test_user = get_user(users_dict, "user1")
        if test_user:
            print(f"\nRetrieved user1 data: {test_user}")

    result = divide(10.0, 2.0)
    print(f"10.0 / 2.0 = {result}")

    result_zero = divide(10.0, 0.0)
    print(f"10.0 / 0.0 = {result_zero}")

So as you can see, Gemma did fix the mistakes.  By implementing a with open(...) context manager, the agent resolved a potential resource leak where file handles were previously left open. It also fortified the script against runtime crashes by adding a zero-division guard in the divide function and replacing silent failures with an explicit dict.get() call, ensuring more predictable return types. The refactoring also prioritized performance and readability through more "Pythonic" idioms. The  range(len()) loop was replaced with a direct .items() iteration, and multiple imports were split across separate lines for clarity. To prevent unexpected side effects, the agent eliminated the risky global state entirely, opting for structured data passing and renaming the id parameter to user_id to avoid shadowing Python's built-in functions. Finally, the agent modernized the script's output by replacing clunky string concatenation with efficient f-strings.

However, if you inspect the markdown report thoroughly you will notice it reads more like a general refactoring summary than a thorough accounting of every specific bug that was found and addressed. So while Gemma did solid work on the actual code, its report was incomplete. This is a common characteristic of smaller LLMs. While they are increasingly proficient at recognizing and fixing patterns in code, their reasoning density and instruction-following persistence often hit a ceiling during multi-step tasks.

Finally, it is worth verifying the syntax of the fixed file independently, not because you do not trust the agent, but because agents are probabilistic and your validation code is deterministic.

try:
    py_compile.compile("sample_code_fixed.py", doraise=True)
    print("Syntax OK.")
except py_compile.PyCompileError as e:
    print(f"Syntax error remains: {e}")

This is the same check the agent ran via run_syntax_check. Running it again here is a trust-but-verify pattern that should be standard practice for any agent-generated code, and in this case it produces "Syntax OK." verifying the syntax of the fixed file.

Practical Considerations When Using Deep Agents

In conclusion, Deep Agents serves as an opinionated harness that handles the heavy lifting of planning, file I/O, and context management out of the box. This architecture allows developers to focus on the creative aspects of agent design: selecting the right model, crafting domain-specific tools, and refining system prompts. However, success depends heavily on a proper understanding of the backend.  The most common mistake is using the default StateBackend when you intended FilesystemBackend. The agent runs, reports success, but no files appear on disk because everything was written to memory and discarded.

Efficiency also requires avoiding tool redundancy. Developers should resist the urge to build custom functions that mirror built-in capabilities, such as creating a specialized Python file reader when a general file tool already exists. Providing multiple ways to perform the same task forces the model to choose, often leading to inconsistent results. Instead, leverage the backend to ensure built-in tools interact seamlessly with your specific file structures.

Remember that, when working with smaller models, prompt engineering becomes a critical safety rail. Because these models are prone to shortcuts, using explicit, forceful instructions is essential for compliance. This, combined with independently verifying the output that the agent produced, is the only way to ensure the agent's creative solutions align with functional reality.

Boris Delovski

Data Science Trainer

Boris Delovski

Boris is a data science trainer and consultant who is passionate about sharing his knowledge with others.

Before Edlitera, Boris applied his skills in several industries, including neuroimaging and metallurgy, using data science and deep learning to analyze images.