Skip to main content
    __  __     ____           ___                    __  _
   / / / /__  / / /___       /   | ____ ____  ____  / /_(_)________ _
  / /_/ / _ \/ / / __ \     / /| |/ __ `/ _ \/ __ \/ __/ / ___/ __ `/
 / __  /  __/ / / /_/ /    / ___ / /_/ /  __/ / / / /_/ / /__/ /_/ /
/_/ /_/\___/_/_/\____( )  /_/  |_\__, /\___/_/ /_/\__/_/\___/\__,_/
                     |/         /____/
Like this? Get Agentica to make it.

How do you use Agentica?

Prerequisites:
  • Install agentica
  • Add your AGENTICA_API_KEY
There are two main ways to use Agentica. They are:
  • creating an agentic function
  • spawning an agent with the spawn function
See the references for more details.

What can you use Agentica for?

Below are a few examples that we believe highlight some of the best features of Agentica!

Grab and go

Install any prerequisites, copy and off you go.
Prerequisites:
  • Run pip install slack-sdk or uv add slack-sdk
  • Add your SLACK_BOT_TOKEN
Read these instructions to generate a SLACK_BOT_TOKEN !
import os
import asyncio
from agentica import agentic
from slack_sdk import WebClient

SLACK_BOT_TOKEN = os.environ.get("SLACK_BOT_TOKEN")

# We know we will want to list users and send a message
slack_conn = WebClient(token=SLACK_BOT_TOKEN)
send_direct_message = slack_conn.chat_postMessage

@agentic(send_direct_message, model="openai:gpt-4.1")
async def send_morning_message(user_name: str) -> None:
    """
    Uses the Slack API to send a direct message to a user. Light and cheerful!
    """
    ...

if __name__ == "__main__":
    import asyncio

    asyncio.run(send_morning_message('@Samuel'))
    print("Morning message sent!")
Prerequisites:
  • Run pip install matplotlib pandas ipynb jupyter or uv add matplotlib pandas ipynb jupyter
  • Download the CSV and save as /movie_metadata.csv
  • Run jupyter notebook data_science.ipynb
data_science.ipynb
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
    "from agentica import spawn\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
    "agent = await spawn()\n",
    "result = await agent.call(\n",
    "    dict[str, int],\n",
    "    \"Show the number of movies for each major genre. The results can be in any order.\",\n",
    "    movie_metadata_dataset=pd.read_csv(\"./movie_metadata.csv\").to_dict(),\n",
    ")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
    "plt.figure(figsize=(12, 8))\n",
    "plt.bar(list(result.keys()), list(result.values()))\n",
    "plt.xticks(rotation=45, ha='right')\n",
    "plt.tight_layout()\n",
    "plt.show()\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
    "result = await agent.call(\n",
    "    dict[str, int],\n",
    "    \"Update the result to only contain the genres that have more than 1000 movies.\",\n",
    ")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
    "plt.figure(figsize=(12, 8))\n",
    "plt.bar(list(result.keys()), list(result.values()))\n",
    "plt.xticks(rotation=45, ha='right')\n",
    "plt.tight_layout()\n",
    "plt.show()\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
    "name": "ipython",
    "version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Prerequisites:
  • If on macOS, install system dependencies with brew install pkg-config cairo meson ninja
  • Run pip install exa-py validators markdown xhtml2pdf
  • Create an EXA account, create an EXA_SERVICE_API_KEY and run export EXA_SERVICE_API_KEY="<your-key-here>"
"""
Deep Research Demo - Multi-agent research with web search and citations.
"""

import asyncio
import json
import re
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Literal

import markdown
from xhtml2pdf import pisa

from agentica.agent import Agent
from agentica.logging import AgentListener
from agentica.std.caption import CaptionLogger
from agentica.std.web import ExaAdmin, ExaClient, SearchResult

type SourceType = Literal[
    "primary",
    "secondary",
    "vendor",
    "press",
    "blog",
    "forum",
    "unknown",
]

LEAD_RESEARCHER_MODEL = "anthropic:claude-opus-4.5"
SUBAGENT_MODEL = "anthropic:claude-sonnet-4.5"
CITATION_MODEL = "openai:gpt-4.1"

CITATION_PREMISE = """
You are a citation agent.

# Task
You must:
1. Review the research report provided to you as `research_report` line by line.
2. Identify which lines of the research report use information that could be from web search results.
3. List the web search results that were used in creating the research report.
4. For each of these lines, use the `load_search_result` function to load the web search result that was used.
5. Add a markdown citation with the URL of the web search result to the claim in the research report by modifying the `research_report` variable.
6. Once this is done, make sure the `research_report` is valid markdown - if not, change the markdown to make it valid.
7. Use the `save_report` function to save the research report to memory as a markdown file at the end.
8. Return saying you have finished.

# Rules
- Your citations MUST be consistent throughout the `research_report`.
- Any URL in the final markdown MUST be formatted as a markdown link, not a bare URL.
- You MUST use the `list_search_results` function to list the web search results that were used in creating the research report
- You MUST use the `load_search_result` function to load the web search results.
- You MUST use the `research_report` variable provided to you to modify the research report by adding citations.
- You MUST make sure the `research_report` is valid markdown.
- You MUST use the `save_report` function to save the research report to memory at the end.
- You MUST inspect the report before saving it to make sure it is valid and what you intended. Iterate until it is valid.

## Citation format
- Prefer inline citations like: `... claim ... ([source](https://example.com))`
- If multiple sources support a sentence, include multiple links: `... ([s1](...), [s2](...))`
"""

LEAD_RESEARCHER_PREMISE = """
You are a lead researcher. You have access to web-search enabled subagents.

# Task
You must:
1. Create a plan to research the user query.
2. Determine how many specialised subagents (with access to the web) are necessary, each with a different specific research task.
3. Call ALL subagents in parallel using asyncio.gather with return_exceptions=True so partial results are preserved.
4. Summarise the results of the subagents in a final research report as markdown. Use sections, sub-sections, list and formatting to make the report easy to read and understand. The formatting should be consistent and easy to follow.
5. Check the final research report, as this will be shown to the user.
6. Return the final research report using `return` at the very end.

# Rules
- Do NOT construct the final report until you have run the subagents.
- Do NOT return the final report in the REPL until planning, assigning subagents and returning the final report is complete.
- Do NOT add citations to the final research report yourself, this will be done afterwards.
- Do NOT repeat yourself in the final research report.
- You MUST raise an AgentError if you cannot complete the task with what you have available.
- You MUST check the final research report string before returning it to the user.

## Planning
- You MUST write the plan yourself.
- You MUST write the plan before assigning subagents to tasks.
- You MUST break down the task into small individual tasks.

## Subagents
- You MUST assign each small individual task to a subagent.
- For each task, YOU MUST create a **new** SubAgent, and provide it with a task via `.call()`.
- You MUST NOT assign multiple unrelated tasks to the same SubAgent.
- You should only call a SubAgent repeatedly if you feel you failed to get enough information from a single call, instructing them with what they were missing.
- You MUST instruct subagents to use the web_search and save_search_result functions if the task requires it.
- Do NOT ask subagents to cite the web, instead instruct them to use the save_search_result function.
- Subagents MUST be assigned independent tasks.
- IF after subagents have returned their findings more research is needed, you can assign more subagents to tasks.
- DO NOT try to preemptively *parse* the output of the subagents, **just look at the output yourself**.
- Subagents may fail! `asyncio.gather` will raise an exception if any of the subagents fail. Instead, you should pass `return_exceptions=True` to `asyncio.gather` to not lose the results of the successful subagents.

## Final Report
- Do NOT write the final report yourself without running subagents to do so.
- Do NOT add citations to the final research report yourself, this will be done afterwards by another agent.
- Do NOT repeat yourself in the final research report.
- Do NOT return a report with missing information, omitted fields or `N/A` values. If more work needs to be done, you must assign more subagents to tasks, or reuse the necessary subagents to extract more information.
- You MUST load the plan from memory before returning the final research report to check that you have followed the plan.
- You MUST check the final research report before returning it to the user.
- Check the final report for quality, completeness and consistency. If up to standard, return using a single `return` as the sole statement in its very own
- Your final report MUST include a short "Sources consulted" section:
  - List each source URL you relied on
  - Include its source_type and 1-2 extracted claims
- Any URL you include MUST be a markdown hyperlink (not a bare URL).
- Do NOT put the whole report in a table.
"""

SUBAGENT_PREMISE = """
You are a helpful assistant.

# Task
You must:
1. Construct a list of things to search for using the web_search function.
2. Execute ALL web_search calls in parallel using asyncio.gather and asyncio.run.
3. For each search result, `print()` relevant sections using SearchResult.content_with_line_numbers(start=..., end=...).
4. Identify which lines of content you are going to use in your report.
5. Use the save_search_result function to save the SearchResult to memory and include the lines of the content that you have used.
   - Include the specific `query` you searched for.
   - Include `extracted_claims`: a list of short claims you will rely on (derived from the saved lines).
   - Include `source_type`: one of ["primary", "secondary", "vendor", "press", "blog", "forum", "unknown"].
     Use your best judgment based on the URL/domain and the content.
   - IMPORTANT: save_search_result returns a saved artifact path; keep it and include it in SourceInfo.artifact_path
6. Condense the search results into a single report with what you have found.
7. Return the report using `return` at the very end in a separate REPL session.

# Rules
- You MUST use `print()` to print the content of each search result by via SearchResult.content_with_line_numbers().
- You MUST use the web_search function if instructed to do so OR if the task requires finding information.
- Do NOT assume that the web_search function will return the information you need, you must go through the content of each search result line by line by combing through the content with SearchResult.content_with_line_numbers(start=, end=).
- Do NOT assume which lines of content you are going to use in your report, you must go through the content of each search result line by line via SearchResult.content_with_line_numbers(start=, end=).
- If you cannot find any information, do NOT provide information yourself, instead raise an error for the lead researcher in the REPL.
- You MUST save the SearchResult of any research that you have used to memory and include the lines of the content that you have used (are relevant).
- When saving, pass `query`, `extracted_claims`, and `source_type` to save_search_result.
- Your returned SubAgentReport MUST include `sources`: one entry per saved source, including url, source_type, query, extracted_claims, artifact_path, and lines_used.
- Return the report using `return` at the very end in a separate REPL session.
"""

STORAGE_DIR = Path("deep_research_test")

@dataclass
class Storage:
    """Centralized storage for all research artifacts."""

    directory: Path = field(default=STORAGE_DIR)
    _result_counts: dict[int, int] = field(default_factory=dict)

    def __post_init__(self):
        self.directory.mkdir(parents=True, exist_ok=True)

    # Plan

    def save_plan(self, plan: str) -> None:
        """Save the research plan."""
        (self.directory / "plan.md").write_text(plan)

    def load_plan(self) -> str:
        """Load the research plan."""
        path = self.directory / "plan.md"
        if not path.exists():
            raise FileNotFoundError("Plan file not created yet.")
        return path.read_text()

    # Search Results

    def save_search_result(
        self,
        subagent_id: int,
        result: SearchResult,
        lines_used: list[tuple[int, int]],
        *,
        query: str | None = None,
        extracted_claims: list[str] | None = None,
        source_type: SourceType | None = None,
        source_notes: str | None = None,
    ) -> str:
        count = self._result_counts.get(subagent_id, 0) + 1
        self._result_counts[subagent_id] = count

        path = self.directory / f"subagent_{subagent_id}" / f"result_{count}.json"
        path.parent.mkdir(parents=True, exist_ok=True)

        # Extract only the relevant lines
        filtered_lines: list[str] = []
        for start, end in lines_used:
            filtered_lines.extend(result.content_lines[start - 1 : end])

        data = {
            "title": result.title,
            "url": result.url,
            "content_lines": filtered_lines,
            "score": result.score,
            # Rich artifact metadata (kept compatible with SearchResult.load()).
            "saved_at": datetime.now().isoformat(),
            "subagent_id": subagent_id,
            "query": query,
            "lines_used": lines_used,
            "extracted_claims": extracted_claims or [],
            "source_type": source_type,
            "source_notes": source_notes,
        }
        path.write_text(json.dumps(data))
        return str(path)

    def load_search_result(self, path: str) -> SearchResult:
        """
        Load a previously saved search-result artifact (JSON) and return it as a SearchResult.

        Note: artifacts may include extra metadata fields, but SearchResult.load() only uses:
        - title
        - url
        - content_lines
        - score
        """
        p = Path(path)
        if not p.is_relative_to(self.directory):
            raise ValueError(f"Path must be within {self.directory}")
        return SearchResult.load(p)

    def list_search_results(self) -> list[str]:
        """List all saved search result paths."""
        files: list[str] = []
        for subagent_dir in self.directory.glob("subagent_*"):
            if not subagent_dir.is_dir():
                continue
            for file in subagent_dir.iterdir():
                if file.suffix == ".json" and re.match(r"^result_\d+$", file.stem):
                    files.append(str(file))
        return files

    # Report

    def save_report(self, md_report: str) -> str:
        """Save the final report as markdown and PDF."""
        md_path = self.directory / "report.md"
        pdf_path = self.directory / "report.pdf"

        md_path.write_text(md_report)

        try:
            html = markdown.markdown(md_report, extensions=['tables'])
            with pdf_path.open("wb") as pdf:
                pisa.CreatePDF(html, dest=pdf)
        except Exception as e:
            print(f"Warning: PDF conversion failed: {e}")

        return str(pdf_path)

    @property
    def report_path(self) -> Path:
        return self.directory / "report.pdf"

    def report_exists(self) -> bool:
        return (self.directory / "report.md").exists()

    # Summary

    def summary(self) -> str:
        """Return a summary of all stored artifacts."""
        lines = [
            "",
            "━" * 40,
            f"📁 Research stored in: {self.directory.resolve()}",
            "━" * 40,
        ]

        if self.report_exists():
            lines.append(f"📄 Report:  {self.report_path.name}")
            if (self.directory / "report.md").exists():
                lines.append(f"           {(self.directory / 'report.md').name}")

        if (self.directory / "plan.md").exists():
            lines.append("📋 Plan:    plan.md")

        search_results = self.list_search_results()
        if search_results:
            lines.append(f"🔍 Search results: {len(search_results)} files")
            by_subagent: dict[str, list[str]] = {}
            for path in search_results:
                p = Path(path)
                subagent = p.parent.name
                by_subagent.setdefault(subagent, []).append(p.name)
            for subagent, files in sorted(by_subagent.items()):
                lines.append(f"           {subagent}/: {len(files)} results")

        lines.append("━" * 40)
        return "\n".join(lines)

storage = Storage()

class SubAgent:
    """
    A subagent with web search capabilities.
    For each task, a subagent must be **created**, then **run** with `.call()`.
    If a subagent needs to be reused, perhaps because it got something wrong, it
    may be run **again** with a second `.call()`, persisting its history.
    """

    _id: int
    _exa: ExaClient | None
    _agent: Agent
    _initialized: bool

    def __init__(self):
        self._id = 0
        self._exa = None

        async def web_search(query: str) -> list[SearchResult]:
            """Tool: search the web for `query`. Returns a small list of SearchResult objects."""
            print(f"Searching: {query}")
            await self._ensure_init()
            assert self._exa is not None
            return await self._exa.search(query, num_results=2)

        def save_search_result(
            result: SearchResult,
            lines_used: list[tuple[int, int]],
            query: str | None = None,
            extracted_claims: list[str] | None = None,
            source_type: SourceType | None = None,
            source_notes: str | None = None,
        ) -> str:
            """
            Tool: save a SearchResult artifact for later citation/inspection.

            Parameters
            ----------
            result:
                The SearchResult you are using.
            lines_used:
                1-indexed (inclusive) line ranges from result.content_lines that support your claims.
            query:
                The web query you used to find this result (optional but recommended).
            extracted_claims:
                Short bullet claims you will rely on, derived from the saved lines.
            source_type:
                Optional coarse label, e.g. "primary", "secondary", "vendor", "press", "blog", "forum", "unknown".
            source_notes:
                Optional brief notes justifying the label / quality.

            Returns
            -------
            str:
                Path to the saved JSON artifact (within the storage directory).
            """
            return storage.save_search_result(
                self._id,
                result,
                lines_used,
                query=query,
                extracted_claims=extracted_claims,
                source_type=source_type,
                source_notes=source_notes,
            )

        self._agent = Agent(
            model=SUBAGENT_MODEL,
            premise=SUBAGENT_PREMISE,
            scope=dict(
                web_search=web_search,
                save_search_result=save_search_result,
                SearchResult=SearchResult,
                SubAgentReport=SubAgentReport,
                SourceInfo=SourceInfo,
            ),
        )
        self._initialized = False

    async def _ensure_init(self) -> None:
        if self._initialized:
            return
        self._initialized = True

        # Get agent ID from listener
        await self._agent._ensure_init()
        if (listener := self._agent._listener) and listener.logger.local_id:
            self._id = int(listener.logger.local_id)
        else:
            raise ValueError("Agent listener not found")

        # Create ephemeral Exa API key for this subagent
        admin = ExaAdmin()
        key_name = f"SubAgent_{self._id}"
        api_key = await admin.create_key(key_name)
        print(f"Created Exa API key for subagent {self._id}: {api_key[:4]}...{api_key[-4:]}")

        self._exa = ExaClient(api_key=api_key)

    async def call(self, task: str) -> 'SubAgentReport':
        """Run the subagent on a task."""
        await self._ensure_init()
        print(f"Running web-search subagent ({self._id})")
        with CaptionLogger():
            return await self._agent.call(SubAgentReport, task)

@dataclass
class SourceInfo:
    """
    A single source you used in your research.

    Fill this out in your SubAgentReport so the coordinator can understand:
    - what URL you relied on,
    - what you searched for to find it,
    - what claims you are taking from it,
    - and an approximate source category (primary/secondary/vendor/press/blog/forum/unknown).
    """

    url: str
    source_type: SourceType | None = None
    query: str | None = None
    extracted_claims: list[str] = field(default_factory=list)
    artifact_path: str | None = None
    lines_used: list[tuple[int, int]] = field(default_factory=list)

@dataclass
class SubAgentReport:
    """
    Your final output for one subagent task.

    Requirements:
    - `content` may be paraphrased, but MUST be supported by the saved `lines_used`.
    - `sources` must include one SourceInfo per source you relied on.
    """

    title: str
    content: str
    sources: list[SourceInfo] = field(default_factory=list)

class CitationAgent:
    """Agent that adds citations to a research report."""

    def __init__(self):
        self._agent = Agent(
            model=CITATION_MODEL,
            premise=CITATION_PREMISE,
            scope=dict(
                list_search_results=storage.list_search_results,
                load_search_result=storage.load_search_result,
                save_report=storage.save_report,
            ),
        )

    async def call(self, md_report: str) -> None:
        """Add citations to a research report."""
        print("Running citation agent")
        return await self._agent.call(
            None,
            f"The `research_report = '{md_report[:10]}...' [truncated]` has been provided to you in the REPL.",
            research_report=md_report,
        )

class DeepResearchSession:
    """Orchestrates a deep research session with multiple agents."""

    def __init__(self):
        self._lead_researcher = Agent(
            model=LEAD_RESEARCHER_MODEL,
            premise=LEAD_RESEARCHER_PREMISE,
            scope=dict(
                save_plan=storage.save_plan,
                load_plan=storage.load_plan,
                list_search_results=storage.list_search_results,
                load_search_result=storage.load_search_result,
                SubAgent=SubAgent,
                SubAgentReport=SubAgentReport,
                SourceInfo=SourceInfo,
                SearchResult=SearchResult,
            ),
            listener=lambda: AgentListener(CaptionLogger("Lead Researcher")),
        )
        self._citation_agent = CitationAgent()

    async def call(self, query: str) -> str:
        """Run the deep research process."""

        try:
            # Research phase
            report = await self._lead_researcher.call(str, query)

            # Citation phase
            with CaptionLogger():
                await self._citation_agent.call(report)

            if not storage.report_exists():
                raise RuntimeError("Report was not created")

            print(storage.summary())

            return f"Check out the research report at {storage.report_path}. Ask me any questions!"
        finally:
            # Clean up ephemeral API keys
            print("Pruning Exa API keys...")
            deleted = await ExaAdmin().prune_keys(prefix="SubAgent_")
            if deleted:
                print(f"Pruned {deleted} key(s)")

if __name__ == "__main__":
    session = DeepResearchSession()
    result = asyncio.run(
        session.call(
            "What are all of the companies in the US working on AI agents in 2025? "
            "Make a list of at least 10. For each, include the name, website and product, "
            "description of what they do, type of agents they build, and their vertical/industry."
        )
    )
    print(result)
View the generated report

Walk-throughs

Prerequisites:
  • Run pip install slack-sdk or uv add slack-sdk
  • Add your SLACK_BOT_TOKEN
Python objects are tools. They are there to be manipulated and used. Agentica lets agents do just that, including using functions, classes and objects from any Python SDK.
As a simple example, let’s say you want to use the Slack client from the Slack SDK to send someone a message with some custom business logic inside it. Let’s start by creating a client.
import os
import asyncio
from agentica import agentic
from slack_sdk import WebClient

SLACK_BOT_TOKEN = os.environ.get("SLACK_BOT_TOKEN")

# We know we will want to list users and send a message
slack_conn = WebClient(token=SLACK_BOT_TOKEN)
Read these instructions to generate a SLACK_BOT_TOKEN !
Then isolate the relevant Slack methods.
send_direct_message = slack_conn.chat_postMessage
And simply pass them to your agentic function using the @agentic decorator. Note that the prompt to the model is specified in the docstring and the method definition is empty.
@agentic(send_direct_message, model="openai:gpt-4.1")
async def send_morning_message(user_name: str) -> None:
    """
    Uses the Slack API to send a direct message to a user. Light and cheerful!
    """
    ...


if __name__ == "__main__":
    import asyncio

    asyncio.run(send_morning_message('@Samuel'))
    print("Morning message sent!")
For more information on what objects you can pass in via the @agentic decorator, see the references.If you prefer more agentic syntax, try the following:
from agentica import spawn

async def main():
    morning_messenger = await spawn(
        """
        Use the Slack API to send the user a direct message. Light and cheerful!
        """,
        scope={
            "send_direct_message": send_direct_message,
        }
    )
    _ = await morning_messenger(None, "@John") # `None` return-type
    print("Morning message sent!")

asyncio.run(main())
Prerequisites:
  • Run pip install matplotlib pandas ipynb jupyter or uv add matplotlib pandas ipynb jupyter
  • Download the CSV and save as /movie_metadata.csv
Let’s take an example from the DSEval benchmark and use an agent in Agentica to answer questions on a dataset in a Jupyter notebook. Let’s start by importing a few things.
from agentica import spawn
import pandas as pd
import matplotlib.pyplot as plt
Now let’s get an agent to help us answer a question on the dataset and stream in its thinking.
agent = await spawn()
result = await agent.call(
    dict[str, int],
    "Show the number of movies for each major genre. The results can be in any order.",
    movie_metadata_dataset=pd.read_csv("./movie_metadata.csv").to_dict(),
)
To determine the number of movies for each major genre, we can follow these steps:

1. Access the `'genres'` field in the `movie_metadata_dataset` dictionary, which should contain the genres of the movies.
2. Initialize a dictionary to keep track of the count of movies in each genre.
3. Iterate over the genres for each movie, and for movies with multiple genres (assuming they are separated by '|'), split the string and count each genre separately.
4. Update the count of each genre in our dictionary.
5. Return the dictionary with the genre counts as the result.

Let's get started by inspecting the `movie_metadata_dataset` to understand its structure and find how genres are stored.No code was executed. Use ```python code blocks to execute code.
```python
# Inspect the structure of movie_metadata_dataset to locate the genres information
movie_metadata_dataset.keys()
```dict_keys(['color', 'director_name', 'num_critic_for_reviews', 'duration', 'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name', 'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name', 'movie_title', 'num_voted_users', 'cast_total_facebook_likes', 'actor_3_name', 'facenumber_in_poster', 'plot_keywords', 'movie_imdb_link', 'num_user_for_reviews', 'language', 'country', 'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes', 'imdb_score', 'aspect_ratio', 'movie_facebook_likes'])
The `movie_metadata_dataset` contains a field `'genres'`, which indicates that we can use this to count the number of movies for each genre. Let's examine a few entries from the `'genres'` field to understand its format. This will help us properly split the genres if they are present as a delimited string.No code was executed. Use ```python code blocks to execute code.
```python
# Look at the first few entries in the 'genres' field
list(movie_metadata_dataset['genres'].values())[:5]
```['Action|Adventure|Fantasy|Sci-Fi', 'Action|Adventure|Fantasy', 'Action|Adventure|Thriller', 'Action|Thriller', 'Documentary']
The genres are stored as strings, with each genre for a movie separated by a '|'. We will split these strings and count each genre separately.

Let's proceed to compute the number of movies for each genre.No code was executed. Use ```python code blocks to execute code.
```python
from collections import defaultdict

# Create a defaultdict to store the count of each genre
genre_count = defaultdict(int)

# Iterate over each movie's genres
for genres in movie_metadata_dataset['genres'].values():
# Split the genres string by '|'
for genre in genres.split('|'):
    # Increment the count for each genre
    genre_count[genre] += 1

# Convert defaultdict to a regular dictionary for the result
result = dict(genre_count)
Let’s print the result.
result
{'Action': 1153, 'Adventure': 923, 'Fantasy': 610, 'Sci-Fi': 616, 'Thriller': 1411, 'Documentary': 121, 'Romance': 1107, 'Animation': 242, 'Comedy': 1872, 'Family': 546, 'Musical': 132, 'Mystery': 500, 'Western': 97, 'Drama': 2594, 'History': 207, 'Sport': 182, 'Crime': 889, 'Horror': 565, 'War': 213, 'Biography': 293, 'Music': 214, 'Game-Show': 1, 'Reality-TV': 2, 'News': 3, 'Short': 5, 'Film-Noir': 6}
Now let’s make a plot with the returned data, since it has passed us back the appropriate object!
plt.figure(figsize=(12, 8))
plt.bar(list(result.keys()), list(result.values()))
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
Movies by genreBut what if we want only the genres with over 1000 movies? Our agent still has access to our result in its execution environment and can manipulate that variable by reference!
result = await agent.call(
dict[str, int],
"Update the result to only contain the genres that have more than 1000 movies.",
)
To update the result to contain only the genres with more than 1000 movies, we'll filter the dictionary accordingly. Let's do that now.No code was executed. Use ```python code blocks to execute code.
```python
# Filter the genre_count dictionary to include only genres with more than 1000 movies
result = {genre: count for genre, count in genre_count.items() if count > 1000}
Now we can remake the plot!
plt.figure(figsize=(12, 8))
plt.bar(list(result.keys()), list(result.values()))
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
Top 5 movies by genreFor more information on what objects you can pass to spawn, see the references.
Prerequisites:
  • If on macOS, install system dependencies with brew install pkg-config cairo meson ninja
  • Run pip install exa-py validators markdown xhtml2pdf or uv add exa-py validators markdown xhtml2pdf
  • Create an EXA account, create an EXA_SERVICE_API_KEY and run export EXA_SERVICE_API_KEY="<your-key-here>"
Let’s replicate the Anthropic’s deep research multi-agent system. The high level architecture and the iterative process are outlined in the images below.Let’s start building.
We are depending on markdown, xhtml2pdf as external dependencies. Additionally, agentica.std.web exports web-search utilities based on Exa.
  • If you use web_search / web_fetch directly, you need EXA_API_KEY.
  • This demo creates ephemeral Exa keys per subagent, which requires EXA_SERVICE_API_KEY.
import asyncio
import json
import re
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Literal

import markdown
from xhtml2pdf import pisa

from agentica.agent import Agent
from agentica.logging import AgentListener
from agentica.std.caption import CaptionLogger
from agentica.std.web import ExaAdmin, ExaClient, SearchResult

type SourceType = Literal[
    "primary",
    "secondary",
    "vendor",
    "press",
    "blog",
    "forum",
    "unknown",
]

LEAD_RESEARCHER_MODEL = "anthropic:claude-opus-4.5"
SUBAGENT_MODEL = "anthropic:claude-sonnet-4.5"
CITATION_MODEL = "openai:gpt-4.1"
Let’s create a simple storage class that can save and load research artifacts.
STORAGE_DIR = Path("deep_research_test")


@dataclass
class Storage:
    """Centralized storage for all research artifacts."""

    directory: Path = field(default=STORAGE_DIR)
    _result_counts: dict[int, int] = field(default_factory=dict)

    def __post_init__(self):
        self.directory.mkdir(parents=True, exist_ok=True)
We need to be able to save and read a plan as a .txt file in the storage directory. Likewise, we need to be bale to make a pdf out of the final markdown report.
    # Plan

    def save_plan(self, plan: str) -> None:
        """Save the research plan."""
        (self.directory / "plan.md").write_text(plan)

    def load_plan(self) -> str:
        """Load the research plan."""
        path = self.directory / "plan.md"
        if not path.exists():
            raise FileNotFoundError("Plan file not created yet.")
        return path.read_text()

    # Search Results

    def save_search_result(
        self,
        subagent_id: int,
        result: SearchResult,
        lines_used: list[tuple[int, int]],
        *,
        query: str | None = None,
        extracted_claims: list[str] | None = None,
        source_type: SourceType | None = None,
        source_notes: str | None = None,
    ) -> str:
        count = self._result_counts.get(subagent_id, 0) + 1
        self._result_counts[subagent_id] = count

        path = self.directory / f"subagent_{subagent_id}" / f"result_{count}.json"
        path.parent.mkdir(parents=True, exist_ok=True)

        # Extract only the relevant lines
        filtered_lines: list[str] = []
        for start, end in lines_used:
            filtered_lines.extend(result.content_lines[start - 1 : end])

        data = {
            "title": result.title,
            "url": result.url,
            "content_lines": filtered_lines,
            "score": result.score,
            # Rich artifact metadata (kept compatible with SearchResult.load()).
            "saved_at": datetime.now().isoformat(),
            "subagent_id": subagent_id,
            "query": query,
            "lines_used": lines_used,
            "extracted_claims": extracted_claims or [],
            "source_type": source_type,
            "source_notes": source_notes,
        }
        path.write_text(json.dumps(data))
        return str(path)

    def load_search_result(self, path: str) -> SearchResult:
        """
        Load a previously saved search-result artifact (JSON) and return it as a SearchResult.

        Note: artifacts may include extra metadata fields, but SearchResult.load() only uses:
        - title
        - url
        - content_lines
        - score
        """
        p = Path(path)
        if not p.is_relative_to(self.directory):
            raise ValueError(f"Path must be within {self.directory}")
        return SearchResult.load(p)

    def list_search_results(self) -> list[str]:
        """List all saved search result paths."""
        files: list[str] = []
        for subagent_dir in self.directory.glob("subagent_*"):
            if not subagent_dir.is_dir():
                continue
            for file in subagent_dir.iterdir():
                if file.suffix == ".json" and re.match(r"^result_\d+$", file.stem):
                    files.append(str(file))
        return files

    # Report

    def save_report(self, md_report: str) -> str:
        """Save the final report as markdown and PDF."""
        md_path = self.directory / "report.md"
        pdf_path = self.directory / "report.pdf"

        md_path.write_text(md_report)

        try:
            html = markdown.markdown(md_report, extensions=['tables'])
            with pdf_path.open("wb") as pdf:
                pisa.CreatePDF(html, dest=pdf)
        except Exception as e:
            print(f"Warning: PDF conversion failed: {e}")

        return str(pdf_path)

    @property
    def report_path(self) -> Path:
        return self.directory / "report.pdf"

    def report_exists(self) -> bool:
        return (self.directory / "report.md").exists()

    # Summary

    def summary(self) -> str:
        """Return a summary of all stored artifacts."""
        lines = [
            "",
            "━" * 40,
            f"📁 Research stored in: {self.directory.resolve()}",
            "━" * 40,
        ]

        if self.report_exists():
            lines.append(f"📄 Report:  {self.report_path.name}")
            if (self.directory / "report.md").exists():
                lines.append(f"           {(self.directory / 'report.md').name}")

        if (self.directory / "plan.md").exists():
            lines.append("📋 Plan:    plan.md")

        search_results = self.list_search_results()
        if search_results:
            lines.append(f"🔍 Search results: {len(search_results)} files")
            by_subagent: dict[str, list[str]] = {}
            for path in search_results:
                p = Path(path)
                subagent = p.parent.name
                by_subagent.setdefault(subagent, []).append(p.name)
            for subagent, files in sorted(by_subagent.items()):
                lines.append(f"           {subagent}/: {len(files)} results")

        lines.append("━" * 40)
        return "\n".join(lines)


storage = Storage()
The lead researcher should be able to create and run as many subagents as it deems necessary  to work on independent tasks (with web search).
Let’s add some bonus features:
  • the lead researcher should have the option to reuse a subagent with persistent context e.g. asking a subagent to redo a task that it got wrong
  • subagents should save the web search results that they use specifying what they have used for the citation agent to review
class SubAgent:
    """
    A subagent with web search capabilities.
    For each task, a subagent must be **created**, then **run** with `.call()`.
    If a subagent needs to be reused, perhaps because it got something wrong, it
    may be run **again** with a second `.call()`, persisting its history.
    """

    _id: int
    _exa: ExaClient | None
    _agent: Agent
    _initialized: bool

    def __init__(self):
        self._id = 0
        self._exa = None

        async def web_search(query: str) -> list[SearchResult]:
            """Tool: search the web for `query`. Returns a small list of SearchResult objects."""
            print(f"Searching: {query}")
            await self._ensure_init()
            assert self._exa is not None
            return await self._exa.search(query, num_results=2)

        def save_search_result(
            result: SearchResult,
            lines_used: list[tuple[int, int]],
            query: str | None = None,
            extracted_claims: list[str] | None = None,
            source_type: SourceType | None = None,
            source_notes: str | None = None,
        ) -> str:
            """
            Tool: save a SearchResult artifact for later citation/inspection.

            Parameters
            ----------
            result:
                The SearchResult you are using.
            lines_used:
                1-indexed (inclusive) line ranges from result.content_lines that support your claims.
            query:
                The web query you used to find this result (optional but recommended).
            extracted_claims:
                Short bullet claims you will rely on, derived from the saved lines.
            source_type:
                Optional coarse label, e.g. "primary", "secondary", "vendor", "press", "blog", "forum", "unknown".
            source_notes:
                Optional brief notes justifying the label / quality.

            Returns
            -------
            str:
                Path to the saved JSON artifact (within the storage directory).
            """
            return storage.save_search_result(
                self._id,
                result,
                lines_used,
                query=query,
                extracted_claims=extracted_claims,
                source_type=source_type,
                source_notes=source_notes,
            )

        self._agent = Agent(
            model=SUBAGENT_MODEL,
            premise=SUBAGENT_PREMISE,
            scope=dict(
                web_search=web_search,
                save_search_result=save_search_result,
                SearchResult=SearchResult,
                SubAgentReport=SubAgentReport,
                SourceInfo=SourceInfo,
            ),
        )
        self._initialized = False

    async def _ensure_init(self) -> None:
        if self._initialized:
            return
        self._initialized = True

        # Get agent ID from listener
        await self._agent._ensure_init()
        if (listener := self._agent._listener) and listener.logger.local_id:
            self._id = int(listener.logger.local_id)
        else:
            raise ValueError("Agent listener not found")

        # Create ephemeral Exa API key for this subagent
        admin = ExaAdmin()
        key_name = f"SubAgent_{self._id}"
        api_key = await admin.create_key(key_name)
        print(f"Created Exa API key for subagent {self._id}: {api_key[:4]}...{api_key[-4:]}")

        self._exa = ExaClient(api_key=api_key)

    async def call(self, task: str) -> 'SubAgentReport':
        """Run the subagent on a task."""
        await self._ensure_init()
        print(f"Running web-search subagent ({self._id})")
        with CaptionLogger():
            return await self._agent.call(SubAgentReport, task)


@dataclass
class SourceInfo:
    """
    A single source you used in your research.

    Fill this out in your SubAgentReport so the coordinator can understand:
    - what URL you relied on,
    - what you searched for to find it,
    - what claims you are taking from it,
    - and an approximate source category (primary/secondary/vendor/press/blog/forum/unknown).
    """

    url: str
    source_type: SourceType | None = None
    query: str | None = None
    extracted_claims: list[str] = field(default_factory=list)
    artifact_path: str | None = None
    lines_used: list[tuple[int, int]] = field(default_factory=list)


@dataclass
class SubAgentReport:
    """
    Your final output for one subagent task.

    Requirements:
    - `content` may be paraphrased, but MUST be supported by the saved `lines_used`.
    - `sources` must include one SourceInfo per source you relied on.
    """

    title: str
    content: str
    sources: list[SourceInfo] = field(default_factory=list)
 The citation agent should be able to list and look back through web searches made by subagents as well as save the final report as an .md file.
class CitationAgent:
    """Agent that adds citations to a research report."""

    def __init__(self):
        self._agent = Agent(
            model=CITATION_MODEL,
            premise=CITATION_PREMISE,
            scope=dict(
                list_search_results=storage.list_search_results,
                load_search_result=storage.load_search_result,
                save_report=storage.save_report,
            ),
        )

    async def call(self, md_report: str) -> None:
        """Add citations to a research report."""
        print("Running citation agent")
        return await self._agent.call(
            None,
            f"The `research_report = '{md_report[:10]}...' [truncated]` has been provided to you in the REPL.",
            research_report=md_report,
        )
Finally, let’s put it all together, making sure that
  • the citation agent is always called after the research report is generated by the lead researcher, and
  • the user has the opportunity to ask follow-up questions after receiving the research report.
class DeepResearchSession:
    """Orchestrates a deep research session with multiple agents."""

    def __init__(self):
        self._lead_researcher = Agent(
            model=LEAD_RESEARCHER_MODEL,
            premise=LEAD_RESEARCHER_PREMISE,
            scope=dict(
                save_plan=storage.save_plan,
                load_plan=storage.load_plan,
                list_search_results=storage.list_search_results,
                load_search_result=storage.load_search_result,
                SubAgent=SubAgent,
                SubAgentReport=SubAgentReport,
                SourceInfo=SourceInfo,
                SearchResult=SearchResult,
            ),
            listener=lambda: AgentListener(CaptionLogger("Lead Researcher")),
        )
        self._citation_agent = CitationAgent()

    async def call(self, query: str) -> str:
        """Run the deep research process."""

        try:
            # Research phase
            report = await self._lead_researcher.call(str, query)

            # Citation phase
            with CaptionLogger():
                await self._citation_agent.call(report)

            if not storage.report_exists():
                raise RuntimeError("Report was not created")

            print(storage.summary())

            return f"Check out the research report at {storage.report_path}. Ask me any questions!"
        finally:
            # Clean up ephemeral API keys
            print("Pruning Exa API keys...")
            deleted = await ExaAdmin().prune_keys(prefix="SubAgent_")
            if deleted:
                print(f"Pruned {deleted} key(s)")
Let’s go back and define the premise prompts for all the agents.
CITATION_PREMISE = """
You are a citation agent.

# Task
You must:
1. Review the research report provided to you as `research_report` line by line.
2. Identify which lines of the research report use information that could be from web search results.
3. List the web search results that were used in creating the research report.
4. For each of these lines, use the `load_search_result` function to load the web search result that was used.
5. Add a markdown citation with the URL of the web search result to the claim in the research report by modifying the `research_report` variable.
6. Once this is done, make sure the `research_report` is valid markdown - if not, change the markdown to make it valid.
7. Use the `save_report` function to save the research report to memory as a markdown file at the end.
8. Return saying you have finished.

# Rules
- Your citations MUST be consistent throughout the `research_report`.
- Any URL in the final markdown MUST be formatted as a markdown link, not a bare URL.
- You MUST use the `list_search_results` function to list the web search results that were used in creating the research report
- You MUST use the `load_search_result` function to load the web search results.
- You MUST use the `research_report` variable provided to you to modify the research report by adding citations.
- You MUST make sure the `research_report` is valid markdown.
- You MUST use the `save_report` function to save the research report to memory at the end.
- You MUST inspect the report before saving it to make sure it is valid and what you intended. Iterate until it is valid.

## Citation format
- Prefer inline citations like: `... claim ... ([source](https://example.com))`
- If multiple sources support a sentence, include multiple links: `... ([s1](...), [s2](...))`
"""

LEAD_RESEARCHER_PREMISE = """
You are a lead researcher. You have access to web-search enabled subagents.

# Task
You must:
1. Create a plan to research the user query.
2. Determine how many specialised subagents (with access to the web) are necessary, each with a different specific research task.
3. Call ALL subagents in parallel using asyncio.gather with return_exceptions=True so partial results are preserved.
4. Summarise the results of the subagents in a final research report as markdown. Use sections, sub-sections, list and formatting to make the report easy to read and understand. The formatting should be consistent and easy to follow.
5. Check the final research report, as this will be shown to the user.
6. Return the final research report using `return` at the very end.

# Rules
- Do NOT construct the final report until you have run the subagents.
- Do NOT return the final report in the REPL until planning, assigning subagents and returning the final report is complete.
- Do NOT add citations to the final research report yourself, this will be done afterwards.
- Do NOT repeat yourself in the final research report.
- You MUST raise an AgentError if you cannot complete the task with what you have available.
- You MUST check the final research report string before returning it to the user.

## Planning
- You MUST write the plan yourself.
- You MUST write the plan before assigning subagents to tasks.
- You MUST break down the task into small individual tasks.

## Subagents
- You MUST assign each small individual task to a subagent.
- For each task, YOU MUST create a **new** SubAgent, and provide it with a task via `.call()`.
- You MUST NOT assign multiple unrelated tasks to the same SubAgent.
- You should only call a SubAgent repeatedly if you feel you failed to get enough information from a single call, instructing them with what they were missing.
- You MUST instruct subagents to use the web_search and save_search_result functions if the task requires it.
- Do NOT ask subagents to cite the web, instead instruct them to use the save_search_result function.
- Subagents MUST be assigned independent tasks.
- IF after subagents have returned their findings more research is needed, you can assign more subagents to tasks.
- DO NOT try to preemptively *parse* the output of the subagents, **just look at the output yourself**.
- Subagents may fail! `asyncio.gather` will raise an exception if any of the subagents fail. Instead, you should pass `return_exceptions=True` to `asyncio.gather` to not lose the results of the successful subagents.

## Final Report
- Do NOT write the final report yourself without running subagents to do so.
- Do NOT add citations to the final research report yourself, this will be done afterwards by another agent.
- Do NOT repeat yourself in the final research report.
- Do NOT return a report with missing information, omitted fields or `N/A` values. If more work needs to be done, you must assign more subagents to tasks, or reuse the necessary subagents to extract more information.
- You MUST load the plan from memory before returning the final research report to check that you have followed the plan.
- You MUST check the final research report before returning it to the user.
- Check the final report for quality, completeness and consistency. If up to standard, return using a single `return` as the sole statement in its very own
- Your final report MUST include a short "Sources consulted" section:
  - List each source URL you relied on
  - Include its source_type and 1-2 extracted claims
- Any URL you include MUST be a markdown hyperlink (not a bare URL).
- Do NOT put the whole report in a table.
"""

SUBAGENT_PREMISE = """
You are a helpful assistant.

# Task
You must:
1. Construct a list of things to search for using the web_search function.
2. Execute ALL web_search calls in parallel using asyncio.gather and asyncio.run.
3. For each search result, `print()` relevant sections using SearchResult.content_with_line_numbers(start=..., end=...).
4. Identify which lines of content you are going to use in your report.
5. Use the save_search_result function to save the SearchResult to memory and include the lines of the content that you have used.
   - Include the specific `query` you searched for.
   - Include `extracted_claims`: a list of short claims you will rely on (derived from the saved lines).
   - Include `source_type`: one of ["primary", "secondary", "vendor", "press", "blog", "forum", "unknown"].
     Use your best judgment based on the URL/domain and the content.
   - IMPORTANT: save_search_result returns a saved artifact path; keep it and include it in SourceInfo.artifact_path
6. Condense the search results into a single report with what you have found.
7. Return the report using `return` at the very end in a separate REPL session.

# Rules
- You MUST use `print()` to print the content of each search result by via SearchResult.content_with_line_numbers().
- You MUST use the web_search function if instructed to do so OR if the task requires finding information.
- Do NOT assume that the web_search function will return the information you need, you must go through the content of each search result line by line by combing through the content with SearchResult.content_with_line_numbers(start=, end=).
- Do NOT assume which lines of content you are going to use in your report, you must go through the content of each search result line by line via SearchResult.content_with_line_numbers(start=, end=).
- If you cannot find any information, do NOT provide information yourself, instead raise an error for the lead researcher in the REPL.
- You MUST save the SearchResult of any research that you have used to memory and include the lines of the content that you have used (are relevant).
- When saving, pass `query`, `extracted_claims`, and `source_type` to save_search_result.
- Your returned SubAgentReport MUST include `sources`: one entry per saved source, including url, source_type, query, extracted_claims, artifact_path, and lines_used.
- Return the report using `return` at the very end in a separate REPL session.
"""
We can now run the session with a user query!
if __name__ == "__main__":
    session = DeepResearchSession()
    result = asyncio.run(
        session.call(
            "What are all of the companies in the US working on AI agents in 2025? "
            "Make a list of at least 10. For each, include the name, website and product, "
            "description of what they do, type of agents they build, and their vertical/industry."
        )
    )
    print(result)
View the generated report