Skip to main content

Writing Effective Prompts

Magic functions and agents have two mutually exclusive premise and system parameters on initialization:
  • Premise. Provides additional context in the AI’s system prompt about what its goal is going to be.
  • System. Provides completely custom behaviour by overriding the AI’s system prompt entirely.
When the AI is invoked, you provide further “task” messages which specify more specifically the goal at that moment. If the system parameter is provided, future “task” messages for agents will be provided with no additional formatting, meaning you need to be explicit about everything the agent has access to and needs to return. Custom prompts benefit from dropping in snippets of “explainers” for things such as the REPL and return types; these may be provided via {{TEMPLATE}} variables, explained in detail in Advanced › System Prompt Templating. Whilst writing prompts for the system message or individual task messages, there are some best practices to follow which will vary depending on the model you are using. These are laid out in detail per model by provider documentation.
When using OpenAI models (openai:gpt-4.1, openai:gpt-5, etc.) with Agentica, apply these proven strategies:Writing Magic Function Docstrings
# Bad - Vague docstring
@magic()
def analyze(text: str) -> dict:
    """Analyze text"""
    ...

# Good - Clear, specific docstring
@magic()
def analyze(text: str) -> dict[str, Any]:
    """
    Analyze the sentiment, key entities, and main topics in the text.
    Return a dict with 'sentiment' (positive/negative/neutral),
    'entities' (list of names/places/orgs), and 'topics' (list of main themes).
    """
    ...
Crafting Agent Premises
# Bad - Generic premise
agent = await spawn(premise="You are helpful.")

# Good - Specific premise with clear role and constraints
agent = await spawn(
    premise="""
    You are a data analyst specializing in customer feedback.
    Always provide numerical confidence scores (0-1) with your conclusions.
    When uncertain, explicitly state your assumptions.
    """
)
Request Step-by-Step ReasoningFor complex tasks, explicitly ask for reasoning in your docstrings or premises:
@magic()
def solve_problem(problem: str) -> dict[str, Any]:
    """
    Solve the math problem step by step.
    First, identify what's being asked.
    Then, break down the solution into steps.
    Finally, provide the answer with your reasoning.
    """
    ...
Leverage Scope EffectivelyProvide focused, relevant tools rather than entire SDKs:
from slack_sdk import WebClient

slack = WebClient(token=TOKEN)

# Good - Extract only what you need
@magic(slack.users_list, slack.chat_postMessage)
async def notify_team(message: str) -> None:
    """Send message to all active team members."""
    ...
Type Hints are InstructionsOpenAI models excel at following type hints—use them to guide output:
from typing import Literal

@magic()
def classify(text: str) -> Literal['urgent', 'normal', 'low']:
    """Classify the priority of this support ticket."""
    ...
When using Anthropic models (anthropic:claude-sonnet-4.5, anthropic:claude-opus-4.1, etc.) with Agentica, leverage these Claude-specific strengths:XML Tags in DocstringsClaude excels at parsing XML structure—use it in complex magic functions:
@magic()
def extract_and_validate(document: str, schema: dict) -> dict:
    """
    Extract structured data from the document and validate against schema.

    <instructions>
    1. Parse the document and extract fields matching the schema
    2. Validate each field against schema constraints
    3. Return extracted data with validation status
    </instructions>

    <output_format>
    Return dict with 'data' (extracted fields) and 'valid' (bool)
    </output_format>
    """
    ...
Rich Agent PremisesClaude responds well to detailed role definitions:
agent = await spawn(
    premise="""
    You are a senior software architect with expertise in distributed systems.

    <role>
    - Analyze system designs for scalability issues
    - Suggest concrete improvements with trade-offs
    - Consider cost, latency, and reliability
    </role>

    <style>
    - Be direct and technical
    - Provide specific code/config examples when helpful
    - Acknowledge uncertainties explicitly
    </style>
    """,
    model="anthropic:claude-sonnet-4.5"
)
Chain of Thought PromptsClaude’s reasoning improves dramatically with explicit thinking requests:
@magic()
def debug_issue(code: str, error: str) -> dict[str, str]:
    """
    Debug the code issue by thinking through it step by step.

    Before providing a solution:
    1. Analyze what the code is trying to do
    2. Identify why the error occurs
    3. Consider multiple potential fixes
    4. Choose the best fix with explanation

    Return dict with 'analysis' and 'fix' keys.
    """
    ...
System Prompts for Complete ControlUse system instead of premise when you need to fully specify behavior:
# For agents that must return ONLY structured data
agent = await spawn(
    system="""
    You are a data extraction API. You receive documents and schemas.
    You MUST return valid JSON matching the schema. Never include explanations,
    preambles, or markdown formatting. Just the JSON object.
    """,
    model="anthropic:claude-sonnet-4.5"
)
Long Context UtilizationClaude handles large contexts exceptionally well—structure them clearly:
@magic()
def analyze_codebase(files: dict[str, str]) -> dict:
    """
    Analyze the entire codebase for security issues.

    <critical_instructions>
    Focus on: SQL injection, XSS, auth bypasses, secrets in code
    </critical_instructions>

    Process each file systematically. Look for patterns across files.

    <critical_instructions>
    Return findings with file paths and severity (critical/high/medium/low)
    </critical_instructions>
    """
    ...
Prompting Style DifferencesOpenAI Models (openai:gpt-4.1, openai:gpt-5):
  • Concise instructions. Works well with shorter, direct docstrings and premises.
  • Delimiters. Use ### or """ to separate sections in complex prompts.
  • Step-by-step explicit. Benefits from phrases like “First… Then… Finally…”
  • Function-oriented. Natural fit for task decomposition and tool use.
Anthropic Models (anthropic:claude-sonnet-4.5, anthropic:claude-opus-4.1):
  • XML tags preferred. Use <instructions>, <context>, <examples> for structure.
  • Detailed premises. Responds well to longer, more elaborate role definitions.
  • Chain of thought. Explicitly request thinking with “Before answering, think step by step…”
  • Long context friendly. Can handle very large prompts and scope effectively.
Universal Agentica Best PracticesRegardless of model choice:1. Write Clear Docstrings/Descriptions. Be specific about what, how, and what format to return.
# Magic functions: detailed docstrings
@magic()
def process(data: str) -> Result:
    """What, how, and what format to return"""
    ...

# Agents: specific premises
agent = await spawn(premise="Clear role + constraints")
2. Use Strong Type Hints. Types guide the AI and ensure type-safe returns.
from typing import Literal
from pydantic import BaseModel

class Analysis(BaseModel):
    sentiment: Literal['positive', 'negative', 'neutral']
    confidence: float

@magic()
def analyze(text: str) -> Analysis:
    """Return sentiment analysis"""
    ...
3. Provide Focused Scope. Only include tools/data the AI needs for the specific task, not entire objects or SDKs.4. Request Reasoning for Complex Tasks. Add “step by step” or “think through” to prompts for better accuracy on hard problems.5. Test with Real Examples. Validate magic functions and agents with actual use cases before production.

Choosing Models in Agentica

Consider OpenAI when:
  • Your prompts are concise and task-focused
  • You’re using structured delimiters for prompt sections
  • Example: model="openai:gpt-5"
Consider Anthropic when:
  • Your prompts benefit from XML structure
  • You’re working with very large context in scope
  • You want detailed, persona-driven agents
  • Example: model="anthropic:claude-sonnet-4.5"
Prompting Approach:
Prompt StyleBetter Match
Short, focused instructionsOpenAI models
XML-structured promptsAnthropic models
Large context (50K+ tokens)Anthropic models
Detailed role definitionsAnthropic models
Task decomposition patternsOpenAI models

Be Specific

Vague prompts lead to inconsistent results. The AI needs clear instructions about what you want, how you want it, and what format to return. Think of your docstring as a specification, not just a description. Bad approach: Generic verbs without details.
@magic()
def summarize(text: str) -> str:
    """Summarize the text"""
    ...
Good approach: Specify length, focus, and style.
@magic()
def summarize(text: str) -> str:
    """
    Create a 2-3 sentence summary of the text.
    Focus on the main argument and key supporting points.
    Use objective language without opinion.
    """
    ...
When behavior or style matters beyond just the output structure, specify it clearly. Types tell the AI what to return, but your prompt tells it how to get there.
@magic()
def extract_sql(user_request: str, schema: dict) -> str:
    """
    Generate a SQL query from the user's natural language request.

    Requirements:
    - Use only SELECT statements (no INSERT, UPDATE, DELETE)
    - Always include LIMIT clauses to prevent large result sets
    - Use table aliases for readability
    - When joining tables, prefer INNER JOIN over implicit joins
    - Add comments explaining complex WHERE clauses

    Return valid PostgreSQL syntax.
    """
    ...

Include Examples

When you need specific formatting or a particular style, showing examples is more effective than describing the desired output in words. AI models learn patterns quickly from concrete examples. Use examples for tasks where the output has specific structure, like generating changelog entries:
@magic()
def write_changelog_entry(commit_messages: list[str]) -> str:
    """
    Write a changelog entry from commit messages.

    Example input: ["fix: resolve login timeout", "feat: add dark mode"]
    Example output:
    ### Features
    - Added dark mode support

    ### Bug Fixes
    - Fixed login timeout issue

    Follow this format exactly.
    """
    ...
Multiple examples help establish patterns, especially for formatting that varies by input:
@magic()
def format_currency(amount: float, currency: str) -> str:
    """
    Format a currency amount for display.

    Examples:
    - format_currency(1234.50, "USD") → "$1,234.50"
    - format_currency(999.99, "EUR") → "€999.99"
    - format_currency(50, "GBP") → "£50.00"

    Always include the currency symbol and exactly 2 decimal places.
    """
    ...

Define Constraints

The AI needs explicit rules for handling edge cases and ambiguous inputs. Without clear constraints, you’ll get inconsistent behavior when inputs don’t match the happy path. When your task involves categorization or decision-making, spell out the criteria:
from typing import Literal

@magic()
def categorize_severity(error_message: str, stack_trace: str) -> Literal['critical', 'high', 'medium', 'low']:
    """
    Categorize error severity.

    Constraints:
    - Return 'critical' if: data loss, security breach, system crash
    - Return 'high' if: feature unusable, affects multiple users
    - Return 'medium' if: degraded performance, workaround available
    - Return 'low' if: cosmetic issue, minimal impact

    Edge cases:
    - If stack_trace is empty, base decision on error_message alone
    - Database errors are at minimum 'high' severity
    - Authentication errors are at minimum 'high' severity
    """
    ...
For parsing or extraction tasks, define what constitutes valid input and what to return when inputs are missing or malformed:
@magic()
def parse_date_range(text: str) -> tuple[str, str] | None:
    """
    Extract start and end dates from natural language.

    Return format: (start_date, end_date) as YYYY-MM-DD strings

    Constraints:
    - Start date must be before or equal to end date
    - Return None if no date range found
    - Return None if only one date found (need both)

    Edge cases:
    - "last week" → Monday to Sunday of previous week
    - "Q1 2024" → 2024-01-01 to 2024-03-31
    - Relative dates use today's date as reference
    """
    ...
For multi-step validation workflows, use an agent that progressively checks and adapts based on what it discovers. The agent remembers previous findings when deciding next steps:
from agentica import spawn
from typing import Literal

agent = await spawn(premise="You are a security validator for user queries")

# Step 1: Agent analyzes input for safety
safety = await agent.call(
    Literal['safe', 'sql_injection', 'invalid_chars'],
    "Classify this user input for security issues",
    user_input=untrusted_input
)

# Step 2: Based on what agent found, take different actions
if safety == 'safe':
    # Agent remembers the input it just analyzed
    normalized = await agent.call(
        str,
        "Normalize the input you just validated (lowercase, trim, remove extra spaces)"
    )
    return normalized
elif safety == 'sql_injection':
    # Agent remembers what SQL patterns it detected
    details = await agent.call(
        str,
        "Explain which SQL patterns you detected and why they're dangerous"
    )
    log_security_violation(details)
    raise SecurityError(details)
else:
    # Agent remembers the invalid characters it found
    suggestion = await agent.call(
        str,
        "Suggest what the user should change about their input"
    )
    return f"Invalid input. {suggestion}"

Type Safety

Strong type hints do two things: they guide the AI toward the correct output structure, and they give you type-safe returns in your code. The more specific your types, the more constrained the AI’s output will be. Use literal types to restrict outputs to specific values:
from typing import Literal

@magic()
def classify(text: str) -> Literal['positive', 'negative', 'neutral']:
    """Classify sentiment"""
    ...

# The AI can only return one of these three exact strings
result = classify("Great product!")  # Type is Literal['positive', 'negative', 'neutral']
Use structured types for complex outputs. The AI will match your type structure exactly:
from dataclasses import dataclass
from typing import Literal

@dataclass
class Review:
    rating: Literal[1, 2, 3, 4, 5]
    sentiment: Literal['positive', 'negative', 'neutral']
    categories: list[str]
    summary: str

@magic()
def analyze_review(text: str) -> Review:
    """Analyze a product review"""
    ...

# Returns a fully typed Review object
review = analyze_review("Great product, fast shipping!")
print(review.rating)  # Type-safe access
Combine types with validation for even stronger guarantees. See Error Handling for validation patterns using Pydantic and Zod.

Security

Credential Management

Never hardcode API keys or secrets. Use environment variables. This keeps credentials out of your codebase and allows different values per environment.
import os

# Good - use environment variables
api_key = os.environ["API_KEY"]
database_url = os.environ.get("DATABASE_URL")

# Bad - hardcoded secrets
api_key = "sk-proj-abc123..."  # Never commit this
Never pass raw API keys to agents. Instead, pass pre-authenticated SDK clients or specific methods. The agent uses the functionality without ever seeing the credentials:
from agentica import spawn
from github import Github

# Good - pass authenticated client methods
gh = Github(os.environ["GITHUB_TOKEN"])
agent = await spawn(premise="You are a GitHub analyst")

result = await agent.call(
    Report,
    "Analyze the repository's recent activity",
    get_repo=gh.get_repo,
    search_issues=gh.search_issues
)
# Agent can use GitHub API without accessing the token

# Bad - passing raw credentials
result = await agent.call(
    Report,
    "Analyze repository",
    github_token=os.environ["GITHUB_TOKEN"]  # Never do this
)

Input Validation

Validate user input before passing it to AI functions. This prevents injection attacks and ensures your magic functions receive clean data.
from agentica import magic

@magic()
def query_database(user_input: str, schema: dict) -> list[dict]:
    """
    Generate and execute a database query based on user input.
    Only generate SELECT queries. Use the schema to validate table/column names.
    """
    ...

def safe_query(user_input: str) -> list[dict]:
    # Validate input length
    if len(user_input) > 500:
        raise ValueError("Input too long")

    # Check for suspicious patterns
    dangerous_keywords = ['drop', 'delete', 'truncate', 'insert', 'update']
    if any(keyword in user_input.lower() for keyword in dangerous_keywords):
        raise ValueError("Invalid query keywords")

    # Now safe to pass to AI
    return query_database(user_input, schema)

File Access Scope

AI that can open arbitrary paths can easily escape its intended sandbox (for example by traversing ../) and read, modify, or delete files across your system. Avoid passing Path objects or unrestricted file paths directly to agents or magic functions. Instead, pre-open only the specific files you want the AI to access and pass those file handles in scope.
from typing import TextIO
from agentica import magic

@magic()
def summarize_report(report_file: TextIO) -> str:
    """
    Read the already-open report_file and summarize its contents.
    """
    ...

with open("/var/reports/weekly.csv", "r", encoding="utf-8") as f:
    # The AI only sees this specific handle, not your whole filesystem
    summary = summarize_report(f)

Rate Limiting

Implement rate limiting to protect against abuse and manage costs. This is especially important for user-facing features.
from collections import defaultdict
from time import time

class RateLimiter:
    def __init__(self, max_calls: int, window_seconds: int):
        self.max_calls = max_calls
        self.window = window_seconds
        self.calls: dict[str, list[float]] = defaultdict(list)

    def allow(self, user_id: str) -> bool:
        now = time()
        # Remove old calls outside window
        self.calls[user_id] = [t for t in self.calls[user_id] if now - t < self.window]

        if len(self.calls[user_id]) >= self.max_calls:
            return False

        self.calls[user_id].append(now)
        return True

limiter = RateLimiter(max_calls=10, window_seconds=60)

@magic()
def summarize(text: str) -> str:
    """Summarize the text"""
    ...

def rate_limited_summarize(user_id: str, text: str) -> str:
    if not limiter.allow(user_id):
        raise Exception("Rate limit exceeded. Try again in a minute.")
    return summarize(text)
Exponential backoff handles transient failures when magic functions or agents call external APIs that may be rate-limited.
import asyncio
from agentica import magic
from dataclasses import dataclass
from typing import Literal

@dataclass
class FetchResult:
    status: Literal['success', 'rate_limited', 'error']
    data: list[dict] | None
    message: str

@magic()
async def fetch_github_data(query: str, api_search) -> FetchResult:
    """
    Search GitHub using the provided api_search function.
    If you encounter a rate limit response, return status='rate_limited'.
    If successful, return status='success' with the data.
    If other error, return status='error' with a message.
    """
    ...

async def fetch_with_backoff(query: str, api_search, max_retries: int = 3) -> FetchResult:
    for attempt in range(max_retries):
        result = await fetch_github_data(query, api_search)

        if result.status == 'success':
            return result
        elif result.status == 'rate_limited' and attempt < max_retries - 1:
            # Exponential backoff: 1s, 2s, 4s
            wait_time = 2 ** attempt
            await asyncio.sleep(wait_time)
            continue
        else:
            return result

    return FetchResult('error', None, 'Max retries exceeded')

Monitoring

Track these key metrics in production to understand your AI operations:
  • Latency. How long do magic functions and agents take to respond?
  • Error rates. What percentage of AI calls fail or timeout?
  • Usage patterns. Which functions are called most? By which users?
  • Output quality. Are results meeting expectations? Use sampling to review outputs.

Logging

Log AI operations with structured data. Include the operation name, input size, model used, and timing. This helps debug issues and identify patterns.
import logging
import time

logger = logging.getLogger(__name__)

@magic()
def classify(text: str) -> str:
    """Classify sentiment"""
    ...

def monitored_classify(text: str) -> str:
    start = time.time()
    try:
        result = classify(text)
        logger.info("AI operation succeeded", extra={
            "operation": "classify",
            "input_length": len(text),
            "latency_ms": (time.time() - start) * 1000,
            "model": "gpt-4"
        })
        return result
    except Exception as e:
        logger.error("AI operation failed", extra={
            "operation": "classify",
            "error": str(e),
            "input_length": len(text),
            "latency_ms": (time.time() - start) * 1000
        })
        raise
Never log sensitive data. User inputs, API keys, or PII should not appear in logs. See Error Handling › Sensitive Data Handling for examples of safe logging practices.

Performance

Caching

Cache AI responses when the same inputs produce the same outputs. This reduces latency and costs for repeated operations. Use caching for:
  • Reference data that changes infrequently (product descriptions, documentation)
  • Expensive operations called repeatedly with the same inputs
  • Read-heavy workflows where consistency is acceptable
from functools import lru_cache

# Decorate the magic function directly
@lru_cache(maxsize=1000)
@magic()
def categorize_product(description: str) -> str:
    """Categorize product into a department"""
    ...

# Same description returns cached result
category1 = categorize_product("Red cotton t-shirt")  # Calls AI
category2 = categorize_product("Red cotton t-shirt")  # Returns cached
Advanced: Best-of-N caching with retries. Like JIT compilation that eventually compiles hot code paths, you can combine caching with retry strategies to create a “best-of-N” pattern: retry failed operations until you get a high-quality result, then cache that successful response. Future calls skip the retry logic entirely and use the cached “compiled” result. This is particularly useful for expensive operations where you want to pay the retry cost once, then reuse the validated output.

Parallel Processing

Process multiple items in parallel when they’re independent. This is faster than sequential processing.
import asyncio

@magic()
async def analyze(text: str) -> dict:
    """Analyze the text"""
    ...

# Process all texts in parallel
texts = ["text 1", "text 2", "text 3"]
results = await asyncio.gather(*[analyze(text) for text in texts])

Stateful Workflows with Agents

Use agents for multi-step workflows where later steps depend on earlier results. Agents maintain context across invocations, allowing them to make decisions based on what they’ve already done. Here’s an agent that debugs code by analyzing, then deciding whether to fix or explain based on what it finds:
from agentica import spawn

agent = await spawn(
    premise="""
    You are a code debugger. When given code with an error:
    1. First analyze the error to understand the root cause
    2. If it's a simple fix (syntax, typo), fix it and return the corrected code
    3. If it's a logic error requiring design changes, explain the issue instead
    """,
    model="openai:gpt-4.1"
)

# First invocation: analyze
await agent.call(None, "Analyze this error", code=broken_code, error=error_msg)

# Second invocation: agent decides to fix or explain based on analysis
result = await agent.call(
    str,
    "Based on your analysis, either fix the code or explain what needs to change"
)
# The agent remembers its analysis and chooses the appropriate action
For truly independent operations, use magic functions and process in parallel. For dependent workflows where context matters, use a single agent across multiple calls.

Cost Optimization

AI operations cost money—optimize by choosing the right model, caching responses, and using agents only when needed. Choose the right model for the task. Use cheaper models for simple operations, more expensive models for complex reasoning. See Model Selection for guidance. Cache aggressively. Every cache hit is a cost you don’t pay. See Caching above. Keep prompts concise. Longer prompts cost more. Remove unnecessary context or examples once you’ve validated your magic function works. Use agents strategically. Agents maintain conversation history, which grows with each call and costs more. For stateless operations, use magic functions instead. Bad: Using an agent for independent operations
# Inefficient - agent maintains unnecessary history
agent = await spawn(premise="You are a data processor")

for item in items:
    result = await agent.call(dict, f"Process this item: {item}")
    # Each call adds to history, increasing cost
Good: Using magic function for independent operations
@magic()
def process_item(item: str) -> dict:
    """Process the item"""
    ...

# Each call is independent, no growing history
for item in items:
    result = process_item(item)
Good: Using agent when context matters
# Agent remembers context across steps
agent = await spawn(premise="You are a research assistant")

# Step 1: Find relevant papers
papers = await agent.call(list[str], "Search for papers on quantum computing", web_search=search)

# Step 2: Agent remembers which papers it found
summary = await agent.call(str, "Summarize the key findings from these papers")

# Step 3: Agent has full context to compare
comparison = await agent.call(str, "Which paper has the most practical applications?")

Deployment Checklist

Before deploying AI-powered features to production: Environment & Configuration
Environment variables configured for all environments (dev, staging, prod)
API keys secured and not hardcoded
Model selections appropriate for each environment (cheaper models for dev/test)
Error Handling & Reliability
Try/catch blocks around all AI operations
Fallback strategies for critical paths
Retry logic for transient failures
Validation on AI outputs where needed
Security
Input validation on user-provided data
Rate limiting implemented for user-facing features
Sensitive data excluded from logs
Authenticated SDK clients passed instead of raw API keys
Monitoring & Observability
Structured logging in place for AI operations
Metrics tracked (latency, error rates, usage)
Alerts configured for error spikes or high latency
Sample-based output quality monitoring
Testing
Unit tests for magic functions with representative inputs
Integration tests for multi-agent workflows
Load tests if serving high-volume traffic
Manual review of AI outputs on diverse test cases
Cost Management
Caching implemented for repeated operations
Model selection optimized (avoid expensive models for simple tasks)
Budget alerts configured with your AI provider
Rate limits prevent runaway costs

Next Steps