Home » AI Tool Use » Why Tool Calls Fail

Why AI Tool Calls Fail and How to Fix Them

Tool call failures fall into two categories: the model generates the wrong call (selection errors, parameter errors, hallucinated tools), or the right call fails during execution (timeouts, permission errors, invalid data). Most failures are preventable through better schema design, clearer descriptions, and robust execution handling. Understanding the common failure modes helps you diagnose problems quickly and build systems that fail rarely and recover gracefully when they do.

Model-Side Failures: Wrong Tool, Wrong Arguments

Wrong Tool Selected

The model picks a tool that does not match the user's intent. This happens when tool descriptions are too vague, when multiple tools overlap in purpose, or when the tool set is so large that the model cannot effectively distinguish between options. A user asking "find that customer" might trigger get_customer (lookup by ID) when it should trigger search_customers (search by name or criteria).

The fix is clearer tool descriptions that explicitly state when to use each tool and how similar tools differ. "Retrieves a customer by their exact customer ID. If the user provides a name, email, or other search criteria, use search_customers instead" disambiguates effectively. If selection errors persist after improving descriptions, consider merging overlapping tools into a single tool with a broader interface or using a routing layer to pre-filter the tool set.

Parameter Hallucination

The model generates a parameter value that was never mentioned in the conversation and does not exist. Instead of asking the user for a missing order ID, the model invents one: {"order_id": "ORD-12345"}. This is especially common with required parameters when the user's message does not contain the necessary information.

The fix has two parts. First, minimize required parameters so the model has fewer values it must provide. If a tool can work with partial information (searching by name when an ID is not available), make the ID optional. Second, add instructions in the tool description that tell the model what to do when information is missing: "If the user has not provided an order ID, ask them for it rather than guessing." Some providers also support a "strict" mode that reduces hallucination by constraining output to the schema.

Wrong Parameter Types or Formats

The model passes a string where an integer is expected, formats a date as "May 15th" instead of "2026-05-15", or passes a full name where only a last name is accepted. These errors occur when parameter descriptions do not specify the expected format clearly.

The fix is explicit format documentation in every parameter description. "Date in ISO 8601 format, e.g. 2026-05-15" is better than "date". Use enum for fields with fixed valid values. Use format hints in the JSON schema. Add example values in the description when the format is non-obvious. For numeric parameters, use minimum and maximum constraints.

Calling Tools Unnecessarily

The model calls a tool when it could have answered from its training data or from context already in the conversation. "What is the capital of France?" triggers a search tool when the model already knows the answer. "What was the order ID we just discussed?" triggers a lookup when the order ID is in the conversation history three messages up.

The fix is adding guidance in the system prompt about when not to use tools. "Only call tools when you need information that is not in your training data or the current conversation. For factual knowledge questions, answer directly." Also ensure that your context management retains recent tool results in the conversation history so the model can reference them rather than re-fetching.

Execution-Side Failures: Right Call, Bad Outcome

Timeouts and Service Unavailability

The external service the tool calls is slow or down. This is the most common runtime failure and the one users encounter most often. A database query times out, an API returns 503, a network connection fails. The user experience depends entirely on how the system handles these failures.

The fix is layered: automatic retries with exponential backoff for transient errors, fallback data sources for critical tools, circuit breakers to stop calling a consistently failing service, and clear error messages to the model so it can communicate the situation to the user. See the error handling guide for detailed implementation.

Authentication and Permission Errors

The tool call reaches the service but is rejected because the credentials are invalid, expired, or lack the necessary permissions. These errors should not be retried because they indicate a configuration problem, not a transient failure.

The fix is surfacing these errors clearly to the model with guidance: "This operation requires admin permissions that the current session does not have. Inform the user that they need to contact an administrator to perform this action." Monitor authentication errors in production because they often indicate expired API keys or misconfigured service accounts.

Rate Limiting

The tool's underlying API enforces rate limits, and the agent has exceeded them. This is common when the agent makes many rapid tool calls (parallel fan-out, retries) or when multiple users share the same API credentials.

The fix is respecting rate limits proactively: check the Retry-After header if present and wait the specified duration, implement per-tool rate limiting in your execution layer to stay below the API's limits, and use request queuing to smooth out bursts of tool calls. For agents that heavily use rate-limited APIs, consider caching recent results to reduce call volume.

Unexpected Response Formats

The tool executes successfully, but the response format does not match what the model or application expects. An API update changes the response structure, a new field is added, a field is renamed, or a null value appears where a value was expected. The tool call "succeeds" but the result confuses the model or crashes the parsing logic.

The fix is defensive result handling: validate tool results against an expected schema before passing them to the model, handle missing and null fields gracefully, and monitor for response format changes. When format changes are detected, log them for developer review and return a simplified version of the result that the model can still work with.

Diagnosing Failures in Production

Log every tool call with its arguments, the execution result (success or failure), the latency, and the context that triggered it. Categorize failures by type (selection, parameter, execution, timeout, auth) and track rates over time. A spike in selection errors often indicates a schema change that introduced ambiguity. A spike in timeout errors often indicates a downstream service degradation. A gradual increase in parameter errors often indicates that the agent's use cases are expanding beyond what the current schemas support.

Memory-powered agents can learn from past failures. When Adaptive Recall stores tool failure outcomes, the agent recalls them in future interactions and avoids repeating the same mistakes. If a tool consistently fails for a specific parameter pattern, the agent remembers this and either adjusts its approach or warns the user proactively.

Build tool-using agents that learn from their mistakes. Adaptive Recall stores failure patterns so your agent avoids repeating errors and improves tool use over time.

Try It Free