Argument Correctness
Validates whether AI agents provide correct and appropriate arguments to tool calls
Overview
Argument Correctness evaluates whether your AI agent provides correct and appropriate parameters to tool calls. This method assumes the tools were already selected and focuses specifically on validating the quality, format, and suitability of the arguments passed to each tool.
Ideal for: Validating API parameter correctness, ensuring arguments align with user intent, debugging parameter extraction issues, and quality assurance for production agents.
What Gets Evaluated
This evaluation analyzes the input parameters for each tool call, not the tool selection itself:
- ✅ Evaluates: "You called get_weather(location='New York') - is 'New York' the right parameter?"
- ✅ Evaluates: "Are the parameters properly formatted and sufficient?"
- ✅ Evaluates: "Do the arguments align with user intent?"
- ❌ Does NOT evaluate: "Should you have called get_weather vs get_forecast?" - only the parameter quality
Key Features
- Parameter-Focused Analysis: Validates argument quality without judging tool selection
- Binary Assessment: Each tool receives a correct/incorrect assessment for its arguments
- Ratio-Based Scoring: Overall score based on percentage of tools with correct arguments
- Detailed Reasoning: Provides explanations for why arguments passed or failed
How It Works
The evaluation uses an LLM-as-a-judge approach:
- Extract Context: Retrieves user input, agent output, and tool calls from logs
- Analyze Each Tool: The LLM judge examines the arguments for each tool call
- Score Arguments: Each tool receives a binary assessment - arguments are either correct or incorrect
- Calculate Overall Score:
score = (# of tools with correct arguments) / (total tools called)
💡 Pro Tip: Use Argument Correctness when you need to ensure your agent is passing the right parameters to tools, not when you're concerned about whether it's calling the right tools in the first place.