Tool call validation

Eqho agents use tools (actions) to perform real-world tasks — scheduling appointments, extracting data, transferring calls. Validating that agents call the right tools with correct arguments is critical.

Tool definitions

When you scaffold a project with eqho-eval init, tool definitions are generated automatically from your Eqho campaign's actions and saved in tools/<agent-slug>.json.

Preview what the LLM receives:

eqho-eval render

Basic tool call validation

Check that the output is a valid tool call

assert:
  - type: is-valid-openai-tools-call

Check that specific tools are called

assert:
  - type: tool-call-f1
    value: [create_appointment]

The tool-call-f1 assertion uses F1 scoring — it checks both precision (no extra tools called) and recall (expected tools are called).

Validate tool arguments

Use JavaScript assertions for fine-grained argument checking:

assert:
  - type: javascript
    value: |
      const calls = JSON.parse(output);
      const appt = calls.find(c => c.function?.name === 'create_appointment');
      if (!appt) return { pass: false, reason: 'No appointment tool call' };
      const args = JSON.parse(appt.function.arguments);
      if (!args.start) return { pass: false, reason: 'Missing start time' };
      return { pass: true };

Common tool validation patterns

Appointment scheduling

tests:
  - vars:
      message: "Can I schedule a demo for next Tuesday at 2pm?"
    assert:
      - type: is-valid-openai-tools-call
      - type: tool-call-f1
        value: [get_free_slots, create_appointment]
      - type: javascript
        value: |
          const calls = JSON.parse(output);
          return calls.some(c => c.function?.name === 'get_free_slots');

Data extraction (postcall)

tests:
  - vars:
      transcript: "The caller said their email is alice@example.com and they need 5 units."
    assert:
      - type: javascript
        value: |
          const calls = JSON.parse(output);
          const extract = calls.find(c => c.function?.name === 'extract_data');
          const args = JSON.parse(extract.function.arguments);
          return args.email === 'alice@example.com' && args.quantity === 5;

Call transfer

tests:
  - vars:
      message: "I need to speak to a manager right now"
    assert:
      - type: tool-call-f1
        value: [transfer_call]

No tool calls expected

tests:
  - vars:
      message: "What are your business hours?"
    assert:
      - type: javascript
        value: |
          try { JSON.parse(output); return false; }
          catch { return true; }

Generating tool call evals automatically

Use eqho-eval action-eval to generate eval configs from real call data:

eqho-eval action-eval --campaign <id> --calls 25

This pulls recent calls, extracts the tools that were actually used, and builds test cases that validate the same behavior.

Multi-provider tool call testing

Not all models handle tool calls equally. The proxy translates tool call formats for non-OpenAI providers via the Vercel AI Gateway, but testing across providers helps identify gaps:

providers:
  - id: openai:chat:gpt-4.1
    label: GPT-4.1
    config:
      tools: file://tools/agent.json
  - id: openai:chat:anthropic/claude-sonnet-4-20250514
    label: Claude Sonnet
  - id: openai:chat:google/gemini-2.5-pro
    label: Gemini 2.5

Run eqho-eval providers list --tools-only to see which models support tool calling.