CI / CD

Automate evaluations in your CI pipeline to catch regressions before they reach production.

Generate a GitHub Actions workflow

eqho-eval ci
eqho-eval ci --threshold 80          # require 80%+ pass rate
eqho-eval ci -o .github/workflows    # output to workflows directory

This generates a workflow file that:

Installs eqho-eval and promptfoo
Authenticates using a repository secret
Runs evaluations
Fails the build if the pass rate drops below the threshold

Manual workflow setup

# .github/workflows/eval.yml
name: Agent Evaluation

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '22'

      - name: Install tools
        run: |
          npm i -g eqho-eval promptfoo

      - name: Authenticate
        run: eqho-eval auth --key ${{ secrets.EQHO_API_KEY }}

      - name: Run evaluation
        run: |
          cd my-eval
          eqho-eval eval --no-cache --json > results.json

      - name: Check pass rate
        run: |
          PASS_RATE=$(cat results.json | jq '.results.stats.successes / (.results.stats.successes + .results.stats.failures) * 100')
          echo "Pass rate: ${PASS_RATE}%"
          if (( $(echo "$PASS_RATE < 80" | bc -l) )); then
            echo "::error::Pass rate ${PASS_RATE}% is below 80% threshold"
            exit 1
          fi

Repository secrets

Add your Eqho API key as a GitHub repository secret:

Go to Settings → Secrets and variables → Actions
Click New repository secret
Name: EQHO_API_KEY
Value: your Eqho API key

Non-interactive mode

For CI environments, use --yes and environment variables to avoid interactive prompts:

export EQHO_API_KEY=your-key
eqho-eval start --yes --campaign <id>

Or if the project is already scaffolded:

eqho-eval auth --key $EQHO_API_KEY
eqho-eval eval --no-cache

Comparing results across runs

Use eqho-eval diff to compare evaluation results between branches or runs:

eqho-eval diff baseline-results.json pr-results.json --threshold 5

This highlights any test cases that changed from pass to fail (or vice versa) and flags regressions above the threshold.

Caching considerations

Use --no-cache in CI to ensure fresh results every run
In local development, caching speeds up iteration significantly
Run eqho-eval cache clear to reset the cache if you suspect stale results