Execution Traces & Self-Enhancement

Traces are the observability layer for GRACE. Every chain execution produces a structured trace recording what ran, how well it performed, and what changed. Over time, traces enable Grace to detect regressions, identify high-performing units, and feed data to automated refinement loops.

Purpose

Traces serve four strategic functions:

Auditability — Answer: “exactly which prompt ran, at what version, with what inputs/outputs?”
Quality tracking — Measure each unit’s performance over time
Regression detection — Flag when a unit’s quality degrades unexpectedly
Self-improvement — Feed traces into DSPy MIPROv2 to automatically refine prompts

Trace Format

Aligned with SPEC-05 and the IRProgram interface from @stratt/ir, ensuring structural compatibility between compiled chains and their execution records.

Chain-Level Trace

Each execution produces a top-level trace record:

trace_id: "tr-01aryz6s41tsqt0yp91qsydysq"
chain_uri: "stratt://dev/chain/sol-1-boot@0.1.0"
version: "0.1.0"
fingerprint: "blake3:a1b2c3d4e5f6..."
session_id: "sess-2026-04-01T10:00:00Z"
council: "pathfinder"
started_at: "2026-04-01T10:00:00Z"
completed_at: "2026-04-01T10:05:23Z"
duration_ms: 323000
status: "completed"  # completed | failed | gated | timeout
quality_score: 0.82  # 0.0–1.0, heuristic-based
token_counts:
  input: 4200
  output: 1850
  total: 6050
steps: []  # see Step-Level Trace below

Step-Level Trace

Each step in the chain produces its own trace entry:

- step_id: "step-01"
  unit_uri: "stratt://dev/task/intake-parse@0.1.0"
  unit_fingerprint: "blake3:a1b2c3d4e5f6..."
  agent: "WATNEY-01"
  gate: false
  started_at: "2026-04-01T10:00:00Z"
  completed_at: "2026-04-01T10:01:12Z"
  duration_ms: 72000
  status: "completed"  # completed | failed | retry_exhausted
  input_hash: "blake3:..."    # hash of actual inputs (for privacy)
  output_hash: "blake3:..."   # hash of actual outputs
  token_counts:
    input: 800
    output: 350
  quality_indicators:
    contract_conformance: true   # output matches declared types
    completeness: 0.9            # all required fields present
    token_efficiency: 0.75       # output tokens / input tokens

Gate Resolution Trace

Gate steps (checkpoints requiring human approval) produce audit records:

- step_id: "step-06"
  unit_uri: "stratt://dev/gate/boot-review@0.1.0"
  agent: "LEWIS-06"
  gate: true
  gate_resolution:
    state: "APPROVED"  # APPROVED | REJECTED | TIMEOUT | ESCALATED
    resolved_by: "LEWIS-06"
    resolved_at: "2026-04-01T10:04:50Z"
    wait_duration_ms: 180000
    reason: null  # populated on REJECTED (why human rejected)
    packet_hash: "blake3:..."  # hash of the unit output being gated

Quality Scoring

A pluggable quality scorer evaluates each execution. The default heuristic uses three factors:

Factor	Weight	Measurement
Contract conformance	40%	Output matches declared types and constraints
Completeness	35%	All required outputs present and non-empty
Token efficiency	25%	Ratio of useful output tokens to input tokens

Formula:

quality_score = (conformance × 0.40) + (completeness × 0.35) + (efficiency × 0.25)

Score Interpretation

Range	Meaning	Action
≥ 0.80	✅ Good	No action; unit performing well
0.60–0.79	⚠️ Review	Flag for manual inspection; check for false negatives
< 0.60	❌ Poor	Trigger refinement cycle; regenerate prompts
Delta > 0.05 between versions	⛔ Regression	Auto-trigger Veritas gate; block publish until approved

Trace Storage

Location: automation/traces/YYYY-MM-DD-{chain-slug}.yaml

Gitignore policy: Traces contain prompt/response content and are NOT committed to version control

Retention policy:

Local storage: 90 days (rolling window)
Archive to Cloudflare R2: Long-term retention (planned)

Sampling strategy:

Default: Trace all executions
At 100+ runs/day: Sample at 0.2 rate (20%), but always trace:
- Failures (status: failed)
- Gate resolutions (gate_resolution.state != APPROVED)
- Quality scores < 0.60

Regression Detection

When a unit is updated, compare its quality score against the previous version:

delta = new_version_score - old_version_score

if delta < -0.05:
  → Regression detected
  → Auto-trigger Veritas gate
  → Block publication until human approves or reverts change

Example:

Unit stratt://dev/task/parse@1.0.0 averaged 0.85 over 10 runs
Unit stratt://dev/task/parse@1.1.0 averages 0.79 over 5 runs
Delta = 0.79 − 0.85 = −0.06 (exceeds −0.05 threshold)
Action: Veritas gate blocks publication; human must decide to revert or approve

Evaluation Cycle

Every Monday morning (as part of HEARTBEAT memory maintenance):

Collect — Read all traces from the past week
Score — Compute average quality per unit and per chain
Compare — Check for regressions against previous week
Identify — Flag underperforming units (score < 0.60 or delta > 0.05)
Refine — For flagged units, determine action:
- Adjust prompt_body content
- Change agent assignment in chain
- Modify step ordering or gate placement
- Tune gate approval thresholds
Log — Write evaluation summary to memory/YYYY-MM-DD.md
Validate — If a unit was modified, run stratt validate + stratt fingerprint --verify

This cycle ensures Grace continuously learns from execution data and proactively identifies degradation.

DSPy Export Format

For future integration with DSPy MIPROv2, traces export to JSONL format with flexible filtering.

Export Command

stratt export-dspy \
  --min-score 0.7 \
  --date-range 2026-03-01..2026-04-01 \
  --version-range 0.1.0..0.2.0 \
  --chain stratt://dev/chain/sol-1-boot@* \
  --exclude-gates \
  > training-data.jsonl

JSONL Format

{
  "trace_id": "tr-01aryz6s41tsqt0yp91qsydysq",
  "chain_uri": "stratt://dev/chain/sol-1-boot@0.1.0",
  "chain_version": "0.1.0",
  "quality_score": 0.82,
  "steps": [
    {
      "step_id": "step-01",
      "unit_uri": "stratt://dev/task/intake-parse@0.1.0",
      "agent": "WATNEY-01",
      "input": "raw user request...",
      "output": "parsed task object...",
      "quality_indicators": {
        "contract_conformance": true,
        "completeness": 0.9,
        "token_efficiency": 0.75
      }
    }
  ]
}

Filtering Options

Option	Purpose	Example
`--min-score`	Only export high-quality traces	`--min-score 0.7`
`--date-range`	Date window for collection	`--date-range 2026-03-01..2026-04-01`
`--version-range`	Version window	`--version-range 0.1.0..0.2.0`
`--chain`	Specific chain only	`--chain stratt://dev/chain/boot@*`
`--exclude-gates`	Omit gate resolution data	(useful for privacy)

Self-Improvement Cycle

The full self-improvement loop (currently design-phase, BUNDLE-C target):

1. COLLECT
   Gather traces from past week
   ↓
2. ANALYZE
   Compute quality scores per unit
   Detect regressions (delta > 0.05)
   Identify underperformers (score < 0.60)
   ↓
3. EXPORT
   Run `stratt export-dspy` with filters
   Produce JSONL training dataset
   ↓
4. OPTIMIZE
   Feed JSONL into DSPy MIPROv2
   MIPROv2 auto-tunes prompt parameters
   Produces candidate prompt revisions
   ↓
5. VALIDATE
   CI checks new prompts (contract conformance, fingerprint)
   Manual review (domain expertise check)
   ↓
6. PUBLISH
   Update unit with new prompt_body
   Increment minor version (0.1.0 → 0.2.0)
   Re-fingerprint + publish
   ↓
7. MONITOR
   Run new version in production
   Collect traces for next cycle
   Compare quality against previous version
   ↓
8. REPEAT
   Weekly evaluation, continuous improvement

IRProgram Compatibility

The trace format is structurally compatible with @stratt/ir interfaces:

IRProgram Field	Trace Equivalent	Purpose
`chainUri`	`chain_uri`	Unique chain identifier
`version`	`version`	Version of chain executed
`meta.council`	`council`	Assigned council for this chain
`steps[].id`	`steps[].step_id`	Step sequence number
`steps[].unitUri`	`steps[].unit_uri`	Which unit ran
`steps[].agent`	`steps[].agent`	Which agent was assigned
`steps[].gate`	`steps[].gate`	Is this a gate checkpoint?
`steps[].inputs`	`steps[].input_hash`	Hashed inputs (for privacy)
`steps[].outputs`	`steps[].output_hash`	Hashed outputs (for privacy)
`edges`	(implicit)	Step ordering and gate dependencies
`failureModes`	`steps[].status`	Transition states (completed, failed, gated)

This means a compiled IRProgram can be directly used as the template for generating trace records during execution—no schema translation needed.

Example Trace

Scenario: stratt://dev/chain/sol-1-boot@0.1.0 executes with 3 steps + 1 gate

trace_id: tr-2026-04-01-001
chain_uri: stratt://dev/chain/sol-1-boot@0.1.0
version: 0.1.0
fingerprint: blake3:9a8b7c6d5e4f3a2b1c0d9e8f7a6b5c4d3e2f1a0b9c8d7e6f5a4b3c2d1e0f
session_id: sess-2026-04-01T10:00:00Z
council: pathfinder
started_at: 2026-04-01T10:00:00Z
completed_at: 2026-04-01T10:05:23Z
duration_ms: 323000
status: completed
quality_score: 0.82
token_counts:
  input: 4200
  output: 1850
  total: 6050
steps:
  - step_id: step-01
    unit_uri: stratt://dev/task/intake-parse@0.1.0
    unit_fingerprint: blake3:a1...
    agent: WATNEY-01
    gate: false
    started_at: 2026-04-01T10:00:00Z
    completed_at: 2026-04-01T10:01:12Z
    duration_ms: 72000
    status: completed
    input_hash: blake3:8a...
    output_hash: blake3:2d...
    token_counts:
      input: 800
      output: 350
    quality_indicators:
      contract_conformance: true
      completeness: 0.9
      token_efficiency: 0.75

  - step_id: step-02
    unit_uri: stratt://dev/task/process-analysis@0.1.0
    unit_fingerprint: blake3:c3...
    agent: LEWIS-06
    gate: false
    started_at: 2026-04-01T10:01:13Z
    completed_at: 2026-04-01T10:04:45Z
    duration_ms: 212000
    status: completed
    input_hash: blake3:5b...
    output_hash: blake3:7f...
    token_counts:
      input: 3200
      output: 1200
    quality_indicators:
      contract_conformance: true
      completeness: 0.85
      token_efficiency: 0.68

  - step_id: step-03
    unit_uri: stratt://dev/gate/boot-review@0.1.0
    agent: LEWIS-06
    gate: true
    started_at: 2026-04-01T10:04:46Z
    completed_at: 2026-04-01T10:04:50Z
    duration_ms: 4000
    status: completed
    gate_resolution:
      state: APPROVED
      resolved_by: LEWIS-06
      resolved_at: 2026-04-01T10:04:50Z
      wait_duration_ms: 4000
      reason: null
      packet_hash: blake3:7f...