Execution Traces & Self-Enhancement
Traces are the observability layer for GRACE. Every chain execution produces a structured trace recording what ran, how well it performed, and what changed. Over time, traces enable Grace to detect regressions, identify high-performing units, and feed data to automated refinement loops.
Purpose
Section titled “Purpose”Traces serve four strategic functions:
- Auditability — Answer: “exactly which prompt ran, at what version, with what inputs/outputs?”
- Quality tracking — Measure each unit’s performance over time
- Regression detection — Flag when a unit’s quality degrades unexpectedly
- Self-improvement — Feed traces into DSPy MIPROv2 to automatically refine prompts
Trace Format
Section titled “Trace Format”Aligned with SPEC-05 and the IRProgram interface from @stratt/ir, ensuring structural compatibility between compiled chains and their execution records.
Chain-Level Trace
Section titled “Chain-Level Trace”Each execution produces a top-level trace record:
trace_id: "tr-01aryz6s41tsqt0yp91qsydysq"chain_uri: "stratt://dev/chain/sol-1-boot@0.1.0"version: "0.1.0"fingerprint: "blake3:a1b2c3d4e5f6..."session_id: "sess-2026-04-01T10:00:00Z"council: "pathfinder"started_at: "2026-04-01T10:00:00Z"completed_at: "2026-04-01T10:05:23Z"duration_ms: 323000status: "completed" # completed | failed | gated | timeoutquality_score: 0.82 # 0.0–1.0, heuristic-basedtoken_counts: input: 4200 output: 1850 total: 6050steps: [] # see Step-Level Trace belowStep-Level Trace
Section titled “Step-Level Trace”Each step in the chain produces its own trace entry:
- step_id: "step-01" unit_uri: "stratt://dev/task/intake-parse@0.1.0" unit_fingerprint: "blake3:a1b2c3d4e5f6..." agent: "WATNEY-01" gate: false started_at: "2026-04-01T10:00:00Z" completed_at: "2026-04-01T10:01:12Z" duration_ms: 72000 status: "completed" # completed | failed | retry_exhausted input_hash: "blake3:..." # hash of actual inputs (for privacy) output_hash: "blake3:..." # hash of actual outputs token_counts: input: 800 output: 350 quality_indicators: contract_conformance: true # output matches declared types completeness: 0.9 # all required fields present token_efficiency: 0.75 # output tokens / input tokensGate Resolution Trace
Section titled “Gate Resolution Trace”Gate steps (checkpoints requiring human approval) produce audit records:
- step_id: "step-06" unit_uri: "stratt://dev/gate/boot-review@0.1.0" agent: "LEWIS-06" gate: true gate_resolution: state: "APPROVED" # APPROVED | REJECTED | TIMEOUT | ESCALATED resolved_by: "LEWIS-06" resolved_at: "2026-04-01T10:04:50Z" wait_duration_ms: 180000 reason: null # populated on REJECTED (why human rejected) packet_hash: "blake3:..." # hash of the unit output being gatedQuality Scoring
Section titled “Quality Scoring”A pluggable quality scorer evaluates each execution. The default heuristic uses three factors:
| Factor | Weight | Measurement |
|---|---|---|
| Contract conformance | 40% | Output matches declared types and constraints |
| Completeness | 35% | All required outputs present and non-empty |
| Token efficiency | 25% | Ratio of useful output tokens to input tokens |
Formula:
quality_score = (conformance × 0.40) + (completeness × 0.35) + (efficiency × 0.25)Score Interpretation
Section titled “Score Interpretation”| Range | Meaning | Action |
|---|---|---|
| ≥ 0.80 | ✅ Good | No action; unit performing well |
| 0.60–0.79 | ⚠️ Review | Flag for manual inspection; check for false negatives |
| < 0.60 | ❌ Poor | Trigger refinement cycle; regenerate prompts |
| Delta > 0.05 between versions | ⛔ Regression | Auto-trigger Veritas gate; block publish until approved |
Trace Storage
Section titled “Trace Storage”Location: automation/traces/YYYY-MM-DD-{chain-slug}.yaml
Gitignore policy: Traces contain prompt/response content and are NOT committed to version control
Retention policy:
- Local storage: 90 days (rolling window)
- Archive to Cloudflare R2: Long-term retention (planned)
Sampling strategy:
- Default: Trace all executions
- At 100+ runs/day: Sample at 0.2 rate (20%), but always trace:
- Failures (
status: failed) - Gate resolutions (
gate_resolution.state != APPROVED) - Quality scores < 0.60
- Failures (
Regression Detection
Section titled “Regression Detection”When a unit is updated, compare its quality score against the previous version:
delta = new_version_score - old_version_score
if delta < -0.05: → Regression detected → Auto-trigger Veritas gate → Block publication until human approves or reverts changeExample:
- Unit
stratt://dev/task/parse@1.0.0averaged 0.85 over 10 runs - Unit
stratt://dev/task/parse@1.1.0averages 0.79 over 5 runs - Delta = 0.79 − 0.85 = −0.06 (exceeds −0.05 threshold)
- Action: Veritas gate blocks publication; human must decide to revert or approve
Evaluation Cycle
Section titled “Evaluation Cycle”Every Monday morning (as part of HEARTBEAT memory maintenance):
- Collect — Read all traces from the past week
- Score — Compute average quality per unit and per chain
- Compare — Check for regressions against previous week
- Identify — Flag underperforming units (score < 0.60 or delta > 0.05)
- Refine — For flagged units, determine action:
- Adjust
prompt_bodycontent - Change agent assignment in chain
- Modify step ordering or gate placement
- Tune gate approval thresholds
- Adjust
- Log — Write evaluation summary to
memory/YYYY-MM-DD.md - Validate — If a unit was modified, run
stratt validate+stratt fingerprint --verify
This cycle ensures Grace continuously learns from execution data and proactively identifies degradation.
DSPy Export Format
Section titled “DSPy Export Format”For future integration with DSPy MIPROv2, traces export to JSONL format with flexible filtering.
Export Command
Section titled “Export Command”stratt export-dspy \ --min-score 0.7 \ --date-range 2026-03-01..2026-04-01 \ --version-range 0.1.0..0.2.0 \ --chain stratt://dev/chain/sol-1-boot@* \ --exclude-gates \ > training-data.jsonlJSONL Format
Section titled “JSONL Format”{ "trace_id": "tr-01aryz6s41tsqt0yp91qsydysq", "chain_uri": "stratt://dev/chain/sol-1-boot@0.1.0", "chain_version": "0.1.0", "quality_score": 0.82, "steps": [ { "step_id": "step-01", "unit_uri": "stratt://dev/task/intake-parse@0.1.0", "agent": "WATNEY-01", "input": "raw user request...", "output": "parsed task object...", "quality_indicators": { "contract_conformance": true, "completeness": 0.9, "token_efficiency": 0.75 } } ]}Filtering Options
Section titled “Filtering Options”| Option | Purpose | Example |
|---|---|---|
--min-score | Only export high-quality traces | --min-score 0.7 |
--date-range | Date window for collection | --date-range 2026-03-01..2026-04-01 |
--version-range | Version window | --version-range 0.1.0..0.2.0 |
--chain | Specific chain only | --chain stratt://dev/chain/boot@* |
--exclude-gates | Omit gate resolution data | (useful for privacy) |
Self-Improvement Cycle
Section titled “Self-Improvement Cycle”The full self-improvement loop (currently design-phase, BUNDLE-C target):
1. COLLECT Gather traces from past week ↓2. ANALYZE Compute quality scores per unit Detect regressions (delta > 0.05) Identify underperformers (score < 0.60) ↓3. EXPORT Run `stratt export-dspy` with filters Produce JSONL training dataset ↓4. OPTIMIZE Feed JSONL into DSPy MIPROv2 MIPROv2 auto-tunes prompt parameters Produces candidate prompt revisions ↓5. VALIDATE CI checks new prompts (contract conformance, fingerprint) Manual review (domain expertise check) ↓6. PUBLISH Update unit with new prompt_body Increment minor version (0.1.0 → 0.2.0) Re-fingerprint + publish ↓7. MONITOR Run new version in production Collect traces for next cycle Compare quality against previous version ↓8. REPEAT Weekly evaluation, continuous improvementIRProgram Compatibility
Section titled “IRProgram Compatibility”The trace format is structurally compatible with @stratt/ir interfaces:
| IRProgram Field | Trace Equivalent | Purpose |
|---|---|---|
chainUri | chain_uri | Unique chain identifier |
version | version | Version of chain executed |
meta.council | council | Assigned council for this chain |
steps[].id | steps[].step_id | Step sequence number |
steps[].unitUri | steps[].unit_uri | Which unit ran |
steps[].agent | steps[].agent | Which agent was assigned |
steps[].gate | steps[].gate | Is this a gate checkpoint? |
steps[].inputs | steps[].input_hash | Hashed inputs (for privacy) |
steps[].outputs | steps[].output_hash | Hashed outputs (for privacy) |
edges | (implicit) | Step ordering and gate dependencies |
failureModes | steps[].status | Transition states (completed, failed, gated) |
This means a compiled IRProgram can be directly used as the template for generating trace records during execution—no schema translation needed.
Example Trace
Section titled “Example Trace”Scenario: stratt://dev/chain/sol-1-boot@0.1.0 executes with 3 steps + 1 gate
trace_id: tr-2026-04-01-001chain_uri: stratt://dev/chain/sol-1-boot@0.1.0version: 0.1.0fingerprint: blake3:9a8b7c6d5e4f3a2b1c0d9e8f7a6b5c4d3e2f1a0b9c8d7e6f5a4b3c2d1e0fsession_id: sess-2026-04-01T10:00:00Zcouncil: pathfinderstarted_at: 2026-04-01T10:00:00Zcompleted_at: 2026-04-01T10:05:23Zduration_ms: 323000status: completedquality_score: 0.82token_counts: input: 4200 output: 1850 total: 6050steps: - step_id: step-01 unit_uri: stratt://dev/task/intake-parse@0.1.0 unit_fingerprint: blake3:a1... agent: WATNEY-01 gate: false started_at: 2026-04-01T10:00:00Z completed_at: 2026-04-01T10:01:12Z duration_ms: 72000 status: completed input_hash: blake3:8a... output_hash: blake3:2d... token_counts: input: 800 output: 350 quality_indicators: contract_conformance: true completeness: 0.9 token_efficiency: 0.75
- step_id: step-02 unit_uri: stratt://dev/task/process-analysis@0.1.0 unit_fingerprint: blake3:c3... agent: LEWIS-06 gate: false started_at: 2026-04-01T10:01:13Z completed_at: 2026-04-01T10:04:45Z duration_ms: 212000 status: completed input_hash: blake3:5b... output_hash: blake3:7f... token_counts: input: 3200 output: 1200 quality_indicators: contract_conformance: true completeness: 0.85 token_efficiency: 0.68
- step_id: step-03 unit_uri: stratt://dev/gate/boot-review@0.1.0 agent: LEWIS-06 gate: true started_at: 2026-04-01T10:04:46Z completed_at: 2026-04-01T10:04:50Z duration_ms: 4000 status: completed gate_resolution: state: APPROVED resolved_by: LEWIS-06 resolved_at: 2026-04-01T10:04:50Z wait_duration_ms: 4000 reason: null packet_hash: blake3:7f...