Execution Records

Execution Records

Every run produces a single canonical artifact: an ExecutionRecord. It's a structured JSON file containing the complete execution timeline, inputs, outputs, and metadata.

The Artifact

When you run an agent, Paprika creates a JSON file:

plaintext
~/.paprika/traces/
├── abc123def456.json
├── xyz789abc123.json
└── ...

Each file is a complete ExecutionRecord. Open one:

bash
#a3d95f]">"text-[#9ecbff]">cat ~/.paprika/traces/abc123def456.json | jq .

Example (abbreviated):

json
{
  "text-[#9ecbff]">"schema_version": "1.0",
  "text-[#9ecbff]">"record_id": "abc123def456",
  "text-[#9ecbff]">"agent": {
    "text-[#9ecbff]">"name": "researcher",
    "text-[#9ecbff]">"version": null
  },
  "text-[#9ecbff]">"execution": {
    "text-[#9ecbff]">"started_at": "2024-01-15T14:32:10.123456Z",
    "text-[#9ecbff]">"ended_at": "2024-01-15T14:32:10.248456Z",
    "text-[#9ecbff]">"duration_ms": 125.0,
    "text-[#9ecbff]">"status": "success",
    "text-[#9ecbff]">"termination_reason": null
  },
  "text-[#9ecbff]">"policy": {
    "text-[#9ecbff]">"config": {
      "text-[#9ecbff]">"max_steps": 10,
      "text-[#9ecbff]">"max_tokens": 10000,
      "text-[#9ecbff]">"max_repeat_hashes": 3
    },
    "text-[#9ecbff]">"violation": null
  },
  "text-[#9ecbff]">"totals": {
    "text-[#9ecbff]">"step_count": 3,
    "text-[#9ecbff]">"llm_calls": 2,
    "text-[#9ecbff]">"tool_calls": 1,
    "text-[#9ecbff]">"total_tokens": 142,
    "text-[#9ecbff]">"prompt_tokens": 95,
    "text-[#9ecbff]">"completion_tokens": 47
  },
  "text-[#9ecbff]">"input": {},
  "text-[#9ecbff]">"output": {
    "text-[#9ecbff]">"question": "What is AI?",
    "text-[#9ecbff]">"search_result": "...",
    "text-[#9ecbff]">"summary": "..."
  },
  "text-[#9ecbff]">"error": null,
  "text-[#9ecbff]">"steps": [
    { "text-[#9ecbff]">"step_type": "llm_call", ... },
    { "text-[#9ecbff]">"step_type": "tool_call", ... },
    { "text-[#9ecbff]">"step_type": "llm_call", ... }
  ]
}

Top-Level Fields

| Field | Type | Purpose | |-------|------|---------| | schema_version | string | Always "1.0" | | record_id | string | Unique run identifier (UUID format) | | parent_record_id | string \| null | For derived runs (future use) | | replay_of | string \| null | If this is a replay, the original run's record_id | | agent | object | Agent metadata: name, version | | execution | object | Execution timeline and status | | policy | object | Policy config snapshot and violation (if any) | | totals | object | Aggregate counts: steps, tokens, calls | | input | any | Original input to the agent | | output | any | Final output from the agent | | error | string \| null | Error message if execution failed | | environment | object \| null | Environment metadata (reserved) | | steps[] | array | Typed steps (LLM calls, tool calls, policy violations) | | extensions | object | Reserved for future extensions |

Execution Status

execution.status is one of:

  • `"success"` — Agent completed normally
  • `"error"` — Agent raised an exception
  • `"policy_violation"` — A runtime policy was violated (agent halted mid-execution)

Steps

The steps[] array contains typed execution steps. Each step has:

json
{
  "text-[#9ecbff]">"step_type": "llm_call" | "tool_call" | "policy_violation",
  "text-[#9ecbff]">"step_index": 0,
  "text-[#9ecbff]">"timestamp": "2024-01-15T14:32:10.130000Z",
  "text-[#9ecbff]">"event_id": "uuid"
}

LLM Call Step

json
{
  "text-[#9ecbff]">"step_type": "llm_call",
  "text-[#9ecbff]">"step_index": 0,
  "text-[#9ecbff]">"timestamp": "2024-01-15T14:32:10.130000Z",
  "text-[#9ecbff]">"event_id": "abc123",
  "text-[#9ecbff]">"provider": "openai",
  "text-[#9ecbff]">"model": "gpt-4o",
  "text-[#9ecbff]">"input_data": {
    "text-[#9ecbff]">"messages": [
      { "text-[#9ecbff]">"role": "user", "text-[#9ecbff]">"content": "Your prompt" }
    ]
  },
  "text-[#9ecbff]">"input_hash": "a1b2c3d4e5f6g7h8",
  "text-[#9ecbff]">"output_data": {
    "text-[#9ecbff]">"choices": [
      {
        "text-[#9ecbff]">"message": {
          "text-[#9ecbff]">"role": "assistant",
          "text-[#9ecbff]">"content": "Response text"
        }
      }
    ]
  },
  "text-[#9ecbff]">"token_usage": {
    "text-[#9ecbff]">"prompt_tokens": 12,
    "text-[#9ecbff]">"completion_tokens": 8,
    "text-[#9ecbff]">"total_tokens": 20
  },
  "text-[#9ecbff]">"duration_ms": 150.0,
  "text-[#9ecbff]">"side_effect": "pure",
  "text-[#9ecbff]">"error": null
}

Fields:

  • provider — LLM provider: "openai", "mock", custom
  • model — Model identifier
  • input_data — Full input dict (exact same as passed to ctx.llm.call())
  • input_hash — Deterministic hash of input (for mismatch detection)
  • output_data — Full output dict from the LLM
  • token_usage — Token counts if available
  • duration_ms — Wall-clock duration
  • side_effect"pure" (LLM calls have no side effects)
  • error — Error message if the call failed

Tool Call Step

json
{
  "text-[#9ecbff]">"step_type": "tool_call",
  "text-[#9ecbff]">"step_index": 1,
  "text-[#9ecbff]">"timestamp": "2024-01-15T14:32:10.180000Z",
  "text-[#9ecbff]">"event_id": "def456",
  "text-[#9ecbff]">"tool_name": "search",
  "text-[#9ecbff]">"args": {
    "text-[#9ecbff]">"query": "AI trends"
  },
  "text-[#9ecbff]">"input_hash": "i9j0k1l2m3n4o5p6",
  "text-[#9ecbff]">"output_data": "Search results...",
  "text-[#9ecbff]">"duration_ms": 45.0,
  "text-[#9ecbff]">"side_effect": null,
  "text-[#9ecbff]">"error": null
}

Fields:

  • tool_name — Name of the registered tool
  • args — Arguments dict (exact same as passed to ctx.tools.call())
  • input_hash — Deterministic hash of args (for repeat detection)
  • output_data — Return value from the tool
  • duration_ms — Wall-clock duration
  • side_effect — Null (can be "read_only", "write", "irreversible" in future)
  • error — Error message if the tool raised an exception

Policy Violation Step

json
{
  "text-[#9ecbff]">"step_type": "policy_violation",
  "text-[#9ecbff]">"step_index": 5,
  "text-[#9ecbff]">"timestamp": "2024-01-15T14:32:10.240000Z",
  "text-[#9ecbff]">"event_id": "ghi789",
  "text-[#9ecbff]">"policy_name": "max_steps",
  "text-[#9ecbff]">"message": "Maximum step count (10) exceeded",
  "text-[#9ecbff]">"details": {
    "text-[#9ecbff]">"limit": 10,
    "text-[#9ecbff]">"current": 11
  }
}

Fields:

  • policy_name — Name of violated policy: "max_steps", "max_tokens", "max_repeat_hashes"
  • message — Human-readable violation description
  • details — Policy-specific details (limit, current value, etc.)

When a policy violation occurs:

  • Execution halts immediately (remaining steps not executed)
  • execution.status = "policy_violation"
  • policy.violation contains the violation details
  • The PolicyViolationStep is added to steps[]

Input Hash

The input_hash field is critical for mismatch detection and repeat detection.

Algorithm:

  1. Take input dict (e.g., {"messages": [...]})
  2. Recursively sort all keys alphabetically
  3. Serialize to compact JSON (no whitespace, sorted keys)
  4. Compute SHA256 hash
  5. Take first 16 hex characters

Example:

python
input_dict = {#a3d95f]">"messages": [{"role": "user", "content": "hi"}]}
# Sorted and serialized: '{"messages":[{"content":"hi","role":"user"}]}'
# SHA256: 'a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6...'
# First 16 chars: 'a1b2c3d4e5f6g7h8'

The hash is deterministic:

  • Same input → same hash
  • Different input → different hash (with overwhelming probability)

Used for:

  • Replay mismatch detection — if replayed input hash ≠ original, ReplayMismatchError
  • Repeat detection — if same input hash appears too many times, max_repeat_hashes policy fires

Storage

Default Location

plaintext
~/.paprika/traces/

The directory is created automatically on first run.

Override Directory

Use environment variable:

bash
PAPRIKA_TRACE_DIR=/tmp/paprika python agent.py

Or CLI flag:

bash
paprika runs list --trace-dir /tmp/paprika

File Naming

Files are named by record_id:

plaintext
abc123def456.json

Run IDs must match the pattern: ^[A-Za-z0-9][A-Za-z0-9._-]*$

(Alphanumeric, dots, dashes, underscores; no path traversal risk)

Security

Path Traversal Prevention:

Run IDs are validated. You cannot escape the trace directory via a run ID like ../../../etc/passwd. Invalid run IDs raise InvalidRunIdError.

No Secrets Storage:

ExecutionRecord stores full inputs and outputs. If your LLM calls or tool calls include sensitive data (API keys, passwords, PII), they will be stored in the trace file. Do not log sensitive data to Paprika traces if you can avoid it. Use the input and output fields only for non-sensitive structured data.

Versions

schema_version is always "1.0".

If Paprika updates the schema in a breaking way, the version number will increment (e.g., "2.0"). The code will migrate old traces automatically on load.

Accessing Records Programmatically

python
#a3d95f]">"text-[#9ecbff]">from paprika "text-[#9ecbff]">import PaprikaRuntime

runtime = PaprikaRuntime()

# Load a record
record = runtime.trace_store.load_record(run_id=#a3d95f]">"abc123def456")

# Access fields
print(record.record_id)
print(record.agent.name)
print(record.execution.status)
print(record.totals.step_count)

# Iterate steps
#a3d95f]">"text-[#9ecbff]">for step in record.steps:
    #a3d95f]">"text-[#9ecbff]">if step.step_type == "llm_call":
        print(f#a3d95f]">"LLM: {step.model} in {step.duration_ms}ms")
    #a3d95f]">"text-[#9ecbff]">elif step.step_type == "tool_call":
        print(f#a3d95f]">"Tool: {step.tool_name}")
    #a3d95f]">"text-[#9ecbff]">elif step.step_type == "policy_violation":
        print(f#a3d95f]">"Violation: {step.policy_name}")

# Serialize to JSON
json_string = record.model_dump_json_pretty()

Next Steps