Execution Records

Every run produces a single canonical artifact: an ExecutionRecord. It's a structured JSON file containing the complete execution timeline, inputs, outputs, and metadata.

The Artifact

When you run an agent, Paprika creates a JSON file:

plaintext

~/.paprika/traces/
├── abc123def456.json
├── xyz789abc123.json
└── ...

Each file is a complete ExecutionRecord. Open one:

bash

#a3d95f]">"text-[#9ecbff]">cat ~/.paprika/traces/abc123def456.json | jq .

Example (abbreviated):

json

{
  "text-[#9ecbff]">"schema_version": "1.0",
  "text-[#9ecbff]">"record_id": "abc123def456",
  "text-[#9ecbff]">"agent": {
    "text-[#9ecbff]">"name": "researcher",
    "text-[#9ecbff]">"version": null
  },
  "text-[#9ecbff]">"execution": {
    "text-[#9ecbff]">"started_at": "2024-01-15T14:32:10.123456Z",
    "text-[#9ecbff]">"ended_at": "2024-01-15T14:32:10.248456Z",
    "text-[#9ecbff]">"duration_ms": 125.0,
    "text-[#9ecbff]">"status": "success",
    "text-[#9ecbff]">"termination_reason": null
  },
  "text-[#9ecbff]">"policy": {
    "text-[#9ecbff]">"config": {
      "text-[#9ecbff]">"max_steps": 10,
      "text-[#9ecbff]">"max_tokens": 10000,
      "text-[#9ecbff]">"max_repeat_hashes": 3
    },
    "text-[#9ecbff]">"violation": null
  },
  "text-[#9ecbff]">"totals": {
    "text-[#9ecbff]">"step_count": 3,
    "text-[#9ecbff]">"llm_calls": 2,
    "text-[#9ecbff]">"tool_calls": 1,
    "text-[#9ecbff]">"total_tokens": 142,
    "text-[#9ecbff]">"prompt_tokens": 95,
    "text-[#9ecbff]">"completion_tokens": 47
  },
  "text-[#9ecbff]">"input": {},
  "text-[#9ecbff]">"output": {
    "text-[#9ecbff]">"question": "What is AI?",
    "text-[#9ecbff]">"search_result": "...",
    "text-[#9ecbff]">"summary": "..."
  },
  "text-[#9ecbff]">"error": null,
  "text-[#9ecbff]">"steps": [
    { "text-[#9ecbff]">"step_type": "llm_call", ... },
    { "text-[#9ecbff]">"step_type": "tool_call", ... },
    { "text-[#9ecbff]">"step_type": "llm_call", ... }
  ]
}

Top-Level Fields

| Field | Type | Purpose | |-------|------|---------| | schema_version | string | Always "1.0" | | record_id | string | Unique run identifier (UUID format) | | parent_record_id | string \| null | For derived runs (future use) | | replay_of | string \| null | If this is a replay, the original run's record_id | | agent | object | Agent metadata: name, version | | execution | object | Execution timeline and status | | policy | object | Policy config snapshot and violation (if any) | | totals | object | Aggregate counts: steps, tokens, calls | | input | any | Original input to the agent | | output | any | Final output from the agent | | error | string \| null | Error message if execution failed | | environment | object \| null | Environment metadata (reserved) | | steps[] | array | Typed steps (LLM calls, tool calls, policy violations) | | extensions | object | Reserved for future extensions |

Execution Status

execution.status is one of:

`"success"` — Agent completed normally
`"error"` — Agent raised an exception
`"policy_violation"` — A runtime policy was violated (agent halted mid-execution)

Steps

The steps[] array contains typed execution steps. Each step has:

json

{
  "text-[#9ecbff]">"step_type": "llm_call" | "tool_call" | "policy_violation",
  "text-[#9ecbff]">"step_index": 0,
  "text-[#9ecbff]">"timestamp": "2024-01-15T14:32:10.130000Z",
  "text-[#9ecbff]">"event_id": "uuid"
}

LLM Call Step

json

{
  "text-[#9ecbff]">"step_type": "llm_call",
  "text-[#9ecbff]">"step_index": 0,
  "text-[#9ecbff]">"timestamp": "2024-01-15T14:32:10.130000Z",
  "text-[#9ecbff]">"event_id": "abc123",
  "text-[#9ecbff]">"provider": "openai",
  "text-[#9ecbff]">"model": "gpt-4o",
  "text-[#9ecbff]">"input_data": {
    "text-[#9ecbff]">"messages": [
      { "text-[#9ecbff]">"role": "user", "text-[#9ecbff]">"content": "Your prompt" }
    ]
  },
  "text-[#9ecbff]">"input_hash": "a1b2c3d4e5f6g7h8",
  "text-[#9ecbff]">"output_data": {
    "text-[#9ecbff]">"choices": [
      {
        "text-[#9ecbff]">"message": {
          "text-[#9ecbff]">"role": "assistant",
          "text-[#9ecbff]">"content": "Response text"
        }
      }
    ]
  },
  "text-[#9ecbff]">"token_usage": {
    "text-[#9ecbff]">"prompt_tokens": 12,
    "text-[#9ecbff]">"completion_tokens": 8,
    "text-[#9ecbff]">"total_tokens": 20
  },
  "text-[#9ecbff]">"duration_ms": 150.0,
  "text-[#9ecbff]">"side_effect": "pure",
  "text-[#9ecbff]">"error": null
}

Fields:

provider — LLM provider: "openai", "mock", custom
model — Model identifier
input_data — Full input dict (exact same as passed to ctx.llm.call())
input_hash — Deterministic hash of input (for mismatch detection)
output_data — Full output dict from the LLM
token_usage — Token counts if available
duration_ms — Wall-clock duration
side_effect — "pure" (LLM calls have no side effects)
error — Error message if the call failed

Tool Call Step

json

{
  "text-[#9ecbff]">"step_type": "tool_call",
  "text-[#9ecbff]">"step_index": 1,
  "text-[#9ecbff]">"timestamp": "2024-01-15T14:32:10.180000Z",
  "text-[#9ecbff]">"event_id": "def456",
  "text-[#9ecbff]">"tool_name": "search",
  "text-[#9ecbff]">"args": {
    "text-[#9ecbff]">"query": "AI trends"
  },
  "text-[#9ecbff]">"input_hash": "i9j0k1l2m3n4o5p6",
  "text-[#9ecbff]">"output_data": "Search results...",
  "text-[#9ecbff]">"duration_ms": 45.0,
  "text-[#9ecbff]">"side_effect": null,
  "text-[#9ecbff]">"error": null
}

Fields:

tool_name — Name of the registered tool
args — Arguments dict (exact same as passed to ctx.tools.call())
input_hash — Deterministic hash of args (for repeat detection)
output_data — Return value from the tool
duration_ms — Wall-clock duration
side_effect — Null (can be "read_only", "write", "irreversible" in future)
error — Error message if the tool raised an exception

Policy Violation Step

json

{
  "text-[#9ecbff]">"step_type": "policy_violation",
  "text-[#9ecbff]">"step_index": 5,
  "text-[#9ecbff]">"timestamp": "2024-01-15T14:32:10.240000Z",
  "text-[#9ecbff]">"event_id": "ghi789",
  "text-[#9ecbff]">"policy_name": "max_steps",
  "text-[#9ecbff]">"message": "Maximum step count (10) exceeded",
  "text-[#9ecbff]">"details": {
    "text-[#9ecbff]">"limit": 10,
    "text-[#9ecbff]">"current": 11
  }
}

Fields:

policy_name — Name of violated policy: "max_steps", "max_tokens", "max_repeat_hashes"
message — Human-readable violation description
details — Policy-specific details (limit, current value, etc.)

When a policy violation occurs:

Execution halts immediately (remaining steps not executed)
execution.status = "policy_violation"
policy.violation contains the violation details
The PolicyViolationStep is added to steps[]

Input Hash

The input_hash field is critical for mismatch detection and repeat detection.

Algorithm:

Take input dict (e.g., {"messages": [...]})
Recursively sort all keys alphabetically
Serialize to compact JSON (no whitespace, sorted keys)
Compute SHA256 hash
Take first 16 hex characters

Example:

python

input_dict = {#a3d95f]">"messages": [{"role": "user", "content": "hi"}]}
# Sorted and serialized: '{"messages":[{"content":"hi","role":"user"}]}'
# SHA256: 'a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6...'
# First 16 chars: 'a1b2c3d4e5f6g7h8'

The hash is deterministic:

Same input → same hash
Different input → different hash (with overwhelming probability)

Used for:

Replay mismatch detection — if replayed input hash ≠ original, ReplayMismatchError
Repeat detection — if same input hash appears too many times, max_repeat_hashes policy fires

Storage

Default Location

plaintext

~/.paprika/traces/

The directory is created automatically on first run.

Override Directory

Use environment variable:

bash

PAPRIKA_TRACE_DIR=/tmp/paprika python agent.py

Or CLI flag:

bash

paprika runs list --trace-dir /tmp/paprika

File Naming

Files are named by record_id:

plaintext

abc123def456.json

Run IDs must match the pattern: ^[A-Za-z0-9][A-Za-z0-9._-]*$

(Alphanumeric, dots, dashes, underscores; no path traversal risk)

Security

Path Traversal Prevention:

Run IDs are validated. You cannot escape the trace directory via a run ID like ../../../etc/passwd. Invalid run IDs raise InvalidRunIdError.

No Secrets Storage:

ExecutionRecord stores full inputs and outputs. If your LLM calls or tool calls include sensitive data (API keys, passwords, PII), they will be stored in the trace file. Do not log sensitive data to Paprika traces if you can avoid it. Use the input and output fields only for non-sensitive structured data.

Versions

schema_version is always "1.0".

If Paprika updates the schema in a breaking way, the version number will increment (e.g., "2.0"). The code will migrate old traces automatically on load.

Accessing Records Programmatically

python

#a3d95f]">"text-[#9ecbff]">from paprika "text-[#9ecbff]">import PaprikaRuntime

runtime = PaprikaRuntime()

# Load a record
record = runtime.trace_store.load_record(run_id=#a3d95f]">"abc123def456")

# Access fields
print(record.record_id)
print(record.agent.name)
print(record.execution.status)
print(record.totals.step_count)

# Iterate steps
#a3d95f]">"text-[#9ecbff]">for step in record.steps:
    #a3d95f]">"text-[#9ecbff]">if step.step_type == "llm_call":
        print(f#a3d95f]">"LLM: {step.model} in {step.duration_ms}ms")
    #a3d95f]">"text-[#9ecbff]">elif step.step_type == "tool_call":
        print(f#a3d95f]">"Tool: {step.tool_name}")
    #a3d95f]">"text-[#9ecbff]">elif step.step_type == "policy_violation":
        print(f#a3d95f]">"Violation: {step.policy_name}")

# Serialize to JSON
json_string = record.model_dump_json_pretty()

Next Steps

Inspect records via CLI: CLI
Replay records: Replay Engine
Set policies that affect execution status: Policies