Savvi Studio

Platform Architecture Gap Analysis — 2026-04-01

Overview

This document maps current platform capabilities against requirements for autonomous agent workflows. It identifies blocking gaps (P0), critical gaps (P1), and quality-of-life gaps (P2) with implementation recommendations.

Phase 2A Status (2026-04-01): Gap 1 ✅ and Gap 2 ✅ are complete. 4 P0 gaps remain.


Part A: Current Capabilities Inventory

✅ Already Implemented

Type System & Schema Foundation

  • Aion framework with declarative module model
  • Zod-based contract enforcement (manifests, objects, diagnostics, envelopes)
  • Contract-aware TypeScript/Zod codegen
  • Module lifecycle hooks (preInstall, postInstall, preUninstall)

MCP Protocol Integration

  • 9 studio-mcp tools (4 functional: list/inspect/validate + 1 diagnostic: health)
  • Bootstrap runtime with envelope contracts
  • Deterministic request/response patterns
  • 5 stubbed tools (compile, bundle, codegen, schema, install_plan)

Observability Infrastructure (Partial)

  • studio-mcp observation tools: query_state, get_status, get_errors, list_changes
  • Trace correlation fields: trace_id, operation_id, emitted_at
  • Deterministic error codes (10 total)
  • 62 integration tests validating envelope contracts

Testing & Reproducibility

  • IntegrationFixtures: StudioInstanceTestBuilder with reproducible bootstrap
  • Contract test fixtures for router/db/workflow operations
  • Deterministic module lifecycle execution (ApplyResult/LifecycleObserverEvents)

API Layer

  • tRPC type-safe RPC with procedure-core framework
  • Router contracts (auth, graph, products, sponsors, tasks, teams)
  • TRPCError with deterministic codes

Database Layer

  • Phase 1 schema: BIGINT resources, RDF-style statements, SMALLINT permission bitmask
  • Predicate policies table with permission computation functions
  • Row-level security (RLS) policies
  • Soft-delete via deleted_at column

Part B: P0 Gaps (Blocking for AI-First)

Gap 1: Closed Feedback Loops ✅ COMPLETE

Implemented: ExecutionStep schema added to studio-mcp contracts; modules.query_state extended to return previousSteps; in-memory step cache wired in bootstrap runtime.

Implementation:

  • File: packages/studio-mcp/src/contracts/tools.tsExecutionStepSchema (stepId, index, toolName, inputs, outputs, durationMs, statusCode, emittedAt)
  • File: packages/studio-mcp/src/runtime/bootstrap-runtime.tsrecordExecutionStep(), getExecutionSteps(), step cache per traceId
  • Tests: packages/studio-mcp/tests/integration/execution-steps.integration.test.ts — 4 tests passing

Agent Loop Pattern (enabled):

  1. Observe current state → query_state (returns previousSteps)
  2. Decide next tool call
  3. Execute tool (compile, codegen, etc.)
  4. Observe new state → query_state.previousSteps shows what ran
  5. Decide: continue or exit

Remaining: State delta diffing (filesystem/DB mutations per step) — deferred to Phase 2B.


Gap 2: Declarative Tool Contracts ✅ COMPLETE

Implemented: @savvi-studio/tool-contracts package shipped with full interface and test coverage.

Package: packages/tool-contracts/src/contracts.ts

Key exports:

// Semantic type registry
type SemanticType = 'ltree-path' | 'resource-id' | 'module-ref' | 'uuid'
                 | 'semver' | 'file-path' | 'json-object' | 'unknown';

// Predicate types
interface PreconditionPredicate { description, errorCode, evaluate(ctx) }
interface PostconditionPredicate { description, errorCode, evaluate(ctx) }

// Capability tags
type CapabilityTag = 'READ' | 'WRITE' | 'DELETE' | 'CREATE'
                  | 'EXECUTE' | 'ADMIN' | 'SCHEMA_MUTATION' | 'CODE_EXECUTION';

// Full tool contract
interface DeclaredToolContract {
  toolName, description, inputSchema, outputSchema,
  inputSemanticTypes?, outputSemanticTypes?,
  preconditions?, postconditions?,
  dependencies?, capabilities, resourceLimits?
}

// Dependency graph with cycle detection
interface ToolDependencyGraph { tools, dependencies, validate(): string[] }

Tests: 14 tests passing (packages/tool-contracts/src/contracts.test.ts)

Remaining: Wire DeclaredToolContract into studio-mcp tool handlers; add capability checks to bootstrap runtime.


Gap 3: Agent-Safe Primitives ❌

Problem: No guardrails for agentic code execution at scale. Cannot safely expose internal tools to multi-tenant agents.

Current State:

  • MCP tools defined but no resource limits
  • No rate limiting per user/agent
  • No rollback after partial failures (compile OK, codegen fails → stale schema state)
  • No sandbox/resource-bounded execution
  • No capability-based access control (all tools available to all callers)

Requirements for Agent Safety:

Tool Execution Sandbox:
  Input Validation
    ↓
  Capability Check (user allowed SCHEMA_MUTATION?)
    ↓
  Rate Limit (< 10 compilations/min?)
    ↓
  Resource Limit (timeout=30s, memory=512MB, output<10MB)
    ↓
  Execute in subprocess/worker
    ↓
  Capture stdout/stderr/exit code
    ↓
  Attestation (sign result with timestamp + signer)
    ↓
  Cleanup resources

Missing Pieces:

  • ⚠️ Rate limit middleware for studio-mcp server
  • ⚠️ Resource limit enforcement (process resource quotas)
  • ⚠️ Capability-based RBAC matrix (user role → allowed tools)
  • ⚠️ Rollback capability (reverse applied changes)
  • ⚠️ Subprocess/worker execution (isolation)
  • ⚠️ Result attestation (signature + timestamp)
  • ⚠️ Multi-tenant policy enforcement

Implementation Approach:

  1. New package: @savvi-studio/tool-sandbox

    • ResourceLimits interface (cpu, memory, timeout, output size)
    • CapabilityCheck function (true/false based on user role + tool)
    • RateLimiter class (per-user, per-tool quotas)
    • SandboxExecutor using piscina/worker_threads
    • ResultAttestation struct (signature, timestamp, signer)
  2. Extend studio-mcp/server/stdio.ts

    • Before tool registration: apply RateLimitMiddleware
    • Before tool execution: capability check
    • Wrap tool execution in ResourceLimits
    • Add result attestation
    • Cleanup on completion
  3. Add CI gate: blocked tools in production (e.g., system.codegen.all)

Effort: High (3-4 weeks) Blocks: Multi-tenant deployment


Gap 4: Observability & Request Correlation ❌

Problem: No runtime tracing, request correlation, or decision audit. Cannot debug agent failures or prove reproducibility.

Current State:

  • OpenTelemetry deps installed but not instrumented
  • No trace/span emitters in runtime
  • No request-id threading through MCP→engine→DB
  • No decision logs (agent decisions not queryable)
  • No tool execution metrics (latency, error rates, cache hits)
  • Loki + Grafana installed but not configured/used

Requirements for Observability:

Request Flow with Trace/Span:
  Client {traceId}
    ↓ [MCP Server receives]
  studio-mcp.stdio {traceId, spanId=1}
    ↓ [MCP tool invocation]
  graph-module-engine {traceId, spanId=2}
    ↓ [Lifecycle execution]
  DB query {traceId, spanId=3}
    ↓ [Response back through stack]
  Loki {traceId, toolName, stepId, duration, status}
    ↓ [Grafana dashboard displays]
  Operator views: tool timeline, error heatmap, latency percentiles

Missing Pieces:

  • ⚠️ Tracer initialization in graph-module-engine
  • ⚠️ Tracer initialization in studio-mcp server
  • ⚠️ Span emissions for tool invocations
  • ⚠️ Request-id threading through tRPC layer
  • ⚠️ Structured logging to Loki (traceId, stepId, duration, outcome)
  • ⚠️ Grafana dashboards + alerts
  • ⚠️ Debug context propagation in error messages

Implementation Approach:

  1. Activate OpenTelemetry in graph-module-engine

    • Create NodeTracerProvider
    • Wire to OTLPTraceExporter (Loki collector)
    • Emit spans for: lifecycle.execute, module.install, permission.check, query.execute
  2. Activate OpenTelemetry in studio-mcp

    • Create tracer for tool invocations
    • Emit span for: tool.register, tool.invoked, tool.completed
    • Thread trace-id from request envelope through response
  3. Thread request-id through tRPC stack

    • Extract from request context
    • Pass to graph-module-engine via ModuleInstallationEngineDependencies
  4. Configure Loki + Promtail

    • Promtail ships structured logs from containers
    • Loki schema: {traceId, toolName, stepId, userId, elapsed, outcome, errors}
  5. Build Grafana dashboards

    • Tool invocation timeline (x=time, y=tool)
    • Error heatmap (x=error_code, y=frequency)
    • Latency percentiles (p50, p95, p99 per tool)
    • Per-user rate limits

Effort: High (3-4 weeks) Blocks: Production observability and agent debugging


Gap 5: Reproducibility & Deterministic Execution

Problem: One-shot execution. Cannot replay/verify agent decisions across environments.

Current Status: ⚠️ Partial

  • Test fixtures exist (IntegrationFixtures)
  • Deterministic module lifecycle (mostly)
  • But: No execution trace format
  • No replay capability
  • No determinism guarantees (codegen order, timestamps)
  • No tool result caching with content addressing

Requirements for Reproducibility:

Replay Workflow:
  1. Capture execution trace: [stepId, toolName, inputs, outputs, timestamp]
  2. Store with content hash: SHA256(inputs) → outputs
  3. Later: replay with same inputs → verify outputs match hash
  4. Or: run diagnostic on failure, capture new trace, replay to point of failure

Missing Pieces:

  • ⚠️ Execution trace format (JSON schema with stepId, toolName, inputs/outputs)
  • ⚠️ Content-addressed caching (input hash → output)
  • ⚠️ Determinism baseline tests (same inputs → same outputs)
  • ⚠️ Replay engine (playback trace + verify)
  • ⚠️ Codegen determinism (no randomness, stable order)
  • ⚠️ Timestamp pinning (use trace timestamp, not now())

Implementation Approach:

  1. Define ExecutionTrace schema

    interface ExecutionTrace {
      traceId: string;
      startedAt: ISO8601;
      steps: Step[];
    }
    interface Step {
      stepId: string;
      index: number;
      toolName: string;
      inputs: Record<string, unknown>;
      outputs: Record<string, unknown>;
      durationMs: number;
      statusCode: number;
      contentHash: string; // SHA256(inputs+toolName+env)
    }
    
  2. Add caching layer (Redis)

    • Key: tool:${toolName}:${SHA256(inputs)}
    • Value: {outputs, digest, timestamp}
    • TTL: configurable (default 24h)
  3. Add determinism tests per tool

    • Run tool twice with same inputs
    • Verify outputs match bit-for-bit
    • Baseline measurements for latency
  4. Add replay engine

    • Load ExecutionTrace
    • For each step: either use cached result or execute
    • Compare digest

Effort: Medium (2-3 weeks) Blocks: Reliable agent workflows


Gap 6: Policy & Guardrails Framework ❌

Problem: No declarative rules engine for tool safety, authorization, or compliance.

Current State:

  • RLS policies exist in DB (code-embedded)
  • Authorization checks in handlers (imperative)
  • No tool-level RBAC
  • No constraint checking for operational goals
  • No pre-approval patterns (codegen proposals → human review → apply)

Requirements for Policy Framework:

Policy Engine Stack:
  Tool Invocation Request
    ↓
  Policy Evaluation Engine
    ├─ Match rules: tool + user + resource + action
    ├─ Evaluate in context: can user approve compiling module X?
    ├─ Check constraints: does this compile exceed monthly quota?
    ├─ Determine approval mode: auto-approve, require-human, deny
    └─ Emit audit: who requested, when, approved/denied, by whom
    ↓
  Execution or Denial

Missing Pieces:

  • ⚠️ Policy DSL (Rego-style rules or simpler subset)
  • ⚠️ Tool-level RBAC matrix (role → allowed tools)
  • ⚠️ Policy query engine (evaluate rules in context)
  • ⚠️ Approval workflow integration (async human approval)
  • ⚠️ Pre-approval proposal generation (show diff before apply)
  • ⚠️ Audit logging (immutable record of approvals)
  • ⚠️ Compliance rule library (SOC 2, GDPR, etc.)

Implementation Approach:

  1. New package: @savvi-studio/policy-engine

    • Policy interface (rules + conditions)
    • PolicyEvaluator (evaluate policies in context)
    • ApprovalMode enum (AUTO, REQUIRE_HUMAN, DENY)
    • PolicyDecision result (approved, reason, audit_id)
  2. Embed in studio-mcp tool middleware

    • Before tool execution: call policyEvaluator.evaluate()
    • If REQUIRE_HUMAN: generate proposal, store in DB, return proposal_id
    • If AUTO: execute immediately
    • If DENY: return error
  3. Add approval workflow UI (in studio-web)

    • List pending approvals
    • Show tool name, inputs, predicted outputs
    • Approve/deny with audit comment
  4. Add audit table schema

    • {tool_name, user_id, action, approved_by, timestamp, reason, status}

Effort: High (4-5 weeks) Blocks: Enterprise adoption, compliance


Part C: P1 Gaps (Critical for Production)

Gap 7: Schema-Driven Tool Ecosystem ❌

Problem: Tools operate on objects, not schema-aware contracts. Cannot auto-generate tools from domain model.

Current State:

  • Codegen generates .ts files from frozen schemas
  • Tools are manually created per operation
  • No schema diff capability
  • No template-aware errors

Missing:

  • ⚠️ Schema diff tools (breaking changes detection)
  • ⚠️ Tool builder DSL (declarative CRUD/query generation)
  • ⚠️ Auto-generated mutation tools (from templates)
  • ⚠️ Auto-generated query tools (from classifiers)

Implementation: New package @savvi-studio/schema-toolgen

Effort: High (4-5 weeks)


Gap 8: Async Patterns for Long-Running Operations ⚠️

Problem: Lifecycle hooks are sync; long-running operations (compilation) block.

Missing:

  • ⚠️ Async lifecycle hooks (preCompile, postCompile async versions)
  • ⚠️ Webhook support (tool completion callback)
  • ⚠️ Event stream support (Kafka/SNS notifications)
  • ⚠️ Polling mechanism (client polls for completion)

Effort: Medium (2-3 weeks)


Gap 9: Tool Result Versioning ⚠️

Problem: Breaking changes to tool output risk agent code.

Missing:

  • ⚠️ Tool output schema versioning
  • ⚠️ Migration framework
  • ⚠️ Backward compatibility layer

Effort: Low (1 week)


Part D: P2 Gaps (Quality of Life)

Gap 10: Error Context & Breadcrumb Tracking ⚠️

Problem: Errors lose execution path context.

Missing:

  • ⚠️ Breadcrumb collection (tool A → tool B → tool C → error at C)
  • ⚠️ Error context enrichment (add relevant state at each layer)
  • ⚠️ Error deduplication (group similar errors)

Effort: Low (1 week)


Part E: Implementation Roadmap (Priority-Based)

Phase 2A (Weeks 1-2): P0 Foundations

Week 1 Monday-Friday:

  • Day 1-2: Commit Phase 1 studio-mcp
  • Day 2-3: Gap 1 (closed loops) — step observer + step history API
  • Day 4-5: Gap 2 (tool contracts) — preconditions + dependency graph

Week 2 Monday-Friday:

  • Day 1-2: Gap 4 (observability) — OpenTelemetry instrumentation
  • Day 3-4: Gap 3 (agent-safe primitives) — rate limiting + capability check
  • Day 5: Testing + CI integration

Deliverables:

  • Closed-loop workflow demonstration (observe→decide→act→repeat)
  • Tool contract framework in place
  • End-to-end tracing from MCP→engine→DB
  • Rate limiting enforced in CI

Phase 2B (Weeks 3-4): Reproducibility + Policy

Week 3:

  • Gap 5 (reproducibility) — execution trace + caching + replay
  • Gap 6 (policy) — basic RBAC + approval workflow

Week 4:

  • Testing + dashboards (Grafana)
  • Documentation

Deliverables:

  • Deterministic execution baseline established
  • Policy engine with approval workflow
  • Observability dashboards live

Phase 3+ (Weeks 5+): Extended Ecosystem

  • Gap 7 (schema-driven tools)
  • Gap 8 (async patterns)
  • Gap 9 (tool versioning)
  • Gap 10 (error context)

Part F: Success Metrics

Metric Target When
Closed-loop agent workflows 1 end-to-end workflow passes Week 2
Tool execution observability 100% of tools emitting spans Week 2
Agent safety Rate limiting + capability checks enforced Week 2
Reproducibility Deterministic execution baseline Week 4
Policy enforcement Approval workflows functional Week 4
Error resolution time 50% reduction (tracing) Week 4
Schema-driven tools 30% CRUD tools auto-generated Week 6

Appendix A: Tool Capability Matrix (Current vs Target)

Tool Current Target
modules.compile ⚠️ Stubbed ✅ Real + preconditions + rate limit
modules.codegen ⚠️ Stubbed ✅ Real + deterministic + caching
modules.validate ⚠️ Basic ✅ Full schema validation + preconditions
graph.install_plan ❌ Stubbed ✅ Real planning + policy enforcement
NEW: query_state ✅ Works ✅ Enhanced with step history
NEW: list_changes ✅ Works ✅ Enhanced with filesystem tracking
NEW: auto_gen_tool ❌ Missing ✅ Schema-driven tool creation

Appendix B: Risk Assessment

Risk Probability Severity Mitigation
Observability adds 20% latency Medium Medium Span sampling + async emission
Sandbox restricts valid use cases Low High Whitelist safe operations, gradual rollout
Determinism breaks existing workflows Low High Backward-compatible trace format
Policy rules become bottleneck Medium Medium Cache policy decisions, pre-compute

Document Status: ✅ Complete Last Updated: 2026-04-01 Next Review: After Gap 1-4 implementation (Week 2)