Platform Architecture Gap Analysis — 2026-04-01

Overview

This document maps current platform capabilities against requirements for autonomous agent workflows. It identifies blocking gaps (P0), critical gaps (P1), and quality-of-life gaps (P2) with implementation recommendations.

Phase 2A Status (2026-04-01): Gap 1 ✅ and Gap 2 ✅ are complete. 4 P0 gaps remain.

Part A: Current Capabilities Inventory

✅ Already Implemented

Type System & Schema Foundation

Aion framework with declarative module model
Zod-based contract enforcement (manifests, objects, diagnostics, envelopes)
Contract-aware TypeScript/Zod codegen
Module lifecycle hooks (preInstall, postInstall, preUninstall)

MCP Protocol Integration

9 studio-mcp tools (4 functional: list/inspect/validate + 1 diagnostic: health)
Bootstrap runtime with envelope contracts
Deterministic request/response patterns
5 stubbed tools (compile, bundle, codegen, schema, install_plan)

Observability Infrastructure (Partial)

studio-mcp observation tools: query_state, get_status, get_errors, list_changes
Trace correlation fields: trace_id, operation_id, emitted_at
Deterministic error codes (10 total)
62 integration tests validating envelope contracts

Testing & Reproducibility

IntegrationFixtures: StudioInstanceTestBuilder with reproducible bootstrap
Contract test fixtures for router/db/workflow operations
Deterministic module lifecycle execution (ApplyResult/LifecycleObserverEvents)

API Layer

tRPC type-safe RPC with procedure-core framework
Router contracts (auth, graph, products, sponsors, tasks, teams)
TRPCError with deterministic codes

Database Layer

Phase 1 schema: BIGINT resources, RDF-style statements, SMALLINT permission bitmask
Predicate policies table with permission computation functions
Row-level security (RLS) policies
Soft-delete via deleted_at column

Part B: P0 Gaps (Blocking for AI-First)

Gap 1: Closed Feedback Loops ✅ COMPLETE

Implemented: ExecutionStep schema added to studio-mcp contracts; modules.query_state extended to return previousSteps; in-memory step cache wired in bootstrap runtime.

Implementation:

File: packages/studio-mcp/src/contracts/tools.ts — ExecutionStepSchema (stepId, index, toolName, inputs, outputs, durationMs, statusCode, emittedAt)
File: packages/studio-mcp/src/runtime/bootstrap-runtime.ts — recordExecutionStep(), getExecutionSteps(), step cache per traceId
Tests: packages/studio-mcp/tests/integration/execution-steps.integration.test.ts — 4 tests passing

Agent Loop Pattern (enabled):

  1. Observe current state → query_state (returns previousSteps)
  2. Decide next tool call
  3. Execute tool (compile, codegen, etc.)
  4. Observe new state → query_state.previousSteps shows what ran
  5. Decide: continue or exit

Remaining: State delta diffing (filesystem/DB mutations per step) — deferred to Phase 2B.

Gap 2: Declarative Tool Contracts ✅ COMPLETE

Implemented: @savvi-studio/tool-contracts package shipped with full interface and test coverage.

Package: packages/tool-contracts/src/contracts.ts

Key exports:

// Semantic type registry
type SemanticType = 'ltree-path' | 'resource-id' | 'module-ref' | 'uuid'
                 | 'semver' | 'file-path' | 'json-object' | 'unknown';

// Predicate types
interface PreconditionPredicate { description, errorCode, evaluate(ctx) }
interface PostconditionPredicate { description, errorCode, evaluate(ctx) }

// Capability tags
type CapabilityTag = 'READ' | 'WRITE' | 'DELETE' | 'CREATE'
                  | 'EXECUTE' | 'ADMIN' | 'SCHEMA_MUTATION' | 'CODE_EXECUTION';

// Full tool contract
interface DeclaredToolContract {
  toolName, description, inputSchema, outputSchema,
  inputSemanticTypes?, outputSemanticTypes?,
  preconditions?, postconditions?,
  dependencies?, capabilities, resourceLimits?
}

// Dependency graph with cycle detection
interface ToolDependencyGraph { tools, dependencies, validate(): string[] }

Tests: 14 tests passing (packages/tool-contracts/src/contracts.test.ts)

Remaining: Wire DeclaredToolContract into studio-mcp tool handlers; add capability checks to bootstrap runtime.

Gap 3: Agent-Safe Primitives ❌

Problem: No guardrails for agentic code execution at scale. Cannot safely expose internal tools to multi-tenant agents.

Current State:

MCP tools defined but no resource limits
No rate limiting per user/agent
No rollback after partial failures (compile OK, codegen fails → stale schema state)
No sandbox/resource-bounded execution
No capability-based access control (all tools available to all callers)

Requirements for Agent Safety:

Tool Execution Sandbox:
  Input Validation
    ↓
  Capability Check (user allowed SCHEMA_MUTATION?)
    ↓
  Rate Limit (< 10 compilations/min?)
    ↓
  Resource Limit (timeout=30s, memory=512MB, output<10MB)
    ↓
  Execute in subprocess/worker
    ↓
  Capture stdout/stderr/exit code
    ↓
  Attestation (sign result with timestamp + signer)
    ↓
  Cleanup resources

Missing Pieces:

⚠️ Rate limit middleware for studio-mcp server
⚠️ Resource limit enforcement (process resource quotas)
⚠️ Capability-based RBAC matrix (user role → allowed tools)
⚠️ Rollback capability (reverse applied changes)
⚠️ Subprocess/worker execution (isolation)
⚠️ Result attestation (signature + timestamp)
⚠️ Multi-tenant policy enforcement

Implementation Approach:

New package: @savvi-studio/tool-sandbox
- ResourceLimits interface (cpu, memory, timeout, output size)
- CapabilityCheck function (true/false based on user role + tool)
- RateLimiter class (per-user, per-tool quotas)
- SandboxExecutor using piscina/worker_threads
- ResultAttestation struct (signature, timestamp, signer)
Extend studio-mcp/server/stdio.ts
- Before tool registration: apply RateLimitMiddleware
- Before tool execution: capability check
- Wrap tool execution in ResourceLimits
- Add result attestation
- Cleanup on completion
Add CI gate: blocked tools in production (e.g., system.codegen.all)

Effort: High (3-4 weeks) Blocks: Multi-tenant deployment

Gap 4: Observability & Request Correlation ❌

Problem: No runtime tracing, request correlation, or decision audit. Cannot debug agent failures or prove reproducibility.

Current State:

OpenTelemetry deps installed but not instrumented
No trace/span emitters in runtime
No request-id threading through MCP→engine→DB
No decision logs (agent decisions not queryable)
No tool execution metrics (latency, error rates, cache hits)
Loki + Grafana installed but not configured/used

Requirements for Observability:

Request Flow with Trace/Span:
  Client {traceId}
    ↓ [MCP Server receives]
  studio-mcp.stdio {traceId, spanId=1}
    ↓ [MCP tool invocation]
  graph-module-engine {traceId, spanId=2}
    ↓ [Lifecycle execution]
  DB query {traceId, spanId=3}
    ↓ [Response back through stack]
  Loki {traceId, toolName, stepId, duration, status}
    ↓ [Grafana dashboard displays]
  Operator views: tool timeline, error heatmap, latency percentiles

Missing Pieces:

⚠️ Tracer initialization in graph-module-engine
⚠️ Tracer initialization in studio-mcp server
⚠️ Span emissions for tool invocations
⚠️ Request-id threading through tRPC layer
⚠️ Structured logging to Loki (traceId, stepId, duration, outcome)
⚠️ Grafana dashboards + alerts
⚠️ Debug context propagation in error messages

Implementation Approach:

Activate OpenTelemetry in graph-module-engine
- Create NodeTracerProvider
- Wire to OTLPTraceExporter (Loki collector)
- Emit spans for: lifecycle.execute, module.install, permission.check, query.execute
Activate OpenTelemetry in studio-mcp
- Create tracer for tool invocations
- Emit span for: tool.register, tool.invoked, tool.completed
- Thread trace-id from request envelope through response
Thread request-id through tRPC stack
- Extract from request context
- Pass to graph-module-engine via ModuleInstallationEngineDependencies
Configure Loki + Promtail
- Promtail ships structured logs from containers
- Loki schema: {traceId, toolName, stepId, userId, elapsed, outcome, errors}
Build Grafana dashboards
- Tool invocation timeline (x=time, y=tool)
- Error heatmap (x=error_code, y=frequency)
- Latency percentiles (p50, p95, p99 per tool)
- Per-user rate limits

Effort: High (3-4 weeks) Blocks: Production observability and agent debugging

Gap 5: Reproducibility & Deterministic Execution

Problem: One-shot execution. Cannot replay/verify agent decisions across environments.

Current Status: ⚠️ Partial

Test fixtures exist (IntegrationFixtures)
Deterministic module lifecycle (mostly)
But: No execution trace format
No replay capability
No determinism guarantees (codegen order, timestamps)
No tool result caching with content addressing

Requirements for Reproducibility:

Replay Workflow:
  1. Capture execution trace: [stepId, toolName, inputs, outputs, timestamp]
  2. Store with content hash: SHA256(inputs) → outputs
  3. Later: replay with same inputs → verify outputs match hash
  4. Or: run diagnostic on failure, capture new trace, replay to point of failure

Missing Pieces:

⚠️ Execution trace format (JSON schema with stepId, toolName, inputs/outputs)
⚠️ Content-addressed caching (input hash → output)
⚠️ Determinism baseline tests (same inputs → same outputs)
⚠️ Replay engine (playback trace + verify)
⚠️ Codegen determinism (no randomness, stable order)
⚠️ Timestamp pinning (use trace timestamp, not now())

Implementation Approach:

Define ExecutionTrace schema

interface ExecutionTrace {
  traceId: string;
  startedAt: ISO8601;
  steps: Step[];
}
interface Step {
  stepId: string;
  index: number;
  toolName: string;
  inputs: Record<string, unknown>;
  outputs: Record<string, unknown>;
  durationMs: number;
  statusCode: number;
  contentHash: string; // SHA256(inputs+toolName+env)
}

Add caching layer (Redis)
- Key: tool:${toolName}:${SHA256(inputs)}
- Value: {outputs, digest, timestamp}
- TTL: configurable (default 24h)
Add determinism tests per tool
- Run tool twice with same inputs
- Verify outputs match bit-for-bit
- Baseline measurements for latency
Add replay engine
- Load ExecutionTrace
- For each step: either use cached result or execute
- Compare digest

Effort: Medium (2-3 weeks) Blocks: Reliable agent workflows

Gap 6: Policy & Guardrails Framework ❌

Problem: No declarative rules engine for tool safety, authorization, or compliance.

Current State:

RLS policies exist in DB (code-embedded)
Authorization checks in handlers (imperative)
No tool-level RBAC
No constraint checking for operational goals
No pre-approval patterns (codegen proposals → human review → apply)

Requirements for Policy Framework:

Policy Engine Stack:
  Tool Invocation Request
    ↓
  Policy Evaluation Engine
    ├─ Match rules: tool + user + resource + action
    ├─ Evaluate in context: can user approve compiling module X?
    ├─ Check constraints: does this compile exceed monthly quota?
    ├─ Determine approval mode: auto-approve, require-human, deny
    └─ Emit audit: who requested, when, approved/denied, by whom
    ↓
  Execution or Denial

Missing Pieces:

⚠️ Policy DSL (Rego-style rules or simpler subset)
⚠️ Tool-level RBAC matrix (role → allowed tools)
⚠️ Policy query engine (evaluate rules in context)
⚠️ Approval workflow integration (async human approval)
⚠️ Pre-approval proposal generation (show diff before apply)
⚠️ Audit logging (immutable record of approvals)
⚠️ Compliance rule library (SOC 2, GDPR, etc.)

Implementation Approach:

New package: @savvi-studio/policy-engine
- Policy interface (rules + conditions)
- PolicyEvaluator (evaluate policies in context)
- ApprovalMode enum (AUTO, REQUIRE_HUMAN, DENY)
- PolicyDecision result (approved, reason, audit_id)
Embed in studio-mcp tool middleware
- Before tool execution: call policyEvaluator.evaluate()
- If REQUIRE_HUMAN: generate proposal, store in DB, return proposal_id
- If AUTO: execute immediately
- If DENY: return error
Add approval workflow UI (in studio-web)
- List pending approvals
- Show tool name, inputs, predicted outputs
- Approve/deny with audit comment
Add audit table schema
- {tool_name, user_id, action, approved_by, timestamp, reason, status}

Effort: High (4-5 weeks) Blocks: Enterprise adoption, compliance

Part C: P1 Gaps (Critical for Production)

Gap 7: Schema-Driven Tool Ecosystem ❌

Problem: Tools operate on objects, not schema-aware contracts. Cannot auto-generate tools from domain model.

Current State:

Codegen generates .ts files from frozen schemas
Tools are manually created per operation
No schema diff capability
No template-aware errors

Missing:

⚠️ Schema diff tools (breaking changes detection)
⚠️ Tool builder DSL (declarative CRUD/query generation)
⚠️ Auto-generated mutation tools (from templates)
⚠️ Auto-generated query tools (from classifiers)

Implementation: New package @savvi-studio/schema-toolgen

Effort: High (4-5 weeks)

Gap 8: Async Patterns for Long-Running Operations ⚠️

Problem: Lifecycle hooks are sync; long-running operations (compilation) block.

Missing:

⚠️ Async lifecycle hooks (preCompile, postCompile async versions)
⚠️ Webhook support (tool completion callback)
⚠️ Event stream support (Kafka/SNS notifications)
⚠️ Polling mechanism (client polls for completion)

Effort: Medium (2-3 weeks)

Gap 9: Tool Result Versioning ⚠️

Problem: Breaking changes to tool output risk agent code.

Missing:

⚠️ Tool output schema versioning
⚠️ Migration framework
⚠️ Backward compatibility layer

Effort: Low (1 week)

Part D: P2 Gaps (Quality of Life)

Problem: Errors lose execution path context.

Missing:

⚠️ Breadcrumb collection (tool A → tool B → tool C → error at C)
⚠️ Error context enrichment (add relevant state at each layer)
⚠️ Error deduplication (group similar errors)

Effort: Low (1 week)

Part E: Implementation Roadmap (Priority-Based)

Phase 2A (Weeks 1-2): P0 Foundations

Week 1 Monday-Friday:

Day 1-2: Commit Phase 1 studio-mcp
Day 2-3: Gap 1 (closed loops) — step observer + step history API
Day 4-5: Gap 2 (tool contracts) — preconditions + dependency graph

Week 2 Monday-Friday:

Day 1-2: Gap 4 (observability) — OpenTelemetry instrumentation
Day 3-4: Gap 3 (agent-safe primitives) — rate limiting + capability check
Day 5: Testing + CI integration

Deliverables:

Closed-loop workflow demonstration (observe→decide→act→repeat)
Tool contract framework in place
End-to-end tracing from MCP→engine→DB
Rate limiting enforced in CI

Phase 2B (Weeks 3-4): Reproducibility + Policy

Week 3:

Gap 5 (reproducibility) — execution trace + caching + replay
Gap 6 (policy) — basic RBAC + approval workflow

Week 4:

Testing + dashboards (Grafana)
Documentation

Deliverables:

Deterministic execution baseline established
Policy engine with approval workflow
Observability dashboards live

Phase 3+ (Weeks 5+): Extended Ecosystem

Gap 7 (schema-driven tools)
Gap 8 (async patterns)
Gap 9 (tool versioning)
Gap 10 (error context)

Part F: Success Metrics

Metric	Target	When
Closed-loop agent workflows	1 end-to-end workflow passes	Week 2
Tool execution observability	100% of tools emitting spans	Week 2
Agent safety	Rate limiting + capability checks enforced	Week 2
Reproducibility	Deterministic execution baseline	Week 4
Policy enforcement	Approval workflows functional	Week 4
Error resolution time	50% reduction (tracing)	Week 4
Schema-driven tools	30% CRUD tools auto-generated	Week 6

Appendix A: Tool Capability Matrix (Current vs Target)

Tool	Current	Target
modules.compile	⚠️ Stubbed	✅ Real + preconditions + rate limit
modules.codegen	⚠️ Stubbed	✅ Real + deterministic + caching
modules.validate	⚠️ Basic	✅ Full schema validation + preconditions
graph.install_plan	❌ Stubbed	✅ Real planning + policy enforcement
NEW: query_state	✅ Works	✅ Enhanced with step history
NEW: list_changes	✅ Works	✅ Enhanced with filesystem tracking
NEW: auto_gen_tool	❌ Missing	✅ Schema-driven tool creation

Appendix B: Risk Assessment

Risk	Probability	Severity	Mitigation
Observability adds 20% latency	Medium	Medium	Span sampling + async emission
Sandbox restricts valid use cases	Low	High	Whitelist safe operations, gradual rollout
Determinism breaks existing workflows	Low	High	Backward-compatible trace format
Policy rules become bottleneck	Medium	Medium	Cache policy decisions, pre-compute

Document Status: ✅ Complete Last Updated: 2026-04-01 Next Review: After Gap 1-4 implementation (Week 2)

Platform Architecture Gap Analysis — 2026-04-01

Overview

Part A: Current Capabilities Inventory

✅ Already Implemented

Part B: P0 Gaps (Blocking for AI-First)

Gap 1: Closed Feedback Loops ✅ COMPLETE

Gap 2: Declarative Tool Contracts ✅ COMPLETE

Gap 3: Agent-Safe Primitives ❌

Gap 4: Observability & Request Correlation ❌

Gap 5: Reproducibility & Deterministic Execution

Gap 6: Policy & Guardrails Framework ❌

Part C: P1 Gaps (Critical for Production)

Gap 7: Schema-Driven Tool Ecosystem ❌

Gap 8: Async Patterns for Long-Running Operations ⚠️

Gap 9: Tool Result Versioning ⚠️

Part D: P2 Gaps (Quality of Life)

Gap 10: Error Context & Breadcrumb Tracking ⚠️

Part E: Implementation Roadmap (Priority-Based)

Phase 2A (Weeks 1-2): P0 Foundations

Phase 2B (Weeks 3-4): Reproducibility + Policy

Phase 3+ (Weeks 5+): Extended Ecosystem

Part F: Success Metrics

Appendix A: Tool Capability Matrix (Current vs Target)

Appendix B: Risk Assessment