Platform Architecture Gap Analysis — 2026-04-01
Overview
This document maps current platform capabilities against requirements for autonomous agent workflows. It identifies blocking gaps (P0), critical gaps (P1), and quality-of-life gaps (P2) with implementation recommendations.
Phase 2A Status (2026-04-01): Gap 1 ✅ and Gap 2 ✅ are complete. 4 P0 gaps remain.
Part A: Current Capabilities Inventory
✅ Already Implemented
Type System & Schema Foundation
- Aion framework with declarative module model
- Zod-based contract enforcement (manifests, objects, diagnostics, envelopes)
- Contract-aware TypeScript/Zod codegen
- Module lifecycle hooks (preInstall, postInstall, preUninstall)
MCP Protocol Integration
- 9 studio-mcp tools (4 functional: list/inspect/validate + 1 diagnostic: health)
- Bootstrap runtime with envelope contracts
- Deterministic request/response patterns
- 5 stubbed tools (compile, bundle, codegen, schema, install_plan)
Observability Infrastructure (Partial)
- studio-mcp observation tools: query_state, get_status, get_errors, list_changes
- Trace correlation fields: trace_id, operation_id, emitted_at
- Deterministic error codes (10 total)
- 62 integration tests validating envelope contracts
Testing & Reproducibility
- IntegrationFixtures: StudioInstanceTestBuilder with reproducible bootstrap
- Contract test fixtures for router/db/workflow operations
- Deterministic module lifecycle execution (ApplyResult/LifecycleObserverEvents)
API Layer
- tRPC type-safe RPC with procedure-core framework
- Router contracts (auth, graph, products, sponsors, tasks, teams)
- TRPCError with deterministic codes
Database Layer
- Phase 1 schema: BIGINT resources, RDF-style statements, SMALLINT permission bitmask
- Predicate policies table with permission computation functions
- Row-level security (RLS) policies
- Soft-delete via deleted_at column
Part B: P0 Gaps (Blocking for AI-First)
Gap 1: Closed Feedback Loops ✅ COMPLETE
Implemented: ExecutionStep schema added to studio-mcp contracts; modules.query_state extended to return previousSteps; in-memory step cache wired in bootstrap runtime.
Implementation:
- File:
packages/studio-mcp/src/contracts/tools.ts—ExecutionStepSchema(stepId, index, toolName, inputs, outputs, durationMs, statusCode, emittedAt) - File:
packages/studio-mcp/src/runtime/bootstrap-runtime.ts—recordExecutionStep(),getExecutionSteps(), step cache per traceId - Tests:
packages/studio-mcp/tests/integration/execution-steps.integration.test.ts— 4 tests passing
Agent Loop Pattern (enabled):
1. Observe current state → query_state (returns previousSteps)
2. Decide next tool call
3. Execute tool (compile, codegen, etc.)
4. Observe new state → query_state.previousSteps shows what ran
5. Decide: continue or exit
Remaining: State delta diffing (filesystem/DB mutations per step) — deferred to Phase 2B.
Gap 2: Declarative Tool Contracts ✅ COMPLETE
Implemented: @savvi-studio/tool-contracts package shipped with full interface and test coverage.
Package: packages/tool-contracts/src/contracts.ts
Key exports:
// Semantic type registry
type SemanticType = 'ltree-path' | 'resource-id' | 'module-ref' | 'uuid'
| 'semver' | 'file-path' | 'json-object' | 'unknown';
// Predicate types
interface PreconditionPredicate { description, errorCode, evaluate(ctx) }
interface PostconditionPredicate { description, errorCode, evaluate(ctx) }
// Capability tags
type CapabilityTag = 'READ' | 'WRITE' | 'DELETE' | 'CREATE'
| 'EXECUTE' | 'ADMIN' | 'SCHEMA_MUTATION' | 'CODE_EXECUTION';
// Full tool contract
interface DeclaredToolContract {
toolName, description, inputSchema, outputSchema,
inputSemanticTypes?, outputSemanticTypes?,
preconditions?, postconditions?,
dependencies?, capabilities, resourceLimits?
}
// Dependency graph with cycle detection
interface ToolDependencyGraph { tools, dependencies, validate(): string[] }
Tests: 14 tests passing (packages/tool-contracts/src/contracts.test.ts)
Remaining: Wire DeclaredToolContract into studio-mcp tool handlers; add capability checks to bootstrap runtime.
Gap 3: Agent-Safe Primitives ❌
Problem: No guardrails for agentic code execution at scale. Cannot safely expose internal tools to multi-tenant agents.
Current State:
- MCP tools defined but no resource limits
- No rate limiting per user/agent
- No rollback after partial failures (compile OK, codegen fails → stale schema state)
- No sandbox/resource-bounded execution
- No capability-based access control (all tools available to all callers)
Requirements for Agent Safety:
Tool Execution Sandbox:
Input Validation
↓
Capability Check (user allowed SCHEMA_MUTATION?)
↓
Rate Limit (< 10 compilations/min?)
↓
Resource Limit (timeout=30s, memory=512MB, output<10MB)
↓
Execute in subprocess/worker
↓
Capture stdout/stderr/exit code
↓
Attestation (sign result with timestamp + signer)
↓
Cleanup resources
Missing Pieces:
- ⚠️ Rate limit middleware for studio-mcp server
- ⚠️ Resource limit enforcement (process resource quotas)
- ⚠️ Capability-based RBAC matrix (user role → allowed tools)
- ⚠️ Rollback capability (reverse applied changes)
- ⚠️ Subprocess/worker execution (isolation)
- ⚠️ Result attestation (signature + timestamp)
- ⚠️ Multi-tenant policy enforcement
Implementation Approach:
-
New package:
@savvi-studio/tool-sandboxResourceLimitsinterface (cpu, memory, timeout, output size)CapabilityCheckfunction (true/false based on user role + tool)RateLimiterclass (per-user, per-tool quotas)SandboxExecutorusing piscina/worker_threadsResultAttestationstruct (signature, timestamp, signer)
-
Extend studio-mcp/server/stdio.ts
- Before tool registration: apply RateLimitMiddleware
- Before tool execution: capability check
- Wrap tool execution in ResourceLimits
- Add result attestation
- Cleanup on completion
-
Add CI gate: blocked tools in production (e.g., system.codegen.all)
Effort: High (3-4 weeks) Blocks: Multi-tenant deployment
Gap 4: Observability & Request Correlation ❌
Problem: No runtime tracing, request correlation, or decision audit. Cannot debug agent failures or prove reproducibility.
Current State:
- OpenTelemetry deps installed but not instrumented
- No trace/span emitters in runtime
- No request-id threading through MCP→engine→DB
- No decision logs (agent decisions not queryable)
- No tool execution metrics (latency, error rates, cache hits)
- Loki + Grafana installed but not configured/used
Requirements for Observability:
Request Flow with Trace/Span:
Client {traceId}
↓ [MCP Server receives]
studio-mcp.stdio {traceId, spanId=1}
↓ [MCP tool invocation]
graph-module-engine {traceId, spanId=2}
↓ [Lifecycle execution]
DB query {traceId, spanId=3}
↓ [Response back through stack]
Loki {traceId, toolName, stepId, duration, status}
↓ [Grafana dashboard displays]
Operator views: tool timeline, error heatmap, latency percentiles
Missing Pieces:
- ⚠️ Tracer initialization in graph-module-engine
- ⚠️ Tracer initialization in studio-mcp server
- ⚠️ Span emissions for tool invocations
- ⚠️ Request-id threading through tRPC layer
- ⚠️ Structured logging to Loki (traceId, stepId, duration, outcome)
- ⚠️ Grafana dashboards + alerts
- ⚠️ Debug context propagation in error messages
Implementation Approach:
-
Activate OpenTelemetry in graph-module-engine
- Create NodeTracerProvider
- Wire to OTLPTraceExporter (Loki collector)
- Emit spans for: lifecycle.execute, module.install, permission.check, query.execute
-
Activate OpenTelemetry in studio-mcp
- Create tracer for tool invocations
- Emit span for: tool.register, tool.invoked, tool.completed
- Thread trace-id from request envelope through response
-
Thread request-id through tRPC stack
- Extract from request context
- Pass to graph-module-engine via ModuleInstallationEngineDependencies
-
Configure Loki + Promtail
- Promtail ships structured logs from containers
- Loki schema:
{traceId, toolName, stepId, userId, elapsed, outcome, errors}
-
Build Grafana dashboards
- Tool invocation timeline (x=time, y=tool)
- Error heatmap (x=error_code, y=frequency)
- Latency percentiles (p50, p95, p99 per tool)
- Per-user rate limits
Effort: High (3-4 weeks) Blocks: Production observability and agent debugging
Gap 5: Reproducibility & Deterministic Execution
Problem: One-shot execution. Cannot replay/verify agent decisions across environments.
Current Status: ⚠️ Partial
- Test fixtures exist (IntegrationFixtures)
- Deterministic module lifecycle (mostly)
- But: No execution trace format
- No replay capability
- No determinism guarantees (codegen order, timestamps)
- No tool result caching with content addressing
Requirements for Reproducibility:
Replay Workflow:
1. Capture execution trace: [stepId, toolName, inputs, outputs, timestamp]
2. Store with content hash: SHA256(inputs) → outputs
3. Later: replay with same inputs → verify outputs match hash
4. Or: run diagnostic on failure, capture new trace, replay to point of failure
Missing Pieces:
- ⚠️ Execution trace format (JSON schema with stepId, toolName, inputs/outputs)
- ⚠️ Content-addressed caching (input hash → output)
- ⚠️ Determinism baseline tests (same inputs → same outputs)
- ⚠️ Replay engine (playback trace + verify)
- ⚠️ Codegen determinism (no randomness, stable order)
- ⚠️ Timestamp pinning (use trace timestamp, not now())
Implementation Approach:
-
Define ExecutionTrace schema
interface ExecutionTrace { traceId: string; startedAt: ISO8601; steps: Step[]; } interface Step { stepId: string; index: number; toolName: string; inputs: Record<string, unknown>; outputs: Record<string, unknown>; durationMs: number; statusCode: number; contentHash: string; // SHA256(inputs+toolName+env) } -
Add caching layer (Redis)
- Key:
tool:${toolName}:${SHA256(inputs)} - Value:
{outputs, digest, timestamp} - TTL: configurable (default 24h)
- Key:
-
Add determinism tests per tool
- Run tool twice with same inputs
- Verify outputs match bit-for-bit
- Baseline measurements for latency
-
Add replay engine
- Load ExecutionTrace
- For each step: either use cached result or execute
- Compare digest
Effort: Medium (2-3 weeks) Blocks: Reliable agent workflows
Gap 6: Policy & Guardrails Framework ❌
Problem: No declarative rules engine for tool safety, authorization, or compliance.
Current State:
- RLS policies exist in DB (code-embedded)
- Authorization checks in handlers (imperative)
- No tool-level RBAC
- No constraint checking for operational goals
- No pre-approval patterns (codegen proposals → human review → apply)
Requirements for Policy Framework:
Policy Engine Stack:
Tool Invocation Request
↓
Policy Evaluation Engine
├─ Match rules: tool + user + resource + action
├─ Evaluate in context: can user approve compiling module X?
├─ Check constraints: does this compile exceed monthly quota?
├─ Determine approval mode: auto-approve, require-human, deny
└─ Emit audit: who requested, when, approved/denied, by whom
↓
Execution or Denial
Missing Pieces:
- ⚠️ Policy DSL (Rego-style rules or simpler subset)
- ⚠️ Tool-level RBAC matrix (role → allowed tools)
- ⚠️ Policy query engine (evaluate rules in context)
- ⚠️ Approval workflow integration (async human approval)
- ⚠️ Pre-approval proposal generation (show diff before apply)
- ⚠️ Audit logging (immutable record of approvals)
- ⚠️ Compliance rule library (SOC 2, GDPR, etc.)
Implementation Approach:
-
New package:
@savvi-studio/policy-enginePolicyinterface (rules + conditions)PolicyEvaluator(evaluate policies in context)ApprovalModeenum (AUTO, REQUIRE_HUMAN, DENY)PolicyDecisionresult (approved, reason, audit_id)
-
Embed in studio-mcp tool middleware
- Before tool execution: call policyEvaluator.evaluate()
- If REQUIRE_HUMAN: generate proposal, store in DB, return proposal_id
- If AUTO: execute immediately
- If DENY: return error
-
Add approval workflow UI (in studio-web)
- List pending approvals
- Show tool name, inputs, predicted outputs
- Approve/deny with audit comment
-
Add audit table schema
{tool_name, user_id, action, approved_by, timestamp, reason, status}
Effort: High (4-5 weeks) Blocks: Enterprise adoption, compliance
Part C: P1 Gaps (Critical for Production)
Gap 7: Schema-Driven Tool Ecosystem ❌
Problem: Tools operate on objects, not schema-aware contracts. Cannot auto-generate tools from domain model.
Current State:
- Codegen generates .ts files from frozen schemas
- Tools are manually created per operation
- No schema diff capability
- No template-aware errors
Missing:
- ⚠️ Schema diff tools (breaking changes detection)
- ⚠️ Tool builder DSL (declarative CRUD/query generation)
- ⚠️ Auto-generated mutation tools (from templates)
- ⚠️ Auto-generated query tools (from classifiers)
Implementation: New package @savvi-studio/schema-toolgen
Effort: High (4-5 weeks)
Gap 8: Async Patterns for Long-Running Operations ⚠️
Problem: Lifecycle hooks are sync; long-running operations (compilation) block.
Missing:
- ⚠️ Async lifecycle hooks (preCompile, postCompile async versions)
- ⚠️ Webhook support (tool completion callback)
- ⚠️ Event stream support (Kafka/SNS notifications)
- ⚠️ Polling mechanism (client polls for completion)
Effort: Medium (2-3 weeks)
Gap 9: Tool Result Versioning ⚠️
Problem: Breaking changes to tool output risk agent code.
Missing:
- ⚠️ Tool output schema versioning
- ⚠️ Migration framework
- ⚠️ Backward compatibility layer
Effort: Low (1 week)
Part D: P2 Gaps (Quality of Life)
Gap 10: Error Context & Breadcrumb Tracking ⚠️
Problem: Errors lose execution path context.
Missing:
- ⚠️ Breadcrumb collection (tool A → tool B → tool C → error at C)
- ⚠️ Error context enrichment (add relevant state at each layer)
- ⚠️ Error deduplication (group similar errors)
Effort: Low (1 week)
Part E: Implementation Roadmap (Priority-Based)
Phase 2A (Weeks 1-2): P0 Foundations
Week 1 Monday-Friday:
- Day 1-2: Commit Phase 1 studio-mcp
- Day 2-3: Gap 1 (closed loops) — step observer + step history API
- Day 4-5: Gap 2 (tool contracts) — preconditions + dependency graph
Week 2 Monday-Friday:
- Day 1-2: Gap 4 (observability) — OpenTelemetry instrumentation
- Day 3-4: Gap 3 (agent-safe primitives) — rate limiting + capability check
- Day 5: Testing + CI integration
Deliverables:
- Closed-loop workflow demonstration (observe→decide→act→repeat)
- Tool contract framework in place
- End-to-end tracing from MCP→engine→DB
- Rate limiting enforced in CI
Phase 2B (Weeks 3-4): Reproducibility + Policy
Week 3:
- Gap 5 (reproducibility) — execution trace + caching + replay
- Gap 6 (policy) — basic RBAC + approval workflow
Week 4:
- Testing + dashboards (Grafana)
- Documentation
Deliverables:
- Deterministic execution baseline established
- Policy engine with approval workflow
- Observability dashboards live
Phase 3+ (Weeks 5+): Extended Ecosystem
- Gap 7 (schema-driven tools)
- Gap 8 (async patterns)
- Gap 9 (tool versioning)
- Gap 10 (error context)
Part F: Success Metrics
| Metric | Target | When |
|---|---|---|
| Closed-loop agent workflows | 1 end-to-end workflow passes | Week 2 |
| Tool execution observability | 100% of tools emitting spans | Week 2 |
| Agent safety | Rate limiting + capability checks enforced | Week 2 |
| Reproducibility | Deterministic execution baseline | Week 4 |
| Policy enforcement | Approval workflows functional | Week 4 |
| Error resolution time | 50% reduction (tracing) | Week 4 |
| Schema-driven tools | 30% CRUD tools auto-generated | Week 6 |
Appendix A: Tool Capability Matrix (Current vs Target)
| Tool | Current | Target |
|---|---|---|
| modules.compile | ⚠️ Stubbed | ✅ Real + preconditions + rate limit |
| modules.codegen | ⚠️ Stubbed | ✅ Real + deterministic + caching |
| modules.validate | ⚠️ Basic | ✅ Full schema validation + preconditions |
| graph.install_plan | ❌ Stubbed | ✅ Real planning + policy enforcement |
| NEW: query_state | ✅ Works | ✅ Enhanced with step history |
| NEW: list_changes | ✅ Works | ✅ Enhanced with filesystem tracking |
| NEW: auto_gen_tool | ❌ Missing | ✅ Schema-driven tool creation |
Appendix B: Risk Assessment
| Risk | Probability | Severity | Mitigation |
|---|---|---|---|
| Observability adds 20% latency | Medium | Medium | Span sampling + async emission |
| Sandbox restricts valid use cases | Low | High | Whitelist safe operations, gradual rollout |
| Determinism breaks existing workflows | Low | High | Backward-compatible trace format |
| Policy rules become bottleneck | Medium | Medium | Cache policy decisions, pre-compute |
Document Status: ✅ Complete Last Updated: 2026-04-01 Next Review: After Gap 1-4 implementation (Week 2)