AI-Accelerated Development System¶

Author: Stig-Johnny Stoebakk Date: March 2026 Status: Production-proven methodology — running in production across 7 repositories

The Core Thesis¶

A small team with the right AI operating system ships faster than a large team without one.

This is not a theoretical framework. It is running in production today — 7 application repositories, 6 autonomous agents, multi-workspace orchestration, automated PR lifecycles, and a communication backbone that binds it all together. The system ships code, reviews code, manages releases, and coordinates across agents without human intervention on routine work.

This document describes the system as it exists, what it produces, and how to apply it to any product.

The governing constraint: every human should be doing work that no AI can yet do reliably. Everything else gets automated, augmented, or eliminated.

1. The Repo Is the Brain¶

Everything an agent needs to do its job lives in the repository. If it is not committed, it does not exist.

Knowledge in Slack threads, verbal conversations, or someone's head is invisible to every agent that runs after the moment it was spoken. This is not a workflow preference — it is a hard constraint of how AI agents operate.

Repo Knowledge Layer¶

/docs
  AGENTS.md            # Entry point for all AI agents (~100 lines, always current)
  architecture/        # System design, ADRs, data flows
  domains/             # Per-domain specs and design docs
  decisions/           # Architecture Decision Records
  quality/             # Standards, testing expectations, edge cases
/.claude
  skills/              # Reusable reference files — IDs, URLs, pricing, conventions
  CLAUDE.md            # Agent instructions, workflow rules, coordination protocol

AGENTS.md (or CLAUDE.md in our implementation) is the most important file in any project. It is short, always up to date, and tells any agent exactly where to find what it needs. Every workspace, every agent, every session reads it first.

Skills are version-controlled knowledge files that any agent can read and any agent can update. When an agent discovers that a stored value is wrong (a bundle ID, a pricing tier, a URL), it updates the skill on its feature branch. Knowledge improves automatically as the system works.

Spec-First Development¶

Every feature begins as a committed markdown spec, not a prompt in a chat window:

Write the spec and commit it. A short document describing inputs, outputs, state transitions, and edge cases. AI drafts from a conversation; the engineer edits to truth; the result is committed before any code is written.
Generate tests from the spec. AI writes tests against the committed spec. Tests are committed before implementation begins. Implementation is done when tests pass.
Generate the implementation. AI produces the code. The engineer reviews for correctness against the spec.
AI writes the PR description, changelog entry, and updates the relevant domain doc. The spec in docs/ is updated to reflect what was actually built, not what was planned.

This makes every task legible to agents — now and in every future session.

2. Multi-Agent Orchestration¶

The Architecture in Production¶

This is not a diagram on a whiteboard. This is running today:

+------------------+
|  Orchestrator    |  Plans work, creates GitHub issues with acceptance
|  (Workspace 3)  |  criteria, monitors PRs, unblocks other agents.
+--------+---------+  Does NOT implement features.
         |
         | GitHub Issues
         v
+------------------+  +------------------+  +------------------+
|  Executor 4     |  |  Executor 5     |  |  Executor 6     |
|  (Workspace 4)  |  |  (Workspace 5)  |  |  (Workspace 6)  |
+--------+---------+  +--------+---------+  +--------+---------+
         |                     |                      |
         | Feature Branches    | Feature Branches     | Feature Branches
         v                     v                      v
+------------------------------------------------------------------+
|  GitHub (shared remote)                                          |
|  - Branch protection: require PR, CI pass, review                |
|  - Squash merge only                                             |
|  - Never push directly to main                                   |
+------------------------------------------------------------------+
         |
         v
+------------------+     +------------------+
|  Review Agent   |     |  CI/CD Pipeline  |
|  (Always-on)    |     |  GitHub Actions   |
|  Reviews every  |     |  Self-hosted      |
|  PR within      |     |  runners (macOS   |
|  seconds        |     |  + Linux k3s)     |
+------------------+     +------------------+

Always-On Agents¶

The system runs 24/7 on dedicated hardware. Agents do not require a human to be present:

Agent	Hardware	Role	Availability
Orchestrator	MacBook Air M4	Plans, coordinates, unblocks	Session-based
Executor(s)	MacBook Air M4	Feature development, bug fixes	Session-based
Review-E	Raspberry Pi 4	PR review, approval, auto-merge	24/7 always-on
Pi-E	Raspberry Pi 4	Lightweight tasks, monitoring	24/7 always-on
Volt-E	VPS (Boston)	Security research, non-macOS tasks	24/7 always-on

Always-on agents run on OpenClaw (open-source agent runtime). They poll for work, claim issues, create PRs, and respond to review requests without human intervention.

Pull-Based Work Distribution¶

Agents are not assigned tasks. They claim them:

Orchestrator creates GitHub issues with clear acceptance criteria and priority labels
Agents check the backlog and pick the top unclaimed issue
Agent claims the issue (comments on GitHub + posts to Discord + creates feature branch)
Other agents see the claim and pick the next issue
Competitive claiming: Pi-E claims in ~16s, Volt-E in ~22s — first to comment wins

This eliminates bottlenecks. No human needs to be online to assign work. The backlog is the queue. Priority labels and issue age determine order.

Session Persistence¶

Agents save their state after every significant action. Other agents read these sessions to know what's in progress:

What issue each workspace is working on
What PRs are open
What blockers exist
What was learned during the session

This prevents duplicate work and enables seamless handoff between sessions.

3. Discord as the Nervous System¶

This is the key architectural insight that makes multi-agent coordination work in practice.

Discord is not a chat tool in this system. It is the event bus. Every significant action produces a Discord message. Every agent reads Discord to understand the current state of the world.

Channel Architecture¶

Channel	Purpose	Who Posts	Who Reads
#tasks	Task delegation, status updates, PR notifications	All agents	All agents + human
#admin	Questions, blockers, human decisions	Agents needing help	Human
#review-e	Review agent activity	Review-E	Orchestrator
#pi-e	Pi-E private channel	Pi-E	Orchestrator
#volt-e	Volt-E private channel	Volt-E	Orchestrator

The Automated PR Lifecycle¶

This is the full cycle, zero human intervention on routine PRs:

Agent creates PR
       |
       v
GitHub Action: request-review.yml
  --> Posts to Discord #tasks with PR details
       |
       v
Review-E sees the notification
  --> Reads the PR diff
  --> Posts review comments or approves
       |
       v
GitHub Action: auto-merge.yml
  --> Waits for CI pass + review approval
  --> Resolves Copilot review threads
  --> Squash merges to main
       |
       v
GitHub Action: build-status-to-discord.yml
  --> Posts merge confirmation to Discord #tasks
       |
       v
GitHub Action: release-please.yml
  --> Analyzes conventional commits
  --> Opens Release PR if releasable changes exist
  --> Auto-merges release PR
       |
       v
GitHub Action: release-notes-to-discord.yml
  --> Posts release notes to Discord

Measured result: PR created to merged in under 2 minutes for routine changes. An agent can create 7 PRs across 7 repos in under 2 minutes via the GitHub API.

Agent Delegation via Discord¶

The orchestrator delegates work through Discord mentions. Agents are mentioned using Discord user IDs (plain text @name does not work for bots):

"<@BOT_USER_ID> Please pick up issue #123 in repo-name.
Acceptance criteria are in the issue. Create a feature branch
and PR when ready."

The agent sees the mention, reads the issue, claims it, and begins work. The orchestrator monitors progress through Discord messages and GitHub PR status.

Why Discord and Not Slack/Linear/Jira¶

Free for unlimited messages and channels
Bot API is simple and well-documented
Webhook support for GitHub Actions integration
MCP server available — agents read and write messages programmatically
Real-time: agents see messages within seconds
Channels provide natural separation of concerns
Message history serves as an audit trail

4. The Engineering Rig¶

Role Separation¶

The core principle: the agent that produces a thing cannot approve that thing. Separation is structural, not cultural.

Why test separation matters: An implementation agent writing its own tests will write tests that confirm its assumptions. Bugs born from misunderstanding the requirement will be hidden by tests that share the same misunderstanding.

Why review separation matters: LLMs reviewing their own code exhibit the same rationalization patterns humans do — they read what they intended to write, not what they wrote.

Hard Escalation Triggers¶

These always require human review, no exceptions. Apply to every project:

Trigger	Reason
Changes to authentication or session management	Irreversible security impact
Changes to payment processing code	Financial and compliance risk
Database schema migrations	Data loss risk
Changes to data retention or deletion logic	GDPR/privacy liability
Dependency version bumps to major versions	Breaking change risk
Any change to a file touched by < 3 total commits	High-risk area, low familiarity

State Machines for Critical Flows¶

For any multi-step workflow where correctness matters — reservations, payments, onboarding, support escalation — use an explicit state machine (e.g., XState) between the LLM and the system of record.

The principle: LLM handles language, state machine handles logic.

The LLM generates natural language and extracts intent. The state machine controls what transitions are allowed, what data must be collected before proceeding, and when to commit to the database. The LLM cannot skip steps or hallucinate state changes because the state machine does not permit them.

State machine snapshots serialize to a cache layer (Redis), allowing conversations across async channels (email, WhatsApp, SMS) to pause and resume with full state intact.

This pattern applies to any domain:

Domain	States	Why It Matters
Reservations	greeting > collection > availability > confirmation > confirmed	Guest data must be complete before committing
Payments	initiated > authorized > captured > settled	Financial state must be deterministic
Onboarding	signup > configuration > verification > active	Each step has prerequisites
Support	received > classified > assigned > resolved	Routing and escalation must follow rules

CI/CD Infrastructure¶

Self-hosted runners eliminate dependency on GitHub's shared infrastructure:

Runner Type	Hardware	Use Case
macOS (ARM64)	MacBook Air M4	iOS builds, Xcode, per-repo dedicated runners
Linux (ARC)	k3s cluster on Dell server	CI for all repos, each repo gets its own ARC scale set
Docker-in-Docker	k3s cluster	Integration tests requiring containers

All repos share identical workflow files: request-review.yml, build-status-to-discord.yml, pr-comments-to-discord.yml, auto-merge.yml, auto-resolve-copilot-conversations.yml, notify-failure.yml, release-please.yml, release-notes-to-discord.yml.

Workflow consistency across repos means a fix to the review process propagates everywhere immediately.

Release Management¶

Conventional commits enforced — feat:, fix:, refactor:, docs:, chore:, ci:
release-please runs on merge to main, auto-generates changelog and version bump
Staging deploy automatic on every merge to main
Production deploy triggered by merging the release PR
Feature flags for anything with user-facing impact

Velocity Benchmark¶

A functioning engineering rig should produce approximately 3-4 merged PRs per engineer per day. This is a diagnostic, not an aspiration. If throughput is significantly below this, the rig has a problem: specs are too vague, agent instructions are stale, task decomposition is too coarse, or the review loop is too slow.

Measured in production: A single always-on agent (Pi-E or Volt-E) can deliver 7 PRs across 7 repos in under 2 minutes for well-defined tasks. The bottleneck is never the agent. It is always the quality of the spec.

5. The MCP Ecosystem (Tool Layer)¶

Agents do not operate in isolation. They connect to external services through Model Context Protocol (MCP) servers — a standardized interface that gives AI agents authenticated access to APIs, databases, and tools.

Production MCP Servers¶

MCP Server	Purpose
Discord	Read/write messages, delegate tasks, monitor status
GitHub	Issues, PRs, code review, repository management
Bitwarden	Secret management — agents retrieve credentials without exposure
App Store Connect	iOS app management, builds, TestFlight, submissions
RevenueCat	Subscription management, product configuration
Playwright	Browser automation for tasks requiring UI interaction
Pushbullet SMS	2FA code retrieval for automated logins
TOTP	Time-based one-time passwords for service authentication
Cloudflare	DNS, workers, R2 storage, D1 databases
Memory	Persistent context across sessions — decisions, learnings, state

Why MCP Matters¶

Without MCP, every agent integration is custom code. With MCP, agents connect to any service through a consistent interface. Adding a new capability (e.g., "agents can now manage DNS") means deploying one MCP server, not modifying every agent.

Security model: Agents never see raw secrets. Bitwarden MCP provides credentials on demand. Secrets are piped through stdin, never passed as command arguments, never logged, never stored in temp files. Terminal history and conversation logs remain clean.

Building New MCP Servers¶

For any product with an API, wrapping it as an MCP server makes it immediately accessible to every agent in the system. This is the pattern for making any product AI-native:

Map API endpoints to MCP tool definitions
Handle authentication via environment variables
Deploy as a lightweight Node.js process
Register in the agent configuration

Time to build: 1-2 hours for a well-documented API. The result: every agent in the fleet can interact with the service.

6. The Development Stack (Recommended Defaults)¶

These are sensible defaults for most projects. Override per project where justified, and document the decision in an ADR.

Layer	Default	Rationale
Monorepo	Turborepo + npm/pnpm workspaces	Shared types, incremental builds, single CI pipeline
Language	TypeScript throughout	End-to-end type safety, AI generates TypeScript well
API	Fastify v4	Fastest Node.js framework, native TypeScript, WebSocket support
Frontend	Next.js (App Router) or SwiftUI (iOS)	Web or native depending on product
Database	PostgreSQL + Prisma ORM	Type-safe queries, migration management, scales to enterprise
Cache/Sessions	Redis	Sub-millisecond reads, pub/sub, TTL-based expiry
State Management	XState v5	Deterministic state machines, actor model, serializable snapshots
CI/CD	GitHub Actions + self-hosted runners	Full control, no per-minute billing, macOS + Linux
Communication	Discord (bot API + webhooks)	Free, real-time, MCP-accessible, audit trail
Secrets	Bitwarden (CLI + MCP)	Agents access credentials without exposure
Deployment	Cloud Run / Container Apps	Serverless containers, scales to zero, uses cloud credits

Infrastructure Strategy¶

Match infrastructure to funding:

Situation	Approach
Google for Startups credits	GCP: Cloud Run + Cloud SQL + Memorystore
Azure for Startups credits	Azure: Container Apps + Azure PostgreSQL + Azure Cache
AWS Activate credits	AWS: ECS Fargate + RDS + ElastiCache
No cloud credits	Render or Railway (managed, predictable pricing)
Scale (5,000+ users)	Kubernetes on whichever cloud you're already on

Never pay cash for infrastructure you can get through startup credits. Never run on a managed platform when you have six figures in cloud credits available.

7. AI Leverage Beyond Engineering¶

The engineering rig is the obvious application. The less obvious — and equally important — application is every other business function. A four-person team can operate like a twelve-person team if each function runs with an AI co-pilot.

The Structure-First Principle¶

AI performs reliably when inputs are structured and outputs are evaluable. The fix for every function is the same: define the atomic unit.

Function	Atomic Unit	Required Inputs	Output Format	Feedback Signal
Product	Feature spec	Problem statement, constraints	Markdown with user stories, acceptance criteria	Engineering starts without clarification
Sales	Outreach sequence	Prospect, contact role, hook	Email + follow-up + connection note	Reply rate, meeting conversion
Support	Ticket resolution	Ticket text, customer history, docs	Resolution message + internal note	CSAT, re-open rate
Marketing	Content piece	Audience, tone, CTA, keyword	Markdown with metadata	Publish rate, engagement
Analytics	Insight report	Question, data source, date range	SQL + narrative + recommendation	Decision made or escalated
Finance	Financial update	Actuals, plan, period	P&L commentary, variance analysis	Founder reviews in < 30 min

What AI Must Never Own¶

No hedging. These are firm:

Do Not Automate	Reason
Contract signing or financial commitments	Legal and financial liability
Hiring and firing decisions	Irreversible impact on people
Crisis communication	Reputational risk compounds rapidly
Irreversible decisions at meaningful cost	AI errors are probabilistic
Key relationship management	Trust cannot be automated
Final merge on auth, payment, and data deletion code	Blast radius too high
App Store / production submissions	Untested releases waste review time

What AI Runs Autonomously¶

These require no human approval loop:

PR review and auto-merge (routine changes) — continuous
System health monitoring — continuous
LLM cost-per-task tracking — daily
Dependency security scanning — on every PR
Release management (changelog, version bump) — on merge to main
Discord notifications for all state changes — continuous
Competitive monitoring digest — weekly

8. Preventing Drift and Maintaining Control¶

AI-augmented orgs have a specific failure mode: gradual drift. AI output quality degrades, humans stop reading carefully, and errors compound invisibly.

The Weekly Rhythm¶

Day	Control Practice
Monday	Founder reviews: competitive digest, health summary, cash position. 30 minutes.
Wednesday	CTO reviews: system health, LLM cost trends, sprint velocity, open incidents. Checks that agent instructions reflect the current codebase. 30 minutes.
Friday	Team sync (60 min): sprint review, customer stories, one thing AI got wrong this week.
Monthly	Full review: Are AI outputs still accurate? Have prompts drifted? What new automation can replace a recurring manual task?

Drift Detection¶

For every AI-automated function, define a ground truth check:

AI Function	Ground Truth Check
Code review AI	Each sprint: CTO manually reviews one PR the AI marked clean. Compare findings.
Customer feedback synthesis	Monthly: founder reads 10 raw tickets. Does the AI brief match?
Financial model	Monthly: reconcile AI-generated P&L against bank statements. Variance > 5% triggers prompt review.
Outreach personalisation	Weekly: spot-check 3 drafts against actual prospect profile. Flag hallucinated details.
Agent instructions accuracy	Every sprint: CTO verifies `AGENTS.md` / `CLAUDE.md` matches the actual codebase. Stale entries get updated before new work begins.

Skill Maintenance (Self-Healing Knowledge)¶

Every agent is responsible for keeping shared knowledge accurate. When an agent encounters a skill file that is wrong or outdated:

Fix it on the current feature branch
The fix propagates to all agents on next pull
Knowledge improves as a side effect of doing work

This means the system gets smarter over time without dedicated maintenance effort.

Model Upgrade Strategy¶

Pin all production agents to specific model versions — never use "latest"
New versions tested in shadow environment first
Upgrade one rig at a time, not all simultaneously
30-day rollback window
Behavioral regression triggers immediate rollback, not a prompt fix

Kill Switches¶

Every rig has a configuration flag that disables AI and falls back to human handling. This must be tested before it is needed.

9. Cost Awareness¶

Three Failure Modes¶

1. Using the largest model everywhere. A classification task does not need Opus. Match model to task. Default to the smallest model that handles the task reliably.

2. Long chains with large contexts. Each hop that passes a large context costs proportionally. Design chains to pass only what the next step needs.

3. No budget visibility. Instrument every rig with token usage logging by function, by task type, and by model. Set alert thresholds. Review monthly.

Reference Costs (Early Stage)¶

Rig	Monthly Estimate
Engineering (active development)	$200-600
Support (500 Tier 1 tickets)	$50-150
Marketing (20 content pieces)	$10-30
Sales outreach (100 prospects)	$20-50
Always-on agents (3 agents, 24/7)	$50-100
Discord + MCP infrastructure	$0 (free tier)

Infrastructure cost for the agent fleet itself is near zero. Raspberry Pis cost $5/year in electricity. The VPS is the only recurring cost. Discord is free. GitHub Actions on self-hosted runners is free.

10. Building This While Shipping¶

Do not pause to build the operating system. Build it in the order that delivers the most leverage against the constraints of the current stage.

Phase 1 — Now (2-3 people, building product): Stand up the engineering rig. Monorepo, agent instructions, spec-first workflow, role-separated agents, CI/CD, Discord channels. The knowledge layer goes in at the same time as the first feature — not later.

Phase 2 — First customers (3-4 people): Add support rig and onboarding automation. Deploy always-on agents for PR review and routine tasks. Add sales outreach automation for the commercial lead.

Phase 3 — Growth (5-8 people): Add analytics rig, finance automation, competitive monitoring. Each new hire gets an AI rig on day one and agent instructions as their first onboarding document.

At each phase: define the atomic unit, structure the inputs, build the rig, instrument it, hire one human to govern it.

The Three Principles¶

No automation without a use case. Every AI workflow is built because someone needs to do that task this week. Not before.
Automate the task, not the job. The goal is to give one person the output of three. Every automation decision is evaluated against this test.
One new automation per sprint. More creates unreviewed automation that drifts. Less leaves leverage on the table.

11. Applying This to a New Project¶

For any new product, the setup sequence is:

Create the monorepo with the standard structure and agent instructions
Set up Discord channels — #tasks, #admin, agent-specific channels
Deploy CI/CD — GitHub Actions with self-hosted runners, auto-merge, release-please
Configure MCP servers — Discord, Bitwarden, GitHub, plus any product-specific APIs
Write the first domain spec in docs/domains/
Record architectural decisions as ADRs in docs/decisions/
Ship the first feature using spec-first development
Add non-engineering rigs as the team and customer base grow

Time from zero to first merged PR: under 4 hours with the templates and tooling already built.

The system is the constant. The product changes. Every project benefits from the lessons learned in the previous one — the prompts get better, the specs get tighter, the rigs get faster, and the MCP ecosystem grows.

12. What This System Needs Next¶

The system is production-proven but not yet productized. To make it sellable and repeatable:

Packaging¶

Gap	What's Needed	Effort
Project template	Cookiecutter/degit template with monorepo structure, AGENTS.md, CI/CD workflows, Discord bot setup, MCP configs	1-2 days
Onboarding guide	Step-by-step for a new team: "from zero to shipping in one day"	1 day
Agent bootstrap script	Single command that provisions Discord channels, GitHub repo, CI runners, MCP servers	2-3 days
Configuration UI	Web dashboard to configure agents, view status, manage skills (currently all file-based)	1-2 weeks

Hardening¶

Gap	What's Needed	Effort
Cost dashboard	Real-time token usage per agent, per task type, per model. Currently tracked but not visualized.	2-3 days
Agent health monitoring	Automated alerts when an agent stops responding, fails repeatedly, or drifts from expected output patterns	3-5 days
Audit trail	Structured log of every agent action (issue claimed, PR created, review posted, merge executed) queryable by time, agent, and repo	3-5 days
Multi-tenant isolation	Currently all agents share one Discord server. For client deployments, need per-client Discord servers or channel-based isolation	1 week

Documentation¶

Gap	What's Needed	Effort
Architecture diagrams	C4 diagrams of the full system (agent topology, MCP connections, CI/CD flow, Discord event bus)	1 day
Case studies	Documented examples of the system in action — PR velocity, multi-repo delivery, agent coordination	1 day
Video walkthrough	Screen recording of the full lifecycle: issue created > agent claims > PR > review > merge > release > Discord notification	Half day

Scale¶

Gap	What's Needed	Effort
Agent fleet management	Currently agents are manually provisioned on specific hardware. Need a deployment system that provisions agents on-demand.	1-2 weeks
Cross-project orchestration	Currently each project is independent. For a company running 5 products, need a meta-orchestrator that allocates agent capacity across projects by priority.	2-3 weeks
Client-facing dashboard	For selling this as a service: client sees agent activity, PR velocity, cost breakdown, health status.	2-3 weeks

13. The Number That Matters¶

At steady state, a company running this system can operate at significant scale with 8-10 people covering all functions described here.

The engineering rig ships as fast as a 10-person dev team. The support rig handles the ticket volume of a 5-person support team. The marketing rig produces at the pace of a 3-person content team.

This is not a productivity multiplier. It is a structural cost model that no competitor built on traditional headcount can match.

The humans govern. The rigs do the work.

This is not a proposal. It is a system description. The system exists. It ships code every day. The question is not whether it works — it is what to point it at next.