Skip to content

AI-Accelerated Development System

Author: Stig-Johnny Stoebakk Date: March 2026 Status: Production-proven methodology — running in production across 7 repositories


The Core Thesis

A small team with the right AI operating system ships faster than a large team without one.

This is not a theoretical framework. It is running in production today — 7 application repositories, 6 autonomous agents, multi-workspace orchestration, automated PR lifecycles, and a communication backbone that binds it all together. The system ships code, reviews code, manages releases, and coordinates across agents without human intervention on routine work.

This document describes the system as it exists, what it produces, and how to apply it to any product.

The governing constraint: every human should be doing work that no AI can yet do reliably. Everything else gets automated, augmented, or eliminated.


1. The Repo Is the Brain

Everything an agent needs to do its job lives in the repository. If it is not committed, it does not exist.

Knowledge in Slack threads, verbal conversations, or someone's head is invisible to every agent that runs after the moment it was spoken. This is not a workflow preference — it is a hard constraint of how AI agents operate.

Repo Knowledge Layer

/docs
  AGENTS.md            # Entry point for all AI agents (~100 lines, always current)
  architecture/        # System design, ADRs, data flows
  domains/             # Per-domain specs and design docs
  decisions/           # Architecture Decision Records
  quality/             # Standards, testing expectations, edge cases
/.claude
  skills/              # Reusable reference files — IDs, URLs, pricing, conventions
  CLAUDE.md            # Agent instructions, workflow rules, coordination protocol

AGENTS.md (or CLAUDE.md in our implementation) is the most important file in any project. It is short, always up to date, and tells any agent exactly where to find what it needs. Every workspace, every agent, every session reads it first.

Skills are version-controlled knowledge files that any agent can read and any agent can update. When an agent discovers that a stored value is wrong (a bundle ID, a pricing tier, a URL), it updates the skill on its feature branch. Knowledge improves automatically as the system works.

Spec-First Development

Every feature begins as a committed markdown spec, not a prompt in a chat window:

  1. Write the spec and commit it. A short document describing inputs, outputs, state transitions, and edge cases. AI drafts from a conversation; the engineer edits to truth; the result is committed before any code is written.
  2. Generate tests from the spec. AI writes tests against the committed spec. Tests are committed before implementation begins. Implementation is done when tests pass.
  3. Generate the implementation. AI produces the code. The engineer reviews for correctness against the spec.
  4. AI writes the PR description, changelog entry, and updates the relevant domain doc. The spec in docs/ is updated to reflect what was actually built, not what was planned.

This makes every task legible to agents — now and in every future session.


2. Multi-Agent Orchestration

The Architecture in Production

This is not a diagram on a whiteboard. This is running today:

+------------------+
|  Orchestrator    |  Plans work, creates GitHub issues with acceptance
|  (Workspace 3)  |  criteria, monitors PRs, unblocks other agents.
+--------+---------+  Does NOT implement features.
         |
         | GitHub Issues
         v
+------------------+  +------------------+  +------------------+
|  Executor 4     |  |  Executor 5     |  |  Executor 6     |
|  (Workspace 4)  |  |  (Workspace 5)  |  |  (Workspace 6)  |
+--------+---------+  +--------+---------+  +--------+---------+
         |                     |                      |
         | Feature Branches    | Feature Branches     | Feature Branches
         v                     v                      v
+------------------------------------------------------------------+
|  GitHub (shared remote)                                          |
|  - Branch protection: require PR, CI pass, review                |
|  - Squash merge only                                             |
|  - Never push directly to main                                   |
+------------------------------------------------------------------+
         |
         v
+------------------+     +------------------+
|  Review Agent   |     |  CI/CD Pipeline  |
|  (Always-on)    |     |  GitHub Actions   |
|  Reviews every  |     |  Self-hosted      |
|  PR within      |     |  runners (macOS   |
|  seconds        |     |  + Linux k3s)     |
+------------------+     +------------------+

Always-On Agents

The system runs 24/7 on dedicated hardware. Agents do not require a human to be present:

Agent Hardware Role Availability
Orchestrator MacBook Air M4 Plans, coordinates, unblocks Session-based
Executor(s) MacBook Air M4 Feature development, bug fixes Session-based
Review-E Raspberry Pi 4 PR review, approval, auto-merge 24/7 always-on
Pi-E Raspberry Pi 4 Lightweight tasks, monitoring 24/7 always-on
Volt-E VPS (Boston) Security research, non-macOS tasks 24/7 always-on

Always-on agents run on OpenClaw (open-source agent runtime). They poll for work, claim issues, create PRs, and respond to review requests without human intervention.

Pull-Based Work Distribution

Agents are not assigned tasks. They claim them:

  1. Orchestrator creates GitHub issues with clear acceptance criteria and priority labels
  2. Agents check the backlog and pick the top unclaimed issue
  3. Agent claims the issue (comments on GitHub + posts to Discord + creates feature branch)
  4. Other agents see the claim and pick the next issue
  5. Competitive claiming: Pi-E claims in ~16s, Volt-E in ~22s — first to comment wins

This eliminates bottlenecks. No human needs to be online to assign work. The backlog is the queue. Priority labels and issue age determine order.

Session Persistence

Agents save their state after every significant action. Other agents read these sessions to know what's in progress:

  • What issue each workspace is working on
  • What PRs are open
  • What blockers exist
  • What was learned during the session

This prevents duplicate work and enables seamless handoff between sessions.


3. Discord as the Nervous System

This is the key architectural insight that makes multi-agent coordination work in practice.

Discord is not a chat tool in this system. It is the event bus. Every significant action produces a Discord message. Every agent reads Discord to understand the current state of the world.

Channel Architecture

Channel Purpose Who Posts Who Reads
#tasks Task delegation, status updates, PR notifications All agents All agents + human
#admin Questions, blockers, human decisions Agents needing help Human
#review-e Review agent activity Review-E Orchestrator
#pi-e Pi-E private channel Pi-E Orchestrator
#volt-e Volt-E private channel Volt-E Orchestrator

The Automated PR Lifecycle

This is the full cycle, zero human intervention on routine PRs:

Agent creates PR
       |
       v
GitHub Action: request-review.yml
  --> Posts to Discord #tasks with PR details
       |
       v
Review-E sees the notification
  --> Reads the PR diff
  --> Posts review comments or approves
       |
       v
GitHub Action: auto-merge.yml
  --> Waits for CI pass + review approval
  --> Resolves Copilot review threads
  --> Squash merges to main
       |
       v
GitHub Action: build-status-to-discord.yml
  --> Posts merge confirmation to Discord #tasks
       |
       v
GitHub Action: release-please.yml
  --> Analyzes conventional commits
  --> Opens Release PR if releasable changes exist
  --> Auto-merges release PR
       |
       v
GitHub Action: release-notes-to-discord.yml
  --> Posts release notes to Discord

Measured result: PR created to merged in under 2 minutes for routine changes. An agent can create 7 PRs across 7 repos in under 2 minutes via the GitHub API.

Agent Delegation via Discord

The orchestrator delegates work through Discord mentions. Agents are mentioned using Discord user IDs (plain text @name does not work for bots):

"<@BOT_USER_ID> Please pick up issue #123 in repo-name.
Acceptance criteria are in the issue. Create a feature branch
and PR when ready."

The agent sees the mention, reads the issue, claims it, and begins work. The orchestrator monitors progress through Discord messages and GitHub PR status.

Why Discord and Not Slack/Linear/Jira

  • Free for unlimited messages and channels
  • Bot API is simple and well-documented
  • Webhook support for GitHub Actions integration
  • MCP server available — agents read and write messages programmatically
  • Real-time: agents see messages within seconds
  • Channels provide natural separation of concerns
  • Message history serves as an audit trail

4. The Engineering Rig

Role Separation

The core principle: the agent that produces a thing cannot approve that thing. Separation is structural, not cultural.

Why test separation matters: An implementation agent writing its own tests will write tests that confirm its assumptions. Bugs born from misunderstanding the requirement will be hidden by tests that share the same misunderstanding.

Why review separation matters: LLMs reviewing their own code exhibit the same rationalization patterns humans do — they read what they intended to write, not what they wrote.

Hard Escalation Triggers

These always require human review, no exceptions. Apply to every project:

Trigger Reason
Changes to authentication or session management Irreversible security impact
Changes to payment processing code Financial and compliance risk
Database schema migrations Data loss risk
Changes to data retention or deletion logic GDPR/privacy liability
Dependency version bumps to major versions Breaking change risk
Any change to a file touched by < 3 total commits High-risk area, low familiarity

State Machines for Critical Flows

For any multi-step workflow where correctness matters — reservations, payments, onboarding, support escalation — use an explicit state machine (e.g., XState) between the LLM and the system of record.

The principle: LLM handles language, state machine handles logic.

The LLM generates natural language and extracts intent. The state machine controls what transitions are allowed, what data must be collected before proceeding, and when to commit to the database. The LLM cannot skip steps or hallucinate state changes because the state machine does not permit them.

State machine snapshots serialize to a cache layer (Redis), allowing conversations across async channels (email, WhatsApp, SMS) to pause and resume with full state intact.

This pattern applies to any domain:

Domain States Why It Matters
Reservations greeting > collection > availability > confirmation > confirmed Guest data must be complete before committing
Payments initiated > authorized > captured > settled Financial state must be deterministic
Onboarding signup > configuration > verification > active Each step has prerequisites
Support received > classified > assigned > resolved Routing and escalation must follow rules

CI/CD Infrastructure

Self-hosted runners eliminate dependency on GitHub's shared infrastructure:

Runner Type Hardware Use Case
macOS (ARM64) MacBook Air M4 iOS builds, Xcode, per-repo dedicated runners
Linux (ARC) k3s cluster on Dell server CI for all repos, each repo gets its own ARC scale set
Docker-in-Docker k3s cluster Integration tests requiring containers

All repos share identical workflow files: request-review.yml, build-status-to-discord.yml, pr-comments-to-discord.yml, auto-merge.yml, auto-resolve-copilot-conversations.yml, notify-failure.yml, release-please.yml, release-notes-to-discord.yml.

Workflow consistency across repos means a fix to the review process propagates everywhere immediately.

Release Management

  • Conventional commits enforced — feat:, fix:, refactor:, docs:, chore:, ci:
  • release-please runs on merge to main, auto-generates changelog and version bump
  • Staging deploy automatic on every merge to main
  • Production deploy triggered by merging the release PR
  • Feature flags for anything with user-facing impact

Velocity Benchmark

A functioning engineering rig should produce approximately 3-4 merged PRs per engineer per day. This is a diagnostic, not an aspiration. If throughput is significantly below this, the rig has a problem: specs are too vague, agent instructions are stale, task decomposition is too coarse, or the review loop is too slow.

Measured in production: A single always-on agent (Pi-E or Volt-E) can deliver 7 PRs across 7 repos in under 2 minutes for well-defined tasks. The bottleneck is never the agent. It is always the quality of the spec.


5. The MCP Ecosystem (Tool Layer)

Agents do not operate in isolation. They connect to external services through Model Context Protocol (MCP) servers — a standardized interface that gives AI agents authenticated access to APIs, databases, and tools.

Production MCP Servers

MCP Server Purpose
Discord Read/write messages, delegate tasks, monitor status
GitHub Issues, PRs, code review, repository management
Bitwarden Secret management — agents retrieve credentials without exposure
App Store Connect iOS app management, builds, TestFlight, submissions
RevenueCat Subscription management, product configuration
Playwright Browser automation for tasks requiring UI interaction
Pushbullet SMS 2FA code retrieval for automated logins
TOTP Time-based one-time passwords for service authentication
Cloudflare DNS, workers, R2 storage, D1 databases
Memory Persistent context across sessions — decisions, learnings, state

Why MCP Matters

Without MCP, every agent integration is custom code. With MCP, agents connect to any service through a consistent interface. Adding a new capability (e.g., "agents can now manage DNS") means deploying one MCP server, not modifying every agent.

Security model: Agents never see raw secrets. Bitwarden MCP provides credentials on demand. Secrets are piped through stdin, never passed as command arguments, never logged, never stored in temp files. Terminal history and conversation logs remain clean.

Building New MCP Servers

For any product with an API, wrapping it as an MCP server makes it immediately accessible to every agent in the system. This is the pattern for making any product AI-native:

  1. Map API endpoints to MCP tool definitions
  2. Handle authentication via environment variables
  3. Deploy as a lightweight Node.js process
  4. Register in the agent configuration

Time to build: 1-2 hours for a well-documented API. The result: every agent in the fleet can interact with the service.


These are sensible defaults for most projects. Override per project where justified, and document the decision in an ADR.

Layer Default Rationale
Monorepo Turborepo + npm/pnpm workspaces Shared types, incremental builds, single CI pipeline
Language TypeScript throughout End-to-end type safety, AI generates TypeScript well
API Fastify v4 Fastest Node.js framework, native TypeScript, WebSocket support
Frontend Next.js (App Router) or SwiftUI (iOS) Web or native depending on product
Database PostgreSQL + Prisma ORM Type-safe queries, migration management, scales to enterprise
Cache/Sessions Redis Sub-millisecond reads, pub/sub, TTL-based expiry
State Management XState v5 Deterministic state machines, actor model, serializable snapshots
CI/CD GitHub Actions + self-hosted runners Full control, no per-minute billing, macOS + Linux
Communication Discord (bot API + webhooks) Free, real-time, MCP-accessible, audit trail
Secrets Bitwarden (CLI + MCP) Agents access credentials without exposure
Deployment Cloud Run / Container Apps Serverless containers, scales to zero, uses cloud credits

Infrastructure Strategy

Match infrastructure to funding:

Situation Approach
Google for Startups credits GCP: Cloud Run + Cloud SQL + Memorystore
Azure for Startups credits Azure: Container Apps + Azure PostgreSQL + Azure Cache
AWS Activate credits AWS: ECS Fargate + RDS + ElastiCache
No cloud credits Render or Railway (managed, predictable pricing)
Scale (5,000+ users) Kubernetes on whichever cloud you're already on

Never pay cash for infrastructure you can get through startup credits. Never run on a managed platform when you have six figures in cloud credits available.


7. AI Leverage Beyond Engineering

The engineering rig is the obvious application. The less obvious — and equally important — application is every other business function. A four-person team can operate like a twelve-person team if each function runs with an AI co-pilot.

The Structure-First Principle

AI performs reliably when inputs are structured and outputs are evaluable. The fix for every function is the same: define the atomic unit.

Function Atomic Unit Required Inputs Output Format Feedback Signal
Product Feature spec Problem statement, constraints Markdown with user stories, acceptance criteria Engineering starts without clarification
Sales Outreach sequence Prospect, contact role, hook Email + follow-up + connection note Reply rate, meeting conversion
Support Ticket resolution Ticket text, customer history, docs Resolution message + internal note CSAT, re-open rate
Marketing Content piece Audience, tone, CTA, keyword Markdown with metadata Publish rate, engagement
Analytics Insight report Question, data source, date range SQL + narrative + recommendation Decision made or escalated
Finance Financial update Actuals, plan, period P&L commentary, variance analysis Founder reviews in < 30 min

What AI Must Never Own

No hedging. These are firm:

Do Not Automate Reason
Contract signing or financial commitments Legal and financial liability
Hiring and firing decisions Irreversible impact on people
Crisis communication Reputational risk compounds rapidly
Irreversible decisions at meaningful cost AI errors are probabilistic
Key relationship management Trust cannot be automated
Final merge on auth, payment, and data deletion code Blast radius too high
App Store / production submissions Untested releases waste review time

What AI Runs Autonomously

These require no human approval loop:

  • PR review and auto-merge (routine changes) — continuous
  • System health monitoring — continuous
  • LLM cost-per-task tracking — daily
  • Dependency security scanning — on every PR
  • Release management (changelog, version bump) — on merge to main
  • Discord notifications for all state changes — continuous
  • Competitive monitoring digest — weekly

8. Preventing Drift and Maintaining Control

AI-augmented orgs have a specific failure mode: gradual drift. AI output quality degrades, humans stop reading carefully, and errors compound invisibly.

The Weekly Rhythm

Day Control Practice
Monday Founder reviews: competitive digest, health summary, cash position. 30 minutes.
Wednesday CTO reviews: system health, LLM cost trends, sprint velocity, open incidents. Checks that agent instructions reflect the current codebase. 30 minutes.
Friday Team sync (60 min): sprint review, customer stories, one thing AI got wrong this week.
Monthly Full review: Are AI outputs still accurate? Have prompts drifted? What new automation can replace a recurring manual task?

Drift Detection

For every AI-automated function, define a ground truth check:

AI Function Ground Truth Check
Code review AI Each sprint: CTO manually reviews one PR the AI marked clean. Compare findings.
Customer feedback synthesis Monthly: founder reads 10 raw tickets. Does the AI brief match?
Financial model Monthly: reconcile AI-generated P&L against bank statements. Variance > 5% triggers prompt review.
Outreach personalisation Weekly: spot-check 3 drafts against actual prospect profile. Flag hallucinated details.
Agent instructions accuracy Every sprint: CTO verifies AGENTS.md / CLAUDE.md matches the actual codebase. Stale entries get updated before new work begins.

Skill Maintenance (Self-Healing Knowledge)

Every agent is responsible for keeping shared knowledge accurate. When an agent encounters a skill file that is wrong or outdated:

  1. Fix it on the current feature branch
  2. The fix propagates to all agents on next pull
  3. Knowledge improves as a side effect of doing work

This means the system gets smarter over time without dedicated maintenance effort.

Model Upgrade Strategy

  1. Pin all production agents to specific model versions — never use "latest"
  2. New versions tested in shadow environment first
  3. Upgrade one rig at a time, not all simultaneously
  4. 30-day rollback window
  5. Behavioral regression triggers immediate rollback, not a prompt fix

Kill Switches

Every rig has a configuration flag that disables AI and falls back to human handling. This must be tested before it is needed.


9. Cost Awareness

Three Failure Modes

1. Using the largest model everywhere. A classification task does not need Opus. Match model to task. Default to the smallest model that handles the task reliably.

2. Long chains with large contexts. Each hop that passes a large context costs proportionally. Design chains to pass only what the next step needs.

3. No budget visibility. Instrument every rig with token usage logging by function, by task type, and by model. Set alert thresholds. Review monthly.

Reference Costs (Early Stage)

Rig Monthly Estimate
Engineering (active development) $200-600
Support (500 Tier 1 tickets) $50-150
Marketing (20 content pieces) $10-30
Sales outreach (100 prospects) $20-50
Always-on agents (3 agents, 24/7) $50-100
Discord + MCP infrastructure $0 (free tier)

Infrastructure cost for the agent fleet itself is near zero. Raspberry Pis cost $5/year in electricity. The VPS is the only recurring cost. Discord is free. GitHub Actions on self-hosted runners is free.


10. Building This While Shipping

Do not pause to build the operating system. Build it in the order that delivers the most leverage against the constraints of the current stage.

Phase 1 — Now (2-3 people, building product): Stand up the engineering rig. Monorepo, agent instructions, spec-first workflow, role-separated agents, CI/CD, Discord channels. The knowledge layer goes in at the same time as the first feature — not later.

Phase 2 — First customers (3-4 people): Add support rig and onboarding automation. Deploy always-on agents for PR review and routine tasks. Add sales outreach automation for the commercial lead.

Phase 3 — Growth (5-8 people): Add analytics rig, finance automation, competitive monitoring. Each new hire gets an AI rig on day one and agent instructions as their first onboarding document.

At each phase: define the atomic unit, structure the inputs, build the rig, instrument it, hire one human to govern it.

The Three Principles

  1. No automation without a use case. Every AI workflow is built because someone needs to do that task this week. Not before.
  2. Automate the task, not the job. The goal is to give one person the output of three. Every automation decision is evaluated against this test.
  3. One new automation per sprint. More creates unreviewed automation that drifts. Less leaves leverage on the table.

11. Applying This to a New Project

For any new product, the setup sequence is:

  1. Create the monorepo with the standard structure and agent instructions
  2. Set up Discord channels — #tasks, #admin, agent-specific channels
  3. Deploy CI/CD — GitHub Actions with self-hosted runners, auto-merge, release-please
  4. Configure MCP servers — Discord, Bitwarden, GitHub, plus any product-specific APIs
  5. Write the first domain spec in docs/domains/
  6. Record architectural decisions as ADRs in docs/decisions/
  7. Ship the first feature using spec-first development
  8. Add non-engineering rigs as the team and customer base grow

Time from zero to first merged PR: under 4 hours with the templates and tooling already built.

The system is the constant. The product changes. Every project benefits from the lessons learned in the previous one — the prompts get better, the specs get tighter, the rigs get faster, and the MCP ecosystem grows.


12. What This System Needs Next

The system is production-proven but not yet productized. To make it sellable and repeatable:

Packaging

Gap What's Needed Effort
Project template Cookiecutter/degit template with monorepo structure, AGENTS.md, CI/CD workflows, Discord bot setup, MCP configs 1-2 days
Onboarding guide Step-by-step for a new team: "from zero to shipping in one day" 1 day
Agent bootstrap script Single command that provisions Discord channels, GitHub repo, CI runners, MCP servers 2-3 days
Configuration UI Web dashboard to configure agents, view status, manage skills (currently all file-based) 1-2 weeks

Hardening

Gap What's Needed Effort
Cost dashboard Real-time token usage per agent, per task type, per model. Currently tracked but not visualized. 2-3 days
Agent health monitoring Automated alerts when an agent stops responding, fails repeatedly, or drifts from expected output patterns 3-5 days
Audit trail Structured log of every agent action (issue claimed, PR created, review posted, merge executed) queryable by time, agent, and repo 3-5 days
Multi-tenant isolation Currently all agents share one Discord server. For client deployments, need per-client Discord servers or channel-based isolation 1 week

Documentation

Gap What's Needed Effort
Architecture diagrams C4 diagrams of the full system (agent topology, MCP connections, CI/CD flow, Discord event bus) 1 day
Case studies Documented examples of the system in action — PR velocity, multi-repo delivery, agent coordination 1 day
Video walkthrough Screen recording of the full lifecycle: issue created > agent claims > PR > review > merge > release > Discord notification Half day

Scale

Gap What's Needed Effort
Agent fleet management Currently agents are manually provisioned on specific hardware. Need a deployment system that provisions agents on-demand. 1-2 weeks
Cross-project orchestration Currently each project is independent. For a company running 5 products, need a meta-orchestrator that allocates agent capacity across projects by priority. 2-3 weeks
Client-facing dashboard For selling this as a service: client sees agent activity, PR velocity, cost breakdown, health status. 2-3 weeks

13. The Number That Matters

At steady state, a company running this system can operate at significant scale with 8-10 people covering all functions described here.

The engineering rig ships as fast as a 10-person dev team. The support rig handles the ticket volume of a 5-person support team. The marketing rig produces at the pace of a 3-person content team.

This is not a productivity multiplier. It is a structural cost model that no competitor built on traditional headcount can match.

The humans govern. The rigs do the work.


This is not a proposal. It is a system description. The system exists. It ships code every day. The question is not whether it works — it is what to point it at next.