AI-Accelerated Development System¶
Author: Stig-Johnny Stoebakk Date: March 2026 Status: Production-proven methodology — running in production across 7 repositories
The Core Thesis¶
A small team with the right AI operating system ships faster than a large team without one.
This is not a theoretical framework. It is running in production today — 7 application repositories, 6 autonomous agents, multi-workspace orchestration, automated PR lifecycles, and a communication backbone that binds it all together. The system ships code, reviews code, manages releases, and coordinates across agents without human intervention on routine work.
This document describes the system as it exists, what it produces, and how to apply it to any product.
The governing constraint: every human should be doing work that no AI can yet do reliably. Everything else gets automated, augmented, or eliminated.
1. The Repo Is the Brain¶
Everything an agent needs to do its job lives in the repository. If it is not committed, it does not exist.
Knowledge in Slack threads, verbal conversations, or someone's head is invisible to every agent that runs after the moment it was spoken. This is not a workflow preference — it is a hard constraint of how AI agents operate.
Repo Knowledge Layer¶
/docs
AGENTS.md # Entry point for all AI agents (~100 lines, always current)
architecture/ # System design, ADRs, data flows
domains/ # Per-domain specs and design docs
decisions/ # Architecture Decision Records
quality/ # Standards, testing expectations, edge cases
/.claude
skills/ # Reusable reference files — IDs, URLs, pricing, conventions
CLAUDE.md # Agent instructions, workflow rules, coordination protocol
AGENTS.md (or CLAUDE.md in our implementation) is the most important file in any project. It is short, always up to date, and tells any agent exactly where to find what it needs. Every workspace, every agent, every session reads it first.
Skills are version-controlled knowledge files that any agent can read and any agent can update. When an agent discovers that a stored value is wrong (a bundle ID, a pricing tier, a URL), it updates the skill on its feature branch. Knowledge improves automatically as the system works.
Spec-First Development¶
Every feature begins as a committed markdown spec, not a prompt in a chat window:
- Write the spec and commit it. A short document describing inputs, outputs, state transitions, and edge cases. AI drafts from a conversation; the engineer edits to truth; the result is committed before any code is written.
- Generate tests from the spec. AI writes tests against the committed spec. Tests are committed before implementation begins. Implementation is done when tests pass.
- Generate the implementation. AI produces the code. The engineer reviews for correctness against the spec.
- AI writes the PR description, changelog entry, and updates the relevant domain doc. The spec in
docs/is updated to reflect what was actually built, not what was planned.
This makes every task legible to agents — now and in every future session.
2. Multi-Agent Orchestration¶
The Architecture in Production¶
This is not a diagram on a whiteboard. This is running today:
+------------------+
| Orchestrator | Plans work, creates GitHub issues with acceptance
| (Workspace 3) | criteria, monitors PRs, unblocks other agents.
+--------+---------+ Does NOT implement features.
|
| GitHub Issues
v
+------------------+ +------------------+ +------------------+
| Executor 4 | | Executor 5 | | Executor 6 |
| (Workspace 4) | | (Workspace 5) | | (Workspace 6) |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
| Feature Branches | Feature Branches | Feature Branches
v v v
+------------------------------------------------------------------+
| GitHub (shared remote) |
| - Branch protection: require PR, CI pass, review |
| - Squash merge only |
| - Never push directly to main |
+------------------------------------------------------------------+
|
v
+------------------+ +------------------+
| Review Agent | | CI/CD Pipeline |
| (Always-on) | | GitHub Actions |
| Reviews every | | Self-hosted |
| PR within | | runners (macOS |
| seconds | | + Linux k3s) |
+------------------+ +------------------+
Always-On Agents¶
The system runs 24/7 on dedicated hardware. Agents do not require a human to be present:
| Agent | Hardware | Role | Availability |
|---|---|---|---|
| Orchestrator | MacBook Air M4 | Plans, coordinates, unblocks | Session-based |
| Executor(s) | MacBook Air M4 | Feature development, bug fixes | Session-based |
| Review-E | Raspberry Pi 4 | PR review, approval, auto-merge | 24/7 always-on |
| Pi-E | Raspberry Pi 4 | Lightweight tasks, monitoring | 24/7 always-on |
| Volt-E | VPS (Boston) | Security research, non-macOS tasks | 24/7 always-on |
Always-on agents run on OpenClaw (open-source agent runtime). They poll for work, claim issues, create PRs, and respond to review requests without human intervention.
Pull-Based Work Distribution¶
Agents are not assigned tasks. They claim them:
- Orchestrator creates GitHub issues with clear acceptance criteria and priority labels
- Agents check the backlog and pick the top unclaimed issue
- Agent claims the issue (comments on GitHub + posts to Discord + creates feature branch)
- Other agents see the claim and pick the next issue
- Competitive claiming: Pi-E claims in ~16s, Volt-E in ~22s — first to comment wins
This eliminates bottlenecks. No human needs to be online to assign work. The backlog is the queue. Priority labels and issue age determine order.
Session Persistence¶
Agents save their state after every significant action. Other agents read these sessions to know what's in progress:
- What issue each workspace is working on
- What PRs are open
- What blockers exist
- What was learned during the session
This prevents duplicate work and enables seamless handoff between sessions.
3. Discord as the Nervous System¶
This is the key architectural insight that makes multi-agent coordination work in practice.
Discord is not a chat tool in this system. It is the event bus. Every significant action produces a Discord message. Every agent reads Discord to understand the current state of the world.
Channel Architecture¶
| Channel | Purpose | Who Posts | Who Reads |
|---|---|---|---|
| #tasks | Task delegation, status updates, PR notifications | All agents | All agents + human |
| #admin | Questions, blockers, human decisions | Agents needing help | Human |
| #review-e | Review agent activity | Review-E | Orchestrator |
| #pi-e | Pi-E private channel | Pi-E | Orchestrator |
| #volt-e | Volt-E private channel | Volt-E | Orchestrator |
The Automated PR Lifecycle¶
This is the full cycle, zero human intervention on routine PRs:
Agent creates PR
|
v
GitHub Action: request-review.yml
--> Posts to Discord #tasks with PR details
|
v
Review-E sees the notification
--> Reads the PR diff
--> Posts review comments or approves
|
v
GitHub Action: auto-merge.yml
--> Waits for CI pass + review approval
--> Resolves Copilot review threads
--> Squash merges to main
|
v
GitHub Action: build-status-to-discord.yml
--> Posts merge confirmation to Discord #tasks
|
v
GitHub Action: release-please.yml
--> Analyzes conventional commits
--> Opens Release PR if releasable changes exist
--> Auto-merges release PR
|
v
GitHub Action: release-notes-to-discord.yml
--> Posts release notes to Discord
Measured result: PR created to merged in under 2 minutes for routine changes. An agent can create 7 PRs across 7 repos in under 2 minutes via the GitHub API.
Agent Delegation via Discord¶
The orchestrator delegates work through Discord mentions. Agents are mentioned using Discord user IDs (plain text @name does not work for bots):
"<@BOT_USER_ID> Please pick up issue #123 in repo-name.
Acceptance criteria are in the issue. Create a feature branch
and PR when ready."
The agent sees the mention, reads the issue, claims it, and begins work. The orchestrator monitors progress through Discord messages and GitHub PR status.
Why Discord and Not Slack/Linear/Jira¶
- Free for unlimited messages and channels
- Bot API is simple and well-documented
- Webhook support for GitHub Actions integration
- MCP server available — agents read and write messages programmatically
- Real-time: agents see messages within seconds
- Channels provide natural separation of concerns
- Message history serves as an audit trail
4. The Engineering Rig¶
Role Separation¶
The core principle: the agent that produces a thing cannot approve that thing. Separation is structural, not cultural.
Why test separation matters: An implementation agent writing its own tests will write tests that confirm its assumptions. Bugs born from misunderstanding the requirement will be hidden by tests that share the same misunderstanding.
Why review separation matters: LLMs reviewing their own code exhibit the same rationalization patterns humans do — they read what they intended to write, not what they wrote.
Hard Escalation Triggers¶
These always require human review, no exceptions. Apply to every project:
| Trigger | Reason |
|---|---|
| Changes to authentication or session management | Irreversible security impact |
| Changes to payment processing code | Financial and compliance risk |
| Database schema migrations | Data loss risk |
| Changes to data retention or deletion logic | GDPR/privacy liability |
| Dependency version bumps to major versions | Breaking change risk |
| Any change to a file touched by < 3 total commits | High-risk area, low familiarity |
State Machines for Critical Flows¶
For any multi-step workflow where correctness matters — reservations, payments, onboarding, support escalation — use an explicit state machine (e.g., XState) between the LLM and the system of record.
The principle: LLM handles language, state machine handles logic.
The LLM generates natural language and extracts intent. The state machine controls what transitions are allowed, what data must be collected before proceeding, and when to commit to the database. The LLM cannot skip steps or hallucinate state changes because the state machine does not permit them.
State machine snapshots serialize to a cache layer (Redis), allowing conversations across async channels (email, WhatsApp, SMS) to pause and resume with full state intact.
This pattern applies to any domain:
| Domain | States | Why It Matters |
|---|---|---|
| Reservations | greeting > collection > availability > confirmation > confirmed | Guest data must be complete before committing |
| Payments | initiated > authorized > captured > settled | Financial state must be deterministic |
| Onboarding | signup > configuration > verification > active | Each step has prerequisites |
| Support | received > classified > assigned > resolved | Routing and escalation must follow rules |
CI/CD Infrastructure¶
Self-hosted runners eliminate dependency on GitHub's shared infrastructure:
| Runner Type | Hardware | Use Case |
|---|---|---|
| macOS (ARM64) | MacBook Air M4 | iOS builds, Xcode, per-repo dedicated runners |
| Linux (ARC) | k3s cluster on Dell server | CI for all repos, each repo gets its own ARC scale set |
| Docker-in-Docker | k3s cluster | Integration tests requiring containers |
All repos share identical workflow files: request-review.yml, build-status-to-discord.yml, pr-comments-to-discord.yml, auto-merge.yml, auto-resolve-copilot-conversations.yml, notify-failure.yml, release-please.yml, release-notes-to-discord.yml.
Workflow consistency across repos means a fix to the review process propagates everywhere immediately.
Release Management¶
- Conventional commits enforced —
feat:,fix:,refactor:,docs:,chore:,ci: - release-please runs on merge to main, auto-generates changelog and version bump
- Staging deploy automatic on every merge to main
- Production deploy triggered by merging the release PR
- Feature flags for anything with user-facing impact
Velocity Benchmark¶
A functioning engineering rig should produce approximately 3-4 merged PRs per engineer per day. This is a diagnostic, not an aspiration. If throughput is significantly below this, the rig has a problem: specs are too vague, agent instructions are stale, task decomposition is too coarse, or the review loop is too slow.
Measured in production: A single always-on agent (Pi-E or Volt-E) can deliver 7 PRs across 7 repos in under 2 minutes for well-defined tasks. The bottleneck is never the agent. It is always the quality of the spec.
5. The MCP Ecosystem (Tool Layer)¶
Agents do not operate in isolation. They connect to external services through Model Context Protocol (MCP) servers — a standardized interface that gives AI agents authenticated access to APIs, databases, and tools.
Production MCP Servers¶
| MCP Server | Purpose |
|---|---|
| Discord | Read/write messages, delegate tasks, monitor status |
| GitHub | Issues, PRs, code review, repository management |
| Bitwarden | Secret management — agents retrieve credentials without exposure |
| App Store Connect | iOS app management, builds, TestFlight, submissions |
| RevenueCat | Subscription management, product configuration |
| Playwright | Browser automation for tasks requiring UI interaction |
| Pushbullet SMS | 2FA code retrieval for automated logins |
| TOTP | Time-based one-time passwords for service authentication |
| Cloudflare | DNS, workers, R2 storage, D1 databases |
| Memory | Persistent context across sessions — decisions, learnings, state |
Why MCP Matters¶
Without MCP, every agent integration is custom code. With MCP, agents connect to any service through a consistent interface. Adding a new capability (e.g., "agents can now manage DNS") means deploying one MCP server, not modifying every agent.
Security model: Agents never see raw secrets. Bitwarden MCP provides credentials on demand. Secrets are piped through stdin, never passed as command arguments, never logged, never stored in temp files. Terminal history and conversation logs remain clean.
Building New MCP Servers¶
For any product with an API, wrapping it as an MCP server makes it immediately accessible to every agent in the system. This is the pattern for making any product AI-native:
- Map API endpoints to MCP tool definitions
- Handle authentication via environment variables
- Deploy as a lightweight Node.js process
- Register in the agent configuration
Time to build: 1-2 hours for a well-documented API. The result: every agent in the fleet can interact with the service.
6. The Development Stack (Recommended Defaults)¶
These are sensible defaults for most projects. Override per project where justified, and document the decision in an ADR.
| Layer | Default | Rationale |
|---|---|---|
| Monorepo | Turborepo + npm/pnpm workspaces | Shared types, incremental builds, single CI pipeline |
| Language | TypeScript throughout | End-to-end type safety, AI generates TypeScript well |
| API | Fastify v4 | Fastest Node.js framework, native TypeScript, WebSocket support |
| Frontend | Next.js (App Router) or SwiftUI (iOS) | Web or native depending on product |
| Database | PostgreSQL + Prisma ORM | Type-safe queries, migration management, scales to enterprise |
| Cache/Sessions | Redis | Sub-millisecond reads, pub/sub, TTL-based expiry |
| State Management | XState v5 | Deterministic state machines, actor model, serializable snapshots |
| CI/CD | GitHub Actions + self-hosted runners | Full control, no per-minute billing, macOS + Linux |
| Communication | Discord (bot API + webhooks) | Free, real-time, MCP-accessible, audit trail |
| Secrets | Bitwarden (CLI + MCP) | Agents access credentials without exposure |
| Deployment | Cloud Run / Container Apps | Serverless containers, scales to zero, uses cloud credits |
Infrastructure Strategy¶
Match infrastructure to funding:
| Situation | Approach |
|---|---|
| Google for Startups credits | GCP: Cloud Run + Cloud SQL + Memorystore |
| Azure for Startups credits | Azure: Container Apps + Azure PostgreSQL + Azure Cache |
| AWS Activate credits | AWS: ECS Fargate + RDS + ElastiCache |
| No cloud credits | Render or Railway (managed, predictable pricing) |
| Scale (5,000+ users) | Kubernetes on whichever cloud you're already on |
Never pay cash for infrastructure you can get through startup credits. Never run on a managed platform when you have six figures in cloud credits available.
7. AI Leverage Beyond Engineering¶
The engineering rig is the obvious application. The less obvious — and equally important — application is every other business function. A four-person team can operate like a twelve-person team if each function runs with an AI co-pilot.
The Structure-First Principle¶
AI performs reliably when inputs are structured and outputs are evaluable. The fix for every function is the same: define the atomic unit.
| Function | Atomic Unit | Required Inputs | Output Format | Feedback Signal |
|---|---|---|---|---|
| Product | Feature spec | Problem statement, constraints | Markdown with user stories, acceptance criteria | Engineering starts without clarification |
| Sales | Outreach sequence | Prospect, contact role, hook | Email + follow-up + connection note | Reply rate, meeting conversion |
| Support | Ticket resolution | Ticket text, customer history, docs | Resolution message + internal note | CSAT, re-open rate |
| Marketing | Content piece | Audience, tone, CTA, keyword | Markdown with metadata | Publish rate, engagement |
| Analytics | Insight report | Question, data source, date range | SQL + narrative + recommendation | Decision made or escalated |
| Finance | Financial update | Actuals, plan, period | P&L commentary, variance analysis | Founder reviews in < 30 min |
What AI Must Never Own¶
No hedging. These are firm:
| Do Not Automate | Reason |
|---|---|
| Contract signing or financial commitments | Legal and financial liability |
| Hiring and firing decisions | Irreversible impact on people |
| Crisis communication | Reputational risk compounds rapidly |
| Irreversible decisions at meaningful cost | AI errors are probabilistic |
| Key relationship management | Trust cannot be automated |
| Final merge on auth, payment, and data deletion code | Blast radius too high |
| App Store / production submissions | Untested releases waste review time |
What AI Runs Autonomously¶
These require no human approval loop:
- PR review and auto-merge (routine changes) — continuous
- System health monitoring — continuous
- LLM cost-per-task tracking — daily
- Dependency security scanning — on every PR
- Release management (changelog, version bump) — on merge to main
- Discord notifications for all state changes — continuous
- Competitive monitoring digest — weekly
8. Preventing Drift and Maintaining Control¶
AI-augmented orgs have a specific failure mode: gradual drift. AI output quality degrades, humans stop reading carefully, and errors compound invisibly.
The Weekly Rhythm¶
| Day | Control Practice |
|---|---|
| Monday | Founder reviews: competitive digest, health summary, cash position. 30 minutes. |
| Wednesday | CTO reviews: system health, LLM cost trends, sprint velocity, open incidents. Checks that agent instructions reflect the current codebase. 30 minutes. |
| Friday | Team sync (60 min): sprint review, customer stories, one thing AI got wrong this week. |
| Monthly | Full review: Are AI outputs still accurate? Have prompts drifted? What new automation can replace a recurring manual task? |
Drift Detection¶
For every AI-automated function, define a ground truth check:
| AI Function | Ground Truth Check |
|---|---|
| Code review AI | Each sprint: CTO manually reviews one PR the AI marked clean. Compare findings. |
| Customer feedback synthesis | Monthly: founder reads 10 raw tickets. Does the AI brief match? |
| Financial model | Monthly: reconcile AI-generated P&L against bank statements. Variance > 5% triggers prompt review. |
| Outreach personalisation | Weekly: spot-check 3 drafts against actual prospect profile. Flag hallucinated details. |
| Agent instructions accuracy | Every sprint: CTO verifies AGENTS.md / CLAUDE.md matches the actual codebase. Stale entries get updated before new work begins. |
Skill Maintenance (Self-Healing Knowledge)¶
Every agent is responsible for keeping shared knowledge accurate. When an agent encounters a skill file that is wrong or outdated:
- Fix it on the current feature branch
- The fix propagates to all agents on next pull
- Knowledge improves as a side effect of doing work
This means the system gets smarter over time without dedicated maintenance effort.
Model Upgrade Strategy¶
- Pin all production agents to specific model versions — never use "latest"
- New versions tested in shadow environment first
- Upgrade one rig at a time, not all simultaneously
- 30-day rollback window
- Behavioral regression triggers immediate rollback, not a prompt fix
Kill Switches¶
Every rig has a configuration flag that disables AI and falls back to human handling. This must be tested before it is needed.
9. Cost Awareness¶
Three Failure Modes¶
1. Using the largest model everywhere. A classification task does not need Opus. Match model to task. Default to the smallest model that handles the task reliably.
2. Long chains with large contexts. Each hop that passes a large context costs proportionally. Design chains to pass only what the next step needs.
3. No budget visibility. Instrument every rig with token usage logging by function, by task type, and by model. Set alert thresholds. Review monthly.
Reference Costs (Early Stage)¶
| Rig | Monthly Estimate |
|---|---|
| Engineering (active development) | $200-600 |
| Support (500 Tier 1 tickets) | $50-150 |
| Marketing (20 content pieces) | $10-30 |
| Sales outreach (100 prospects) | $20-50 |
| Always-on agents (3 agents, 24/7) | $50-100 |
| Discord + MCP infrastructure | $0 (free tier) |
Infrastructure cost for the agent fleet itself is near zero. Raspberry Pis cost $5/year in electricity. The VPS is the only recurring cost. Discord is free. GitHub Actions on self-hosted runners is free.
10. Building This While Shipping¶
Do not pause to build the operating system. Build it in the order that delivers the most leverage against the constraints of the current stage.
Phase 1 — Now (2-3 people, building product): Stand up the engineering rig. Monorepo, agent instructions, spec-first workflow, role-separated agents, CI/CD, Discord channels. The knowledge layer goes in at the same time as the first feature — not later.
Phase 2 — First customers (3-4 people): Add support rig and onboarding automation. Deploy always-on agents for PR review and routine tasks. Add sales outreach automation for the commercial lead.
Phase 3 — Growth (5-8 people): Add analytics rig, finance automation, competitive monitoring. Each new hire gets an AI rig on day one and agent instructions as their first onboarding document.
At each phase: define the atomic unit, structure the inputs, build the rig, instrument it, hire one human to govern it.
The Three Principles¶
- No automation without a use case. Every AI workflow is built because someone needs to do that task this week. Not before.
- Automate the task, not the job. The goal is to give one person the output of three. Every automation decision is evaluated against this test.
- One new automation per sprint. More creates unreviewed automation that drifts. Less leaves leverage on the table.
11. Applying This to a New Project¶
For any new product, the setup sequence is:
- Create the monorepo with the standard structure and agent instructions
- Set up Discord channels — #tasks, #admin, agent-specific channels
- Deploy CI/CD — GitHub Actions with self-hosted runners, auto-merge, release-please
- Configure MCP servers — Discord, Bitwarden, GitHub, plus any product-specific APIs
- Write the first domain spec in
docs/domains/ - Record architectural decisions as ADRs in
docs/decisions/ - Ship the first feature using spec-first development
- Add non-engineering rigs as the team and customer base grow
Time from zero to first merged PR: under 4 hours with the templates and tooling already built.
The system is the constant. The product changes. Every project benefits from the lessons learned in the previous one — the prompts get better, the specs get tighter, the rigs get faster, and the MCP ecosystem grows.
12. What This System Needs Next¶
The system is production-proven but not yet productized. To make it sellable and repeatable:
Packaging¶
| Gap | What's Needed | Effort |
|---|---|---|
| Project template | Cookiecutter/degit template with monorepo structure, AGENTS.md, CI/CD workflows, Discord bot setup, MCP configs | 1-2 days |
| Onboarding guide | Step-by-step for a new team: "from zero to shipping in one day" | 1 day |
| Agent bootstrap script | Single command that provisions Discord channels, GitHub repo, CI runners, MCP servers | 2-3 days |
| Configuration UI | Web dashboard to configure agents, view status, manage skills (currently all file-based) | 1-2 weeks |
Hardening¶
| Gap | What's Needed | Effort |
|---|---|---|
| Cost dashboard | Real-time token usage per agent, per task type, per model. Currently tracked but not visualized. | 2-3 days |
| Agent health monitoring | Automated alerts when an agent stops responding, fails repeatedly, or drifts from expected output patterns | 3-5 days |
| Audit trail | Structured log of every agent action (issue claimed, PR created, review posted, merge executed) queryable by time, agent, and repo | 3-5 days |
| Multi-tenant isolation | Currently all agents share one Discord server. For client deployments, need per-client Discord servers or channel-based isolation | 1 week |
Documentation¶
| Gap | What's Needed | Effort |
|---|---|---|
| Architecture diagrams | C4 diagrams of the full system (agent topology, MCP connections, CI/CD flow, Discord event bus) | 1 day |
| Case studies | Documented examples of the system in action — PR velocity, multi-repo delivery, agent coordination | 1 day |
| Video walkthrough | Screen recording of the full lifecycle: issue created > agent claims > PR > review > merge > release > Discord notification | Half day |
Scale¶
| Gap | What's Needed | Effort |
|---|---|---|
| Agent fleet management | Currently agents are manually provisioned on specific hardware. Need a deployment system that provisions agents on-demand. | 1-2 weeks |
| Cross-project orchestration | Currently each project is independent. For a company running 5 products, need a meta-orchestrator that allocates agent capacity across projects by priority. | 2-3 weeks |
| Client-facing dashboard | For selling this as a service: client sees agent activity, PR velocity, cost breakdown, health status. | 2-3 weeks |
13. The Number That Matters¶
At steady state, a company running this system can operate at significant scale with 8-10 people covering all functions described here.
The engineering rig ships as fast as a 10-person dev team. The support rig handles the ticket volume of a 5-person support team. The marketing rig produces at the pace of a 3-person content team.
This is not a productivity multiplier. It is a structural cost model that no competitor built on traditional headcount can match.
The humans govern. The rigs do the work.
This is not a proposal. It is a system description. The system exists. It ships code every day. The question is not whether it works — it is what to point it at next.