Engineering Rig¶
Technical details of how the automated engineering rig works.
Architecture¶
%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph LR
ES["Event Store"] --> COORD["Coordination + Status + History"]
GH["GitHub"] --> CODE["Code + PRs + CI + Dependencies"]
style ES fill:#a78bfa,color:#000
style GH fill:#60a5fa,color:#000
style COORD fill:#a78bfa,color:#000
style CODE fill:#60a5fa,color:#000
Event Store (PostgreSQL): Source of truth for all coordination. Who is working on what, what state each issue is in, full audit trail.
GitHub: Source of truth for code. Repositories, branches, pull requests, CI/CD, blocking relationships between issues.
All agents write events directly to the event store. The Conductor is stateless. If it crashes and restarts, it reads the event store and picks up where things are.
Dev Agent Lifecycle¶
The dev agent owns the full journey of each issue from code to verified in production.
1. Implement¶
%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph LR
A["ISSUE_ASSIGNED"] --> B["Create branch"]
B --> C[Implement]
C --> D["Open PR"]
style A fill:#34d399,color:#000
style C fill:#a78bfa,color:#000
style D fill:#a78bfa,color:#000
2. Build, Test, Review¶
%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph LR
A[PR] --> B[Build + Lint]
B --> C[Unit Tests]
C --> D[E2E Tests]
D --> E{Pass?}
E -->|no| F[Agent fixes]
F --> B
E -->|yes| G[Code Review]
G --> H{Human gate?}
H -->|no| I[Auto-merge]
H -->|yes| J[Human review]
J --> I
style E fill:#fbbf24,color:#000
style F fill:#a78bfa,color:#000
style H fill:#fbbf24,color:#000
style J fill:#f59e0b,color:#000
style I fill:#34d399,color:#000
Testing layers:
- Unit tests: Fast, no external dependencies
- E2E tests with mocked integrations: External services (Twilio, Stripe, Claude API) are mocked. Tests full request flows without real API calls. Run before code review.
- Smoke tests (after merge): Run against real staging with real integrations. Catches integration issues before production.
Merge conflicts: agent rebases onto latest main and resolves. If it can't resolve, writes AGENT_STUCK.
3. Deploy¶
%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph LR
A[Merge to main] --> B[Staging]
B --> C{Smoke tests}
C -->|pass| D[Production]
C -->|fail| E[Agent fixes + redeploy]
E -->|max 3 retries| C
E -->|still failing| F[AGENT_STUCK]
D --> G[Verify]
G --> H["ISSUE_DONE"]
style A fill:#34d399,color:#000
style C fill:#fbbf24,color:#000
style D fill:#34d399,color:#000
style E fill:#a78bfa,color:#000
style F fill:#f87171,color:#000
style H fill:#94a3b8,color:#000
Smoke test failure: the agent checks the reason. If reason: "code", it investigates, fixes on top of main (new PR), re-deploys, and re-smokes. Max 3 retries before writing AGENT_STUCK. If reason: "external_dependency" (Twilio, Stripe, Claude API outage), retries are skipped and the Conductor notifies the CTO directly.
Color key: purple = automated, yellow = decision, orange = human, green = progression.
Agent Health¶
Dev agents write HEARTBEAT events every 60 seconds while working. The Conductor monitors these.
| Condition | Action |
|---|---|
| No HEARTBEAT for 5 minutes | Conductor writes AGENT_STUCK on agent's behalf |
| WORK_STARTED but no PR_CREATED within 45 minutes | Auto AGENT_STUCK |
| PR_CREATED but no CI_PASSED within 15 minutes | Auto AGENT_STUCK |
Role Separation¶
The agent that writes code does not review it. An AI reviewing its own output will find it acceptable more often than it should. Separation is structural.
Human Gates¶
%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph TD
PR[Pull Request] --> CHECK{What changed?}
CHECK -->|Auth or session code| HUMAN[Human review required]
CHECK -->|Payment code| HUMAN
CHECK -->|Database schema| HUMAN
CHECK -->|Data deletion logic| HUMAN
CHECK -->|New dependency| HUMAN
CHECK -->|Everything else| AUTO[Automated review]
AUTO --> MERGE[Merge]
HUMAN --> MERGE
style HUMAN fill:#f87171,color:#000
style AUTO fill:#34d399,color:#000
Enforced via CODEOWNERS and CI checks, not by convention.
Stall prevention: If a human gate has been waiting for 30 minutes, the Conductor sends a Slack reminder. Repeats every 30 minutes. After 4 hours, escalates.
Escalation¶
The Conductor reads events written by agents, CI, and monitoring.
| Event | Written by | Conductor action |
|---|---|---|
AGENT_STUCK |
Dev agent / Conductor (timeout) | Reassign to another agent |
AGENT_STUCK (2nd time, same issue) |
Dev agent | Escalate to CTO |
SMOKE_FAILED (reason: "code", after 3 retries) |
Dev agent | Triggers AGENT_STUCK → reassign flow |
SMOKE_FAILED (reason: "external_dependency") |
Dev agent | Notify CTO directly, no retries |
HUMAN_GATE_TRIGGERED (30 min) |
CI | Slack reminder to CTO |
%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph LR
A["Agent stuck (1st time)"] --> B["Reassign to another agent"]
B --> C{"Fixed?"}
C -->|yes| D["Continue"]
C -->|no| E["Escalate to CTO"]
style A fill:#f87171,color:#000
style B fill:#a78bfa,color:#000
style C fill:#fbbf24,color:#000
style D fill:#34d399,color:#000
style E fill:#f59e0b,color:#000
Reassignment rules:
- Conductor writes
ISSUE_UNASSIGNEDfor the old agent, thenISSUE_ASSIGNEDfor the new one - Reassignment always means a fresh branch. Previous branch is abandoned and PR closed
- The new agent starts clean, no partial work inherited
Conductor Logic¶
The Conductor is one loop, running every 60 seconds. Reads from the event store and GitHub.
every 60 seconds:
# Check milestone progress (with VERIFIED validation)
for milestone in github.get_milestones(state: "open"):
issues = github.get_issues(milestone)
verified_done = event_store.get_issues_with_verified_done(issues)
# Only counts issues where both VERIFIED and ISSUE_DONE exist
if len(verified_done) == len(issues):
event_store.append("MILESTONE_COMPLETE", actor: "conductor", data: {milestone})
asana.update_status(milestone, "complete")
slack.notify(f"Milestone complete: {milestone.title}")
# Check agent health (heartbeats + SLA timeouts)
for assignment in event_store.get_active_assignments():
sla = get_impl_sla(assignment.issue) # custom per issue via label, default 45 min
# Heartbeat check
last_heartbeat = event_store.get_latest("HEARTBEAT", agent: assignment.agent)
if last_heartbeat is older than 5 minutes:
if not event_store.has_recent_event("AGENT_STUCK", assignment.correlation_id, within: 5_min):
event_store.append("AGENT_STUCK", actor: "conductor", data: {assignment.issue, reason: "no heartbeat"})
# WORK_STARTED → PR_CREATED SLA
work_started = event_store.get_latest("WORK_STARTED", correlation: assignment.correlation_id)
pr_created = event_store.get_latest("PR_CREATED", correlation: assignment.correlation_id)
if work_started and not pr_created and work_started is older than sla:
if not event_store.has_recent_event("AGENT_STUCK", assignment.correlation_id, within: sla):
event_store.append("AGENT_STUCK", actor: "conductor", data: {assignment.issue, reason: "no PR"})
# PR_CREATED → CI_PASSED SLA (fixed, not affected by issue sla: label)
ci_passed = event_store.get_latest("CI_PASSED", correlation: assignment.correlation_id)
if pr_created and not ci_passed and pr_created is older than 15 minutes:
if not event_store.has_recent_event("AGENT_STUCK", assignment.correlation_id, within: 15_min):
event_store.append("AGENT_STUCK", actor: "conductor", data: {assignment.issue, reason: "CI SLA"})
# Handle stuck agents (two strikes then human)
for event in event_store.get_events("AGENT_STUCK", since: last_check):
stuck_count = event_store.count_events("AGENT_STUCK", correlation: event.correlation_id)
if stuck_count == 1:
event_store.append("ISSUE_UNASSIGNED", actor: "conductor", data: {event.issue, event.agent})
# Issue re-enters the queue for reassignment
else:
event_store.append("ESCALATED", actor: "conductor", data: {event.issue, reason: "two agents stuck"})
slack.notify_cto(f"Two agents stuck on {event.issue.title}")
# Handle human gate reminders (deduplicated, every 30 min)
for event in event_store.get_events("HUMAN_GATE_TRIGGERED", unresolved: True):
if event.created_at is older than 30 minutes:
last_reminder = event_store.get_latest("HUMAN_GATE_REMINDER", correlation: event.correlation_id)
if last_reminder is None or last_reminder.created_at is older than 30 minutes:
event_store.append("HUMAN_GATE_REMINDER", actor: "conductor", data: {event.issue})
slack.notify_cto(f"Human gate waiting: {event.issue.title}")
if event.created_at is older than 4 hours:
event_store.append("ESCALATED", actor: "conductor", data: {event.issue, reason: "human gate timeout"})
# Find assignable issues
approved = event_store.get_issues_with_event("ISSUE_APPROVED")
currently_assigned = event_store.get_currently_assigned_issues() # ASSIGNED without UNASSIGNED or DONE
unassigned = approved - currently_assigned
assignable = []
for issue in unassigned:
blockers = github.get_blocking_relationships(issue)
done_issues = event_store.get_issues_with_verified_done()
if all(b in done_issues for b in blockers):
assignable.append(issue)
# Sort: priority first, then oldest approved
assignable.sort(key: (priority_rank(issue), issue.approved_at))
# Assign to available agents
for agent in agents.get_idle():
if assignable is empty: break
issue = assignable.pop_first()
event_store.append("ISSUE_ASSIGNED", actor: "conductor", data: {issue, agent})
slack.notify(f"{agent} started {issue.title}")
def priority_rank(issue):
if "critical" in issue.labels: return 0
if "high-priority" in issue.labels: return 1
return 2
def get_impl_sla(issue):
# Custom implementation SLA via label (e.g. sla:2h, sla:45m)
# Only affects WORK_STARTED → PR_CREATED timeout
# CI SLA (PR_CREATED → CI_PASSED) is always fixed at 15 min
for label in issue.labels:
if label.startswith("sla:"): return parse_duration(label[4:])
return 45 # minutes — higher default accounts for pre-PR E2E test time
Design properties:
- Stateless: Conductor reads all state from event store + GitHub on every loop. Crash and restart with zero data loss.
- Idempotent: Running the same loop twice with the same state produces the same result.
- No preemption: Agents finish current work before picking up new work. A critical bug waits for the next available agent.
- Milestone fairness: Priority is per-issue, not per-milestone. A high-priority issue in one milestone can delay another.
Phases¶
Phase 1: Before anything is live¶
| In place | Not yet |
|---|---|
| Git + branch protection | Production environment |
| CI on every PR | Dev agents |
| Conventional commits | Conductor |
| Event store |
Phase 2: Going live¶
Add production environment, release pipeline, monitoring, error tracking.
Phase 3: Conductor + dev agents¶
The Conductor and dev agents join. Event store drives all coordination.
%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph TD
ES["Event Store"] --> COND["Conductor"]
COND --> AGENT["Dev agent"]
AGENT --> GH["GitHub (branch, PR, CI)"]
GH --> PROD["Production"]
PROD --> DONE["ISSUE_DONE"]
DONE --> ES
style ES fill:#a78bfa,color:#000
style COND fill:#34d399,color:#000
style AGENT fill:#a78bfa,color:#000
style GH fill:#60a5fa,color:#000
style PROD fill:#34d399,color:#000
style DONE fill:#a78bfa,color:#000
Phase 4: Flywheel¶
Human feedback and usage data drive product decisions. The rig handles the implementation. The cycle repeats.