Engineering Rig¶

Technical details of how the automated engineering rig works.

Conductor

Dev Agent

Review Agent

Monitor Agent

Architecture¶

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph LR
    ES["Event Store"] --> COORD["Coordination + Status + History"]
    GH["GitHub"] --> CODE["Code + PRs + CI + Dependencies"]

    style ES fill:#a78bfa,color:#000
    style GH fill:#60a5fa,color:#000
    style COORD fill:#a78bfa,color:#000
    style CODE fill:#60a5fa,color:#000

Event Store (PostgreSQL): Source of truth for all coordination. Who is working on what, what state each issue is in, full audit trail.

GitHub: Source of truth for code. Repositories, branches, pull requests, CI/CD, blocking relationships between issues.

All agents write events directly to the event store. The Conductor is stateless. If it crashes and restarts, it reads the event store and picks up where things are.

Dev Agent Lifecycle¶

The dev agent owns the full journey of each issue from code to verified in production.

1. Implement¶

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph LR
    A["ISSUE_ASSIGNED"] --> B["Create branch"]
    B --> C[Implement]
    C --> D["Open PR"]

    style A fill:#34d399,color:#000
    style C fill:#a78bfa,color:#000
    style D fill:#a78bfa,color:#000

2. Build, Test, Review¶

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph LR
    A[PR] --> B[Build + Lint]
    B --> C[Unit Tests]
    C --> D[E2E Tests]
    D --> E{Pass?}
    E -->|no| F[Agent fixes]
    F --> B
    E -->|yes| G[Code Review]
    G --> H{Human gate?}
    H -->|no| I[Auto-merge]
    H -->|yes| J[Human review]
    J --> I

    style E fill:#fbbf24,color:#000
    style F fill:#a78bfa,color:#000
    style H fill:#fbbf24,color:#000
    style J fill:#f59e0b,color:#000
    style I fill:#34d399,color:#000

Testing layers:

Unit tests: Fast, no external dependencies
E2E tests with mocked integrations: External services (Twilio, Stripe, Claude API) are mocked. Tests full request flows without real API calls. Run before code review.
Smoke tests (after merge): Run against real staging with real integrations. Catches integration issues before production.

Merge conflicts: agent rebases onto latest main and resolves. If it can't resolve, writes AGENT_STUCK.

3. Deploy¶

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph LR
    A[Merge to main] --> B[Staging]
    B --> C{Smoke tests}
    C -->|pass| D[Production]
    C -->|fail| E[Agent fixes + redeploy]
    E -->|max 3 retries| C
    E -->|still failing| F[AGENT_STUCK]
    D --> G[Verify]
    G --> H["ISSUE_DONE"]

    style A fill:#34d399,color:#000
    style C fill:#fbbf24,color:#000
    style D fill:#34d399,color:#000
    style E fill:#a78bfa,color:#000
    style F fill:#f87171,color:#000
    style H fill:#94a3b8,color:#000

Smoke test failure: the agent checks the reason. If reason: "code", it investigates, fixes on top of main (new PR), re-deploys, and re-smokes. Max 3 retries before writing AGENT_STUCK. If reason: "external_dependency" (Twilio, Stripe, Claude API outage), retries are skipped and the Conductor notifies the CTO directly.

Color key: purple = automated, yellow = decision, orange = human, green = progression.

Agent Health¶

Dev agents write HEARTBEAT events every 60 seconds while working. The Conductor monitors these.

Condition	Action
No HEARTBEAT for 5 minutes	Conductor writes AGENT_STUCK on agent's behalf
WORK_STARTED but no PR_CREATED within 45 minutes	Auto AGENT_STUCK
PR_CREATED but no CI_PASSED within 15 minutes	Auto AGENT_STUCK

Role Separation¶

Writes code

≠

Reviews code

The agent that writes code does not review it. An AI reviewing its own output will find it acceptable more often than it should. Separation is structural.

Human Gates¶

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph TD
    PR[Pull Request] --> CHECK{What changed?}
    CHECK -->|Auth or session code| HUMAN[Human review required]
    CHECK -->|Payment code| HUMAN
    CHECK -->|Database schema| HUMAN
    CHECK -->|Data deletion logic| HUMAN
    CHECK -->|New dependency| HUMAN
    CHECK -->|Everything else| AUTO[Automated review]
    AUTO --> MERGE[Merge]
    HUMAN --> MERGE

    style HUMAN fill:#f87171,color:#000
    style AUTO fill:#34d399,color:#000

Enforced via CODEOWNERS and CI checks, not by convention.

Stall prevention: If a human gate has been waiting for 30 minutes, the Conductor sends a Slack reminder. Repeats every 30 minutes. After 4 hours, escalates.

Escalation¶

The Conductor reads events written by agents, CI, and monitoring.

Event	Written by	Conductor action
`AGENT_STUCK`	Dev agent / Conductor (timeout)	Reassign to another agent
`AGENT_STUCK` (2nd time, same issue)	Dev agent	Escalate to CTO
`SMOKE_FAILED` (reason: "code", after 3 retries)	Dev agent	Triggers AGENT_STUCK → reassign flow
`SMOKE_FAILED` (reason: "external_dependency")	Dev agent	Notify CTO directly, no retries
`HUMAN_GATE_TRIGGERED` (30 min)	CI	Slack reminder to CTO

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph LR
    A["Agent stuck (1st time)"] --> B["Reassign to another agent"]
    B --> C{"Fixed?"}
    C -->|yes| D["Continue"]
    C -->|no| E["Escalate to CTO"]

    style A fill:#f87171,color:#000
    style B fill:#a78bfa,color:#000
    style C fill:#fbbf24,color:#000
    style D fill:#34d399,color:#000
    style E fill:#f59e0b,color:#000

Reassignment rules:

Conductor writes ISSUE_UNASSIGNED for the old agent, then ISSUE_ASSIGNED for the new one
Reassignment always means a fresh branch. Previous branch is abandoned and PR closed
The new agent starts clean, no partial work inherited

Conductor Logic¶

The Conductor is one loop, running every 60 seconds. Reads from the event store and GitHub.

every 60 seconds:

    # Check milestone progress (with VERIFIED validation)
    for milestone in github.get_milestones(state: "open"):
        issues = github.get_issues(milestone)
        verified_done = event_store.get_issues_with_verified_done(issues)
        # Only counts issues where both VERIFIED and ISSUE_DONE exist
        if len(verified_done) == len(issues):
            event_store.append("MILESTONE_COMPLETE", actor: "conductor", data: {milestone})
            asana.update_status(milestone, "complete")
            slack.notify(f"Milestone complete: {milestone.title}")

    # Check agent health (heartbeats + SLA timeouts)
    for assignment in event_store.get_active_assignments():
        sla = get_impl_sla(assignment.issue)  # custom per issue via label, default 45 min

        # Heartbeat check
        last_heartbeat = event_store.get_latest("HEARTBEAT", agent: assignment.agent)
        if last_heartbeat is older than 5 minutes:
            if not event_store.has_recent_event("AGENT_STUCK", assignment.correlation_id, within: 5_min):
                event_store.append("AGENT_STUCK", actor: "conductor", data: {assignment.issue, reason: "no heartbeat"})

        # WORK_STARTED → PR_CREATED SLA
        work_started = event_store.get_latest("WORK_STARTED", correlation: assignment.correlation_id)
        pr_created = event_store.get_latest("PR_CREATED", correlation: assignment.correlation_id)
        if work_started and not pr_created and work_started is older than sla:
            if not event_store.has_recent_event("AGENT_STUCK", assignment.correlation_id, within: sla):
                event_store.append("AGENT_STUCK", actor: "conductor", data: {assignment.issue, reason: "no PR"})

        # PR_CREATED → CI_PASSED SLA (fixed, not affected by issue sla: label)
        ci_passed = event_store.get_latest("CI_PASSED", correlation: assignment.correlation_id)
        if pr_created and not ci_passed and pr_created is older than 15 minutes:
            if not event_store.has_recent_event("AGENT_STUCK", assignment.correlation_id, within: 15_min):
                event_store.append("AGENT_STUCK", actor: "conductor", data: {assignment.issue, reason: "CI SLA"})

    # Handle stuck agents (two strikes then human)
    for event in event_store.get_events("AGENT_STUCK", since: last_check):
        stuck_count = event_store.count_events("AGENT_STUCK", correlation: event.correlation_id)
        if stuck_count == 1:
            event_store.append("ISSUE_UNASSIGNED", actor: "conductor", data: {event.issue, event.agent})
            # Issue re-enters the queue for reassignment
        else:
            event_store.append("ESCALATED", actor: "conductor", data: {event.issue, reason: "two agents stuck"})
            slack.notify_cto(f"Two agents stuck on {event.issue.title}")

    # Handle human gate reminders (deduplicated, every 30 min)
    for event in event_store.get_events("HUMAN_GATE_TRIGGERED", unresolved: True):
        if event.created_at is older than 30 minutes:
            last_reminder = event_store.get_latest("HUMAN_GATE_REMINDER", correlation: event.correlation_id)
            if last_reminder is None or last_reminder.created_at is older than 30 minutes:
                event_store.append("HUMAN_GATE_REMINDER", actor: "conductor", data: {event.issue})
                slack.notify_cto(f"Human gate waiting: {event.issue.title}")
            if event.created_at is older than 4 hours:
                event_store.append("ESCALATED", actor: "conductor", data: {event.issue, reason: "human gate timeout"})

    # Find assignable issues
    approved = event_store.get_issues_with_event("ISSUE_APPROVED")
    currently_assigned = event_store.get_currently_assigned_issues()  # ASSIGNED without UNASSIGNED or DONE
    unassigned = approved - currently_assigned

    assignable = []
    for issue in unassigned:
        blockers = github.get_blocking_relationships(issue)
        done_issues = event_store.get_issues_with_verified_done()
        if all(b in done_issues for b in blockers):
            assignable.append(issue)

    # Sort: priority first, then oldest approved
    assignable.sort(key: (priority_rank(issue), issue.approved_at))

    # Assign to available agents
    for agent in agents.get_idle():
        if assignable is empty: break
        issue = assignable.pop_first()
        event_store.append("ISSUE_ASSIGNED", actor: "conductor", data: {issue, agent})
        slack.notify(f"{agent} started {issue.title}")

def priority_rank(issue):
    if "critical" in issue.labels: return 0
    if "high-priority" in issue.labels: return 1
    return 2

def get_impl_sla(issue):
    # Custom implementation SLA via label (e.g. sla:2h, sla:45m)
    # Only affects WORK_STARTED → PR_CREATED timeout
    # CI SLA (PR_CREATED → CI_PASSED) is always fixed at 15 min
    for label in issue.labels:
        if label.startswith("sla:"): return parse_duration(label[4:])
    return 45  # minutes — higher default accounts for pre-PR E2E test time

Design properties:

Stateless: Conductor reads all state from event store + GitHub on every loop. Crash and restart with zero data loss.
Idempotent: Running the same loop twice with the same state produces the same result.
No preemption: Agents finish current work before picking up new work. A critical bug waits for the next available agent.
Milestone fairness: Priority is per-issue, not per-milestone. A high-priority issue in one milestone can delay another.

Phases¶

Phase 1: Before anything is live¶

In place	Not yet
Git + branch protection	Production environment
CI on every PR	Dev agents
Conventional commits	Conductor
Event store

Phase 2: Going live¶

Add production environment, release pipeline, monitoring, error tracking.

Phase 3: Conductor + dev agents¶

The Conductor and dev agents join. Event store drives all coordination.

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e2e8f0", "primaryTextColor": "#1e293b", "primaryBorderColor": "#64748b", "lineColor": "#94a3b8", "secondaryColor": "#e2e8f0", "tertiaryColor": "#cbd5e1", "background": "#0f172a", "mainBkg": "#e2e8f0", "nodeBorder": "#64748b", "clusterBkg": "#1e293b", "clusterBorder": "#475569", "titleColor": "#e2e8f0", "edgeLabelBackground": "#1e293b", "nodeTextColor": "#1e293b"}}}%%
graph TD
    ES["Event Store"] --> COND["Conductor"]
    COND --> AGENT["Dev agent"]
    AGENT --> GH["GitHub (branch, PR, CI)"]
    GH --> PROD["Production"]
    PROD --> DONE["ISSUE_DONE"]
    DONE --> ES

    style ES fill:#a78bfa,color:#000
    style COND fill:#34d399,color:#000
    style AGENT fill:#a78bfa,color:#000
    style GH fill:#60a5fa,color:#000
    style PROD fill:#34d399,color:#000
    style DONE fill:#a78bfa,color:#000

Phase 4: Flywheel¶

Human feedback and usage data drive product decisions. The rig handles the implementation. The cycle repeats.