Shipyard

Natural language to production

Describe an outcome. Shipyard's agents read your codebase, plan the work, build and test in isolation, and ship — reporting back into Slack and Jira throughout. You set the direction. Agents handle the run.

Open Dashboard How It Works Shared Clipboard

How a run works

DESCRIBE
BUILD
VERIFY
SHIP

Describe the outcome — agents plan, build & test in isolation, review & validate, then merge and deploy. The full SPEC · test · review · drift detail runs underneath; you just watch the four beats.

--
Pipelines Run
--
Merges Shipped
--
Crew Runs
--
Crew Pipelines
--
Crew Tokens
--
Pipeline Tokens
--
claudeAP Tokens
--
claudeME Tokens
--
Agents Launched
--
Memories Stored
--
Repos Indexed

What Shipyard Does

🤖

Agentic Runs — Intent to Deploy

One natural-language task triggers a full agentic run: agents read your codebase, spec the work, build in an isolated worktree, test, review, merge, deploy, and validate — each step logged and reported back into Slack and Jira. Autonomous end-to-end; halts only for genuine human decisions.

Trigger a run: POST /dashboard/api/pipeline — orchestrators like Maestro dispatch work directly; no separate agent management needed.

Dynamic Risk Routing

Tasks are triaged into NANO, FAST, STANDARD, or SAFE paths based on complexity, risk keywords, and source trust. Post-spec re-routing adjusts if the scope changes.

Cross-Model Validation

Every spec and build goes through adversarial validators (Haiku), cross-model review (OpenAI o3 vs Claude), and spec-alignment checks. Only high/critical blocks — low/medium is logged, not blocking.

🔌

Multi-Repo Orchestration

The Dock manages 24 repos across 5 orgs with a tmux session pool, batch dispatch, and fuzzy repo resolution from Slack messages. Parallel pipelines with concurrency limits per org.

📊

Per-Stage Cost Tracking

Every pipeline stage logs token usage to audit_log. /shipyard cost <id> shows the breakdown. CLI-first execution on Max subscription keeps most pipelines at ~$0.

🛠

Self-Healing Pipelines

Auto-retry up to 3x with delta-only prompts (60% token reduction). No-op build detection, stall watchdog (15min), pipeline deadline (3h), chunked BUILD with mid-stream commits.

🧠

Agent Memory

pgvector embeddings store every successful pipeline, SPEC risk assessment, and architectural decision. Agents read context from past work and write learnings on every merge.

NEW
📈

Drift Detection

After every merge, an agent compares the original SPEC against the final diff and scores divergence 0–1. Deviations are logged per pipeline so you know whether what shipped matches what was planned.

NEW
📚

Portfolio Playbooks

YAML-defined task templates dispatched across your entire repo fleet in one call. Run security audits, dependency bumps, or Node upgrades across every registered repo simultaneously — one command, one run report.

NEW
🚨

Telemetry Feedback Loop

Production alerts from New Relic, PagerDuty, or any webhook source are correlated against recent deploys. Regressions auto-file a P0 Harbor ticket and optionally trigger a rollback pipeline — closing the loop from ship to incident.

NEW
🔍

Repo Search + Org Graph

Semantic search across every file in every registered repo. The cross-org dependency graph maps edges between repos so you can visualize blast radius before running a change. Full code navigator in the dashboard.

NEW
🏛

Knowledge Graph Versioning

Agent memory is versioned. Every successful merge snapshots the repo's knowledge state. Query the memory timeline to see how architectural decisions evolved, or diff two snapshots to understand what changed and when.

NEW
🔗

MCP Server — 30+ Tools

A full Model Context Protocol server exposes Shipyard as a control surface for any MCP-compatible agent. Trigger pipelines, query memory, generate blueprints, run playbooks, read drift scores — all from external tooling.

NEW
🕐

Blueprint Generator

On-demand multi-repo architecture synthesis: reads every repo's structure, dependencies, and README to generate a live architecture brief. Always reflects the actual codebase, not a stale wiki page.

Built and operated by