Describe an outcome. Shipyard's agents read your codebase, plan the work, build and test in isolation, and ship — reporting back into Slack and Jira throughout. You set the direction. Agents handle the run.
How a run works
Describe the outcome — agents plan, build & test in isolation, review & validate, then merge and deploy. The full SPEC · test · review · drift detail runs underneath; you just watch the four beats.
One natural-language task triggers a full agentic run: agents read your codebase, spec the work, build in an isolated worktree, test, review, merge, deploy, and validate — each step logged and reported back into Slack and Jira. Autonomous end-to-end; halts only for genuine human decisions.
Trigger a run: POST /dashboard/api/pipeline — orchestrators like Maestro dispatch work directly; no separate agent management needed.
Tasks are triaged into NANO, FAST, STANDARD, or SAFE paths based on complexity, risk keywords, and source trust. Post-spec re-routing adjusts if the scope changes.
Every spec and build goes through adversarial validators (Haiku), cross-model review (OpenAI o3 vs Claude), and spec-alignment checks. Only high/critical blocks — low/medium is logged, not blocking.
The Dock manages 24 repos across 5 orgs with a tmux session pool, batch dispatch, and fuzzy repo resolution from Slack messages. Parallel pipelines with concurrency limits per org.
Every pipeline stage logs token usage to audit_log. /shipyard cost <id> shows the breakdown. CLI-first execution on Max subscription keeps most pipelines at ~$0.
Auto-retry up to 3x with delta-only prompts (60% token reduction). No-op build detection, stall watchdog (15min), pipeline deadline (3h), chunked BUILD with mid-stream commits.
pgvector embeddings store every successful pipeline, SPEC risk assessment, and architectural decision. Agents read context from past work and write learnings on every merge.
After every merge, an agent compares the original SPEC against the final diff and scores divergence 0–1. Deviations are logged per pipeline so you know whether what shipped matches what was planned.
YAML-defined task templates dispatched across your entire repo fleet in one call. Run security audits, dependency bumps, or Node upgrades across every registered repo simultaneously — one command, one run report.
Production alerts from New Relic, PagerDuty, or any webhook source are correlated against recent deploys. Regressions auto-file a P0 Harbor ticket and optionally trigger a rollback pipeline — closing the loop from ship to incident.
Semantic search across every file in every registered repo. The cross-org dependency graph maps edges between repos so you can visualize blast radius before running a change. Full code navigator in the dashboard.
Agent memory is versioned. Every successful merge snapshots the repo's knowledge state. Query the memory timeline to see how architectural decisions evolved, or diff two snapshots to understand what changed and when.
A full Model Context Protocol server exposes Shipyard as a control surface for any MCP-compatible agent. Trigger pipelines, query memory, generate blueprints, run playbooks, read drift scores — all from external tooling.
On-demand multi-repo architecture synthesis: reads every repo's structure, dependencies, and README to generate a live architecture brief. Always reflects the actual codebase, not a stale wiki page.