One read: what the talks say + what the audience wants. Click "evidence & move" under any finding for the receipts and the Companion implication.
The headline: the field stopped arguing about whether to write specs and started arguing about how to keep the AI honest — did we agree what to build (interrogation), did a second opinion check it (adversarial review), how do we know it's done (verification / evals). And the single loudest audience request, everywhere, is "support my tool."
Both point the same way: Companion's job isn't to add ceremony — it's to be the cross-provider layer that surfaces a few high-value gates and stays out of the way otherwise. The evidence below is one-sided on this.
Merges two passes: the 44-video trends report and the audience-demand study (GitHub issues, YouTube comments, ~35 Medium articles — debriefed in full here). Every source is clickable in the appendix.
Every rival's issue tracker is flooded with "support MY tool." Companion already does this by design — lead with it.
Spec Kit — Cursor CLI, Warp, Antigravity, Codex; BMAD — OpenCode (41 comments), Augment (31), Warp; OpenSpec — Zed, generic .agents installer; Superpowers — Antigravity, Gemini CLI, Hermes harness. On YouTube, the whole Sonnet-5-vs-Opus comment war is the same instinct.
Let the model interview you in multiple-choice rounds until the spec has no holes. The most common pattern in the videos.
Gui Ferreira's whole talk, Matt Pocock's "grill me," Spec Kit's /clarify, OpenSpec's "explore," Jellypod's 30-question write-plan — everyone converges on a recommended default per question and "keep going until complete."
grill-me recipe (W3·4). Persist the Q&A as a trace artifact, not just a spec edit. Ships as embedded prompt text; maps onto a webview wizard."A model reviewing its own plan is an echo chamber." The most Companion-native idea in the whole corpus.
The grill-me + Codex talk caught real bugs by handing the plan to a second model; OpenSpec's adversarial authoring; Waldemar's separate validator.
Kent Beck's "finger guns": the AI fakes "done." Don't auto-complete on its say-so.
Kent Beck (the AI rationalizes "11 of 19 tests pass" as done); Waldemar's "never let the agent judge itself"; Anthropic's self-verification frontier. Audience echo: "orchestration is just YOLO with better visuals." New GitHub signal: BMAD "excessive noise / AI slop in generated code".
mark-complete is a status promotion. Gate it on a tool-returned pass or a second-model verdict, and surface the AI's silent "judgment calls" for confirmation. Bridges the verifier recipe to the terminal node.The hottest article thread lands directly on evals: facts / BDD / IDSD all say the durable artifact is an executable check.
Kapil Ahuja: SDD collapses on upstream change → IDSD/ICE. Wasowski: "write facts, not specs" + "BDD is the missing link." Falk Gottlob's rebuttal: "kill both — the prototype is the spec, the eval is the acceptance test." Every voice converges on executable acceptance criteria. Full debrief: the Medium articles.
Spec Kit's #1 issue is literally "can't edit existing specs." Brownfield is where the field is going — and Companion's biggest gap.
Spec Kit — Can't Easily Update or Refine Existing Specs, post-implementation debugging; OpenSpec — /opsx:repair, hierarchical specs, multi-repo/monorepo. Articles: Wasowski's "spec that survives code generation," Breunig's "keeping the triangle in sync."
/analyze drift/repair gate, codebase reverse-engineering for adoption."Context rot" is the cross-cutting enemy. Companion's trace is a context-durability layer on top of Spec Kit.
The showdown slots Spec Kit as the "spec layer," GSD as the "context layer"; context rot runs through Rick Hightower, Kapil (ICE), Wasowski, Goecke. Anthropic's managed agents keep an append-only log independent of the executor — a near-exact mirror of Companion's trace.
.spec-context.json as adding context survival on top of Spec Kit. Steal: capture decisions/rationale (the why), not just status, and re-inject on resume.Adoption is settled; governance is the new bottleneck. Companion's gates + trace are the oversight surface.
Papalini: governance is the bottleneck once AI writes specs; the 2026-2030 forecast predicts devs shift to approving intent (and cites the study where they were 19% slower while feeling faster).
Auto-mode demand: cmux "hands-off," "24/7 agents," Papalini's "loop engineering."
Papalini's "Loop Engineering Is SDD in Motion"; YouTube's cmux, "Skills + Hermes = 24/7 agents," "Agentic OS."
Lock-in relocated up the stack. Companion's plain-file trace is an anti-lock-in selling point.
Kapil's "Five Dependency Layers" (lock-in moves to your spec format/toolchain) and "SDD will collapse on upstream change."
Fork the spec into parallel implementations and run agent teams — a top request across repos.
Den Delimarsky's worktree "multi-armed bandit." Demand: Superpowers' top open issue Agent Teams (21 comments), BMAD agent teams, Spec Kit multi-agent + spawn-worktree.
Credible voices warn against ceremony; "plans over-specify" is a real complaint. Validates lean-by-default.
"Less is more — 99% just use the baseline," "GSD without ceremony," "it looks like waterfall." Demand: Superpowers' plans over-specify, no room for executor judgment (17 comments). New GitHub signal — model-specific prompt brittleness: Superpowers "Sonnet 5 doesn't like the instructions", OpenSpec "Opus 4.6 skips the step".
Should Companion add evals? Yes — as a trend-watching layer, not a pass/fail gate. An eval isn't a unit test: judge scores are fuzzy and never settle (the speaker's own drifted 94/96/100 with no code change), so you watch the trend. "A single score can lie to you. A trend is much harder to fool."
behavioral-judge) + scorecard, as a dev tool. Productize it downward: acceptance-criteria checks → blind second-provider judge → trend in .spec-context.json. Keep it optional, local, on-demand (Langfuse ~$300/mo always-on; Evalite free).| # | Title | Why it lands |
|---|---|---|
| 1 | "Is Spec-Driven Development Already Obsolete?" | Rides the hottest debate; steelman the critiques, land on evals. |
| 2 | "Brownfield SDD: Adopt Specs in an Existing Codebase" | The most-requested capability; barely covered well. |
| 3 | "Stop Trusting 'Done': Verified Completion w/ a Second-Model Review" | Finger-guns + cross-provider review, live. |
| 4 | "Which Model for Which Step?" | Rides the Sonnet-5-vs-Opus war; routing across a pipeline. |
| 5 | "Spec Drift Is Killing Your Codebase" | Keep spec, code, tests in sync (Breunig's triangle). |
| 6 | "One GUI, Every Provider" | Companion demo framed against the "support my tool" flood. |
| # | Opportunity | Roadmap tie |
|---|---|---|
| 1 | Cross-provider as the headline + widen the provider matrix — the #1 demand; the wedge. | positioning + core |
| 2 | Adversarial cross-model plan review — uniquely enabled by cross-provider. | Wave 3 · W3·2 |
| 3 | Clarify-first interrogation gate — the most-validated pattern. | Wave 3 · W3·4 |
| 4 | Verified completion, not self-reported — closes the finger-guns gap. | verifier → mark-complete |
| 5 | Optional per-spec eval layer — reuses the bench judge + cross-provider. | new |
| 6 | Brownfield adopt + spec-editing / drift — the most-requested capability. | Wave 4 |
| 7 | Reposition the trace as context-durability + anti-lock-in. | core trace |
| 8 | Agent teams / "compare implementations" worktree view. | Wave 5 fan-out |
| 9 | Model/effort routing + token-budget pause; gray-box plans. | classify + auto-mode |
| 10 | HTML plan view; glossary; negative-constraints; spec-derived tests; post-implement refactor node. | assorted |
github/spec-kit — #1191 edit existing specs · #442 post-impl debugging · #377 multi-agent · #9 Cursor · #58 Warp · #1213 Antigravity · #30 Codex · #1414 model resolution · #3272 preset not installed
Fission-AI/OpenSpec — #662 hierarchical specs · #821 /opsx:repair · #689 standard skills folder · #1104 .agents installer · #202 Zed · #780 Superpowers skill pack · #869 Opus 4.6 skips the step
obra/superpowers — #429 Agent Teams · #895 plans over-specify · #743 slowness · #1878 Sonnet 5 fights the instructions · #270 Antigravity · #128 Gemini CLI · #1859 Hermes harness
bmad-code-org/BMAD-METHOD — #285 OpenCode · #320 Augment · #383 Warp · #301 Zed · #1613 agent teams · #2538 AI slop in code · #2512 post-impl refactor workflow
The 10 you flagged plus ~25 discovered, grouped by camp with expandable digests: → The Medium Articles, Debriefed.
Key voices: Kapil Ahuja (SDD-collapse → IDSD/ICE), Jarek Wasowski (facts / BDD / 15-framework map), Enrico Papalini (SDD-as-infrastructure / governance), Rick Hightower (framework layers), and Falk Gottlob (the "kill both" rebuttal).
Full per-video map: the playlist index · deep dive: the video trends report.
Lead with the one-two punch only Companion can ship cleanly: a clarify gate up front, and a second-model review before "done" — both riding the cross-provider strength that the whole category is begging for. Then close the brownfield + verified-completion gap where the field is moving fastest.
Component reports: Video Trends · Medium Debrief · refreshed weekly by the demand-radar skill.
Sourcing: GitHub & YouTube signals are firsthand. Most Medium bodies are member-paywalled — those digests lean on previews + titles + author list pages; deep claims behind the wall are flagged as unread. Nothing invented.