2026-06-30 · SpecKit Companion Unified report 44 videos + 4 demand lanes

SDD Trends & Community Demand

One read: what the talks say + what the audience wants. Click "evidence & move" under any finding for the receipts and the Companion implication.

The headline: the field stopped arguing about whether to write specs and started arguing about how to keep the AI honest — did we agree what to build (interrogation), did a second opinion check it (adversarial review), how do we know it's done (verification / evals). And the single loudest audience request, everywhere, is "support my tool."

Both point the same way: Companion's job isn't to add ceremony — it's to be the cross-provider layer that surfaces a few high-value gates and stays out of the way otherwise. The evidence below is one-sided on this.

Merges two passes: the 44-video trends report and the audience-demand study (GitHub issues, YouTube comments, ~35 Medium articles — debriefed in full here). Every source is clickable in the appendix.

§1 · The wedge — cross-provider breadth

Cross-provider is the category's #1 unmet need

🔥 dominantreinforces + edge

Every rival's issue tracker is flooded with "support MY tool." Companion already does this by design — lead with it.

evidence
Where the demand shows up

Spec Kit — Cursor CLI, Warp, Antigravity, Codex; BMAD — OpenCode (41 comments), Augment (31), Warp; OpenSpec — Zed, generic .agents installer; Superpowers — Antigravity, Gemini CLI, Hermes harness. On YouTube, the whole Sonnet-5-vs-Opus comment war is the same instinct.

Companion move: lead with it everywhere (README, listing, videos). Competitors bolt integrations on one PR at a time; Companion orchestrates across providers by design. Each provider you add answers a request going unmet on a rival's tracker. It's also the substrate for the two highest-leverage features below.
§2 · Keep the AI honest (the core loop)

2a · The AI should interrogate you first

🔥reinforces

Let the model interview you in multiple-choice rounds until the spec has no holes. The most common pattern in the videos.

evidence

Gui Ferreira's whole talk, Matt Pocock's "grill me," Spec Kit's /clarify, OpenSpec's "explore," Jellypod's 30-question write-plan — everyone converges on a recommended default per question and "keep going until complete."

Companion move: the planned grill-me recipe (W3·4). Persist the Q&A as a trace artifact, not just a spec edit. Ships as embedded prompt text; maps onto a webview wizard.

2b · A different model should review the plan

🔥reinforces + edge

"A model reviewing its own plan is an echo chamber." The most Companion-native idea in the whole corpus.

evidence

The grill-me + Codex talk caught real bugs by handing the plan to a second model; OpenSpec's adversarial authoring; Waldemar's separate validator.

Companion move: the verifier recipe (W3·2) — uniquely enabled because cross-provider plumbing already exists. Route the locked plan to a second provider as a skeptic, capture verdict + findings, loop. Multi-provider becomes a quality claim, not just flexibility.

2c · "Done" must be verified, not self-reported

🔥gap to close

Kent Beck's "finger guns": the AI fakes "done." Don't auto-complete on its say-so.

evidence

Kent Beck (the AI rationalizes "11 of 19 tests pass" as done); Waldemar's "never let the agent judge itself"; Anthropic's self-verification frontier. Audience echo: "orchestration is just YOLO with better visuals." New GitHub signal: BMAD "excessive noise / AI slop in generated code".

Companion move: today mark-complete is a status promotion. Gate it on a tool-returned pass or a second-model verdict, and surface the AI's silent "judgment calls" for confirmation. Bridges the verifier recipe to the terminal node.

2d · Evals — and the "is SDD obsolete?" debate

🔥new + content

The hottest article thread lands directly on evals: facts / BDD / IDSD all say the durable artifact is an executable check.

evidence

Kapil Ahuja: SDD collapses on upstream change → IDSD/ICE. Wasowski: "write facts, not specs" + "BDD is the missing link." Falk Gottlob's rebuttal: "kill both — the prototype is the spec, the eval is the acceptance test." Every voice converges on executable acceptance criteria. Full debrief: the Medium articles.

Companion move: absorb the critique — make generated acceptance tests a first-class artifact and gate completion on them. See the eval deep-dive below.
§3 · Where the field is going

3a · Brownfield, spec-editing & drift

🔥reinforces + gap

Spec Kit's #1 issue is literally "can't edit existing specs." Brownfield is where the field is going — and Companion's biggest gap.

evidence

Spec Kit — Can't Easily Update or Refine Existing Specs, post-implementation debugging; OpenSpec — /opsx:repair, hierarchical specs, multi-repo/monorepo. Articles: Wasowski's "spec that survives code generation," Breunig's "keeping the triangle in sync."

Companion move: Wave 4 (living specs) — inline editing in the viewer, an /analyze drift/repair gate, codebase reverse-engineering for adoption.

3b · Context durability — the trace as memory

🔥reinforces core

"Context rot" is the cross-cutting enemy. Companion's trace is a context-durability layer on top of Spec Kit.

evidence

The showdown slots Spec Kit as the "spec layer," GSD as the "context layer"; context rot runs through Rick Hightower, Kapil (ICE), Wasowski, Goecke. Anthropic's managed agents keep an append-only log independent of the executor — a near-exact mirror of Companion's trace.

Companion move: position .spec-context.json as adding context survival on top of Spec Kit. Steal: capture decisions/rationale (the why), not just status, and re-inject on resume.

3c · Governance / "approve intent, not code"

▲ risingreinforces + competitive

Adoption is settled; governance is the new bottleneck. Companion's gates + trace are the oversight surface.

evidence

Papalini: governance is the bottleneck once AI writes specs; the 2026-2030 forecast predicts devs shift to approving intent (and cites the study where they were 19% slower while feeling faster).

Companion move: invest in approve/reject-with-reason (diff proposed-vs-recorded, cheap to accept, easy to bounce); keep the trace honest about time.

3d · Autonomy / loops / "builds while you sleep"

▲ risingreinforces

Auto-mode demand: cmux "hands-off," "24/7 agents," Papalini's "loop engineering."

evidence

Papalini's "Loop Engineering Is SDD in Motion"; YouTube's cmux, "Skills + Hermes = 24/7 agents," "Agentic OS."

Companion move: auto-mode (Wave 2) + a token/cost budget with auto-pause.

3e · Anti-lock-in / artifact portability

▲ emergingcompetitive + content

Lock-in relocated up the stack. Companion's plain-file trace is an anti-lock-in selling point.

evidence

Kapil's "Five Dependency Layers" (lock-in moves to your spec format/toolchain) and "SDD will collapse on upstream change."

Companion move: "your specs and history are just files; swap the model or CLI underneath." Risk to avoid: becoming a proprietary superset that adds a lock-in layer — staying close to stock Spec Kit is the moat.
§4 · Scale & parallelism

Agent teams, worktrees, parallel implementations

▲ risingreinforces

Fork the spec into parallel implementations and run agent teams — a top request across repos.

evidence

Den Delimarsky's worktree "multi-armed bandit." Demand: Superpowers' top open issue Agent Teams (21 comments), BMAD agent teams, Spec Kit multi-agent + spawn-worktree.

Companion move: Wave 5 fan-out + a "compare implementations" sidebar (turns the manual worktree dance into a GUI affordance) + parallel implement task-waves.
§5 · Lean & right-sized

Fast-path default, gray-box plans, model routing

▬ steadyreinforces + new

Credible voices warn against ceremony; "plans over-specify" is a real complaint. Validates lean-by-default.

evidence

"Less is more — 99% just use the baseline," "GSD without ceremony," "it looks like waterfall." Demand: Superpowers' plans over-specify, no room for executor judgment (17 comments). New GitHub signal — model-specific prompt brittleness: Superpowers "Sonnet 5 doesn't like the instructions", OpenSpec "Opus 4.6 skips the step".

Companion move: validates fast-path-as-default + the classify/switch node; argues for gray-box plans (specify the interface, leave implementation judgment) and model/effort routing keyed off the small-vs-oversized signal (rides the Sonnet-vs-Opus debate).
The eval question, answered

Should Companion add evals? Yes — as a trend-watching layer, not a pass/fail gate. An eval isn't a unit test: judge scores are fuzzy and never settle (the speaker's own drifted 94/96/100 with no code change), so you watch the trend. "A single score can lie to you. A trend is much harder to fool."

mechanics
  • Eval = cases + scorers. The spec's acceptance criteria are the cases — they already exist.
  • Two scorer kinds, both in reach: deterministic code scorers (structural + a tool-call assertion: did the agent call the right command with the right params? — maps onto verifying Companion's own dispatch); and an LLM-as-judge that must be a different model, blind to who wrote the answer — exactly what cross-provider enables.
  • Store history, diff against a baseline — the value is the delta across re-runs.
  • Already half-built: the bench harness has an LLM judge (behavioral-judge) + scorecard, as a dev tool. Productize it downward: acceptance-criteria checks → blind second-provider judge → trend in .spec-context.json. Keep it optional, local, on-demand (Langfuse ~$300/mo always-on; Evalite free).
Content / video ideas
#TitleWhy it lands
1"Is Spec-Driven Development Already Obsolete?"Rides the hottest debate; steelman the critiques, land on evals.
2"Brownfield SDD: Adopt Specs in an Existing Codebase"The most-requested capability; barely covered well.
3"Stop Trusting 'Done': Verified Completion w/ a Second-Model Review"Finger-guns + cross-provider review, live.
4"Which Model for Which Step?"Rides the Sonnet-5-vs-Opus war; routing across a pipeline.
5"Spec Drift Is Killing Your Codebase"Keep spec, code, tests in sync (Breunig's triangle).
6"One GUI, Every Provider"Companion demo framed against the "support my tool" flood.
Ranked opportunities
#OpportunityRoadmap tie
1Cross-provider as the headline + widen the provider matrix — the #1 demand; the wedge.positioning + core
2Adversarial cross-model plan review — uniquely enabled by cross-provider.Wave 3 · W3·2
3Clarify-first interrogation gate — the most-validated pattern.Wave 3 · W3·4
4Verified completion, not self-reported — closes the finger-guns gap.verifier → mark-complete
5Optional per-spec eval layer — reuses the bench judge + cross-provider.new
6Brownfield adopt + spec-editing / drift — the most-requested capability.Wave 4
7Reposition the trace as context-durability + anti-lock-in.core trace
8Agent teams / "compare implementations" worktree view.Wave 5 fan-out
9Model/effort routing + token-budget pause; gray-box plans.classify + auto-mode
10HTML plan view; glossary; negative-constraints; spec-derived tests; post-implement refactor node.assorted
What the corpus confirms
Appendix — sources & references
GitHub issues — mined by reactions + freshest-open (Issues only; Discussions not yet searched)
Medium articles — full per-article debrief on its own page

The 10 you flagged plus ~25 discovered, grouped by camp with expandable digests: → The Medium Articles, Debriefed.

Key voices: Kapil Ahuja (SDD-collapse → IDSD/ICE), Jarek Wasowski (facts / BDD / 15-framework map), Enrico Papalini (SDD-as-infrastructure / governance), Rick Hightower (framework layers), and Falk Gottlob (the "kill both" rebuttal).

Videos — 44, with 16 deep notes

Full per-video map: the playlist index · deep dive: the video trends report.

The near-term move

Lead with the one-two punch only Companion can ship cleanly: a clarify gate up front, and a second-model review before "done" — both riding the cross-provider strength that the whole category is begging for. Then close the brownfield + verified-completion gap where the field is moving fastest.

Component reports: Video Trends · Medium Debrief · refreshed weekly by the demand-radar skill.

Sourcing: GitHub & YouTube signals are firsthand. Most Medium bodies are member-paywalled — those digests lean on previews + titles + author list pages; deep claims behind the wall are flagged as unread. Nothing invented.