Build Log 2 — Capturing the Whole Lifecycle, and Shipping v1

A magnifier lens and scan beam sweeping a glowing strip of data tape, freezing on a clump of marks collapsed onto one point and flagged with a red X — an inspector catching the data lying

Build Log 1 proved one link in the chain: a spec-kit after_specify hook writes my state file, and the SpecKit Companion GUI lights up from it. One write. That's a spike, not a product.

The problem shows up the second you do anything past specify. Plan, tasks, implement, none of them are captured, so the moment you move forward, the GUI is showing a half-truth. And it gets worse: spec-kit hooks are agent-mediated. They fire only if the agent runs them. Skip a command, run one out of band, open an old project that never had the extension, and the hook never fires at all. The GUI doesn't just go stale. It goes blind.

So this log has one job: capture the whole lifecycle, and never lie about state, even when a hook didn't fire. When that holds, v1 ships, tracking and resume on any spec-kit project, with zero template changes.

What I built

Three things stacked on each other: the rest of the capture chain, a safety net for when a hook doesn't fire, and the two commands that finally make the captured state useful.

The lifecycle hooks. Build Log 1 had one hook. Now there are four, each backed by a per-step capture command that reuses the same write-context.py writer.

# 📃 speckit-extension/extension.yml
hooks:
  after_specify: { command: speckit.companion.capture, optional: false }
  after_plan: { command: speckit.companion.capture-plan, optional: false } # 👈 new
  after_tasks: { command: speckit.companion.capture-tasks, optional: false } # 👈 new
  after_implement: { command: speckit.companion.capture-implement, optional: false } # 👈 new

Each one appends an entry to history[], the canonical event journal, and advances currentStep and status. (The old name transitions[] is gone. One journal, one name.) Inside implement, each completed - [x] **T###** task gets its own entry, recorded as a substep so the viewer never mistakes a single task for the whole step finishing.

The derive-from-files fallback. This is the part that earns "never lie." derive-from-files.py is a stdlib reader that reconstructs history[] from the artifacts already on disk, spec.md, plan.md, tasks.md, and which task markers are checked, plus git history, when no hook ever fired. Same no-backward-clobber guard as the live path, tagged by: "derive" so you can tell reconstructed state from captured state. Delete the state file, or run on a project that never touched the extension, and the GUI still reflects reality.

Status and resume. Two commands turn all that captured state into something you actually use:

/speckit.companion.status reads the canonical file (or derives it on a miss) and prints the current step, status, recorded decisions, and the next action.
/speckit.companion.resume picks up at the next unchecked task, carries the recorded decisions[] back into scope, and dispatches the next /speckit.* command.

The Companion sidebar surfaces the same data: the SPECS tree lists every active spec with its current step, refreshed live by the watcher that was already there.

The SpecKit Companion SPECS sidebar tree, with the active spec 131 expanded to its Specification, Plan, and Tasks rows

That's v1. Tracking plus resume on any spec-kit project, no template change. Open a tracked spec and the whole journal renders: each phase with its real duration, every task with the one-line summary it wrote as it landed.

The SpecKit Companion spec viewer showing the four phases (Specify, Plan, Tasks, Implement) with durations, plus the per-task T001 to T004 substeps and their DONE summaries

Does it actually work?

Start with the easy half: the deterministic tests. The capture layer ships 12 stdlib tests, append-only history, no-backward-clobber, unknown-key preservation, per-task idempotency, a full derive round-trip. Status and resume add 25 more Python resolver tests and 354 green Jest tests on the GUI side. All passing.

The interesting part is that I wrote an eval, and the eval caught capture lying.

eval-speckit-extension is a re-runnable checker: it asserts the journal is well-formed, the timestamps are monotonic and not backfilled, and the per-task substeps line up with tasks.md. It also prints a timing breakdown, and that breakdown is what blew the whistle. Here's one of my own specs after a full implement run:

# 👇 spec 130 — every task stamped at the same instant, by the hook
implement T001 … T024   by: extension   at: 2026-06-07T23:27:31.648Z
  cadence: BURST — 24 task substeps, 0ms span   ✗

Twenty-four tasks, one timestamp, all written by: extension. The capture said all 24 ran, but with zero time between them. The journal looked complete and was quietly meaningless. The root cause is structural: the after_implement hook fires once, at the end of the step. A hook physically cannot produce cadence inside a step it only sees the end of.

So the fix moves the cadence source out of the hook and into the implement preamble. The idea: have the AI journal each task the moment it finishes, authored by: "ai" with a real timestamp, instead of letting one hook stamp them all at the end.

// 📃 .spec-context.json — a live per-task entry, now written by the AI (spec 131)
{
  "step": "implement",
  "substep": "T004",
  "task": "T004",
  "kind": "start",
  "by": "ai",
  "at": "2026-06-08T11:29:35Z"
}

Because each live entry carries its task id, the old hook's task-sync dedupes against them and degrades to a no-op backstop. If the AI skips the journaling, the hook still records everything. Strictly better, never worse.

Here's the honest part: it isn't fixed yet. I re-ran the eval on the next spec, and it refused to sign off:

✗ [FAIL] timestamps-real: 10/16 look hand-typed (round ms) — capture may be backfilled, not live
✓ [PASS] per-task-substeps: 4 task events; substep==task: True
· [INFO] task-cadence: 4 tasks; gaps 0ms, 0ms, 39.0s
→ 12 pass / 1 fail / 4 info

The entries are by: "ai" now, which is the structural win. But on a small, fast spec the AI still batched three of them at one timestamp, and those round-second stamps look backfilled, not live. The burst isn't gone. It's smaller.

And that's the actual point of this beat. I shipped the capture, the fallback, and an eval whose entire job is to refuse the story I want to tell. It caught the original 24-task 0ms burst, and it's still catching the residual one. The cadence quality is a known bug, tracked in the backlog, not hand-waved away in the journal. A state engine that lies confidently is worse than one that admits it isn't done.

Before and after: on the left a dense stack of two dozen task markers collapsed onto one instant (a 0ms burst); on the right the markers journaled by an AI and spread along a timeline, except three still stacked at one point and caught under a magnifier with a red X

The other two proofs are quieter but matter just as much. Delete .spec-context.json on a real spec and derive-from-files.py rebuilds the right step and status from the artifacts and git alone. And /speckit.companion.status reads that journal, or the derived one, and tells you exactly where a spec sits:

$ /speckit.companion.status
Spec: Fix Opencode Prompt Flag   (source: state)
Step: implement   Status: implemented
Decisions: (none recorded)
Next: Pipeline complete  →  —

Its sibling, /speckit.companion.resume, reads that same resolved state, carries the recorded decisions[] into scope, and dispatches the next /speckit.* command, or refuses to invent one when the pipeline is already done. I pointed it at the spec above, which had finished every step:

$ /speckit.companion.resume
The resolution is complete: true.
Pipeline complete — nothing to resume.

That's the terminal branch doing exactly the right thing: read the journal, resolve complete: true, dispatch nothing. It's the unglamorous half of resume, but it's the half that keeps the command from running a finished spec off a cliff. The interesting half, resume re-entering a half-done pipeline and firing the next command, is covered by the 25 resolver tests today, and it earns a proper walkthrough rather than a footnote here. So that gets its own log.

And the final proof is the one I'd been putting off: installing it the way a stranger would. Not the dev symlink I'd been dogfooding all along, but the real thing, specify extension add companion --from <release.zip> off a public GitHub release. Then I ran /speckit.specify on a fresh feature and watched the release-installed hook write the file itself:

// 📃 specs/131-fix-opencode-prompt-flag/.spec-context.json — written by the release-installed extension
{
  "step": "specify",
  "kind": "start",
  "from": { "step": null },
  "by": "extension",
  "at": "2026-06-08T02:09:08.437Z"
}

by: "extension" means the hook fired, not a manual write; the ms-precision timestamp means it fired live, at the moment specify ran, not backfilled later. The same canonical history[], the same writer, but emitted by a copy of the extension that came down the wire from a release archive. That's the line between "works on my machine" and "works." v1 ships.

Why I built it this way

A few choices I'd defend to anyone copying this:

Reuse the hook mechanism for every lifecycle event. No daemon, no file watcher polling for changes. It rides spec-kit's own prompt-driven execution, the same machinery Build Log 1 already proved.
Derive-from-files is a first-class capture path, not a gap-filler. Hooks are best-effort by design. The fallback means the GUI reflects reality even on a project that never ran the extension. State you can't trust is worse than no state.
Status and resume are thin readers over the one canonical file. No second source of truth to drift out of sync.
Resume dispatches the already-installed /speckit.* commands, not a specify workflow resume CLI. The backlog assumed a >=0.8.5 subcommand, but my workspace runs 0.7.4.dev0. Binding resume to a subcommand that might not exist would break on exactly the stock installs v1 is supposed to support. Dispatching what's already there works everywhere.
Capture and status/resume stay always-on. They're the v1 core the GUI reads from. The differentiator features, the lean preset, complexity, drift, auto mode, agent teams, all come later and ship opt-out. The thing that tracks your work shouldn't be a setting.

What's next

v1 is the boundary this log climbs to: install the Companion on any spec-kit project and get tracking plus resume on your existing flow, no template change. From here the posts stop being plumbing and start being the SDD opinion.

Two things I'm deliberately carrying forward. The cadence still bursts on fast specs, and the eval won't let me forget it. And resume deserves to be shown re-entering a live pipeline, not described. Both get their own treatment next, alongside Build Log 3: the SDD shape, the sdd-lean preset that overrides the templates, and a complexity fast-path that right-sizes the ceremony so a one-line change doesn't drag the full pipeline behind it.

The public backlog tracks every step as it merges. Watch the repo, and I'll see you at the next one.

Build Log 2 — Capturing the Whole Lifecycle, and Shipping v1

What I built

Does it actually work?

Why I built it this way

What's next

Build Log 4 — Your Pipeline, as Nodes

Build Log 3 — Turbo Mode and the Fast Path

Build Log 1 — Proving the Riskiest Assumption