Expand description
Session recovery after unclean shutdown (Phase 4.2 — ultra-long-sessions).
When a persistent agent subprocess is running and the server is killed (crash, OS reboot, task-kill), the subprocess dies with it but the block’s session history is preserved in FileStore. On next startup, we want to:
- Detect that a session was running when the server died, so we can surface it to the user as “this session was interrupted”.
- Let the user resume: the persistent controller already supports
--resume <session_id>on the next input, so the recovery UX is simply “click resume, type your next message, get picked up mid-flight”.
The mechanism is a single boolean meta flag: session:active_pid. It’s
set on subprocess spawn, cleared on clean exit (graceful or killed). If
this flag is still set when the server boots, the process is definitely
gone (old PID from a dead process), so we transfer it to
session:was_interrupted = true.
session:was_interrupted is a frontend-only signal — the backend doesn’t
consume it. The frontend AgentControlBar renders a banner when it’s set,
and service:update_object_meta clears it when the user dismisses.
Constants§
- META_
SESSION_ ACTIVE_ PID - PID of the current running subprocess; 0 or missing = no process.
- META_
SESSION_ WAS_ INTERRUPTED - Set to
trueby startup scan when a pre-existingactive_pidis found.
Functions§
- clear_
active_ pid - Clear
session:active_pid— called when the subprocess exits for any reason. - mark_
active_ pid - Record that a subprocess with
pidhas been spawned forblock_id. Best-effort — logs on failure but never panics. - scan_
orphans - Scan all blocks at server startup. For any agent block that still has
session:active_pidset, transfer the flag tosession:was_interrupted.