Module session_recovery

Module session_recovery 

Source
Expand description

Session recovery after unclean shutdown (Phase 4.2 — ultra-long-sessions).

When a persistent agent subprocess is running and the server is killed (crash, OS reboot, task-kill), the subprocess dies with it but the block’s session history is preserved in FileStore. On next startup, we want to:

  1. Detect that a session was running when the server died, so we can surface it to the user as “this session was interrupted”.
  2. Let the user resume: the persistent controller already supports --resume <session_id> on the next input, so the recovery UX is simply “click resume, type your next message, get picked up mid-flight”.

The mechanism is a single boolean meta flag: session:active_pid. It’s set on subprocess spawn, cleared on clean exit (graceful or killed). If this flag is still set when the server boots, the process is definitely gone (old PID from a dead process), so we transfer it to session:was_interrupted = true.

session:was_interrupted is a frontend-only signal — the backend doesn’t consume it. The frontend AgentControlBar renders a banner when it’s set, and service:update_object_meta clears it when the user dismisses.

Constants§

META_SESSION_ACTIVE_PID
PID of the current running subprocess; 0 or missing = no process.
META_SESSION_WAS_INTERRUPTED
Set to true by startup scan when a pre-existing active_pid is found.

Functions§

clear_active_pid
Clear session:active_pid — called when the subprocess exits for any reason.
mark_active_pid
Record that a subprocess with pid has been spawned for block_id. Best-effort — logs on failure but never panics.
scan_orphans
Scan all blocks at server startup. For any agent block that still has session:active_pid set, transfer the flag to session:was_interrupted.