Editing Openai/694057b6-101c-8007-9a65-c40578c7252d (section)

===== A small package (directory) with a few focused modules: =====
* llama_worker/types.py - All public TypedDict/dataclasses/enums/protocols from Appendix A. - No logic.
* llama_worker/worker.py - LlamaWorker public API implementation. - Wires together the other components. - Owns the request table + slot semaphore. - No low-level subprocess details, no prompt formatting details.
* llama_worker/process.py - Subprocess lifecycle: - spawn llama-server (process group/session) - stop/kill group reliably - capture logs into ring buffer - Exposes minimal interface: start(), stop(), pid, is_alive(), recent_logs().
* llama_worker/transport.py - HTTP client logic to talk to llama-server. - Implements: - readiness probe: GET /v1/models - POST /v1/chat/completions streaming request/response handling - Does not know about slots, tools, BIOS, or orchestration.
* llama_worker/prompting.py - BIOS prompt generation adapter + message-stack assembly. - Contains build_message_stack() implementation. - May contain helpers for combining multiple system prompts if needed.
* llama_worker/tooling.py - Normal tool loop machinery: - detect tool call vs final output - invoke ToolRunner - append tool-result messages - decrement tool budget - Also includes fallback tool parsing strategy.
* llama_worker/exit_signals.py - Parsing and recording of exit-tool calls into signals[]. - Must be explicitly non-terminating (records only).
* llama_worker/liveness.py - Prefill-safe liveness probes: - /proc/<pid> CPU-time delta - process-alive checks - Returns timestamps or “evidence of life” booleans.
* llama_worker/timeouts.py - Implements timeout bookkeeping based on: - connect/header timeouts (transport-level) - progress timestamps (last_stream_byte_at, last_liveness_at) - Returns “should_restart” decisions (policy evaluation), but does not restart itself.
* llama_worker/loopdetect.py - Repeated-line detector. - Pure incremental API: feed text chunks, ask “triggered?”.
* llama_worker/util.py - Small shared utilities (ring buffer, monotonic time helpers, etc.). - Keep tiny.

This structure ensures each concern can be unit-tested in isolation.