Files
harmony/harmony
Jean-Gabriel Gill-Couture bc2edf4530 feat(podman): init containers with k8s-style run-to-completion semantics
Customer apps frequently need a one-shot setup step (DB migration,
config render, cache warm-up) to succeed before the long-running
service starts. Without init containers each customer either inlines
the step into the service entrypoint (slow, racy, no failure surface)
or bolts on a sidecar that the platform can't introspect. This change
adds k8s-style init containers at the score layer so the contract is
the same one the customer already knows.

Score:
- New `InitContainer { name, image, args, env, volumes, timeout }`
  in `harmony::modules::podman`.
- `PodmanV0Score.init_containers: Vec<InitContainer>` with
  `#[serde(default)]` — pre-init-container wire payloads parse as an
  empty vec and behave unchanged.
- `DEFAULT_INIT_CONTAINER_TIMEOUT = 300s`; timeout serializes as
  whole seconds for operator readability.
- Idempotency is the customer's contract — documented at module
  level: init containers re-run on every reconcile that needs a
  fresh main container set.

Runtime contract:
- `ContainerRuntime::run_to_completion(spec, timeout) -> RunOutcome`
  added to the domain trait. `RunOutcome::Exited { exit_code }`
  vs `TimedOut { waited }` — distinct arms because the caller's
  failure path is different (operator gets the exit code for
  actionable diagnosis).
- Init containers are NOT surfaced via `list_managed_services`;
  they're removed after they exit so the host's managed-container
  surface stays bounded to long-running services.

PodmanTopology implementation:
- Pre-remove any prior container with the same name (retry-safe).
- Restart policy forced to `No` — a retrying init defeats the
  run-to-completion contract.
- `tokio::time::timeout` around `podman wait`; force-remove + return
  `TimedOut` on deadline.
- Single 200ms retry on inspect for the libpod race where state can
  briefly read `running` between `wait` returning and conmon writing
  the exit code.
- `INIT_CONTAINER_LABEL` on every init container so operators can
  `podman ps -a --filter label=...` to spot init failures.

Interpret:
- Init containers run sequentially before any service. Non-zero exit
  or timeout fails the deployment with a typed `InterpretError`
  carrying the container name + cause.
- Success message reports both counts.

Tests (in tree):
- 3 new wire-format tests in `podman::score`: roundtrip, default
  timeout hydration, ordering preservation.
- All 10 existing podman::score tests still pass; legacy roundtrip
  test now also asserts `init_containers.is_empty()` as a wire-compat
  canary.

Call-site updates (5 sites) — all existing constructors of
`PodmanV0Score` add `init_containers: vec![]`: harmony_apply_deployment
example, fleet_load_test example, operator e2e, vm_deploy_lifecycle
e2e, vm_isolation e2e.

Deferred: per-version "run-once" semantics (customer can build with a
marker file today); the agent-side handler for surfacing init logs to
the operator dashboard (covered by the logs companion PR's deferred
work).
2026-05-24 21:56:39 -04:00
..