Some checks failed
Run Check Script / check (pull_request) Failing after 59s
The k3d smoke-test surfaced that the operator chart baked
`fleet-system` into every namespaced manifest (Deployment,
ServiceAccount, Secret) and into the ClusterRoleBinding subject.
Installing into any other namespace failed with helm
release-namespace mismatch.
Fixed by making the chart genuinely namespace-neutral:
- Removed `namespace` from `ChartOptions` entirely.
- `service_account()` and `operator_deployment(opts)` no longer
set `metadata.namespace`; helm assigns the release namespace at
install time, and the direct-apply path injects the namespace
through `K8sResourceScore::single(.., Some(ns))`.
- `operator_secret(opts)` likewise drops `metadata.namespace`; the
Secret is applied with an explicit namespace by its caller.
- `cluster_role_binding(subject_namespace)` keeps a namespace
argument because the CRB subject must point at a concrete
namespace; the chart path passes the literal helm template
`{{ .Release.Namespace }}` so helm substitutes the release
namespace at install time. The direct-apply path passes the
real namespace string.
- `FleetOperatorScore::new()` defaults its own `namespace` field
(no longer sourced from `ChartOptions::default()`); the chart
itself carries no namespace default at all.
Verified on k3d by installing the released chart into a
deliberately non-default namespace (`my-fleet`): all resources
land in `my-fleet`, ClusterRoleBinding subject resolves to
`my-fleet`, operator pod runs.
Also adds `ROADMAP/fleet_platform/dashboard_ingress.md` capturing
the three-step dependency chain (build with web-frontend feature →
implement real FleetService → add Service + Ingress to chart) that
the k3d test surfaced when looking for the dashboard. Unnumbered
file per project convention; numbered ones are versioned
milestones.
93 lines
4.4 KiB
Markdown
93 lines
4.4 KiB
Markdown
# Fleet operator dashboard — make shippable and expose via Ingress
|
|
|
|
## Context
|
|
|
|
The operator binary has a server-side dashboard (axum + Maud + HTMX
|
|
under `fleet/harmony-fleet-operator/src/frontend/`), but it is **not
|
|
shippable today**. The k3d smoke-test of the release pipeline made
|
|
this concrete: the chart correctly omits any `Service` or `Ingress`
|
|
because there is no production-ready dashboard endpoint to point them
|
|
at. Three blockers, in order of dependency.
|
|
|
|
## Work to be done
|
|
|
|
### 1. Build the production image with the dashboard included
|
|
|
|
- [ ] Update `fleet/harmony-fleet-operator/Dockerfile` to build with
|
|
`--features web-frontend` (currently
|
|
`cargo build --release --locked -p harmony-fleet-operator`,
|
|
no features).
|
|
- [ ] Confirm Tailwind CSS is embedded at build time inside the
|
|
builder stage. The crate doc says the CSS is embedded when
|
|
`tailwindcss` is on PATH at build time, otherwise the bundle is
|
|
empty and `--css-from` must be passed at runtime. Decide: ship
|
|
with embedded CSS (install `tailwindcss` in the builder stage)
|
|
or document the empty-bundle path.
|
|
- [ ] Confirm the build still satisfies the cross-compile gating
|
|
added in PR #291 (`ci: fix Windows cross-compile by gating
|
|
unix-only harmony code`) — the `web-frontend` feature must not
|
|
pull in unix-only code on Windows targets if Windows is still a
|
|
CI target.
|
|
|
|
### 2. Replace the mock-only `serve-web` with a real implementation
|
|
|
|
- [ ] Implement `FleetService` against the real NATS + Kubernetes
|
|
backend (the operator currently uses
|
|
`MockFleetService::default()` and bails when `--mock` is
|
|
not passed: `main.rs:125` — `"serve-web without --mock is not
|
|
implemented yet (real FleetService impl pending)"`).
|
|
- [ ] Decide the runtime topology: does the controller and the web
|
|
server share a Pod and a process? Two containers in one Pod?
|
|
Two separate Deployments? Current code suggests "same process,
|
|
different subcommand"; the chart will need to be updated
|
|
whichever way it goes.
|
|
- [ ] Wire the Zitadel auth env vars (`FLEET_AUTH_*` from `dev.sh`)
|
|
through the chart's Pod env. These are
|
|
operator-environment-specific (like the existing
|
|
`FLEET_OPERATOR_CREDENTIALS_TOML` Secret) and should likely
|
|
stay out of the redistributable chart, mounted by the deploy
|
|
pipeline.
|
|
- [ ] Decide on the `FLEET_OPERATOR_COOKIE_KEY_B64` lifecycle:
|
|
operator-generated on first boot? Deploy-time secret? Document.
|
|
|
|
### 3. Expose the dashboard via Service + Ingress in the chart
|
|
|
|
- [ ] Add a `Service` resource to `chart.rs` (ClusterIP, target port
|
|
18080 to match the default `serve-web --addr`).
|
|
- [ ] Add an `Ingress` resource. Open questions:
|
|
- Ingress class: assume `traefik` (k3d default)? Make it
|
|
configurable via `ChartOptions`?
|
|
- Host: configurable via `ChartOptions` (e.g.,
|
|
`fleet.my-cluster.example.com`); no sensible default.
|
|
- TLS: cert-manager `ClusterIssuer` reference, or expect TLS to be
|
|
terminated upstream? Probably a `ChartOptions.tls_issuer:
|
|
Option<String>` knob — `None` means "no TLS section on the
|
|
Ingress."
|
|
- [ ] Decide whether the Ingress is in scope for the chart at all,
|
|
or whether it should live in a separate `*-ingress` chart that
|
|
the deploy layer composes. The first path is simpler;
|
|
the second matches "small composable Scores" from ADR-023.
|
|
- [ ] Smoke-test on k3d: install the chart, `curl` the dashboard
|
|
through the k3d LoadBalancer, confirm HTTP 200 and the page
|
|
renders.
|
|
|
|
## Out of scope here
|
|
|
|
- Decisions about who hosts the dashboard's auth (Zitadel-only or
|
|
multi-IdP) — that's a product question, not a chart question.
|
|
- Operator HA. The current chart is `replicas: 1`. Multi-replica
|
|
needs leader election in the controller, which is its own work.
|
|
- Dashboard observability (metrics endpoint, structured access
|
|
logs) — fold in when adding the Service.
|
|
|
|
## Why this lives in its own roadmap
|
|
|
|
These three items are dependency-chained (1 → 2 → 3) and each is
|
|
non-trivial. Bundling them with the CI release pipeline would couple
|
|
unrelated risks and make the PR un-reviewable. Keeping this file
|
|
unnumbered (per
|
|
[`ROADMAP/fleet_platform/v0_1_plan.md`](v0_1_plan.md) and
|
|
[`v0_2_plan.md`](v0_2_plan.md) — numbered files are versioned
|
|
milestones) signals that this is a free-floating workstream that
|
|
slots into whichever milestone picks it up.
|