ci/fleet-argo-cd #301

Closed
johnride wants to merge 11 commits from ci/fleet-argo-cd into master
Owner
No description provided.
johnride added 8 commits 2026-05-28 00:16:48 +00:00
One git tag `harmony-fleet-operator-v*` now produces both the
container image and a hydrated helm chart at the same version,
pushed to hub.nationtech.io. release.sh is a 5-line wrapper around a
new `harmony-fleet-operator-release` binary in harmony-fleet-deploy
that orchestrates docker build/push, chart hydration via the
existing `build_chart()`, and `helm package`/`helm push`. CI is
reduced to a thin trigger calling the same script developers run
locally.

- chart.rs: ChartOptions gains an optional chart_version (None
  preserves the previous CARGO_PKG_VERSION behavior).
- operator_release.rs: new binary.
- release.sh: thin wrapper.
- .gitea/workflows/harmony-fleet-operator.yaml: rewritten to fire
  on `harmony-fleet-operator-v*` tags (and workflow_dispatch with a
  manual version input).
Two findings from the k3d smoke-test of the release pipeline:

1. `HelmChart::new(name, version)` is misleading — the second arg sets
   appVersion only, while the chart-level `version` is a separate pub
   field defaulting to "0.1.0". The first run produced
   `harmony-fleet-operator-0.1.0.tgz` instead of
   `harmony-fleet-operator-<tag>.tgz`. Set both fields to the released
   tag so one tag → one image + one chart at one matching version.

2. Add `--no-push` to the release binary so the same code path that
   CI exercises is usable locally for k3d smoke-tests without pushing
   to Harbor. The packaged chart tgz is copied into the caller's CWD
   so it survives the binary's tempdir cleanup, and the binary prints
   both artifact paths at the end.

Verified end-to-end on k3d: helm install brings up CRDs, RBAC, and
the operator Deployment; the operator pod reaches Running 1/1 and
starts retrying NATS connect (expected — no NATS deployed in this
smoke-test).
refactor: chart is now namespace-neutral; add dashboard roadmap
Some checks failed
Run Check Script / check (pull_request) Failing after 59s
cc41f190d2
The k3d smoke-test surfaced that the operator chart baked
`fleet-system` into every namespaced manifest (Deployment,
ServiceAccount, Secret) and into the ClusterRoleBinding subject.
Installing into any other namespace failed with helm
release-namespace mismatch.

Fixed by making the chart genuinely namespace-neutral:

- Removed `namespace` from `ChartOptions` entirely.
- `service_account()` and `operator_deployment(opts)` no longer
  set `metadata.namespace`; helm assigns the release namespace at
  install time, and the direct-apply path injects the namespace
  through `K8sResourceScore::single(.., Some(ns))`.
- `operator_secret(opts)` likewise drops `metadata.namespace`; the
  Secret is applied with an explicit namespace by its caller.
- `cluster_role_binding(subject_namespace)` keeps a namespace
  argument because the CRB subject must point at a concrete
  namespace; the chart path passes the literal helm template
  `{{ .Release.Namespace }}` so helm substitutes the release
  namespace at install time. The direct-apply path passes the
  real namespace string.
- `FleetOperatorScore::new()` defaults its own `namespace` field
  (no longer sourced from `ChartOptions::default()`); the chart
  itself carries no namespace default at all.

Verified on k3d by installing the released chart into a
deliberately non-default namespace (`my-fleet`): all resources
land in `my-fleet`, ClusterRoleBinding subject resolves to
`my-fleet`, operator pod runs.

Also adds `ROADMAP/fleet_platform/dashboard_ingress.md` capturing
the three-step dependency chain (build with web-frontend feature →
implement real FleetService → add Service + Ingress to chart) that
the k3d test surfaced when looking for the dashboard. Unnumbered
file per project convention; numbered ones are versioned
milestones.
style: cargo fmt the operator-release binary
All checks were successful
Run Check Script / check (pull_request) Successful in 2m23s
87e142c73d
Collapses a chained `current_dir()?.join(..)` per rustfmt's
preferred layout. Caught by ./build/check.sh's fmt step; no
behavior change.
refactor: use tracing instead of eprintln in operator-release
All checks were successful
Run Check Script / check (pull_request) Successful in 2m26s
b29d466240
eprintln! was the wrong tool for a long-running CLI invoked from
shell scripts and CI. The rest of the codebase (operator,
fleet-e2e harness, etc.) uses tracing::info!, and a release tool
should honor RUST_LOG.

Initializes a stderr tracing subscriber defaulting to info so the
progress lines show up without configuration, while still letting
operators silence (`RUST_LOG=warn`) or expand (`RUST_LOG=debug`)
the output. The final summary uses structured fields (image=,
chart=) so downstream tools can grep for them deterministically.
refactor: app-scoped release binary harmony-fleet-release
All checks were successful
Run Check Script / check (pull_request) Successful in 2m41s
Release harmony-fleet-operator (image + chart) / release (push) Successful in 4m38s
81bf8a9257
Renames `harmony-fleet-operator-release` to `harmony-fleet-release`
and adds a `--component <operator|agent|nats-callout>` flag. The
binary's surface is now app-scoped (one binary per app, all fleet
components behind a flag) rather than component-scoped (one binary
per component), matching ADR-023's "deploy binary statically lists
its supported components" guidance.

Why: adding the agent and nats-callout release pipelines later
would otherwise mean three near-identical binaries with copy-pasted
docker/helm orchestration. Folding them under one binary keeps the
shared 90% in one place and reduces each new component to:

  - a new `Component` enum variant
  - a `Component::spec` arm naming the Dockerfile + image
  - a `hydrate_chart` arm pointing at the component's `build_chart`

`agent` and `nats-callout` variants exist today but bail with an
actionable error pointing at the roadmap; this keeps `--help`
honest about what's coming without lying about what works.

The per-component `release.sh` wrapper pattern stays: each
component's script (today `fleet/harmony-fleet-operator/release.sh`,
tomorrow agent's and callout's) is a 1-line wrapper that pre-fills
`--component`. This lets a tag like `harmony-fleet-operator-v0.1.0`
route to the right component via the existing CI workflow without
the operator having to remember a flag.

File renamed via `git mv` so blame history is preserved. Verified
on k3d with `--component operator --no-push`: produces the same
image + chart pair as before.
Add `ingress_class_name: Option<String>` so callers can pin the
class (OKD `openshift-default`, vanilla `nginx`/`traefik`) or
hand `None` to fall through to the cluster's default IngressClass.

Gate the global + redis `securityContext.runAsUser: null` on the
`openshift` flag. The null was an OKD restricted-v2 SCC trick;
on vanilla k8s the same combination (no UID + the chart's
`runAsNonRoot: true`) makes redis CrashLoop with "image will run
as root". Off OKD, emit `{}` so the chart's own defaults stay
in force.

Single pre-existing caller in packaging_deployment.rs preserved
on the OKD path.
feat(fleet): Argo-based CD path for the fleet operator
Some checks failed
Run Check Script / check (pull_request) Failing after 56s
445a24f34d
Adds `FleetArgoOperatorAppScore` (Argo `Application` CR pointing
at `oci://hub.nationtech.io/harmony/harmony-fleet-operator:<tag>`,
plus the operator credentials Secret which lives outside the
chart) and `FleetArgoScore` (bootstrap Argo CD via the existing
`ArgoHelmScore` and compose the operator app on top).

CLI gains `--use-argo` to swap the operator path from
direct-helm to Argo and `--operator-chart-version` as the single
knob CI re-runs to deploy/upgrade/rollback. Re-running with a
different version patches the Application's `targetRevision`;
Argo syncs. Roll-forward only.

NATS + agent stay on the direct path in v1; NATS-via-Argo and
callout-via-Argo will follow as their charts and deploy stories
land. Flattens `harmony_cli::Args` into the binary's CliConfig
so `--yes` makes it through one argv parse (without this the
binary refuses to run non-interactively).

Live tested on a fresh k3d cluster against the published
`hub.nationtech.io/harmony/harmony-fleet-operator:0.0.1` artifact:
Argo installs, Application reconciles into `fleet-system`, the
operator pod runs (it then crash-loops on NATS auth — orthogonal,
same behavior as direct-helm without credentials).
johnride reviewed 2026-05-28 01:02:04 +00:00
johnride left a comment
Author
Owner

This is about 5x more code than it should be, and 10x more documentation/comments. Do better.

This is about 5x more code than it should be, and 10x more documentation/comments. Do better.
@@ -45,0 +51,4 @@
# wants (e.g. `harmony-fleet-operator-v0.1.0` → `v0.1.0`). On
# manual workflow_dispatch the operator passes `version`
# directly.
- name: Resolve version
Author
Owner

should be done in rust, we don't want to depend on bash inlined in yaml it's not only locking-in and not reusable it is brittle.

should be done in rust, we don't want to depend on bash inlined in yaml it's not only locking-in and not reusable it is brittle.
@@ -45,0 +75,4 @@
# `harmony` workspace from scratch in the Dockerfile's builder
# stage. cargo-chef + `cache-from: type=gha` would help once
# build time becomes the bottleneck.
- name: Build and push image + chart
Author
Owner

it would be better to resolve the version in the bash script here, at least it is not crappy inline yaml in ci config absolutely not reusable anywhere. But ideally it should happen in rust, harmony being an infrastructure tool it is relevant to have git understanding to resolve tags and other stuff.

it would be better to resolve the version in the bash script here, at least it is not crappy inline yaml in ci config absolutely not reusable anywhere. But ideally it should happen in rust, harmony being an infrastructure tool it is relevant to have git understanding to resolve tags and other stuff.
@@ -0,0 +1,418 @@
//! Argo-CD-managed deploy for the fleet stack.
Author
Owner

This is making a simple thing hard and complicated. We already have an argo module, just use it. Don't rebuilt another one on top. Complete waste of lines of code and time and documentation.

This is making a simple thing hard and complicated. We already have an argo module, just use it. Don't rebuilt another one on top. Complete waste of lines of code and time and documentation.
@@ -0,0 +40,4 @@
name = "harmony-fleet-release",
about = "Build and push a fleet component's image + helm chart for a tagged release"
)]
struct Cli {
Author
Owner

We will have to make this simpler, we should not have to rebuild a new cli for every component of every app using harmony. This should be a macro call or a simple function call, I don't know yet but definitely not a 270 lines file.

We will have to make this simpler, we should not have to rebuild a new cli for every component of every app using harmony. This should be a macro call or a simple function call, I don't know yet but definitely not a 270 lines file.
johnride force-pushed ci/fleet-argo-cd from 445a24f34d to f4fd5d312a 2026-05-28 01:23:03 +00:00 Compare
johnride added 1 commit 2026-05-28 01:38:08 +00:00
ci(fleet-operator): extract version resolution to a shared bash script
All checks were successful
Run Check Script / check (pull_request) Successful in 2m22s
80b3cf1c31
Workflow yaml had a 12-line inline bash block computing the release
version from `GITHUB_REF_NAME` or `inputs.version`. Moves the
logic into `.gitea/scripts/resolve-release-version.sh` so the
workflow yaml is back to one-line invocations and the resolver is
reusable by sibling component workflows (agent, callout).

This is the interim form. The real fix is a harmony Rust binary
that understands git refs directly — see PR thread on framework-
level build/package/release ownership.
johnride added 1 commit 2026-05-28 02:25:41 +00:00
docs(adr): draft 012-1 — release architecture mechanism
All checks were successful
Run Check Script / check (pull_request) Successful in 2m22s
1349739ed7
Clarification + concrete mechanism for ADR-012 (Project Delivery
Automation). Pulls together two prior attempts (modules/application
RustWebapp + the per-component harmony-fleet-release binary) and
proposes a unified shape: release is a Score driven by Topology
capabilities (ContainerBuilder, OciRegistry, HelmRegistry,
ContinuousDelivery), composed alongside DeployScore /
MonitoringScore into the opinionated pipeline ADR-012 specified.

Kept in drafts as 012-1 rather than a fresh ADR-025 — this
implements ADR-012's intent, doesn't compete with it. Open
questions deliberately left open for further 012-N follow-ups.
johnride closed this pull request 2026-06-01 15:39:20 +00:00
All checks were successful
Run Check Script / check (pull_request) Successful in 2m22s

Pull request closed

Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: NationTech/harmony#301
No description provided.