feat(fleet-deploy): smoke-test contract as a Score companion #292
Open
johnride
wants to merge 3 commits from
feat/smoke-test-contract into master
pull from: feat/smoke-test-contract
merge into: NationTech:master
NationTech:master
NationTech:feat/fleet-ch2-operator-recovery
NationTech:feat/fleet-device-exec-logs
NationTech:feat/zitadel-web-pkce-and-human-user
NationTech:feat/jwt-bearer-openbao-auth
NationTech:feat/fleet-ch5-graceful-deploy-upgrade
NationTech:feat/fleet-ch4-agent-upgrade
NationTech:feat/fleet-ch3-log-streaming
NationTech:feat/add-claims-for-openbao
NationTech:refactor/move-zitadel-jwt-to-module
NationTech:feat/fleet-operator-real-data
NationTech:docs/fleet-secrets-device-access
NationTech:chore/fleet-operator-prune-mock-dtos
NationTech:chore/rename-release-to-publish
NationTech:refactor/config-namespace-env-var
NationTech:feat/fleet-staging-openbao
NationTech:feat/auth-add-next-url-redirect
NationTech:pr/harmony-sso-example
NationTech:feat/unified-config-and-secrets
NationTech:ci/fleet-argo-cd
NationTech:ci/fleet-operator-release-pipeline
NationTech:feat/on-device-key-gen
NationTech:feat/install-gitea
NationTech:feat/v0-3-logs-companion
NationTech:refactor/smoke-companion-minimal
NationTech:feat/iobench-redpanda-profile
NationTech:feat/v0-3-dashboard-role-enforcement
NationTech:feat/v0-3-init-containers
NationTech:feat/v0-3-operator-restart-baseline
NationTech:feat/fleet-e2e-x86
NationTech:feat/ceph-score
NationTech:feat/opnsense-bootstrap-score
NationTech:feat/fleet-e2e
NationTech:feat/fleet-e2e-harness-and-ping
NationTech:feat/dashboard-auth
NationTech:feat/fleet-operator-web-frontend
NationTech:feat/deploy_fleet_server_side
NationTech:feat/openwebui
NationTech:feat/iot-aggregation-scale
NationTech:feat/iot-operator-helm-chart
NationTech:feat/removesideeffect
NationTech:feat/test-alert-receivers-sttest
NationTech:feat/brocade-client-add-vlans
NationTech:feat/agent-desired-state
NationTech:feat/opnsense-dns-implementation
NationTech:feat/named-config-instances
NationTech:worktree-bridge-cse_012j1jB37XfjXvDGHUjHrKSj
NationTech:chore/leftover-adr
NationTech:feat/config_e2e_zitadel_openbao
NationTech:example/vllm
NationTech:feat/config_sqlite
NationTech:chore/roadmap
NationTech:feature/kvm-module
NationTech:feat/rustfs
NationTech:feat/harmony_assets
NationTech:feat/brocade_assisted_setup
NationTech:feat/cluster_alerting_score
NationTech:e2e-tests-multicluster
NationTech:fix/refactor_alert_receivers
NationTech:feat/change-node-readiness-strategy
NationTech:feat/zitadel
NationTech:feat/improve-inventory-discovery
NationTech:fix/monitoring_abstractions_openshift
NationTech:feat/nats-jetstream
NationTech:adr-nats-creds
NationTech:feat/st_test
NationTech:feat/dockerAutoinstall
NationTech:chore/cleanup_hacluster
NationTech:doc/cert-management
NationTech:feat/certificate_management
NationTech:adr/017-staleness-failover
NationTech:fix/nats_non_root
NationTech:feat/rebuild_inventory
NationTech:fix/opnsense_update
NationTech:feat/unshedulable_control_planes
NationTech:feat/worker_okd_install
NationTech:doc-and-braindump
NationTech:fix/pxe_install
NationTech:switch-client
NationTech:okd_enable_user_workload_monitoring
NationTech:configure-switch
NationTech:fix/clippy
NationTech:feat/gen-ca-cert
NationTech:feat/okd_default_ingress_class
NationTech:fix/add_routes_to_domain
NationTech:secrets-prompt-editor
NationTech:feat/multisiteApplication
NationTech:feat/ceph-install-score
NationTech:feat/ceph-osd-score
NationTech:feat/ceph_validate_health
NationTech:better-indicatif-progress-grouped
NationTech:feat/crd-alertmanager-configs
NationTech:better-cli
NationTech:opnsense_upgrade
NationTech:feat/monitoring-application-feature
NationTech:dev/postgres
NationTech:feat/cd/localdeploymentdemo
NationTech:feat/webhook_receiver
NationTech:feat/kube-prometheus
NationTech:feat/init_k8s_tenant
NationTech:feat/discord-webhook-receiver
NationTech:feat/kube-prometheus-monitor
NationTech:feat/tenantScore
NationTech:feat/teams-integration
NationTech:feat/slack-notifs
NationTech:monitoring
NationTech:runtime-profiles
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
No description provided.
Delete Branch "feat/smoke-test-contract"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Phase 0 of the smoke-test contract from ADR-023 P4 ("deploy returns
only after smoke-test success"). Lands inside harmony-fleet-deploy
under companion/smoke/ so the shape can be validated against a real
consumer before promoting to a top-level crate.
Three small types + a wrapper, zero edits to the Score / Interpret /
Maestro public API:
Probe single-attempt unit; classifies its own observation
as Ok / Retry / Fatal via ProbeAttempt.
SmokeSuite ordered list of (probe, RetryPolicy) stages; sequential
by design.
SmokeTest companion trait paired with a Score by associated type
—
SM: SmokeTest<T, Score = S>is the compile-timelock (ADR-024 §2 / JG's compile-time-feedback principle).
deploy / free functions binding Score::interpret to an optional
deploy_with_ SmokeTest. The smoke variant returns only after the
smoke suite passes; a failed probe is DeployError::SmokeFailed
with the full report attached.
Cardinality choices follow JG's For the Love of Compilers talk:
ProbeName is a validated newtype (rejected: empty / control-chars /
One concrete probe ships in this PR — TcpReachable — to validate the
trait shape against real network I/O. Run_probe orchestrates retry +
timeout outside the probe itself so each probe stays single-attempt
and unit-testable.
Stage progress is emitted via tracing::info_span! / info! today.
Adding HarmonyEvent::SmokeStage* variants is deferred to Phase 1
(Score / Interpret stay untouched in Phase 0).
Phase 1 follow-ups: HttpHealthy, K8sPodReady, NatsKvKeyExists probes;
a real FleetOperatorSmokeTest composed of those; optional dashboard
event variants. None of those touch the contract — that's the point
of locking the shape here.
Tests: 21 new (4 in tcp::tests, 8 in probe::tests, 3 in suite::tests,
4 in deploy::tests). Full crate: 39 / 39 passing. No regressions.
build/check.sh equivalents: cargo check --all-targets --all-features
clean, cargo fmt --check clean, cargo clippy with zero findings on
the new module.
Phase 0 of the smoke-test contract from ADR-023 P4 ("deploy returns only after smoke-test success"). Lands inside harmony-fleet-deploy under companion/smoke/ so the shape can be validated against a real consumer before promoting to a top-level crate. Three small types + a wrapper, zero edits to the Score / Interpret / Maestro public API: Probe single-attempt unit; classifies its own observation as Ok / Retry / Fatal via ProbeAttempt. SmokeSuite ordered list of (probe, RetryPolicy) stages; sequential by design. SmokeTest companion trait paired with a Score by associated type — `SM: SmokeTest<T, Score = S>` is the compile-time lock (ADR-024 §2 / JG's compile-time-feedback principle). deploy / free functions binding Score::interpret to an optional deploy_with_ SmokeTest. The smoke variant returns only after the smoke suite passes; a failed probe is DeployError::SmokeFailed with the full report attached. Cardinality choices follow JG's *For the Love of Compilers* talk: ProbeName is a validated newtype (rejected: empty / control-chars / >128 bytes), ProbeAttempt is a three-arm sum because "not ready yet" and "definitively no" drive different orchestration paths, ProbeFailure splits Timeout from Rejected because the operator actions are different. RetryPolicy::polling panics on a zero interval at construction rather than spinning the executor in production. One concrete probe ships in this PR — TcpReachable — to validate the trait shape against real network I/O. Run_probe orchestrates retry + timeout outside the probe itself so each probe stays single-attempt and unit-testable. Stage progress is emitted via tracing::info_span! / info! today. Adding HarmonyEvent::SmokeStage* variants is deferred to Phase 1 (Score / Interpret stay untouched in Phase 0). Phase 1 follow-ups: HttpHealthy, K8sPodReady, NatsKvKeyExists probes; a real FleetOperatorSmokeTest composed of those; optional dashboard event variants. None of those touch the contract — that's the point of locking the shape here. Tests: 21 new (4 in tcp::tests, 8 in probe::tests, 3 in suite::tests, 4 in deploy::tests). Full crate: 39 / 39 passing. No regressions. build/check.sh equivalents: cargo check --all-targets --all-features clean, cargo fmt --check clean, cargo clippy with zero findings on the new module.Authoritative plan for the last mile before fleet ships to a real customer. Picks up where v0_2_plan.md left the chapter structure. Twelve chapters, organized in execution order: 1. Dashboard role enforcement (security gap, do right now) 2. Operator restart + aggregator recovery (more critical than smoke) 3. Application log forwarding companion (dashboard utility) 4. Agent self-upgrade, NATS-coordinated, systemd-resident 5. Graceful deployment upgrade (roll-forward only — customer ask) 6. Init containers in PodmanV0Score 7. System upgrade, rollback deferred to v0.4 8. Secrets via Zitadel + OpenBao (blocked on harmony_secret work) 9. Agent time-drift verification 10. Phase 1 smoke wiring 11. CI yaml minimization (longer-term) 12. NATS callout CI hardening (minimal) Customer constraints baked in: deployments are roll-forward only (no auto-rollback on Deployment failure); system rollback half of the upgrade ADR is deferred to v0.4 (snapshot is created but not used for revert in v0.3); secrets must go through Zitadel + OpenBao (no plaintext shortcut). Includes: - feature checklist as a status table (14 items), - sequencing table with ordering rationale, - per-chapter goal / current state with file:line citations / plan / open questions / "done when", - out-of-scope table with target version + reason, - cross-cutting open questions Q1–Q5. Format follows the user's "tables over prose" preference: every multi-item section is either a table or bold-led bullets with nested supporting detail. Scannable at three depths (30-second scroll for bold leads, 2-minute read for nested detail, deep read with code where it matters).This all looks correct, but I have a feeling we could find a simpler design that does not involve the whole smokesuite -> smoke -> probe -> probeoutcome -> probestatus boilerplate.
It's not too bad, I think the design is mostly sound, the genericity on T is sound and makes things safe to run.
Aside from the associated type on Score I don't have any problem with this design. I'm just not sure it is the correct approach.
Let's take a step back and explore a few out of the box ideas.
@@ -0,0 +52,4 @@pub trait SmokeTest<T: Topology>: Send + Sync {/// The Score this smoke test verifies. The type lock means/// `SM::Score = S` is enforced at every call site.type Score: Score<T>;Idea : associate an interpret type instead? The idea is that we have many scores that point to the same interpret. Of course locking to a score makes the code smaller and easier to understand, but will inevitably lead to boilerplate and a lot of repetition when similar scores exist. For example a smoke test on a HelmChartScore that valudates the helm chart is ready would not work with a NatsHelmChartScore as it is not the same type at the top level but would work with both if we use the Interpret type which is the same for both.
View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.