Files
harmony/plan.md
Jean-Gabriel Gill-Couture 9d2308eca6
All checks were successful
Run Check Script / check (pull_request) Successful in 1m48s
Merge remote-tracking branch 'origin/master' into feature/kvm-module
2026-03-22 10:02:10 -04:00

94 lines
6.9 KiB
Markdown

Final Plan: S3-Backed Asset Management for Harmony
Context Summary
Harmony is an infrastructure-as-code framework where Scores (desired state) are interpreted against Topologies (infrastructure capabilities). The existing Url enum (harmony_types/src/net.rs:96) already has LocalFolder(String) and Url(url::Url) variants, but the Url variant is unimplemented (todo!()) in both OPNsense TFTP and HTTP infra layers. Configuration in Harmony follows a "schema in Git, state in the store" pattern via harmony_config -- compile-time structs with values resolved from environment, secret store, or interactive prompt.
Findings
1. openshift-install is the only OKD binary actually invoked from Rust code (bootstrap_02_bootstrap.rs:139,162). oc and kubectl in data/okd/bin/ are never used by any code path.
2. The Url::Url variant is the designed extension point. The architecture explicitly anticipated remote URL sources but left them as todo!().
3. The k3d crate has a working lazy-download pattern (DownloadableAsset with SHA256 checksum verification, local caching, and HTTP download). This should be generalized.
4. The manual SCP workaround (ipxe.rs:126, bootstrap_02_bootstrap.rs:230) exists because russh is too slow for large file transfers. The S3 approach eliminates this entirely -- the OPNsense box pulls from S3 over HTTP instead.
5. All data/ paths are hardcoded as ./data/... in bootstrap_02_bootstrap.rs:84-88 and ipxe.rs:73.
---
Phase 1: Create a shared DownloadableAsset crate
Goal: Generalize the k3d download pattern into a reusable crate.
- Extract k3d/src/downloadable_asset.rs into a new shared crate (e.g., harmony_asset or add to harmony_types)
- The struct stays simple: { url, file_name, checksum, local_cache_path }
- Behavior: check local cache first (by checksum), download if missing, verify checksum after download
- The k3d crate becomes a consumer of this shared code
This is a straightforward refactor of ~160 lines of existing, tested code.
Phase 2: Define asset metadata as compile-time configuration
Goal: Replace hardcoded ./data/... paths with typed configuration, following the harmony_config pattern.
Create config structs for each asset group:
```rust
#[derive(Config, Serialize, Deserialize, JsonSchema, InteractiveParse)]
struct OkdInstallerConfig {
pub openshift_install_url: String,
pub openshift_install_sha256: String,
pub scos_kernel_url: String,
pub scos_kernel_sha256: String,
pub scos_initramfs_url: String,
pub scos_initramfs_sha256: String,
pub scos_rootfs_url: String,
pub scos_rootfs_sha256: String,
}
#[derive(Config, Serialize, Deserialize, JsonSchema, InteractiveParse)]
struct PxeAssetsConfig {
pub centos_install_img_url: String,
pub centos_install_img_sha256: String,
// ... etc
}
```
These structs live in the OKD module. On first run, harmony_config::get_or_prompt will prompt for the S3 URLs and checksums; after that, the values are persisted in the config store (OpenBao or local file). This means:
- No manifest file to maintain separately
- URLs/checksums can be updated per-team/per-environment without code changes
- Defaults can be compiled in for convenience
Phase 3: Implement Url::Url in OPNsense infra layer
Goal: Make the OPNsense TFTP/HTTP server pull files from remote URLs.
In harmony/src/infra/opnsense/http.rs and tftp.rs, implement the Url::Url(url) match arm:
- SSH into the OPNsense box
- Run fetch -o /usr/local/http/{path} {url} (FreeBSD/OPNsense native) or curl -o ...
- This completely replaces the manual SCP workaround for internet-connected environments
For serve_files with a folder of remote assets: the Score would pass individual Url::Url entries rather than a single Url::LocalFolder. This may require the trait to accept a list of URLs or an iterator pattern.
Phase 4: Refactor OKD modules
Goal: Wire up the new patterns in the OKD bootstrap flow.
In bootstrap_02_bootstrap.rs:
- openshift-install: Use the lazy-download pattern (like k3d). On execute(), resolve OkdInstallerConfig from harmony_config, download openshift-install to a local cache, invoke it.
- SCOS images: Pass Url::Url(scos_kernel_url) etc. to the StaticFilesHttpScore, which triggers the OPNsense box to fetch them from S3 directly. No more SCP.
- Remove oc and kubectl from data/okd/bin/ (they are unused).
In ipxe.rs:
- TFTP boot files (ipxe.efi, undionly.kpxe): These are small (~1MB). Either keep them in git (they're not the size problem) or move to S3 and lazy-download.
- HTTP files folder: Replace the folder_to_serve: None / SCP workaround with individual Url::Url entries for each asset.
- Remove the inquire::Confirm SCP prompts.
Phase 5: Upload assets to Ceph S3
Goal: Populate the S3 bucket and configure defaults.
- Upload all current data/ binaries to your Ceph S3 with a clear path scheme: harmony-assets/okd/v{version}/openshift-install, harmony-assets/pxe/centos-stream-9/install.img, etc.
- Set public-read ACL (or document presigned URL generation)
- Record the S3 URLs and SHA256 checksums
- These become the default values for the config structs (can be hardcoded as defaults or documented)
Phase 6: Remove LFS, clean up git history, publish to GitHub
Goal: Make the repo publishable.
- Remove all LFS-tracked files from the repo
- Update .gitattributes to remove LFS filters
- Keep data/ in .gitignore (it becomes a local cache directory)
- Optionally use git filter-repo or BFG to clean LFS objects from history
- The repo is now small enough for GitHub (code + templates + small configs only)
- Document the setup: "after clone, run the program and it will prompt for asset URLs or download from defaults"
---
Risks and Mitigations
Risk Mitigation
OPNsense can't reach S3 (network issues) Url::LocalFolder remains as fallback; populate local data/ manually for air-gapped
S3 bucket permissions misconfigured Test with curl from OPNsense before wiring into code
Large download times during bootstrap Progress reporting in the fetch/curl command; files are cached after first download
Breaking change to existing workflows Phase the rollout; keep LocalFolder working throughout
What About Upstream URL Resilience?
You mentioned upstream repos sometimes get cleaned up. The S3 bucket is your durable mirror. The config structs could optionally include upstream_url as a fallback source, but the primary retrieval should always be from your S3. Periodically re-uploading new versions to S3 (when upstream releases new images) is a manual but infrequent operation.
---
Order of Execution
I'd suggest this order:
1. Phase 5 first (upload to S3) -- this is independent of code and gives you the URLs to work with
2. Phase 1 (shared DownloadableAsset crate) -- small, testable refactor
3. Phase 2 (config structs) -- define the schema
4. Phase 3 (Url::Url implementation) -- the core infra change
5. Phase 4 (OKD module refactor) -- wire it all together
6. Phase 6 (LFS removal + GitHub) -- final cleanup
Does this plan align with your vision? Any aspect you'd like me to adjust or elaborate on before implementation?