feat: capture network intent at host discovery #267

Merged
stremblay merged 6 commits from feat/discover-networking into master 2026-04-21 16:20:49 +00:00
Owner

Extend the interactive discovery flow so operators record not just the install disk for each host but how it should be networked (bond + mode + interface blacklist), and persist that alongside the role mapping. Downstream scores can consume the intent later; storage now holds at
most one mapping row per host.

What's new

  • New prompts after disk selection — when a host has ≥2 NICs: "Configure a bond?", bond member multi-select, bond mode picker (LACP, active-backup, balance-rr/xor, broadcast, balance-tlb/alb), blacklist confirm + multi-select. Skipped automatically for single-NIC hosts.
  • NetworkConfig on HostConfig — { bond: Option<BondConfig { interfaces, mode }>, blacklisted_interfaces }, stored as JSON in a new host_role_mapping.network_config column (migration 20260421000000).
  • Role mapping is now replace-on-write — save_role_mapping runs DELETE + INSERT in a single transaction, and discovery checks for an existing mapping first and prompts Update / Cancel, showing what's already there. Self-heals any pre-existing duplicate rows.
  • harmony_host_discovery example — single-host discovery on 192.168.40.0/24:25000, mirrors harmony_inventory_builder.

Discovery UX polish

  • PhysicalHost::summary() tightened: storage reads 485 GB [8 GB, 477 GB]; network lists every NIC as [ip, mac] with an N NICs: prefix (or NIC: for one); CPU survives a blank model without stray spaces.
  • New summary_short() (no NIC section) used in the Host: … header above each prompt.
  • Every prompt gets a uniform Host: header; NICs sorted by name (f0 before f1) at conversion time so display is stable and byte-equal between discoveries.
  • Chipset (vendor + name) from the inventory agent is promoted to a system-product-name label, so the first field of the summary shows "LENOVO 3136" instead of "Server".

DB hygiene

  • physical_hosts.save() compares the incoming serialized bytes to the latest stored row and skips inserting when unchanged — no more unbounded row growth under continuous mDNS / repeated CIDR scans. Genuine changes still append a version row.
  • SQLite pool opens with journal mode DELETE, so no more .sqlite-wal / .sqlite-shm sidecar files next to the DB. Existing WAL-mode DBs are checkpointed and converted on next open.

Follow-up (not in this PR)

OKDSetupPersistNetworkBondScore / HostNetworkConfigurationScore still bond all detected interfaces implicitly — wiring them to read the new HostConfig.network_config is deliberately left for the next PR, once the on-disk shape has landed.

Extend the interactive discovery flow so operators record not just the install disk for each host but how it should be networked (bond + mode + interface blacklist), and persist that alongside the role mapping. Downstream scores can consume the intent later; storage now holds at most one mapping row per host. What's new - New prompts after disk selection — when a host has ≥2 NICs: "Configure a bond?", bond member multi-select, bond mode picker (LACP, active-backup, balance-rr/xor, broadcast, balance-tlb/alb), blacklist confirm + multi-select. Skipped automatically for single-NIC hosts. - NetworkConfig on HostConfig — { bond: Option<BondConfig { interfaces, mode }>, blacklisted_interfaces }, stored as JSON in a new host_role_mapping.network_config column (migration 20260421000000). - Role mapping is now replace-on-write — save_role_mapping runs DELETE + INSERT in a single transaction, and discovery checks for an existing mapping first and prompts Update / Cancel, showing what's already there. Self-heals any pre-existing duplicate rows. - harmony_host_discovery example — single-host discovery on 192.168.40.0/24:25000, mirrors harmony_inventory_builder. Discovery UX polish - PhysicalHost::summary() tightened: storage reads 485 GB [8 GB, 477 GB]; network lists every NIC as [ip, mac] with an N NICs: prefix (or NIC: for one); CPU survives a blank model without stray spaces. - New summary_short() (no NIC section) used in the Host: … header above each prompt. - Every prompt gets a uniform Host: <summary> header; NICs sorted by name (f0 before f1) at conversion time so display is stable and byte-equal between discoveries. - Chipset (vendor + name) from the inventory agent is promoted to a system-product-name label, so the first field of the summary shows "LENOVO 3136" instead of "Server". DB hygiene - physical_hosts.save() compares the incoming serialized bytes to the latest stored row and skips inserting when unchanged — no more unbounded row growth under continuous mDNS / repeated CIDR scans. Genuine changes still append a version row. - SQLite pool opens with journal mode DELETE, so no more .sqlite-wal / .sqlite-shm sidecar files next to the DB. Existing WAL-mode DBs are checkpointed and converted on next open. Follow-up (not in this PR) OKDSetupPersistNetworkBondScore / HostNetworkConfigurationScore still bond all detected interfaces implicitly — wiring them to read the new HostConfig.network_config is deliberately left for the next PR, once the on-disk shape has landed.
stremblay added 5 commits 2026-04-21 15:33:44 +00:00
Extend DiscoverHostForRoleScore with three new interactive prompts after
  the installation-disk selection:

  - "Configure a network bond?" (only when host has >= 2 NICs), followed by
    a multi-select of bond members (min 2) and a bond-mode picker
    (LACP / active-backup / balance-rr / balance-xor / broadcast /
    balance-tlb / balance-alb).
  - "Blacklist any remaining interface?", with candidates limited to NICs
    not already claimed by the bond.

  The answers are persisted as a JSON-encoded NetworkConfig on a new
  host_role_mapping.network_config column. HostConfig now exposes
  network_config alongside installation_device so downstream scores can
  honor the user's intent.

  Also adds a new harmony_host_discovery example that discovers a single
  host on 192.168.40.0/24:25000.
- PhysicalHost::summary() becomes terser and more informative:
    - Storage: "400 GB [8 GB, 477 GB]" (was "400 GB Storage (2 Disks [8 GB, 477 GB])").
      Single-disk collapses to just the total.
    - Network: list every NIC as "[ip, mac]" with a count prefix
      (e.g. "3 NICs: [192.168.40.10, 98:fa:9b:03:17:6f], [00:e0:ed:7a:ec:4d], ...").
      Single-NIC form drops the count and "s": "NIC: [ip, mac]".
      NICs without an IPv4 render as "[mac]".

  - Promote the inventory agent's Chipset { vendor, name } into a
    "system-product-name" label during host conversion (both MDNS and CIDR
    flows), so summary()'s first field shows "LENOVO 3136" instead of
    falling back to the HostCategory string ("Server"). Extracted into
    build_discovered_host_labels() to keep the two conversion sites in
    sync. When the chipset is blank, the old category fallback still
    applies.

  - Print a blank line before every interactive inquire prompt in the
    discovery flow (role pick, disk pick, bond confirm/multi-select/mode,
    blacklist confirm/multi-select) so prompts stand out from the
    preceding log output on the terminal.
- SqliteInventoryRepository::save() now compares the incoming
    serde_json bytes against the latest stored `data` blob for this
    host_id. If byte-identical, the insert is skipped with an info log
    "Host '<id>' unchanged, skipping save". Genuine changes still
    produce a new version row, preserving the audit trail. Eliminates
    the unbounded row growth from repeated discovery (mDNS is
    continuous, CIDR scans often re-run). Addresses the long-standing
    FIXME in modules/inventory; the comment is now removed.

  - Reworded the caller-side log that fires after repo.save() from
    "Saved [new] host id X, summary: ..." to "Discovered host X,
    summary: ...". The old text claimed "Saved" even when the repo had
    actually skipped the insert, producing contradictory log lines on
    re-runs.

  - Harmonized every host-specific inquire prompt in the discovery
    flow behind a new print_host_header() helper: each prompt is now
    preceded by a blank line and a "Host: <summary>" banner, and the
    redundant host name inside the question text is stripped (disk
    prompt, bond confirm). The node-selection prompt is unchanged --
    it picks *which* host, so there is no current host yet.
- host_role_mapping now holds at most one row per host_id.
    SqliteInventoryRepository::save_role_mapping wraps a DELETE of any
    prior rows for the host and the INSERT of the new one in a single
    transaction, self-healing pre-existing duplicate rows along the way.

  - Before re-prompting for disk and networking, the discovery flow
    looks up the current role mapping via the new
    InventoryRepository::get_role_mapping(host_id) method. If one
    exists, the operator sees a summary (role, install disk, bond
    mode + interfaces, blacklist) and picks between "Update" and
    "Cancel"; cancelling skips the host entirely and continues the
    selection loop without touching the DB. New HostRoleMapping
    domain type carries the returned row back to the caller.

  - Network interfaces are sorted by name at the hwinfo-to-domain
    conversion step (both MDNS and CIDR flows), so f0 always appears
    before f1 in every downstream consumer — host summary, bond
    multi-select, blacklist multi-select. This also makes the
    byte-equality dedup in save() robust against the agent returning
    NICs in different sysfs-walk order across reboots.

  - PhysicalHost::summary() split into summary_parts_through_storage()
    + append_network_summary(), with a new public summary_short()
    variant that omits the NIC list. print_host_header() in the
    discovery prompts now uses summary_short() so the "Host: ..."
    banner fits on one line; full summaries still render in the node
    picker, logs, and Display impl.

  - Fix CPU summary rendering when the agent reports an empty model:
    single-CPU renders as "6c/6t", multi-CPU as "2x CPU (12c/24t)",
    no stray double-space in the pipe-separated summary.

  - Regenerate .sqlx offline cache for the new DELETE and SELECT
    queries.
chore(discovery): drop sqlite WAL sidecars, add blank line after prompts
All checks were successful
Run Check Script / check (pull_request) Successful in 2m4s
adb05a0b91
- Switch SqliteInventoryRepository to DELETE journal mode with
    create_if_missing, so `.sqlite-wal` / `.sqlite-shm` files no longer
    appear next to the DB. Existing WAL-mode DBs are checkpointed and
    converted on next open.

  - Print a blank line after prompt_network_config returns so the save
    logs don't stomp on the last answered question.
stremblay added 1 commit 2026-04-21 16:06:02 +00:00
refactor(discovery): use shared LaggProtocol for bond mode
All checks were successful
Run Check Script / check (pull_request) Successful in 2m6s
84a083a012
Replace the Linux-specific BondMode enum with harmony_types'
  LaggProtocol, which is already used by the OPNsense LAGG score.
  "Capabilities are industry concepts, not tools" — the kernel mode
  numbers (BalanceRr/ActiveBackup/…) were the wrong abstraction;
  LaggProtocol's Lacp / Failover / LoadBalance / RoundRobin span
  Linux bonding and BSD lagg uniformly. LaggProtocol now derives
  Deserialize so NetworkConfig can round-trip through SQLite.

  Make SqliteInventoryRepository::get_role_mapping tolerate a
  network_config blob it cannot deserialize: log a warning and
  fall back to NetworkConfig::default() so the operator still sees
  the existing mapping prompt and can pick "Update" to overwrite
  the bad row. This self-heals DBs that were written with the old
  BondMode variant names and gives the repo real resilience for
  future NetworkConfig evolutions.
stremblay merged commit 503f9eb357 into master 2026-04-21 16:20:49 +00:00
stremblay deleted branch feat/discover-networking 2026-04-21 16:20:50 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: NationTech/harmony#267
No description provided.