refactor(host_network): extract NetworkManager as a reusable component #183

Merged
letian merged 11 commits from refactor-network-manager into master 2025-11-06 00:02:53 +00:00
Owner

The NetworkManager logic was implemented directly into the HaClusterTopology, which wasn't directly its concern and prevented us from being able to reuse that NetworkManaager implementations in the future for a different Topology.

  • Extract a NetworkManager trait
  • Implement a OpenShiftNmStateNetworkManager for NetworkManager
  • Dynamically instantiate the NetworkManager in the Topology to delegate calls to it
The NetworkManager logic was implemented directly into the `HaClusterTopology`, which wasn't directly its concern and prevented us from being able to reuse that NetworkManaager implementations in the future for a different Topology. * Extract a `NetworkManager` trait * Implement a `OpenShiftNmStateNetworkManager` for `NetworkManager` * Dynamically instantiate the NetworkManager in the Topology to delegate calls to it
letian added 2 commits 2025-11-04 23:01:04 +00:00
adjust logs
Some checks failed
Run Check Script / check (pull_request) Failing after 1m4s
325d7891be
letian added 1 commit 2025-11-05 17:14:25 +00:00
Merge branch 'network-manager' into refactor-network-manager
All checks were successful
Run Check Script / check (pull_request) Successful in 1m15s
148504439e
johnride approved these changes 2025-11-05 21:30:23 +00:00
johnride left a comment
Owner

Overall LGTM, a few considerations here and there but nothing major.

Overall LGTM, a few considerations here and there but nothing major.
@ -378,6 +118,7 @@ impl HAClusterTopology {
bootstrap_host: dummy_host,
control_plane: vec![],
workers: vec![],
network_manager: OnceLock::new(),
Owner

Why OnceLock instead of Oncecell or something else? I'm not familiar with it. I guess it takes a read lock when accessed after initialization?

Why OnceLock instead of Oncecell or something else? I'm not familiar with it. I guess it takes a read lock when accessed after initialization?
Author
Owner

In pure Rust, the main difference is that OnceLock is the multi-threaded version of OnceCell. Considering our initialization process is synchronous in that case, I preferred to use this option rather than tokio::OnceCell (which is particularly useful for async initialization process).

But I can switch to OnceCell from tokio instead if you prefer. No strong opinions here for me.

In pure Rust, the main difference is that `OnceLock` is the multi-threaded version of `OnceCell`. Considering our initialization process is _synchronous_ in that case, I preferred to use this option rather than `tokio::OnceCell` (which is particularly useful for _async_ initialization process). But I can switch to `OnceCell` from `tokio` instead if you prefer. No strong opinions here for me.
@ -548,3 +289,1 @@
async fn configure_host_network(&self, config: &HostNetworkConfig) -> Result<(), SwitchError> {
self.configure_bond(config).await?;
self.configure_port_channel(config).await
async fn configure_port_channel(&self, config: &HostNetworkConfig) -> Result<(), SwitchError> {
Owner

Why is this function and logic in ha_cluster instead of network manager?

Why is this function and logic in ha_cluster instead of network manager?
Author
Owner

it's actually the impl Switch for HAClusterTopology

it's actually the `impl Switch for HAClusterTopology`
letian marked this conversation as resolved
@ -186,0 +190,4 @@
}
#[derive(Debug, Clone, new)]
pub struct NetworkError {
Owner

I really don't like the boilerplate added by having multiple error types for each bit of business logic.

I know passing strings around has its limitations but there's a middle ground for sure.

We should be evaluating these options sooner rather than later :

  • String
  • Specific error types like this
  • thiserror crate
  • anyhow
  • std::Error

See this comment, which I largely agree with https://www.reddit.com/r/rust/comments/1bb7dco/comment/ku7io8o/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

I really don't like the boilerplate added by having multiple error types for each bit of business logic. I know passing strings around has its limitations but there's a middle ground for sure. We should be evaluating these options sooner rather than later : - String - Specific error types like this - thiserror crate - anyhow - std::Error See this comment, which I largely agree with https://www.reddit.com/r/rust/comments/1bb7dco/comment/ku7io8o/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
Author
Owner

Agree with using stuff like thiserror in general, especially for the core.

The tricky is thing is that here, it is not part of the core of Harmony strictly speaking. The Switch and the NetworkManager are extra capabilities that we offer to enrich topologies. Also, Switch and NetworkManager are 2 different capabilities. So they will surely evolve separately and change for different reasons. Which means it's tricky to have a single Error that represent both. It's not designed as such right now but a future refactoring to make it an Enum with different options would make sense to allow user to inspect the error and react accordingly.

Agree with using stuff like `thiserror` in general, especially for the core. The tricky is thing is that here, it is not part of the core of Harmony strictly speaking. The `Switch` and the `NetworkManager` are extra capabilities that we offer to enrich topologies. Also, `Switch` and `NetworkManager` are 2 different capabilities. So they will surely evolve separately and change for different reasons. Which means it's tricky to have a single `Error` that represent both. It's not designed as such right now but a future refactoring to make it an Enum with different options would make sense to allow user to inspect the error and react accordingly.
@ -0,0 +29,4 @@
#[async_trait]
impl NetworkManager for OpenShiftNmStateNetworkManager {
async fn ensure_network_manager_installed(&self) -> Result<(), NetworkError> {
Owner

I guess this is required because the operator from the catalog does not work?

I guess this is required because the operator from the catalog does not work?
letian marked this conversation as resolved
@ -0,0 +36,4 @@
.await?;
debug!("Creating NMState namespace...");
self.k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/namespace.yaml
Owner

url::parse can fail at runtime, would hurl! work here?

url::parse can fail at runtime, would hurl! work here?
Author
Owner

I wanted to use hurl! at first but it creates an enum that match either an URL or a local folder. The url would have been nice, but I'm not sure what to do if it's a local folder. So it made the code more complicated and less elegant in the end.

We can definitely revisit the apply_url function if we really want to support both URLs and folders.

I wanted to use `hurl!` at first but it creates an enum that match either an URL or a _local folder_. The url would have been nice, but I'm not sure what to do if it's a _local folder_. So it made the code more complicated and less elegant in the end. We can definitely revisit the `apply_url` function if we really want to support both URLs and _folders_.
Owner

Very nice demo gif !

Very nice demo gif !
letian changed target branch from network-manager to master 2025-11-05 23:38:27 +00:00
letian added 1 commit 2025-11-05 23:48:14 +00:00
Merge branch 'master' into refactor-network-manager
Some checks failed
Run Check Script / check (pull_request) Failing after 19s
79db278060
letian added 1 commit 2025-11-05 23:50:26 +00:00
adjust merge
Some checks failed
Run Check Script / check (pull_request) Failing after 21s
3f6c6da802
letian added 1 commit 2025-11-06 00:00:13 +00:00
fix format
All checks were successful
Run Check Script / check (pull_request) Successful in 1m11s
08125d6a49
letian merged commit 06a004a65d into master 2025-11-06 00:02:53 +00:00
letian deleted branch refactor-network-manager 2025-11-06 00:02:54 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: NationTech/harmony#183
No description provided.