WIP: configure-switch #159
No reviewers
Labels
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: NationTech/harmony#159
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "configure-switch"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
LGTM
@ -175,0 +193,4 @@
pub struct SwitchPort {
pub mac_address: MacAddress,
pub port_name: String,
// FIXME: Should we add speed as well? And other params
speed yes, maybe we will need something else but nothing comes to mind right now.
@ -77,6 +77,8 @@ harmony_secret = { path = "../harmony_secret" }
askama.workspace = true
sqlx.workspace = true
inquire.workspace = true
brocade = { version = "0.1.0", path = "../brocade" }
Je ne met pas la version dans les dependances locales habituellement
@ -90,2 +103,4 @@
}
fn find_master_switch(&self) -> Option<LogicalHost> {
self.switch.first().cloned() // FIXME: Should we be smarter to find the master switch?
I think it is fine for now, but that is vendor specific logic. So it should probably be entirely hidden inside the brocade crate's logic.
We might have to interact with something higher level than a "switch" for a more generic API but we're not there yet.
It was easy to do so, so I pushed this logic inside the Brocade crate.
In a way, that's what we already have with the
SwitchClient
. Maybe something is missing indeed, but for now it does the job well (the core logic has no idea it's brocade behind the scene).I did not review very deeply but still a couple comments.
I focused on the tests though and it looks good so far but I would like to see also tests for :
In general I feel like this may deserve some more logging at info level. At the very least installing a cluster operator must be told to the user at the info level. I understand it is an implementation detail in the feature, but this has impact on the entire cluster administration as it deploys more resources which may cause contention on already tight clusters.
Good work so far! As you said we see the light at then end of the tunnel already.
@ -14,3 +14,2 @@
pub category: HostCategory,
pub network: Vec<NetworkInterface>,
pub storage: Vec<StorageDrive>,
pub network: Vec<NetworkInterface>, // FIXME: Don't use harmony_inventory_agent::NetworkInterface
I understand we don't want to use the "inventory agent" data structures, but I think it's OK to have one base data structure that we try to reuse across crates for basic components like NetworkInterface and these few others.
IMO, the FIXME should be about refactoring the type into a shared types crate or something like that.
Why would that not be correct?
that's one of the options, yes 😉 the
fixme
is just stating the problem: we should not use directly this type fromharmony_inventory_agent
possible solutions are:
harmony_types
and bothharmony_inventory_agent
andharmony
uses itharmony_inventory_agent::NetworkInterface
as is and have a different structure inharmony
and map from one to the otherAs of now I'm personally 50/50: I have a feeling that the needs from
harmony_inventory_agent
might evolve differently fromharmony
thus theNetworkInterface
might evolve differently. So going with option 1. wouldn't be adequate. But on the other hand for now they are indeed very similar so why maintain 2 structs? 🤷Anyway for now it's not really a big issue and won't be hard to fix. So it's more like a note for the future to be aware of.
@ -92,0 +156,4 @@
channel: Some("stable".to_string()),
install_plan_approval: Some(InstallPlanApproval::Automatic),
name: "kubernetes-nmstate-operator".to_string(),
source: "redhat-operators".to_string(),
redhat operators are not always available, it depends on how the OKD cluster was brought up. This method may not be as portable as we need.
I think we can move forward for now but this deserves a comment here in the code to help debug potential issues.
@ -92,0 +183,4 @@
}
fn get_next_bond_id(&self) -> u8 {
42 // FIXME: Find a better way to declare the bond id
There may be a better way but no better answer than 42.
@ -216,0 +215,4 @@
) -> Result<(), InterpretError> {
info!("[ControlPlane] Ensuring persistent bonding");
let score = HostNetworkConfigurationScore {
hosts: hosts.clone(), // FIXME: Avoid clone if possible
The clone has been bugging me too. We could use an Rc or Arc too as a balance between usability and performance.
@ -0,0 +236,4 @@
#[tokio::test]
async fn port_not_found_for_mac_address_should_not_configure_interface() {
// FIXME: Should it still configure an empty bond/port channel?
If I understand correctly this has the correct behavior. It is perfectly normal for a host to have mac addresses that are not connected to the switch. Most of them do have disconnected 1gb interfaces.
It is also normal to find connected interfaces in the switch that are not part of the cluster. If an admin connects to the switch for debugging, or the switch is used for other purposes than only this cluster.
in that case what should we do if we realize that not a single mac addresses were found on the switch? should we simply ignore it (current solution) or should we clear existing configurations to a clean slate?
and actually the question will be similar when we'll approach the "update config" scenario: what to do with existing bonds? and port-channels? should we just clear everything before and just apply the new config? or a smart way to update existing bonds/port-channels?
Ok, I think I figured it out and it's not as complicated as it seems.
The problem is not only specific replacement logic, it is also idempotency. Does the current code currently support detecting that the mac address is connected to a port that is already in a port channel with the correct configuration (correct mac + ports assigned to it) ?
I think the correct behavior here is :
This behavior should be quite easily testable too 😁
ad61be277b
to77e09436a9
Pull request closed