Files
harmony/examples/fleet_auth_callout/tests/security_model.rs
Reda Tarzalt 96e7d43b2f
Some checks failed
Run Check Script / check (pull_request) Failing after 38s
add auth to frontend through lib (#284)
Adds OIDC login support to the harmony-fleet-operator web dashboard using Zitadel SSO.

pkce was the recommended option for this since we don't need to hold on to any secret. We compute a value on server before sending the data to Zitadel who validates authenticity by recomputing the hash and comparing the two values.

pkce Auth flow

 1. User visits a protected dashboard route, like /devices.
 2. If no valid harmony_fleet_session cookie exists, the app redirects to /login.
 3. /login creates:
     - random state
     - random pkce_code_verifier
     - derived code_challenge = base64url(sha256(pkce_code_verifier))
 4. The app stores state and pkce_code_verifier in a temporary HTTP-only login-attempt cookie.
 5. The browser is redirected to Zitadel’s authorize endpoint with:
     - client_id
     - redirect_uri
     - scope
     - state
     - code_challenge
     - code_challenge_method=S256
 6. After SSO login, Zitadel redirects back to /auth/callback?code=...&state=....
 7. The callback handler:
     - parses the raw query into a strict success/failure enum
     - reads the temporary login-attempt cookie
     - validates returned state
     - exchanges code + pkce_code_verifier for tokens
     - validates the returned ID token using OIDC discovery/JWKS
     - creates a local harmony_fleet_session cookie
     - redirects to /
 8. Protected routes validate the local dashboard session cookie on each request.
 9. /logout clears the dashboard session cookie and redirects to /login.

---

 Auth middleware responses depending on request type:

 - normal browser request: redirect to /login
 - SSE request: 401 authentication required
 - HTMX request: 401 with HX-Redirect: /login (HTMX redirect is more idiomatic than through Axum for this)

Reviewed-on: #284
Reviewed-by: johnride <jg@nationtech.io>
Co-authored-by: Reda Tarzalt <tarzaltreda@gmail.com>
Co-committed-by: Reda Tarzalt <tarzaltreda@gmail.com>
2026-05-19 20:37:08 +00:00

135 lines
4.7 KiB
Rust

//! Real cargo tests proving the IoT fleet security model.
//!
//! All tests share a single bringup of the stack via [`OnceCell`]. The
//! cluster keeps running across the suite, with each test using the
//! cached machine keys to mint Zitadel JWTs and exercise NATS through
//! the auth callout. Three invariants:
//!
//! 1. `admin_can_read_any_device_subject` — fleet-admin sees other devices' state.
//! 2. `device_can_only_access_own_subjects` — sensor-a is denied access to sensor-b's commands.
//! 3. `unknown_role_is_rejected` — a Zitadel-authenticated user with no
//! fleet role cannot connect to NATS.
//!
//! ## Why these tests are real-stack
//!
//! Mocking the OIDC issuer or NATS would only re-prove the unit tests
//! already cover. The point of this suite is to confirm — in CI, in
//! cargo — that the **deployed** stack on k3d enforces the security
//! model end-to-end. Hidden cluster-level misconfiguration (an unset
//! `auth_callout` block, a wrong issuer pubkey, a CoreDNS rewrite drift,
//! a permissions YAML typo) only shows up here.
use std::sync::Arc;
use std::time::Duration;
use anyhow::{Context, Result};
use async_nats::ConnectOptions;
use example_fleet_auth_callout::{
StackHandles, bring_up_stack, mint_access_token, scopes_for_project,
};
use futures_util::StreamExt;
use tokio::sync::OnceCell;
static STACK: OnceCell<Arc<StackHandles>> = OnceCell::const_new();
async fn shared_stack() -> Result<Arc<StackHandles>> {
let cell = STACK
.get_or_try_init(|| async {
let handles = bring_up_stack().await?;
anyhow::Ok(Arc::new(handles))
})
.await?;
Ok(cell.clone())
}
async fn connect_with_role(stack: &StackHandles, key_json: &str) -> Result<async_nats::Client> {
let token = mint_access_token(
&stack.zitadel_url,
key_json,
&scopes_for_project(&stack.project_id),
)
.await
.context("mint Zitadel access token")?;
ConnectOptions::with_token(token)
.connection_timeout(Duration::from_secs(5))
.connect(&stack.nats_url_external)
.await
.map_err(|e| anyhow::anyhow!("NATS connect: {e}"))
}
#[tokio::test]
#[ignore = "requires k3d + docker environment"]
async fn admin_can_read_any_device_subject() -> Result<()> {
let _ = tracing_subscriber::fmt().with_env_filter("info").try_init();
let stack = shared_stack().await?;
let admin = connect_with_role(&stack, &stack.admin_machine_key).await?;
let device = connect_with_role(&stack, &stack.device_a_machine_key).await?;
let mut admin_sub = admin.subscribe("device-state.>").await?;
admin.flush().await?;
device
.publish("device-state.sensor-a", "telemetry-payload".into())
.await?;
device.flush().await?;
let msg = tokio::time::timeout(Duration::from_secs(5), admin_sub.next())
.await
.context("admin sub timeout")?
.context("admin sub closed")?;
assert_eq!(msg.payload.as_ref(), b"telemetry-payload");
Ok(())
}
#[tokio::test]
#[ignore = "requires k3d + docker environment"]
async fn device_can_only_access_own_subjects() -> Result<()> {
let _ = tracing_subscriber::fmt().with_env_filter("info").try_init();
let stack = shared_stack().await?;
let device_a = connect_with_role(&stack, &stack.device_a_machine_key).await?;
let device_b = connect_with_role(&stack, &stack.device_b_machine_key).await?;
let _b_sub = device_b.subscribe("device-commands.sensor-b").await?;
let mut a_wrong = device_a.subscribe("device-commands.sensor-b").await?;
device_a.flush().await?;
device_b.flush().await?;
// We only care that A's subscription does NOT receive B's traffic;
// pushing through B-side traffic would be a no-op since A's
// subscription was rejected by NATS at SUB time.
device_b
.publish("device-commands.sensor-b", "should-not-leak".into())
.await?;
device_b.flush().await?;
let result = tokio::time::timeout(Duration::from_millis(750), a_wrong.next()).await;
assert!(
result.is_err(),
"device A must not observe device B's commands"
);
Ok(())
}
#[tokio::test]
#[ignore = "requires k3d + docker environment"]
async fn unknown_role_is_rejected() -> Result<()> {
let _ = tracing_subscriber::fmt().with_env_filter("info").try_init();
let stack = shared_stack().await?;
// The intruder has a valid Zitadel JWT but no fleet-admin/device role
// grant. The callout must reject the connection — NATS surfaces that
// as `authorization violation` at connect time.
let result = connect_with_role(&stack, &stack.intruder_machine_key).await;
assert!(
result.is_err(),
"JWT without fleet role must not be admitted to NATS"
);
Ok(())
}