Some checks failed
Run Check Script / check (pull_request) Failing after 2m15s
Two bugs uncovered while running the full e2e walk end to end: 1. find_user_grant POSTed to /management/v1/users/<id>/grants/_search which Zitadel rejects with 405 Method Not Allowed (the original author's note in the comment hinted at this). The cache previously masked it: first apply created the grant + cached the id; second apply hit the cache and skipped the broken search. The live-query refactor (f4d6fb94) removed the cache short-circuit, surfacing the bug as "Create user grant failed: User grant already exists" on every re-apply. Fix: switch to the collection endpoint /management/v1/users/grants/_search with a userIdQuery filter, matching the Zitadel API that's actually wired up. Now returns the existing grant on re-apply and the create_user_grant fallback is correctly skipped. 2. Operator keyfile mounted as 0o400 owned by root. The operator pod runs as non-root (image USER directive — no fixed runAsUser because we want SCC compatibility). Result: operator boots, tries to load the JSON keyfile from the Secret volume, hits EACCES, fails the credential factory, retries forever. Fix: mode 0o444. World-read inside the pod is fine — single container, no other consumers, the Secret namespace is locked down, and the file never escapes pod-fs. The proper fsGroup-based alternative requires pinning a UID/GID, which conflicts with our SCC-friendly choice of leaving runAsUser unset. Also fixes a stale `git rm` from commit4194baac(harmony-fleet-auth extraction) — the agent's local credentials.rs was deleted from disk but never staged. Verified end to end: * STACK READY in 2 min on warm cluster * Operator pod: "minted fresh Zitadel access token", "NATS connected", "starting Deployment controller", "watching device-info KV" * 2 Device CRs auto-created with full label set * `kubectl apply -f` of a Deployment CR with targetSelector.matchLabels: { group: group-a } produced: - status.aggregate { matched=1, succeeded=1, failed=0 } - HTTP 200 from nginx on vm-device-00:8080 - connection refused from vm-device-01:8080 (correctly excluded)