Compare commits
5 Commits
chore/left
...
feat/teams
| Author | SHA1 | Date | |
|---|---|---|---|
| f96572848a | |||
| 78aadadd22 | |||
| b5c6e1c99d | |||
| f94c899bf7 | |||
| 77eb1228be |
@@ -1,6 +0,0 @@
|
||||
target/
|
||||
Dockerfile
|
||||
.git
|
||||
data
|
||||
target
|
||||
demos
|
||||
2
.gitattributes
vendored
@@ -2,5 +2,3 @@ bootx64.efi filter=lfs diff=lfs merge=lfs -text
|
||||
grubx64.efi filter=lfs diff=lfs merge=lfs -text
|
||||
initrd filter=lfs diff=lfs merge=lfs -text
|
||||
linux filter=lfs diff=lfs merge=lfs -text
|
||||
data/okd/bin/* filter=lfs diff=lfs merge=lfs -text
|
||||
data/okd/installer_image/* filter=lfs diff=lfs merge=lfs -text
|
||||
|
||||
@@ -1,18 +0,0 @@
|
||||
name: Run Check Script
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
pull_request:
|
||||
|
||||
jobs:
|
||||
check:
|
||||
runs-on: docker
|
||||
container:
|
||||
image: hub.nationtech.io/harmony/harmony_composer:latest
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Run check script
|
||||
run: bash build/check.sh
|
||||
@@ -1,95 +0,0 @@
|
||||
name: Compile and package harmony_composer
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
|
||||
jobs:
|
||||
package_harmony_composer:
|
||||
container:
|
||||
image: hub.nationtech.io/harmony/harmony_composer:latest
|
||||
runs-on: dind
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Build for Linux x86_64
|
||||
run: cargo build --release --bin harmony_composer --target x86_64-unknown-linux-gnu
|
||||
|
||||
- name: Build for Windows x86_64 GNU
|
||||
run: cargo build --release --bin harmony_composer --target x86_64-pc-windows-gnu
|
||||
|
||||
- name: Setup log into hub.nationtech.io
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
registry: hub.nationtech.io
|
||||
username: ${{ secrets.HUB_BOT_USER }}
|
||||
password: ${{ secrets.HUB_BOT_PASSWORD }}
|
||||
|
||||
# TODO: build ARM images and MacOS binaries (or other targets) too
|
||||
|
||||
- name: Update snapshot-latest tag
|
||||
run: |
|
||||
git config user.name "Gitea CI"
|
||||
git config user.email "ci@nationtech.io"
|
||||
git tag -f snapshot-latest
|
||||
git push origin snapshot-latest --force
|
||||
|
||||
- name: Install jq
|
||||
run: apt install -y jq # The current image includes apt lists so we don't have to apt update and rm /var/lib/apt... every time. But if the image is optimized it won't work anymore
|
||||
|
||||
- name: Create or update release
|
||||
run: |
|
||||
# First, check if release exists and delete it if it does
|
||||
RELEASE_ID=$(curl -s -X GET \
|
||||
-H "Authorization: token ${{ secrets.GITEATOKEN }}" \
|
||||
"https://git.nationtech.io/api/v1/repos/nationtech/harmony/releases/tags/snapshot-latest" \
|
||||
| jq -r '.id // empty')
|
||||
|
||||
if [ -n "$RELEASE_ID" ]; then
|
||||
# Delete existing release
|
||||
curl -X DELETE \
|
||||
-H "Authorization: token ${{ secrets.GITEATOKEN }}" \
|
||||
"https://git.nationtech.io/api/v1/repos/nationtech/harmony/releases/$RELEASE_ID"
|
||||
fi
|
||||
|
||||
# Create new release
|
||||
RESPONSE=$(curl -X POST \
|
||||
-H "Authorization: token ${{ secrets.GITEATOKEN }}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"tag_name": "snapshot-latest",
|
||||
"name": "Latest Snapshot",
|
||||
"body": "Automated snapshot build from master branch",
|
||||
"draft": false,
|
||||
"prerelease": true
|
||||
}' \
|
||||
"https://git.nationtech.io/api/v1/repos/nationtech/harmony/releases")
|
||||
|
||||
echo "RELEASE_ID=$(echo $RESPONSE | jq -r '.id')" >> $GITHUB_ENV
|
||||
|
||||
- name: Upload Linux binary
|
||||
run: |
|
||||
curl -X POST \
|
||||
-H "Authorization: token ${{ secrets.GITEATOKEN }}" \
|
||||
-H "Content-Type: application/octet-stream" \
|
||||
--data-binary "@target/x86_64-unknown-linux-gnu/release/harmony_composer" \
|
||||
"https://git.nationtech.io/api/v1/repos/nationtech/harmony/releases/${{ env.RELEASE_ID }}/assets?name=harmony_composer"
|
||||
|
||||
- name: Upload Windows binary
|
||||
run: |
|
||||
curl -X POST \
|
||||
-H "Authorization: token ${{ secrets.GITEATOKEN }}" \
|
||||
-H "Content-Type: application/octet-stream" \
|
||||
--data-binary "@target/x86_64-pc-windows-gnu/release/harmony_composer.exe" \
|
||||
"https://git.nationtech.io/api/v1/repos/nationtech/harmony/releases/${{ env.RELEASE_ID }}/assets?name=harmony_composer.exe"
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Build and push
|
||||
uses: docker/build-push-action@v6
|
||||
with:
|
||||
context: .
|
||||
push: true
|
||||
tags: hub.nationtech.io/harmony/harmony_composer:latest
|
||||
37
.gitignore
vendored
@@ -1,34 +1,3 @@
|
||||
### General ###
|
||||
private_repos/
|
||||
|
||||
### Harmony ###
|
||||
harmony.log
|
||||
data/okd/installation_files*
|
||||
|
||||
### Helm ###
|
||||
# Chart dependencies
|
||||
**/charts/*.tgz
|
||||
|
||||
### Rust ###
|
||||
# Generated by Cargo
|
||||
# will have compiled files and executables
|
||||
debug/
|
||||
target/
|
||||
|
||||
# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
|
||||
# More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
|
||||
Cargo.lock
|
||||
|
||||
# These are backup files generated by rustfmt
|
||||
**/*.rs.bk
|
||||
|
||||
# MSVC Windows builds of rustc generate these, which store debugging information
|
||||
*.pdb
|
||||
|
||||
.harmony_generated
|
||||
|
||||
# Useful to create ignore folders for temp files and notes
|
||||
ignore
|
||||
|
||||
# Generated book
|
||||
book
|
||||
target
|
||||
private_repos
|
||||
log/
|
||||
|
||||
3
.gitmodules
vendored
@@ -1,3 +0,0 @@
|
||||
[submodule "examples/try_rust_webapp/tryrust.org"]
|
||||
path = examples/try_rust_webapp/tryrust.org
|
||||
url = https://github.com/rust-dd/tryrust.org.git
|
||||
@@ -1,26 +0,0 @@
|
||||
{
|
||||
"db_name": "SQLite",
|
||||
"query": "SELECT host_id, installation_device FROM host_role_mapping WHERE role = ?",
|
||||
"describe": {
|
||||
"columns": [
|
||||
{
|
||||
"name": "host_id",
|
||||
"ordinal": 0,
|
||||
"type_info": "Text"
|
||||
},
|
||||
{
|
||||
"name": "installation_device",
|
||||
"ordinal": 1,
|
||||
"type_info": "Text"
|
||||
}
|
||||
],
|
||||
"parameters": {
|
||||
"Right": 1
|
||||
},
|
||||
"nullable": [
|
||||
false,
|
||||
true
|
||||
]
|
||||
},
|
||||
"hash": "24f719d57144ecf4daa55f0aa5836c165872d70164401c0388e8d625f1b72d7b"
|
||||
}
|
||||
@@ -1,12 +0,0 @@
|
||||
{
|
||||
"db_name": "SQLite",
|
||||
"query": "\n INSERT INTO host_role_mapping (host_id, role, installation_device)\n VALUES (?, ?, ?)\n ",
|
||||
"describe": {
|
||||
"columns": [],
|
||||
"parameters": {
|
||||
"Right": 3
|
||||
},
|
||||
"nullable": []
|
||||
},
|
||||
"hash": "6fcc29cfdbdf3b2cee94a4844e227f09b245dd8f079832a9a7b774151cb03af6"
|
||||
}
|
||||
@@ -1,32 +0,0 @@
|
||||
{
|
||||
"db_name": "SQLite",
|
||||
"query": "\n SELECT\n p1.id,\n p1.version_id,\n p1.data as \"data: Json<PhysicalHost>\"\n FROM\n physical_hosts p1\n INNER JOIN (\n SELECT\n id,\n MAX(version_id) AS max_version\n FROM\n physical_hosts\n GROUP BY\n id\n ) p2 ON p1.id = p2.id AND p1.version_id = p2.max_version\n ",
|
||||
"describe": {
|
||||
"columns": [
|
||||
{
|
||||
"name": "id",
|
||||
"ordinal": 0,
|
||||
"type_info": "Text"
|
||||
},
|
||||
{
|
||||
"name": "version_id",
|
||||
"ordinal": 1,
|
||||
"type_info": "Text"
|
||||
},
|
||||
{
|
||||
"name": "data: Json<PhysicalHost>",
|
||||
"ordinal": 2,
|
||||
"type_info": "Blob"
|
||||
}
|
||||
],
|
||||
"parameters": {
|
||||
"Right": 0
|
||||
},
|
||||
"nullable": [
|
||||
false,
|
||||
false,
|
||||
false
|
||||
]
|
||||
},
|
||||
"hash": "8d247918eca10a88b784ee353db090c94a222115c543231f2140cba27bd0f067"
|
||||
}
|
||||
@@ -1,32 +0,0 @@
|
||||
{
|
||||
"db_name": "SQLite",
|
||||
"query": "SELECT id, version_id, data as \"data: Json<PhysicalHost>\" FROM physical_hosts WHERE id = ? ORDER BY version_id DESC LIMIT 1",
|
||||
"describe": {
|
||||
"columns": [
|
||||
{
|
||||
"name": "id",
|
||||
"ordinal": 0,
|
||||
"type_info": "Text"
|
||||
},
|
||||
{
|
||||
"name": "version_id",
|
||||
"ordinal": 1,
|
||||
"type_info": "Text"
|
||||
},
|
||||
{
|
||||
"name": "data: Json<PhysicalHost>",
|
||||
"ordinal": 2,
|
||||
"type_info": "Null"
|
||||
}
|
||||
],
|
||||
"parameters": {
|
||||
"Right": 1
|
||||
},
|
||||
"nullable": [
|
||||
false,
|
||||
false,
|
||||
false
|
||||
]
|
||||
},
|
||||
"hash": "934035c7ca6e064815393e4e049a7934b0a7fac04a4fe4b2a354f0443d630990"
|
||||
}
|
||||
@@ -1,12 +0,0 @@
|
||||
{
|
||||
"db_name": "SQLite",
|
||||
"query": "INSERT INTO physical_hosts (id, version_id, data) VALUES (?, ?, ?)",
|
||||
"describe": {
|
||||
"columns": [],
|
||||
"parameters": {
|
||||
"Right": 3
|
||||
},
|
||||
"nullable": []
|
||||
},
|
||||
"hash": "f10f615ee42129ffa293e46f2f893d65a237d31d24b74a29c6a8d8420d255ab8"
|
||||
}
|
||||
@@ -1,36 +0,0 @@
|
||||
# Contributing to the Harmony project
|
||||
|
||||
## Write small P-R
|
||||
|
||||
Aim for the smallest piece of work that is mergeable.
|
||||
|
||||
Mergeable means that :
|
||||
|
||||
- it does not break the build
|
||||
- it moves the codebase one step forward
|
||||
|
||||
P-Rs can be many things, they do not have to be complete features.
|
||||
|
||||
### What a P-R **should** be
|
||||
|
||||
- Introduce a new trait : This will be the place to discuss the new trait addition, its design and implementation
|
||||
- A new implementation of a trait : a new concrete implementation of the LoadBalancer trait
|
||||
- A new CI check : something that improves quality, robustness, ci performance
|
||||
- Documentation improvements
|
||||
- Refactoring
|
||||
- Bugfix
|
||||
|
||||
### What a P-R **should not** be
|
||||
|
||||
- Large. Anything over 200 lines (excluding generated lines) should have a very good reason to be this large.
|
||||
- A mix of refactoring, bug fixes and new features.
|
||||
- Introducing multiple new features or ideas at once.
|
||||
- Multiple new implementations of a trait/functionnality at once
|
||||
|
||||
The general idea is to keep P-Rs small and single purpose.
|
||||
|
||||
## Commit message formatting
|
||||
|
||||
We follow conventional commits guidelines.
|
||||
|
||||
https://www.conventionalcommits.org/en/v1.0.0/
|
||||
4901
Cargo.lock
generated
94
Cargo.toml
@@ -1,28 +1,16 @@
|
||||
[workspace]
|
||||
resolver = "2"
|
||||
members = [
|
||||
"examples/*",
|
||||
"private_repos/*",
|
||||
"examples/*",
|
||||
"harmony",
|
||||
"harmony_types",
|
||||
"harmony_macros",
|
||||
"harmony_tui",
|
||||
"harmony_execution",
|
||||
"opnsense-config",
|
||||
"opnsense-config-xml",
|
||||
"harmony_cli",
|
||||
"k3d",
|
||||
"harmony_composer",
|
||||
"harmony_inventory_agent",
|
||||
"harmony_secret_derive",
|
||||
"harmony_secret",
|
||||
"harmony_config_derive",
|
||||
"harmony_config",
|
||||
"brocade",
|
||||
"harmony_agent",
|
||||
"harmony_agent/deploy",
|
||||
"harmony_node_readiness",
|
||||
"harmony-k8s",
|
||||
]
|
||||
|
||||
[workspace.package]
|
||||
@@ -31,58 +19,28 @@ readme = "README.md"
|
||||
license = "GNU AGPL v3"
|
||||
|
||||
[workspace.dependencies]
|
||||
log = { version = "0.4", features = ["kv"] }
|
||||
env_logger = "0.11"
|
||||
derive-new = "0.7"
|
||||
async-trait = "0.1"
|
||||
tokio = { version = "1.40", features = [
|
||||
"io-std",
|
||||
"fs",
|
||||
"macros",
|
||||
"rt-multi-thread",
|
||||
] }
|
||||
tokio-retry = "0.3.0"
|
||||
tokio-util = "0.7.15"
|
||||
cidr = { features = ["serde"], version = "0.2" }
|
||||
russh = "0.45"
|
||||
russh-keys = "0.45"
|
||||
rand = "0.9"
|
||||
url = "2.5"
|
||||
kube = { version = "1.1.0", features = [
|
||||
"config",
|
||||
"client",
|
||||
"runtime",
|
||||
"rustls-tls",
|
||||
"ws",
|
||||
"jsonpatch",
|
||||
] }
|
||||
k8s-openapi = { version = "0.25", features = ["v1_30"] }
|
||||
# TODO replace with https://github.com/bourumir-wyngs/serde-saphyr as serde_yaml is deprecated https://github.com/sebastienrousseau/serde_yml
|
||||
serde_yaml = "0.9"
|
||||
serde-value = "0.7"
|
||||
http = "1.2"
|
||||
inquire = "0.7"
|
||||
convert_case = "0.8"
|
||||
chrono = "0.4"
|
||||
similar = "2"
|
||||
uuid = { version = "1.11", features = ["v4", "fast-rng", "macro-diagnostics"] }
|
||||
pretty_assertions = "1.4.1"
|
||||
tempfile = "3.20.0"
|
||||
bollard = "0.19.1"
|
||||
base64 = "0.22.1"
|
||||
tar = "0.4.44"
|
||||
lazy_static = "1.5.0"
|
||||
directories = "6.0.0"
|
||||
thiserror = "2.0.14"
|
||||
serde = { version = "1.0.209", features = ["derive", "rc"] }
|
||||
serde_json = "1.0.127"
|
||||
askama = "0.14"
|
||||
sqlx = { version = "0.8", features = ["runtime-tokio", "sqlite"] }
|
||||
reqwest = { version = "0.12", features = [
|
||||
"blocking",
|
||||
"stream",
|
||||
"rustls-tls",
|
||||
"http2",
|
||||
"json",
|
||||
], default-features = false }
|
||||
assertor = "0.0.4"
|
||||
log = "0.4.22"
|
||||
env_logger = "0.11.5"
|
||||
derive-new = "0.7.0"
|
||||
async-trait = "0.1.82"
|
||||
tokio = { version = "1.40.0", features = ["io-std", "fs", "macros", "rt-multi-thread"] }
|
||||
cidr = "0.2.3"
|
||||
russh = "0.45.0"
|
||||
russh-keys = "0.45.0"
|
||||
rand = "0.8.5"
|
||||
url = "2.5.4"
|
||||
kube = "0.98.0"
|
||||
k8s-openapi = { version = "0.24.0", features = ["v1_30"] }
|
||||
serde_yaml = "0.9.34"
|
||||
serde-value = "0.7.0"
|
||||
http = "1.2.0"
|
||||
inquire = "0.7.5"
|
||||
convert_case = "0.8.0"
|
||||
|
||||
[workspace.dependencies.uuid]
|
||||
version = "1.11.0"
|
||||
features = [
|
||||
"v4", # Lets you generate random UUIDs
|
||||
"fast-rng", # Use a faster (but still sufficiently random) RNG
|
||||
"macro-diagnostics", # Enable better diagnostics for compile-time UUIDs
|
||||
]
|
||||
|
||||
26
Dockerfile
@@ -1,26 +0,0 @@
|
||||
FROM docker.io/rust:1.89.0 AS build
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY . .
|
||||
|
||||
RUN cargo build --release --bin harmony_composer
|
||||
|
||||
FROM docker.io/rust:1.89.0
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
RUN rustup target add x86_64-pc-windows-gnu
|
||||
RUN rustup target add x86_64-unknown-linux-gnu
|
||||
RUN rustup component add rustfmt
|
||||
RUN rustup component add clippy
|
||||
|
||||
RUN apt update
|
||||
|
||||
# TODO: Consider adding more supported targets
|
||||
# nodejs for checkout action, docker for building containers, mingw for cross-compiling for windows
|
||||
RUN apt install -y nodejs docker.io mingw-w64
|
||||
|
||||
COPY --from=build /app/target/release/harmony_composer .
|
||||
|
||||
ENTRYPOINT ["/app/harmony_composer"]
|
||||
341
README.md
@@ -1,250 +1,171 @@
|
||||
# Harmony
|
||||
# Harmony : Open Infrastructure Orchestration
|
||||
|
||||
**Infrastructure orchestration that treats your platform like first-class code.**
|
||||
## Quick demo
|
||||
|
||||
Harmony is an open-source framework that brings the rigor of software engineering to infrastructure management. Write Rust code to define what you want, and Harmony handles the rest — from local development to production clusters.
|
||||
`cargo run -p example-tui`
|
||||
|
||||
_By [NationTech](https://nationtech.io)_
|
||||
This will launch Harmony's minimalist terminal ui which embeds a few demo scores.
|
||||
|
||||
[](https://git.nationtech.io/NationTech/harmony)
|
||||
[](LICENSE)
|
||||
Usage instructions will be displayed at the bottom of the TUI.
|
||||
|
||||
---
|
||||
`cargo run --bin example-cli -- --help`
|
||||
|
||||
## The Problem Harmony Solves
|
||||
This is the harmony CLI, a minimal implementation
|
||||
|
||||
Modern infrastructure is messy. Your Kubernetes cluster needs monitoring. Your bare-metal servers need provisioning. Your applications need deployments. Each comes with its own tooling, its own configuration format, and its own failure modes.
|
||||
The current help text:
|
||||
|
||||
**What if you could describe your entire platform in one consistent language?**
|
||||
````
|
||||
Usage: example-cli [OPTIONS]
|
||||
|
||||
That's Harmony. It unifies project scaffolding, infrastructure provisioning, application deployment, and day-2 operations into a single strongly-typed Rust codebase.
|
||||
Options:
|
||||
-y, --yes Run score(s) or not
|
||||
-f, --filter <FILTER> Filter query
|
||||
-i, --interactive Run interactive TUI or not
|
||||
-a, --all Run all or nth, defaults to all
|
||||
-n, --number <NUMBER> Run nth matching, zero indexed [default: 0]
|
||||
-l, --list list scores, will also be affected by run filter
|
||||
-h, --help Print help
|
||||
-V, --version Print version```
|
||||
|
||||
---
|
||||
## Core architecture
|
||||
|
||||
## Three Principles That Make the Difference
|
||||

|
||||
````
|
||||
## Supporting a new field in OPNSense `config.xml`
|
||||
|
||||
| Principle | What It Means |
|
||||
|-----------|---------------|
|
||||
| **Infrastructure as Resilient Code** | Stop fighting with YAML and bash. Write type-safe Rust that you can test, version, and refactor like any other code. |
|
||||
| **Prove It Works Before You Deploy** | Harmony verifies at _compile time_ that your application can actually run on your target infrastructure. No more "the config looks right but it doesn't work" surprises. |
|
||||
| **One Unified Model** | Software and infrastructure are one system. Deploy from laptop to production cluster without switching contexts or tools. |
|
||||
Two steps:
|
||||
- Supporting the field in `opnsense-config-xml`
|
||||
- Enabling Harmony to control the field
|
||||
|
||||
---
|
||||
We'll use the `filename` field in the `dhcpcd` section of the file as an example.
|
||||
|
||||
## How It Works: The Core Concepts
|
||||
### Supporting the field
|
||||
|
||||
Harmony is built around three concepts that work together:
|
||||
As type checking if enforced, every field from `config.xml` must be known by the code. Each subsection of `config.xml` has its `.rs` file. For the `dhcpcd` section, we'll modify `opnsense-config-xml/src/data/dhcpd.rs`.
|
||||
|
||||
### Score — "What You Want"
|
||||
When a new field appears in the xml file, an error like this will be thrown and Harmony will panic :
|
||||
```
|
||||
Running `/home/stremblay/nt/dir/harmony/target/debug/example-nanodc`
|
||||
Found unauthorized element filename
|
||||
thread 'main' panicked at opnsense-config-xml/src/data/opnsense.rs:54:14:
|
||||
OPNSense received invalid string, should be full XML: ()
|
||||
|
||||
A `Score` is a declarative description of desired state. Think of it as a "recipe" that says _what_ you want without specifying _how_ to get there.
|
||||
|
||||
```rust
|
||||
// "I want a PostgreSQL cluster running with default settings"
|
||||
let postgres = PostgreSQLScore {
|
||||
config: PostgreSQLConfig {
|
||||
cluster_name: "harmony-postgres-example".to_string(),
|
||||
namespace: "harmony-postgres-example".to_string(),
|
||||
..Default::default()
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
### Topology — "Where It Goes"
|
||||
|
||||
A `Topology` represents your infrastructure environment and its capabilities. It answers the question: "What can this environment actually do?"
|
||||
|
||||
```rust
|
||||
// Deploy to a local K3D cluster, or any Kubernetes cluster via environment variables
|
||||
K8sAnywhereTopology::from_env()
|
||||
Define the missing field (`filename`) in the `DhcpInterface` struct of `opnsense-config-xml/src/data/dhcpd.rs`:
|
||||
```
|
||||
pub struct DhcpInterface {
|
||||
...
|
||||
pub filename: Option<String>,
|
||||
```
|
||||
|
||||
### Interpret — "How It Happens"
|
||||
Harmony should now be fixed, build and run.
|
||||
|
||||
An `Interpret` is the execution logic that connects your `Score` to your `Topology`. It translates "what you want" into "what the infrastructure does."
|
||||
### Controlling the field
|
||||
|
||||
**The Compile-Time Check:** Before your code ever runs, Harmony verifies that your `Score` is compatible with your `Topology`. If your application needs a feature your infrastructure doesn't provide, you get a compile error — not a runtime failure.
|
||||
|
||||
---
|
||||
|
||||
## What You Can Deploy
|
||||
|
||||
Harmony ships with ready-made Scores for:
|
||||
|
||||
**Data Services**
|
||||
- PostgreSQL clusters (via CloudNativePG operator)
|
||||
- Multi-site PostgreSQL with failover
|
||||
|
||||
**Kubernetes**
|
||||
- Namespaces, Deployments, Ingress
|
||||
- Helm charts
|
||||
- cert-manager for TLS
|
||||
- Monitoring (Prometheus, alerting, ntfy)
|
||||
|
||||
**Bare Metal / Infrastructure**
|
||||
- OKD clusters from scratch
|
||||
- OPNsense firewalls
|
||||
- Network services (DNS, DHCP, TFTP)
|
||||
- Brocade switch configuration
|
||||
|
||||
**And more:** Application deployment, tenant management, load balancing, and more.
|
||||
|
||||
---
|
||||
|
||||
## Quick Start: Deploy a PostgreSQL Cluster
|
||||
|
||||
This example provisions a local Kubernetes cluster (K3D) and deploys a PostgreSQL cluster on it — no external infrastructure required.
|
||||
|
||||
```rust
|
||||
use harmony::{
|
||||
inventory::Inventory,
|
||||
modules::postgresql::{PostgreSQLScore, capability::PostgreSQLConfig},
|
||||
topology::K8sAnywhereTopology,
|
||||
};
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
let postgres = PostgreSQLScore {
|
||||
config: PostgreSQLConfig {
|
||||
cluster_name: "harmony-postgres-example".to_string(),
|
||||
namespace: "harmony-postgres-example".to_string(),
|
||||
..Default::default()
|
||||
},
|
||||
};
|
||||
|
||||
harmony_cli::run(
|
||||
Inventory::autoload(),
|
||||
K8sAnywhereTopology::from_env(),
|
||||
vec![Box::new(postgres)],
|
||||
None,
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
}
|
||||
Define the `xml field setter` in `opnsense-config/src/modules/dhcpd.rs`.
|
||||
```
|
||||
impl<'a> DhcpConfig<'a> {
|
||||
...
|
||||
pub fn set_filename(&mut self, filename: &str) {
|
||||
self.enable_netboot();
|
||||
self.get_lan_dhcpd().filename = Some(filename.to_string());
|
||||
}
|
||||
...
|
||||
```
|
||||
|
||||
### What this actually does
|
||||
|
||||
When you compile and run this program:
|
||||
|
||||
1. **Compiles** the Harmony Score into an executable
|
||||
2. **Connects** to `K8sAnywhereTopology` — which auto-provisions a local K3D cluster if none exists
|
||||
3. **Installs** the CloudNativePG operator into the cluster (one-time setup)
|
||||
4. **Creates** a PostgreSQL cluster with 1 instance and 1 GiB of storage
|
||||
5. **Exposes** the PostgreSQL instance as a Kubernetes Service
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- [Rust](https://rust-lang.org/tools/install) (edition 2024)
|
||||
- [Docker](https://docs.docker.com/get-docker/) (for the local K3D cluster)
|
||||
- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) (optional, for inspecting the cluster)
|
||||
|
||||
### Run it
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://git.nationtech.io/nationtech/harmony
|
||||
cd harmony
|
||||
|
||||
# Build the project
|
||||
cargo build --release
|
||||
|
||||
# Run the example
|
||||
cargo run -p example-postgresql
|
||||
Define the `value setter` in the `DhcpServer trait` in `domain/topology/network.rs`
|
||||
```
|
||||
#[async_trait]
|
||||
pub trait DhcpServer: Send + Sync {
|
||||
...
|
||||
async fn set_filename(&self, filename: &str) -> Result<(), ExecutorError>;
|
||||
...
|
||||
```
|
||||
|
||||
Harmony will print its progress as it sets up the cluster and deploys PostgreSQL. When complete, you can inspect the deployment:
|
||||
Implement the `value setter` in each `DhcpServer` implementation.
|
||||
`infra/opnsense/dhcp.rs`:
|
||||
```
|
||||
#[async_trait]
|
||||
impl DhcpServer for OPNSenseFirewall {
|
||||
...
|
||||
async fn set_filename(&self, filename: &str) -> Result<(), ExecutorError> {
|
||||
{
|
||||
let mut writable_opnsense = self.opnsense_config.write().await;
|
||||
writable_opnsense.dhcp().set_filename(filename);
|
||||
debug!("OPNsense dhcp server set filename {filename}");
|
||||
}
|
||||
|
||||
```bash
|
||||
kubectl get pods -n harmony-postgres-example
|
||||
kubectl get secret -n harmony-postgres-example harmony-postgres-example-db-user -o jsonpath='{.data.password}' | base64 -d
|
||||
Ok(())
|
||||
}
|
||||
...
|
||||
```
|
||||
|
||||
To connect to the database, forward the port:
|
||||
```bash
|
||||
kubectl port-forward -n harmony-postgres-example svc/harmony-postgres-example-rw 5432:5432
|
||||
psql -h localhost -p 5432 -U postgres
|
||||
`domain/topology/ha_cluster.rs`
|
||||
```
|
||||
#[async_trait]
|
||||
impl DhcpServer for DummyInfra {
|
||||
...
|
||||
async fn set_filename(&self, _filename: &str) -> Result<(), ExecutorError> {
|
||||
unimplemented!("{}", UNIMPLEMENTED_DUMMY_INFRA)
|
||||
}
|
||||
...
|
||||
```
|
||||
|
||||
To clean up, delete the K3D cluster:
|
||||
```bash
|
||||
k3d cluster delete harmony-postgres-example
|
||||
Add the new field to the DhcpScore in `modules/dhcp.rs`
|
||||
```
|
||||
pub struct DhcpScore {
|
||||
...
|
||||
pub filename: Option<String>,
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
`K8sAnywhereTopology::from_env()` reads the following environment variables to determine where and how to connect:
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `KUBECONFIG` | `~/.kube/config` | Path to your kubeconfig file |
|
||||
| `HARMONY_AUTOINSTALL` | `true` | Auto-provision a local K3D cluster if none found |
|
||||
| `HARMONY_USE_LOCAL_K3D` | `true` | Always prefer local K3D over remote clusters |
|
||||
| `HARMONY_PROFILE` | `dev` | Deployment profile: `dev`, `staging`, or `prod` |
|
||||
| `HARMONY_K8S_CONTEXT` | _none_ | Use a specific kubeconfig context |
|
||||
| `HARMONY_PUBLIC_DOMAIN` | _none_ | Public domain for ingress endpoints |
|
||||
|
||||
To connect to an existing Kubernetes cluster instead of provisioning K3D:
|
||||
|
||||
```bash
|
||||
# Point to your kubeconfig
|
||||
export KUBECONFIG=/path/to/your/kubeconfig
|
||||
export HARMONY_USE_LOCAL_K3D=false
|
||||
export HARMONY_AUTOINSTALL=false
|
||||
|
||||
# Then run
|
||||
cargo run -p example-postgresql
|
||||
Define it in its implementation in `modules/okd/dhcp.rs`
|
||||
```
|
||||
impl OKDDhcpScore {
|
||||
...
|
||||
Self {
|
||||
dhcp_score: DhcpScore {
|
||||
...
|
||||
filename: Some("undionly.kpxe".to_string()),
|
||||
```
|
||||
|
||||
---
|
||||
Define it in its implementation in `modules/okd/bootstrap_dhcp.rs`
|
||||
```
|
||||
impl OKDDhcpScore {
|
||||
...
|
||||
Self {
|
||||
dhcp_score: DhcpScore::new(
|
||||
...
|
||||
Some("undionly.kpxe".to_string()),
|
||||
```
|
||||
|
||||
## Documentation
|
||||
Update the interpret (function called by the `execute` fn of the interpret) so it now updates the `filename` field value in `modules/dhcp.rs`
|
||||
```
|
||||
impl DhcpInterpret {
|
||||
...
|
||||
let filename_outcome = match &self.score.filename {
|
||||
Some(filename) => {
|
||||
let dhcp_server = Arc::new(topology.dhcp_server.clone());
|
||||
dhcp_server.set_filename(&filename).await?;
|
||||
Outcome::new(
|
||||
InterpretStatus::SUCCESS,
|
||||
format!("Dhcp Interpret Set filename to {filename}"),
|
||||
)
|
||||
}
|
||||
None => Outcome::noop(),
|
||||
};
|
||||
|
||||
| I want to... | Start here |
|
||||
|--------------|------------|
|
||||
| Understand the core concepts | [Core Concepts](./docs/concepts.md) |
|
||||
| Deploy my first application | [Getting Started Guide](./docs/guides/getting-started.md) |
|
||||
| Explore available components | [Scores Catalog](./docs/catalogs/scores.md) · [Topologies Catalog](./docs/catalogs/topologies.md) |
|
||||
| See a complete bare-metal deployment | [OKD on Bare Metal](./docs/use-cases/okd-on-bare-metal.md) |
|
||||
| Build my own Score or Topology | [Developer Guide](./docs/guides/developer-guide.md) |
|
||||
if next_server_outcome.status == InterpretStatus::NOOP
|
||||
&& boot_filename_outcome.status == InterpretStatus::NOOP
|
||||
&& filename_outcome.status == InterpretStatus::NOOP
|
||||
|
||||
---
|
||||
...
|
||||
|
||||
## Why Rust?
|
||||
|
||||
We chose Rust for the same reason you might: **reliability through type safety**.
|
||||
|
||||
Infrastructure code runs in production. It needs to be correct. Rust's ownership model and type system let us build a framework where:
|
||||
|
||||
- Invalid configurations fail at compile time, not at 3 AM
|
||||
- Refactoring infrastructure is as safe as refactoring application code
|
||||
- The compiler verifies that your platform can actually fulfill your requirements
|
||||
|
||||
See [ADR-001 · Why Rust](./adr/001-rust.md) for our full rationale.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Decisions
|
||||
|
||||
Harmony's design is documented through Architecture Decision Records (ADRs):
|
||||
|
||||
- [ADR-001 · Why Rust](./adr/001-rust.md)
|
||||
- [ADR-003 · Infrastructure Abstractions](./adr/003-infrastructure-abstractions.md)
|
||||
- [ADR-006 · Secret Management](./adr/006-secret-management.md)
|
||||
- [ADR-011 · Multi-Tenant Cluster](./adr/011-multi-tenant-cluster.md)
|
||||
|
||||
---
|
||||
|
||||
## License
|
||||
|
||||
Harmony is released under the **GNU AGPL v3**.
|
||||
|
||||
> We choose a strong copyleft license to ensure the project—and every improvement to it—remains open and benefits the entire community.
|
||||
|
||||
See [LICENSE](LICENSE) for the full text.
|
||||
|
||||
---
|
||||
|
||||
_Made with ❤️ & 🦀 by NationTech and the Harmony community_
|
||||
Ok(Outcome::new(
|
||||
InterpretStatus::SUCCESS,
|
||||
format!(
|
||||
"Dhcp Interpret Set next boot to [{:?}], boot_filename to [{:?}], filename to [{:?}]",
|
||||
self.score.boot_filename, self.score.boot_filename, self.score.filename
|
||||
)
|
||||
...
|
||||
```
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Architecture Decision Record: \<Title\>
|
||||
|
||||
Initial Author: \<Name\>
|
||||
Name: \<Name\>
|
||||
|
||||
Initial Date: \<Date\>
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
## Status
|
||||
|
||||
Rejected : See ADR 020 ./020-interactive-configuration-crate.md
|
||||
Proposed
|
||||
|
||||
### TODO [#3](https://git.nationtech.io/NationTech/harmony/issues/3):
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Architecture Decision Record: Helm and Kustomize Handling
|
||||
|
||||
Initial Author: Taha Hawa
|
||||
Name: Taha Hawa
|
||||
|
||||
Initial Date: 2025-04-15
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# Architecture Decision Record: Monitoring and Alerting
|
||||
|
||||
Initial Author : Willem Rolleman
|
||||
Date : April 28 2025
|
||||
Proposed by: Willem Rolleman
|
||||
Date: April 28 2025
|
||||
|
||||
## Status
|
||||
|
||||
@@ -1,124 +0,0 @@
|
||||
# ADR‑014 · Port Allocation Policy
|
||||
|
||||
**Date:** 2025‑11‑03
|
||||
**Status:** Proposed
|
||||
**Authors:** Infrastructure Team / NationTech Harmony Core
|
||||
|
||||
---
|
||||
|
||||
## 1 · Context
|
||||
|
||||
Harmony orchestrates heterogeneous clusters — from developer laptops to multi‑tenant production environments — through a single, typed Rust codebase.
|
||||
To enable deterministic, automatable service exposure across clusters, we need a **stable, conflict‑free range of TCP/UDP ports** reserved for Harmony‑managed workloads and host‑network services.
|
||||
|
||||
Today, various manual operations (e.g. Ceph RGW on host network) occasionally use ad‑hoc local ports such as `80` or `8080`. This leads to unpredictable overlaps with NodePorts, ephemeral ports, or other daemons.
|
||||
|
||||
Since Harmony aims to unify *manual* and *automated* infrastructure under a single declarative model, we formalize a **reserved port block** that should be used consistently — even for out‑of‑band operations — so everything remains alignment‑ready for future orchestration.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Problem Statement
|
||||
|
||||
We must select a contiguous range of approximately 1000 ports that:
|
||||
|
||||
1. Avoids conflict with:
|
||||
- System / well‑known ports (`0–1023`)
|
||||
- Common registered application ports (`1024–49151`)
|
||||
- Kubernetes NodePort default range (`30 000–32 767`)
|
||||
- Ephemeral port ranges (typically `32 768–60 999` on Linux)
|
||||
2. Feels deterministic and maintainable for long‑term automation.
|
||||
3. Can be mirrored across multiple clusters and bare‑metal hosts.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Decision
|
||||
|
||||
We designate:
|
||||
|
||||
> **Harmony Reserved Port Range:** `25 000 – 25 999`
|
||||
|
||||
### Rationale
|
||||
|
||||
| Criterion | Reasoning |
|
||||
|------------|------------|
|
||||
| **Safety** | Lies below the default ephemeral port start (`32 768`) and above NodePort (`30 000–32 767`). |
|
||||
| **Conflicts** | Very few historical IANA registrations in this span, practically unused in modern systems. |
|
||||
| **Predictability** | Easy to recognize and communicate (`25k‑block` for Harmony). |
|
||||
| **Ergonomics** | Mirrors production service conventions (`25080` for HTTP‑like, `25443` for HTTPS‑like, etc.). |
|
||||
| **Cross‑cluster consistency** | Portable across OKD, K3D, bare‑metal – allows static firewall rules. |
|
||||
|
||||
This range is *registered* (per IANA definition 1024–49151) but not *occupied*. Choosing such a mid‑registered band is intentional: it guarantees good inter‑platform compatibility without interfering with OS‑managed ephemeral allocations.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Integration in Harmony
|
||||
|
||||
Harmony’s topology and orchestration layers should expose a built‑in **`PortAllocator`** resource:
|
||||
|
||||
```rust
|
||||
let orchestrator_ports = harmony::network::PortAllocator::new(25_000, 25_999);
|
||||
let rados_gateway_port = orchestrator_ports.allocate("ceph-rgw")?;
|
||||
```
|
||||
|
||||
In TOML configuration:
|
||||
|
||||
```toml
|
||||
[harmony.network.defaults]
|
||||
orchestrator_port_range = "25000-25999"
|
||||
```
|
||||
|
||||
All `Score` / `Interpret` modules that deploy host‑network or externally visible services should use this allocator to pick ports deterministically.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Operational Guidance
|
||||
|
||||
- **Manual Operations:** Even manually configured host‑network services on clusters we manage (e.g. Ceph RGW, ntfy, reverse proxy tests) should adopt ports within `25 000–25 999`.
|
||||
- **Automation Alignment:** Any such ports are considered *Harmony‑manageable* and may be introspected later by the orchestrator.
|
||||
- **Firewall Rules:** Cluster and site firewalls may pre‑whitelist this block as “Harmony Infra Ports”.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Initial Allocation Table
|
||||
|
||||
| Service / Component | Port | Notes |
|
||||
|----------------------|------|-------|
|
||||
| **Ceph RADOS Gateway** | **25080** | Replacement for `:80`; public HTTP front‑end via RGW (host network). |
|
||||
| **Internal HTTPS Services** | 25443 | Equivalent of `:443` for secure endpoints. |
|
||||
| **Prometheus / Metrics Endpoints** | 25090–25099 | Internal observability. |
|
||||
| **Harmony Control Plane (gRPC)** | 25500–25519 | Inter‑process messaging, API, TUI backend. |
|
||||
| **Reserved for Future Scores** | 25550–25599 | Dedicated expansion slots. |
|
||||
|
||||
These values shall be managed centrally in a `harmony.network.ports` registry file (YAML / TOML) within the repository.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Consequences
|
||||
|
||||
### Positive
|
||||
- Eliminates accidental collisions with NodePorts and ephemeral connections.
|
||||
- Provides a memorable, human‑readable convention suitable for documentation and automation.
|
||||
- Enables Harmony to treat host‑network configuration as first‑class code.
|
||||
|
||||
### Negative / Trade‑offs
|
||||
- Occupies part of the IANA “registered” band (must remain consistent across environments).
|
||||
- Requires operators to adapt legacy services away from lower ports (`80`, `443`) for compliance with this policy.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Future Work
|
||||
- Extend Harmony CLI to display currently allocated ports and detect conflicts.
|
||||
- Expose a compile‑time capability check ensuring clusters in a topology do not re‑use the same port twice.
|
||||
- Consider parallel UDP range reservation when we start exposing datagram services.
|
||||
|
||||
---
|
||||
|
||||
## 9 · References
|
||||
- [IANA Service Name and Transport Protocol Port Number Registry](https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml)
|
||||
- [Kubernetes NodePort default range documentation](https://kubernetes.io/docs/concepts/services-networking/service/)
|
||||
- Harmony ADR‑001 · Why Rust
|
||||
- Harmony ADR‑003 · Infrastructure Abstractions
|
||||
|
||||
---
|
||||
|
||||
*Made with 🦀 by the NationTech Harmony team — ensuring your infrastructure behaves like code.*
|
||||
@@ -1,107 +0,0 @@
|
||||
### Context & Use Case
|
||||
We are implementing a High Availability (HA) Failover Strategy for decentralized "Micro Data Centers." The core challenge is managing stateful workloads (PostgreSQL) over unreliable networks.
|
||||
|
||||
We solve this using a **Local Fencing First** approach, backed by **NATS JetStream Strict Ordering** for the final promotion authority.
|
||||
|
||||
In CAP theorem terms, we are developing a CP system, intentionally sacrificing availability. In practical terms, we expect an average of two primary outages per year, with a failover delay of around 2 minutes. This translates to an uptime of over five nines. To be precise, 2 outages * 2 minutes = 4 minutes per year = 99.99924% uptime.
|
||||
|
||||
### The Algorithm: Local Fencing & Remote Promotion
|
||||
|
||||
The safety (data consistency) of the system relies on the time gap between the **Primary giving up (Fencing)** and the **Replica taking over (Promotion)**.
|
||||
|
||||
To avoid clock skew issues between agents and datastore (nats), all timestamps comparisons will be done using jetstream metadata. I.E. a harmony agent will never use `Instant::now()` to get a timestamp, it will use `my_last_heartbeat.metadata.timestamp` (conceptually).
|
||||
|
||||
#### 1. Configuration
|
||||
* `heartbeat_timeout` (e.g., 1s): Max time allowed for a NATS write/DB check.
|
||||
* `failure_threshold` (e.g., 2): Consecutive failures before self-fencing.
|
||||
* `failover_timeout` (e.g., 5s): Time since last NATS update of Primary heartbeat before Replica promotes.
|
||||
* This timeout must be carefully configured to allow enough time for the primary to fence itself (after `heartbeat_timeout * failure_threshold`) BEFORE the replica gets promoted to avoid a split brain with two primaries.
|
||||
* Implementing this will rely on the actual deployment configuration. For example, a CNPG based PostgreSQL cluster might require a longer gap (such as 30s) than other technologies.
|
||||
* Expires when `replica_heartbeat.metadata.timestamp - primary_heartbeat.metadata.timestamp > failover_timeout`
|
||||
|
||||
#### 2. The Primary (Self-Preservation)
|
||||
|
||||
The Primary is aggressive about killing itself.
|
||||
|
||||
* It attempts a heartbeat.
|
||||
* If the network latency > `heartbeat_timeout`, the attempt is **cancelled locally** because the heartbeat did not make it back in time.
|
||||
* This counts as a failure and increments the `consecutive_failures` counter.
|
||||
* If `consecutive_failures` hit the threshold, **FENCING occurs immediately**. The database is stopped.
|
||||
|
||||
This means that the Primary will fence itself after `heartbeat_timeout * failure_threshold`.
|
||||
|
||||
#### 3. The Replica (The Watchdog)
|
||||
|
||||
The Replica is patient.
|
||||
|
||||
* It watches the NATS stream to measure if `replica_heartbeat.metadata.timestamp - primary_heartbeat.metadata.timestamp > failover_timeout`
|
||||
* It only attempts promotion if the `failover_timeout` (5s) has passed.
|
||||
* **Crucial:** Careful configuration of the failover_timeout is required. This is the only way to avoid a split brain in case of a network partition where the Primary cannot write its heartbeats in time anymore.
|
||||
* In short, `failover_timeout` should be tuned to be `heartbeat_timeout * failure_threshold + safety_margin`. This `safety_margin` will vary by use case. For example, a CNPG cluster may need 30 seconds to demote a Primary to Replica when fencing is triggered, so `safety_margin` should be at least 30s in that setup.
|
||||
|
||||
Since we forcibly fail timeouts after `heartbeat_timeout`, we are guaranteed that the primary will have **started** the fencing process after `heartbeat_timeout * failure_threshold`.
|
||||
|
||||
But, in a network split scenario where the failed primary is still accessible by clients but cannot write its heartbeat successfully, there is no way to know if the demotion has actually **completed**.
|
||||
|
||||
For example, in a CNPG cluster, the failed Primary agent will attempt to change the CNPG cluster state to read-only. But if anything fails after that attempt (permission error, k8s api failure, CNPG bug, etc) it is possible that the PostgreSQL instance keeps accepting writes.
|
||||
|
||||
While this is not a theoretical failure of the agent's algorithm, this is a practical failure where data corruption occurs.
|
||||
|
||||
This can be fixed by detecting the demotion failure and escalating the fencing procedure aggressiveness. Harmony being an infrastructure orchestrator, it can easily exert radical measures if given the proper credentials, such as forcibly powering off a server, disconnecting its network in the switch configuration, forcibly kill a pod/container/process, etc.
|
||||
|
||||
However, these details are out of scope of this algorithm, as they simply fall under the "fencing procedure".
|
||||
|
||||
The implementation of the fencing procedure itself is not relevant. This algorithm's responsibility stops at calling the fencing procedure in the appropriate situation.
|
||||
|
||||
#### 4. The Demotion Handshake (Return to Normalcy)
|
||||
|
||||
When the original Primary recovers:
|
||||
|
||||
1. It becomes healthy locally but sees `current_primary = Replica`. It waits.
|
||||
2. The Replica (current leader) detects the Original Primary is back (via NATS heartbeats).
|
||||
3. Replica performs a **Clean Demotion**:
|
||||
* Stops DB.
|
||||
* Writes `current_primary = None` to NATS.
|
||||
4. Original Primary sees `current_primary = None` and can launch the promotion procedure.
|
||||
|
||||
Depending on the implementation, the promotion procedure may require a transition phase. Typically, for a PostgreSQL use case the promoting primary will make sure it has caught up on WAL replication before starting to accept writes.
|
||||
|
||||
---
|
||||
|
||||
### Failure Modes & Behavior Analysis
|
||||
|
||||
#### Case 1: Immediate Outage (Power Cut)
|
||||
|
||||
* **Primary:** Dies instantly. Fencing is implicit (machine is off).
|
||||
* **Replica:** Waits for `failover_timeout` (5s). Sees staleness. Promotes self.
|
||||
* **Outcome:** Clean failover after 5s.
|
||||
|
||||
// TODO detail what happens when the primary comes back up. We will likely have to tie PostgreSQL's lifecycle (liveness/readiness probes) with the agent to ensure it does not come back up as primary.
|
||||
|
||||
#### Case 2: High Network Latency on the Primary (The "Split Brain" Trap)
|
||||
|
||||
* **Scenario:** Network latency spikes to 5s on the Primary, still below `heartbeat_timeout` on the Replica.
|
||||
* **T=0 to T=2 (Primary):** Tries to write. Latency (5s) > Timeout (1s). Fails twice.
|
||||
* **T=2 (Primary):** `consecutive_failures` = 2. **Primary Fences Self.** (Service is DOWN).
|
||||
* **T=2 to T=5 (Cluster):** **Read-Only Phase.** No Primary exists.
|
||||
* **T=5 (Replica):** `failover_timeout` reached. Replica promotes self.
|
||||
* **Outcome:** Safe failover. The "Read-Only Gap" (T=2 to T=5) ensures no Split Brain occurred.
|
||||
|
||||
#### Case 3: Replica Network Lag (False Positive)
|
||||
|
||||
* **Scenario:** Replica has high latency, greater than `failover_timeout`; Primary is fine.
|
||||
* **Replica:** Thinks Primary is dead. Tries to promote by setting `cluster_state.current_primary = replica_id`.
|
||||
* **NATS:** Rejects the write because the Primary is still updating the sequence numbers successfully.
|
||||
* **Outcome:** Promotion denied. Primary stays leader.
|
||||
|
||||
#### Case 4: Network Instability (Flapping)
|
||||
|
||||
* **Scenario:** Intermittent packet loss.
|
||||
* **Primary:** Fails 1 heartbeat, succeeds the next. `consecutive_failures` resets.
|
||||
* **Replica:** Sees a slight delay in updates, but never reaches `failover_timeout`.
|
||||
* **Outcome:** No Fencing, No Promotion. System rides out the noise.
|
||||
|
||||
## Contextual notes
|
||||
|
||||
* Clock skew : Tokio relies on monotonic clocks. This means that `tokio::time::sleep(...)` will not be affected by system clock corrections (such as NTP). But monotonic clocks are known to jump forward in some cases such as VM live migrations. This could mean a false timeout of a single heartbeat. If `failure_threshold = 1`, this can mean a false negative on the nodes' health, and a potentially useless demotion.
|
||||
|
||||
@@ -1,107 +0,0 @@
|
||||
### Context & Use Case
|
||||
We are implementing a High Availability (HA) Failover Strategy for decentralized "Micro Data Centers." The core challenge is managing stateful workloads (PostgreSQL) over unreliable networks.
|
||||
|
||||
We solve this using a **Local Fencing First** approach, backed by **NATS JetStream Strict Ordering** for the final promotion authority.
|
||||
|
||||
In CAP theorem terms, we are developing a CP system, intentionally sacrificing availability. In practical terms, we expect an average of two primary outages per year, with a failover delay of around 2 minutes. This translates to an uptime of over five nines. To be precise, 2 outages * 2 minutes = 4 minutes per year = 99.99924% uptime.
|
||||
|
||||
### The Algorithm: Local Fencing & Remote Promotion
|
||||
|
||||
The safety (data consistency) of the system relies on the time gap between the **Primary giving up (Fencing)** and the **Replica taking over (Promotion)**.
|
||||
|
||||
To avoid clock skew issues between agents and datastore (nats), all timestamps comparisons will be done using jetstream metadata. I.E. a harmony agent will never use `Instant::now()` to get a timestamp, it will use `my_last_heartbeat.metadata.timestamp` (conceptually).
|
||||
|
||||
#### 1. Configuration
|
||||
* `heartbeat_timeout` (e.g., 1s): Max time allowed for a NATS write/DB check.
|
||||
* `failure_threshold` (e.g., 2): Consecutive failures before self-fencing.
|
||||
* `failover_timeout` (e.g., 5s): Time since last NATS update of Primary heartbeat before Replica promotes.
|
||||
* This timeout must be carefully configured to allow enough time for the primary to fence itself (after `heartbeat_timeout * failure_threshold`) BEFORE the replica gets promoted to avoid a split brain with two primaries.
|
||||
* Implementing this will rely on the actual deployment configuration. For example, a CNPG based PostgreSQL cluster might require a longer gap (such as 30s) than other technologies.
|
||||
* Expires when `replica_heartbeat.metadata.timestamp - primary_heartbeat.metadata.timestamp > failover_timeout`
|
||||
|
||||
#### 2. The Primary (Self-Preservation)
|
||||
|
||||
The Primary is aggressive about killing itself.
|
||||
|
||||
* It attempts a heartbeat.
|
||||
* If the network latency > `heartbeat_timeout`, the attempt is **cancelled locally** because the heartbeat did not make it back in time.
|
||||
* This counts as a failure and increments the `consecutive_failures` counter.
|
||||
* If `consecutive_failures` hit the threshold, **FENCING occurs immediately**. The database is stopped.
|
||||
|
||||
This means that the Primary will fence itself after `heartbeat_timeout * failure_threshold`.
|
||||
|
||||
#### 3. The Replica (The Watchdog)
|
||||
|
||||
The Replica is patient.
|
||||
|
||||
* It watches the NATS stream to measure if `replica_heartbeat.metadata.timestamp - primary_heartbeat.metadata.timestamp > failover_timeout`
|
||||
* It only attempts promotion if the `failover_timeout` (5s) has passed.
|
||||
* **Crucial:** Careful configuration of the failover_timeout is required. This is the only way to avoid a split brain in case of a network partition where the Primary cannot write its heartbeats in time anymore.
|
||||
* In short, `failover_timeout` should be tuned to be `heartbeat_timeout * failure_threshold + safety_margin`. This `safety_margin` will vary by use case. For example, a CNPG cluster may need 30 seconds to demote a Primary to Replica when fencing is triggered, so `safety_margin` should be at least 30s in that setup.
|
||||
|
||||
Since we forcibly fail timeouts after `heartbeat_timeout`, we are guaranteed that the primary will have **started** the fencing process after `heartbeat_timeout * failure_threshold`.
|
||||
|
||||
But, in a network split scenario where the failed primary is still accessible by clients but cannot write its heartbeat successfully, there is no way to know if the demotion has actually **completed**.
|
||||
|
||||
For example, in a CNPG cluster, the failed Primary agent will attempt to change the CNPG cluster state to read-only. But if anything fails after that attempt (permission error, k8s api failure, CNPG bug, etc) it is possible that the PostgreSQL instance keeps accepting writes.
|
||||
|
||||
While this is not a theoretical failure of the agent's algorithm, this is a practical failure where data corruption occurs.
|
||||
|
||||
This can be fixed by detecting the demotion failure and escalating the fencing procedure aggressiveness. Harmony being an infrastructure orchestrator, it can easily exert radical measures if given the proper credentials, such as forcibly powering off a server, disconnecting its network in the switch configuration, forcibly kill a pod/container/process, etc.
|
||||
|
||||
However, these details are out of scope of this algorithm, as they simply fall under the "fencing procedure".
|
||||
|
||||
The implementation of the fencing procedure itself is not relevant. This algorithm's responsibility stops at calling the fencing procedure in the appropriate situation.
|
||||
|
||||
#### 4. The Demotion Handshake (Return to Normalcy)
|
||||
|
||||
When the original Primary recovers:
|
||||
|
||||
1. It becomes healthy locally but sees `current_primary = Replica`. It waits.
|
||||
2. The Replica (current leader) detects the Original Primary is back (via NATS heartbeats).
|
||||
3. Replica performs a **Clean Demotion**:
|
||||
* Stops DB.
|
||||
* Writes `current_primary = None` to NATS.
|
||||
4. Original Primary sees `current_primary = None` and can launch the promotion procedure.
|
||||
|
||||
Depending on the implementation, the promotion procedure may require a transition phase. Typically, for a PostgreSQL use case the promoting primary will make sure it has caught up on WAL replication before starting to accept writes.
|
||||
|
||||
---
|
||||
|
||||
### Failure Modes & Behavior Analysis
|
||||
|
||||
#### Case 1: Immediate Outage (Power Cut)
|
||||
|
||||
* **Primary:** Dies instantly. Fencing is implicit (machine is off).
|
||||
* **Replica:** Waits for `failover_timeout` (5s). Sees staleness. Promotes self.
|
||||
* **Outcome:** Clean failover after 5s.
|
||||
|
||||
// TODO detail what happens when the primary comes back up. We will likely have to tie PostgreSQL's lifecycle (liveness/readiness probes) with the agent to ensure it does not come back up as primary.
|
||||
|
||||
#### Case 2: High Network Latency on the Primary (The "Split Brain" Trap)
|
||||
|
||||
* **Scenario:** Network latency spikes to 5s on the Primary, still below `heartbeat_timeout` on the Replica.
|
||||
* **T=0 to T=2 (Primary):** Tries to write. Latency (5s) > Timeout (1s). Fails twice.
|
||||
* **T=2 (Primary):** `consecutive_failures` = 2. **Primary Fences Self.** (Service is DOWN).
|
||||
* **T=2 to T=5 (Cluster):** **Read-Only Phase.** No Primary exists.
|
||||
* **T=5 (Replica):** `failover_timeout` reached. Replica promotes self.
|
||||
* **Outcome:** Safe failover. The "Read-Only Gap" (T=2 to T=5) ensures no Split Brain occurred.
|
||||
|
||||
#### Case 3: Replica Network Lag (False Positive)
|
||||
|
||||
* **Scenario:** Replica has high latency, greater than `failover_timeout`; Primary is fine.
|
||||
* **Replica:** Thinks Primary is dead. Tries to promote by setting `cluster_state.current_primary = replica_id`.
|
||||
* **NATS:** Rejects the write because the Primary is still updating the sequence numbers successfully.
|
||||
* **Outcome:** Promotion denied. Primary stays leader.
|
||||
|
||||
#### Case 4: Network Instability (Flapping)
|
||||
|
||||
* **Scenario:** Intermittent packet loss.
|
||||
* **Primary:** Fails 1 heartbeat, succeeds the next. `consecutive_failures` resets.
|
||||
* **Replica:** Sees a slight delay in updates, but never reaches `failover_timeout`.
|
||||
* **Outcome:** No Fencing, No Promotion. System rides out the noise.
|
||||
|
||||
## Contextual notes
|
||||
|
||||
* Clock skew : Tokio relies on monotonic clocks. This means that `tokio::time::sleep(...)` will not be affected by system clock corrections (such as NTP). But monotonic clocks are known to jump forward in some cases such as VM live migrations. This could mean a false timeout of a single heartbeat. If `failure_threshold = 1`, this can mean a false negative on the nodes' health, and a potentially useless demotion.
|
||||
|
||||
@@ -1,84 +0,0 @@
|
||||
# ADR 017: Multi-Cluster Federation and Granular Authorization
|
||||
|
||||
* **Status:** Accepted
|
||||
* **Date:** 2025-12-20
|
||||
* **Context:** ADR-016 Harmony Agent and Global Mesh For Decentralized Workload Management
|
||||
|
||||
## Context and Problem Statement
|
||||
|
||||
As Harmony expands to manage workloads across multiple independent clusters, we face two specific requirements regarding network topology and security:
|
||||
|
||||
1. **Granular Authorization:** We need the ability to authorize a specific node (Leaf Node) from the decentralized mesh to access only a specific subset of topics within a cluster. For example, a collaborator connecting to our Public Cloud should not have visibility into other collaborator's data or our internal system events. Same goes the other way, when we connect to a collaborator cluster, we must not gain access to their internal data.
|
||||
2. **Hybrid/Failover Topologies:** Users need to run their own private meshes (e.g., 5 interconnected office locations) with full internal trust, while simultaneously maintaining a connection to the Harmony Public Cloud for failover or bursting. In this scenario, the Public Cloud must not have access to the private mesh's internal data, and the private mesh should only access specific "public" topics on the cloud.
|
||||
|
||||
We need to decide how the Harmony Agent and the underlying NATS infrastructure will handle these "Split-Horizon" and "Zero-Trust" scenarios without complicating the application logic within the Agent itself.
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
* **Security:** Default-deny posture is required for cross-domain connections.
|
||||
* **Isolation:** Private on-premise clusters must remain private; data leakage to the public cloud is unacceptable.
|
||||
* **Simplicity:** The Harmony Agent code should not be burdened with managing multiple complex TCP connections or application-layer routing logic.
|
||||
* **Efficiency:** Failover traffic should only occur when necessary; network chatter (gossip) from the Public Cloud should not propagate to private nodes.
|
||||
|
||||
## Considered Options
|
||||
|
||||
1. **Application-Layer Multi-Homing:** The Harmony Agent maintains two distinct active connections (one to Local, one to Cloud) and handles routing logic in Go code.
|
||||
2. **VPN/VPC Peering:** Connect private networks to the public cloud at the network layer (IPSec/WireGuard).
|
||||
3. **NATS Account Federation & Leaf Nodes:** Utilize NATS native multi-tenancy (Accounts) and topology bridging (Leaf Nodes) to handle routing and permissions at the protocol layer.
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
We will adopt **Option 3: NATS Account Federation & Leaf Nodes**.
|
||||
|
||||
We will leverage NATS built-in "Accounts" to create logical isolation boundaries and "Service Imports/Exports" to bridge specific data streams between private and public clusters.
|
||||
|
||||
### 1. Authorization Strategy (Subject-Based Permissions)
|
||||
|
||||
We will not implement custom authorization logic inside the Harmony Agent for topic access. Instead, we will enforce **Subject-Based Permissions** at the NATS server level.
|
||||
|
||||
* **Mechanism:** When a Leaf Node (e.g., a Customer Site) connects to the Public Cloud, it must authenticate using a scoped credential (JWT/NKey).
|
||||
* **Policy:** This credential will carry a permission policy that strictly defines:
|
||||
* `PUB`: Allowed subjects (e.g., `harmony.public.requests`).
|
||||
* `SUB`: Allowed subjects (e.g., `harmony.public.responses`).
|
||||
* `DENY`: Implicitly everything else (including `_SYS.>`, other customer streams).
|
||||
* **Enforcement:** The NATS server rejects unauthorized subscriptions or publications at the protocol level before they reach the application layer.
|
||||
|
||||
### 2. Hybrid Topology Strategy (Federation)
|
||||
|
||||
We will use **NATS Account Federation** to solve the "Hybrid Cloud" requirement. We will treat the Private Mesh and Public Cloud as separate "Accounts."
|
||||
|
||||
* **The Private Mesh (Business Account):**
|
||||
* Consists of the customer's internal nodes (e.g., 5 locations).
|
||||
* Nodes share full trust and visibility of `local.*` subjects.
|
||||
* Configured as a **Leaf Node** connection to the Public Cloud.
|
||||
* **The Public Cloud (SaaS Account):**
|
||||
* **Exports** a specific stream/service (e.g., `public.compute.rentals`).
|
||||
* Does *not* join the customer's cluster mesh. It acts as a hub.
|
||||
* **The Bridge (Import):**
|
||||
* The Private Mesh **Imports** the public stream.
|
||||
* The Private Mesh maps this import to a local subject, e.g., `external.cloud`.
|
||||
|
||||
### 3. Failover Logic
|
||||
|
||||
The Harmony Agent will implement failover by selecting the target subject, relying on NATS to route the message physically.
|
||||
|
||||
1. **Primary Attempt:** Agent publishes to `local.compute`. NATS routes this only to the 5 internal sites.
|
||||
2. **Failover Condition:** If internal capacity is exhausted (determined by Agent heuristics), the Agent publishes to `external.cloud`.
|
||||
3. **Routing:** NATS transparently routes `external.cloud` messages over the Leaf Node connection to the Public Cloud.
|
||||
4. **Security:** Because the Public Cloud does not import `local.compute`, it never sees internal traffic.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
* **Zero-Trust by Design:** The Public Cloud cannot spy on the Private Mesh because no streams are exported from Private to Public.
|
||||
* **Reduced Complexity:** The Harmony Agent remains simple; it just publishes to different topics. It does not need to manage connection pools or complex authentication handshakes for multiple clouds.
|
||||
* **Bandwidth Efficiency:** Gossip protocol traffic (cluster topology updates) is contained within the Accounts. The Private Mesh does not receive heartbeat traffic from the massive Public Cloud.
|
||||
|
||||
### Negative
|
||||
* **Configuration Overhead:** Setting up NATS Accounts, Imports, and Exports requires more complex configuration files (server.conf) and understanding of JWT/NKeys compared to a flat mesh.
|
||||
* **Observability:** Tracing a message as it crosses Account boundaries (from Private `external.cloud` to Public `public.compute`) requires centralized monitoring that understands Federation.
|
||||
|
||||
## References
|
||||
* [NATS Documentation: Subject-Based Access Control](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro/subject_access)
|
||||
* [NATS Documentation: Leaf Nodes](https://docs.nats.io/running-a-nats-service/configuration/leafnodes)
|
||||
* [NATS Documentation: Services & Streams (Imports/Exports)](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/accounts)
|
||||
@@ -1,9 +0,0 @@
|
||||
[book]
|
||||
title = "Harmony"
|
||||
description = "Infrastructure orchestration that treats your platform like first-class code"
|
||||
src = "docs"
|
||||
build-dir = "book"
|
||||
authors = ["NationTech"]
|
||||
|
||||
[output.html]
|
||||
mathjax-support = false
|
||||
@@ -1,19 +0,0 @@
|
||||
[package]
|
||||
name = "brocade"
|
||||
edition = "2024"
|
||||
version.workspace = true
|
||||
readme.workspace = true
|
||||
license.workspace = true
|
||||
|
||||
[dependencies]
|
||||
async-trait.workspace = true
|
||||
harmony_types = { path = "../harmony_types" }
|
||||
russh.workspace = true
|
||||
russh-keys.workspace = true
|
||||
tokio.workspace = true
|
||||
log.workspace = true
|
||||
env_logger.workspace = true
|
||||
regex = "1.11.3"
|
||||
harmony_secret = { path = "../harmony_secret" }
|
||||
serde.workspace = true
|
||||
schemars = "0.8"
|
||||
@@ -1,75 +0,0 @@
|
||||
use std::net::{IpAddr, Ipv4Addr};
|
||||
|
||||
use brocade::{BrocadeOptions, ssh};
|
||||
use harmony_secret::{Secret, SecretManager};
|
||||
use harmony_types::switch::PortLocation;
|
||||
use schemars::JsonSchema;
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
#[derive(Secret, Clone, Debug, JsonSchema, Serialize, Deserialize)]
|
||||
struct BrocadeSwitchAuth {
|
||||
username: String,
|
||||
password: String,
|
||||
}
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
|
||||
|
||||
// let ip = IpAddr::V4(Ipv4Addr::new(10, 0, 0, 250)); // old brocade @ ianlet
|
||||
let ip = IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)); // brocade @ sto1
|
||||
// let ip = IpAddr::V4(Ipv4Addr::new(192, 168, 4, 11)); // brocade @ st
|
||||
let switch_addresses = vec![ip];
|
||||
|
||||
let config = SecretManager::get_or_prompt::<BrocadeSwitchAuth>()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let brocade = brocade::init(
|
||||
&switch_addresses,
|
||||
&config.username,
|
||||
&config.password,
|
||||
&BrocadeOptions {
|
||||
dry_run: true,
|
||||
ssh: ssh::SshOptions {
|
||||
port: 2222,
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
},
|
||||
)
|
||||
.await
|
||||
.expect("Brocade client failed to connect");
|
||||
|
||||
let entries = brocade.get_stack_topology().await.unwrap();
|
||||
println!("Stack topology: {entries:#?}");
|
||||
|
||||
let entries = brocade.get_interfaces().await.unwrap();
|
||||
println!("Interfaces: {entries:#?}");
|
||||
|
||||
let version = brocade.version().await.unwrap();
|
||||
println!("Version: {version:?}");
|
||||
|
||||
println!("--------------");
|
||||
let mac_adddresses = brocade.get_mac_address_table().await.unwrap();
|
||||
println!("VLAN\tMAC\t\t\tPORT");
|
||||
for mac in mac_adddresses {
|
||||
println!("{}\t{}\t{}", mac.vlan, mac.mac_address, mac.port);
|
||||
}
|
||||
|
||||
println!("--------------");
|
||||
todo!();
|
||||
let channel_name = "1";
|
||||
brocade.clear_port_channel(channel_name).await.unwrap();
|
||||
|
||||
println!("--------------");
|
||||
let channel_id = brocade.find_available_channel_id().await.unwrap();
|
||||
|
||||
println!("--------------");
|
||||
let channel_name = "HARMONY_LAG";
|
||||
let ports = [PortLocation(2, 0, 35)];
|
||||
brocade
|
||||
.create_port_channel(channel_id, channel_name, &ports)
|
||||
.await
|
||||
.unwrap();
|
||||
}
|
||||
@@ -1,228 +0,0 @@
|
||||
use super::BrocadeClient;
|
||||
use crate::{
|
||||
BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceInfo, MacAddressEntry,
|
||||
PortChannelId, PortOperatingMode, parse_brocade_mac_address, shell::BrocadeShell,
|
||||
};
|
||||
|
||||
use async_trait::async_trait;
|
||||
use harmony_types::switch::{PortDeclaration, PortLocation};
|
||||
use log::{debug, info};
|
||||
use regex::Regex;
|
||||
use std::{collections::HashSet, str::FromStr};
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct FastIronClient {
|
||||
shell: BrocadeShell,
|
||||
version: BrocadeInfo,
|
||||
}
|
||||
|
||||
impl FastIronClient {
|
||||
pub fn init(mut shell: BrocadeShell, version_info: BrocadeInfo) -> Self {
|
||||
shell.before_all(vec!["skip-page-display".into()]);
|
||||
shell.after_all(vec!["page".into()]);
|
||||
|
||||
Self {
|
||||
shell,
|
||||
version: version_info,
|
||||
}
|
||||
}
|
||||
|
||||
fn parse_mac_entry(&self, line: &str) -> Option<Result<MacAddressEntry, Error>> {
|
||||
debug!("[Brocade] Parsing mac address entry: {line}");
|
||||
let parts: Vec<&str> = line.split_whitespace().collect();
|
||||
if parts.len() < 3 {
|
||||
return None;
|
||||
}
|
||||
|
||||
let (vlan, mac_address, port) = match parts.len() {
|
||||
3 => (
|
||||
u16::from_str(parts[0]).ok()?,
|
||||
parse_brocade_mac_address(parts[1]).ok()?,
|
||||
parts[2].to_string(),
|
||||
),
|
||||
_ => (
|
||||
1,
|
||||
parse_brocade_mac_address(parts[0]).ok()?,
|
||||
parts[1].to_string(),
|
||||
),
|
||||
};
|
||||
|
||||
let port =
|
||||
PortDeclaration::parse(&port).map_err(|e| Error::UnexpectedError(format!("{e}")));
|
||||
|
||||
match port {
|
||||
Ok(p) => Some(Ok(MacAddressEntry {
|
||||
vlan,
|
||||
mac_address,
|
||||
port: p,
|
||||
})),
|
||||
Err(e) => Some(Err(e)),
|
||||
}
|
||||
}
|
||||
|
||||
fn parse_stack_port_entry(&self, line: &str) -> Option<Result<InterSwitchLink, Error>> {
|
||||
debug!("[Brocade] Parsing stack port entry: {line}");
|
||||
let parts: Vec<&str> = line.split_whitespace().collect();
|
||||
if parts.len() < 10 {
|
||||
return None;
|
||||
}
|
||||
|
||||
let local_port = PortLocation::from_str(parts[0]).ok()?;
|
||||
|
||||
Some(Ok(InterSwitchLink {
|
||||
local_port,
|
||||
remote_port: None,
|
||||
}))
|
||||
}
|
||||
|
||||
fn build_port_channel_commands(
|
||||
&self,
|
||||
channel_id: PortChannelId,
|
||||
channel_name: &str,
|
||||
ports: &[PortLocation],
|
||||
) -> Vec<String> {
|
||||
let mut commands = vec![
|
||||
"configure terminal".to_string(),
|
||||
format!("lag {channel_name} static id {channel_id}"),
|
||||
];
|
||||
|
||||
for port in ports {
|
||||
commands.push(format!("ports ethernet {port}"));
|
||||
}
|
||||
|
||||
commands.push(format!("primary-port {}", ports[0]));
|
||||
commands.push("deploy".into());
|
||||
commands.push("exit".into());
|
||||
commands.push("write memory".into());
|
||||
commands.push("exit".into());
|
||||
|
||||
commands
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl BrocadeClient for FastIronClient {
|
||||
async fn version(&self) -> Result<BrocadeInfo, Error> {
|
||||
Ok(self.version.clone())
|
||||
}
|
||||
|
||||
async fn get_mac_address_table(&self) -> Result<Vec<MacAddressEntry>, Error> {
|
||||
info!("[Brocade] Showing MAC address table...");
|
||||
|
||||
let output = self
|
||||
.shell
|
||||
.run_command("show mac-address", ExecutionMode::Regular)
|
||||
.await?;
|
||||
|
||||
output
|
||||
.lines()
|
||||
.skip(2)
|
||||
.filter_map(|line| self.parse_mac_entry(line))
|
||||
.collect()
|
||||
}
|
||||
|
||||
async fn get_stack_topology(&self) -> Result<Vec<InterSwitchLink>, Error> {
|
||||
let output = self
|
||||
.shell
|
||||
.run_command("show interface stack-ports", crate::ExecutionMode::Regular)
|
||||
.await?;
|
||||
|
||||
output
|
||||
.lines()
|
||||
.skip(1)
|
||||
.filter_map(|line| self.parse_stack_port_entry(line))
|
||||
.collect()
|
||||
}
|
||||
|
||||
async fn get_interfaces(&self) -> Result<Vec<InterfaceInfo>, Error> {
|
||||
todo!()
|
||||
}
|
||||
|
||||
async fn configure_interfaces(
|
||||
&self,
|
||||
_interfaces: &Vec<(String, PortOperatingMode)>,
|
||||
) -> Result<(), Error> {
|
||||
todo!()
|
||||
}
|
||||
|
||||
async fn find_available_channel_id(&self) -> Result<PortChannelId, Error> {
|
||||
info!("[Brocade] Finding next available channel id...");
|
||||
|
||||
let output = self
|
||||
.shell
|
||||
.run_command("show lag", ExecutionMode::Regular)
|
||||
.await?;
|
||||
let re = Regex::new(r"=== LAG .* ID\s+(\d+)").expect("Invalid regex");
|
||||
|
||||
let used_ids: HashSet<u8> = output
|
||||
.lines()
|
||||
.filter_map(|line| {
|
||||
re.captures(line)
|
||||
.and_then(|c| c.get(1))
|
||||
.and_then(|id_match| id_match.as_str().parse().ok())
|
||||
})
|
||||
.collect();
|
||||
|
||||
let mut next_id: u8 = 1;
|
||||
loop {
|
||||
if !used_ids.contains(&next_id) {
|
||||
break;
|
||||
}
|
||||
next_id += 1;
|
||||
}
|
||||
|
||||
info!("[Brocade] Found channel id: {next_id}");
|
||||
Ok(next_id)
|
||||
}
|
||||
|
||||
async fn create_port_channel(
|
||||
&self,
|
||||
channel_id: PortChannelId,
|
||||
channel_name: &str,
|
||||
ports: &[PortLocation],
|
||||
) -> Result<(), Error> {
|
||||
info!(
|
||||
"[Brocade] Configuring port-channel '{channel_name} {channel_id}' with ports: {ports:?}"
|
||||
);
|
||||
|
||||
let commands = self.build_port_channel_commands(channel_id, channel_name, ports);
|
||||
self.shell
|
||||
.run_commands(commands, ExecutionMode::Privileged)
|
||||
.await?;
|
||||
|
||||
info!("[Brocade] Port-channel '{channel_name}' configured.");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn clear_port_channel(&self, channel_name: &str) -> Result<(), Error> {
|
||||
info!("[Brocade] Clearing port-channel: {channel_name}");
|
||||
|
||||
let commands = vec![
|
||||
"configure terminal".to_string(),
|
||||
format!("no lag {channel_name}"),
|
||||
"write memory".to_string(),
|
||||
];
|
||||
self.shell
|
||||
.run_commands(commands, ExecutionMode::Privileged)
|
||||
.await?;
|
||||
|
||||
info!("[Brocade] Port-channel '{channel_name}' cleared.");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn enable_snmp(&self, user_name: &str, auth: &str, des: &str) -> Result<(), Error> {
|
||||
let commands = vec![
|
||||
"configure terminal".into(),
|
||||
"snmp-server view ALL 1 included".into(),
|
||||
"snmp-server group public v3 priv read ALL".into(),
|
||||
format!(
|
||||
"snmp-server user {user_name} groupname public auth md5 auth-password {auth} priv des priv-password {des}"
|
||||
),
|
||||
"exit".into(),
|
||||
];
|
||||
self.shell
|
||||
.run_commands(commands, ExecutionMode::Regular)
|
||||
.await?;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
@@ -1,352 +0,0 @@
|
||||
use std::net::IpAddr;
|
||||
use std::{
|
||||
fmt::{self, Display},
|
||||
time::Duration,
|
||||
};
|
||||
|
||||
use crate::network_operating_system::NetworkOperatingSystemClient;
|
||||
use crate::{
|
||||
fast_iron::FastIronClient,
|
||||
shell::{BrocadeSession, BrocadeShell},
|
||||
};
|
||||
|
||||
use async_trait::async_trait;
|
||||
use harmony_types::net::MacAddress;
|
||||
use harmony_types::switch::{PortDeclaration, PortLocation};
|
||||
use regex::Regex;
|
||||
use serde::Serialize;
|
||||
|
||||
mod fast_iron;
|
||||
mod network_operating_system;
|
||||
mod shell;
|
||||
pub mod ssh;
|
||||
|
||||
#[derive(Default, Clone, Debug)]
|
||||
pub struct BrocadeOptions {
|
||||
pub dry_run: bool,
|
||||
pub ssh: ssh::SshOptions,
|
||||
pub timeouts: TimeoutConfig,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct TimeoutConfig {
|
||||
pub shell_ready: Duration,
|
||||
pub command_execution: Duration,
|
||||
pub command_output: Duration,
|
||||
pub cleanup: Duration,
|
||||
pub message_wait: Duration,
|
||||
}
|
||||
|
||||
impl Default for TimeoutConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
shell_ready: Duration::from_secs(10),
|
||||
command_execution: Duration::from_secs(60), // Commands like `deploy` (for a LAG) can take a while
|
||||
command_output: Duration::from_secs(5), // Delay to start logging "waiting for command output"
|
||||
cleanup: Duration::from_secs(10),
|
||||
message_wait: Duration::from_millis(500),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
enum ExecutionMode {
|
||||
Regular,
|
||||
Privileged,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct BrocadeInfo {
|
||||
os: BrocadeOs,
|
||||
_version: String,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub enum BrocadeOs {
|
||||
NetworkOperatingSystem,
|
||||
FastIron,
|
||||
Unknown,
|
||||
}
|
||||
|
||||
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone)]
|
||||
pub struct MacAddressEntry {
|
||||
pub vlan: u16,
|
||||
pub mac_address: MacAddress,
|
||||
pub port: PortDeclaration,
|
||||
}
|
||||
|
||||
pub type PortChannelId = u8;
|
||||
|
||||
/// Represents a single physical or logical link connecting two switches within a stack or fabric.
|
||||
///
|
||||
/// This structure provides a standardized view of the topology regardless of the
|
||||
/// underlying Brocade OS configuration (stacking vs. fabric).
|
||||
#[derive(Debug, PartialEq, Eq, Clone)]
|
||||
pub struct InterSwitchLink {
|
||||
/// The local port on the switch where the topology command was run.
|
||||
pub local_port: PortLocation,
|
||||
/// The port on the directly connected neighboring switch.
|
||||
pub remote_port: Option<PortLocation>,
|
||||
}
|
||||
|
||||
/// Represents the key running configuration status of a single switch interface.
|
||||
#[derive(Debug, PartialEq, Eq, Clone)]
|
||||
pub struct InterfaceInfo {
|
||||
/// The full configuration name (e.g., "TenGigabitEthernet 1/0/1", "FortyGigabitEthernet 2/0/2").
|
||||
pub name: String,
|
||||
/// The physical location of the interface.
|
||||
pub port_location: PortLocation,
|
||||
/// The parsed type and name prefix of the interface.
|
||||
pub interface_type: InterfaceType,
|
||||
/// The primary configuration mode defining the interface's behavior (L2, L3, Fabric).
|
||||
pub operating_mode: Option<PortOperatingMode>,
|
||||
/// Indicates the current state of the interface.
|
||||
pub status: InterfaceStatus,
|
||||
}
|
||||
|
||||
/// Categorizes the functional type of a switch interface.
|
||||
#[derive(Debug, PartialEq, Eq, Clone)]
|
||||
pub enum InterfaceType {
|
||||
/// Physical or virtual Ethernet interface (e.g., TenGigabitEthernet, FortyGigabitEthernet).
|
||||
Ethernet(String),
|
||||
}
|
||||
|
||||
impl fmt::Display for InterfaceType {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
match self {
|
||||
InterfaceType::Ethernet(name) => write!(f, "{name}"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Defines the primary configuration mode of a switch interface, representing mutually exclusive roles.
|
||||
#[derive(Debug, PartialEq, Eq, Clone, Serialize)]
|
||||
pub enum PortOperatingMode {
|
||||
/// The interface is explicitly configured for Brocade fabric roles (ISL or Trunk enabled).
|
||||
Fabric,
|
||||
/// The interface is configured for standard Layer 2 switching as Trunk port (`switchport mode trunk`).
|
||||
Trunk,
|
||||
/// The interface is configured for standard Layer 2 switching as Access port (`switchport` without trunk mode).
|
||||
Access,
|
||||
}
|
||||
|
||||
/// Defines the possible status of an interface.
|
||||
#[derive(Debug, PartialEq, Eq, Clone)]
|
||||
pub enum InterfaceStatus {
|
||||
/// The interface is connected.
|
||||
Connected,
|
||||
/// The interface is not connected and is not expected to be.
|
||||
NotConnected,
|
||||
/// The interface is not connected but is expected to be (configured with `no shutdown`).
|
||||
SfpAbsent,
|
||||
}
|
||||
|
||||
pub async fn init(
|
||||
ip_addresses: &[IpAddr],
|
||||
username: &str,
|
||||
password: &str,
|
||||
options: &BrocadeOptions,
|
||||
) -> Result<Box<dyn BrocadeClient + Send + Sync>, Error> {
|
||||
let shell = BrocadeShell::init(ip_addresses, username, password, options).await?;
|
||||
|
||||
let version_info = shell
|
||||
.with_session(ExecutionMode::Regular, |session| {
|
||||
Box::pin(get_brocade_info(session))
|
||||
})
|
||||
.await?;
|
||||
|
||||
Ok(match version_info.os {
|
||||
BrocadeOs::FastIron => Box::new(FastIronClient::init(shell, version_info)),
|
||||
BrocadeOs::NetworkOperatingSystem => {
|
||||
Box::new(NetworkOperatingSystemClient::init(shell, version_info))
|
||||
}
|
||||
BrocadeOs::Unknown => todo!(),
|
||||
})
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait BrocadeClient: std::fmt::Debug {
|
||||
/// Retrieves the operating system and version details from the connected Brocade switch.
|
||||
///
|
||||
/// This is typically the first call made after establishing a connection to determine
|
||||
/// the switch OS family (e.g., FastIron, NOS) for feature compatibility.
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// A `BrocadeInfo` structure containing parsed OS type and version string.
|
||||
async fn version(&self) -> Result<BrocadeInfo, Error>;
|
||||
|
||||
/// Retrieves the dynamically learned MAC address table from the switch.
|
||||
///
|
||||
/// This is crucial for discovering where specific network endpoints (MAC addresses)
|
||||
/// are currently located on the physical ports.
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// A vector of `MacAddressEntry`, where each entry typically contains VLAN, MAC address,
|
||||
/// and the associated port name/index.
|
||||
async fn get_mac_address_table(&self) -> Result<Vec<MacAddressEntry>, Error>;
|
||||
|
||||
/// Derives the physical connections used to link multiple switches together
|
||||
/// to form a single logical entity (stack, fabric, etc.).
|
||||
///
|
||||
/// This abstracts the underlying configuration (e.g., stack ports, fabric ports)
|
||||
/// to return a standardized view of the topology.
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// A vector of `InterSwitchLink` structs detailing which ports are used for stacking/fabric.
|
||||
/// If the switch is not stacked, returns an empty vector.
|
||||
async fn get_stack_topology(&self) -> Result<Vec<InterSwitchLink>, Error>;
|
||||
|
||||
/// Retrieves the status for all interfaces
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// A vector of `InterfaceInfo` structures.
|
||||
async fn get_interfaces(&self) -> Result<Vec<InterfaceInfo>, Error>;
|
||||
|
||||
/// Configures a set of interfaces to be operated with a specified mode (access ports, ISL, etc.).
|
||||
async fn configure_interfaces(
|
||||
&self,
|
||||
interfaces: &Vec<(String, PortOperatingMode)>,
|
||||
) -> Result<(), Error>;
|
||||
|
||||
/// Scans the existing configuration to find the next available (unused)
|
||||
/// Port-Channel ID (`lag` or `trunk`) for assignment.
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// The smallest, unassigned `PortChannelId` within the supported range.
|
||||
async fn find_available_channel_id(&self) -> Result<PortChannelId, Error>;
|
||||
|
||||
/// Creates and configures a new Port-Channel (Link Aggregation Group or LAG)
|
||||
/// using the specified channel ID and ports.
|
||||
///
|
||||
/// The resulting configuration must be persistent (saved to startup-config).
|
||||
/// Assumes a static LAG configuration mode unless specified otherwise by the implementation.
|
||||
///
|
||||
/// # Parameters
|
||||
///
|
||||
/// * `channel_id`: The ID (e.g., 1-128) for the logical port channel.
|
||||
/// * `channel_name`: A descriptive name for the LAG (used in configuration context).
|
||||
/// * `ports`: A slice of `PortLocation` structs defining the physical member ports.
|
||||
async fn create_port_channel(
|
||||
&self,
|
||||
channel_id: PortChannelId,
|
||||
channel_name: &str,
|
||||
ports: &[PortLocation],
|
||||
) -> Result<(), Error>;
|
||||
|
||||
/// Enables Simple Network Management Protocol (SNMP) server for switch
|
||||
///
|
||||
/// # Parameters
|
||||
///
|
||||
/// * `user_name`: The user name for the snmp server
|
||||
/// * `auth`: The password for authentication process for verifying the identity of a device
|
||||
/// * `des`: The Data Encryption Standard algorithm key
|
||||
async fn enable_snmp(&self, user_name: &str, auth: &str, des: &str) -> Result<(), Error>;
|
||||
|
||||
/// Removes all configuration associated with the specified Port-Channel name.
|
||||
///
|
||||
/// This operation should be idempotent; attempting to clear a non-existent
|
||||
/// channel should succeed (or return a benign error).
|
||||
///
|
||||
/// # Parameters
|
||||
///
|
||||
/// * `channel_name`: The name of the Port-Channel (LAG) to delete.
|
||||
///
|
||||
async fn clear_port_channel(&self, channel_name: &str) -> Result<(), Error>;
|
||||
}
|
||||
|
||||
async fn get_brocade_info(session: &mut BrocadeSession) -> Result<BrocadeInfo, Error> {
|
||||
let output = session.run_command("show version").await?;
|
||||
|
||||
if output.contains("Network Operating System") {
|
||||
let re = Regex::new(r"Network Operating System Version:\s*(?P<version>[a-zA-Z0-9.\-]+)")
|
||||
.expect("Invalid regex");
|
||||
let version = re
|
||||
.captures(&output)
|
||||
.and_then(|cap| cap.name("version"))
|
||||
.map(|m| m.as_str().to_string())
|
||||
.unwrap_or_default();
|
||||
|
||||
return Ok(BrocadeInfo {
|
||||
os: BrocadeOs::NetworkOperatingSystem,
|
||||
_version: version,
|
||||
});
|
||||
} else if output.contains("ICX") {
|
||||
let re = Regex::new(r"(?m)^\s*SW: Version\s*(?P<version>[a-zA-Z0-9.\-]+)")
|
||||
.expect("Invalid regex");
|
||||
let version = re
|
||||
.captures(&output)
|
||||
.and_then(|cap| cap.name("version"))
|
||||
.map(|m| m.as_str().to_string())
|
||||
.unwrap_or_default();
|
||||
|
||||
return Ok(BrocadeInfo {
|
||||
os: BrocadeOs::FastIron,
|
||||
_version: version,
|
||||
});
|
||||
}
|
||||
|
||||
Err(Error::UnexpectedError("Unknown Brocade OS version".into()))
|
||||
}
|
||||
|
||||
fn parse_brocade_mac_address(value: &str) -> Result<MacAddress, String> {
|
||||
let cleaned_mac = value.replace('.', "");
|
||||
|
||||
if cleaned_mac.len() != 12 {
|
||||
return Err(format!("Invalid MAC address: {value}"));
|
||||
}
|
||||
|
||||
let mut bytes = [0u8; 6];
|
||||
for (i, pair) in cleaned_mac.as_bytes().chunks(2).enumerate() {
|
||||
let byte_str = std::str::from_utf8(pair).map_err(|_| "Invalid UTF-8")?;
|
||||
bytes[i] =
|
||||
u8::from_str_radix(byte_str, 16).map_err(|_| format!("Invalid hex in MAC: {value}"))?;
|
||||
}
|
||||
|
||||
Ok(MacAddress(bytes))
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
pub enum SecurityLevel {
|
||||
AuthPriv(String),
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
pub enum Error {
|
||||
NetworkError(String),
|
||||
AuthenticationError(String),
|
||||
ConfigurationError(String),
|
||||
TimeoutError(String),
|
||||
UnexpectedError(String),
|
||||
CommandError(String),
|
||||
}
|
||||
|
||||
impl Display for Error {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
match self {
|
||||
Error::NetworkError(msg) => write!(f, "Network error: {msg}"),
|
||||
Error::AuthenticationError(msg) => write!(f, "Authentication error: {msg}"),
|
||||
Error::ConfigurationError(msg) => write!(f, "Configuration error: {msg}"),
|
||||
Error::TimeoutError(msg) => write!(f, "Timeout error: {msg}"),
|
||||
Error::UnexpectedError(msg) => write!(f, "Unexpected error: {msg}"),
|
||||
Error::CommandError(msg) => write!(f, "{msg}"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<Error> for String {
|
||||
fn from(val: Error) -> Self {
|
||||
format!("{val}")
|
||||
}
|
||||
}
|
||||
|
||||
impl std::error::Error for Error {}
|
||||
|
||||
impl From<russh::Error> for Error {
|
||||
fn from(value: russh::Error) -> Self {
|
||||
Error::NetworkError(format!("Russh client error: {value}"))
|
||||
}
|
||||
}
|
||||
@@ -1,352 +0,0 @@
|
||||
use std::str::FromStr;
|
||||
|
||||
use async_trait::async_trait;
|
||||
use harmony_types::switch::{PortDeclaration, PortLocation};
|
||||
use log::{debug, info};
|
||||
use regex::Regex;
|
||||
|
||||
use crate::{
|
||||
BrocadeClient, BrocadeInfo, Error, ExecutionMode, InterSwitchLink, InterfaceInfo,
|
||||
InterfaceStatus, InterfaceType, MacAddressEntry, PortChannelId, PortOperatingMode,
|
||||
parse_brocade_mac_address, shell::BrocadeShell,
|
||||
};
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct NetworkOperatingSystemClient {
|
||||
shell: BrocadeShell,
|
||||
version: BrocadeInfo,
|
||||
}
|
||||
|
||||
impl NetworkOperatingSystemClient {
|
||||
pub fn init(mut shell: BrocadeShell, version_info: BrocadeInfo) -> Self {
|
||||
shell.before_all(vec!["terminal length 0".into()]);
|
||||
|
||||
Self {
|
||||
shell,
|
||||
version: version_info,
|
||||
}
|
||||
}
|
||||
|
||||
fn parse_mac_entry(&self, line: &str) -> Option<Result<MacAddressEntry, Error>> {
|
||||
debug!("[Brocade] Parsing mac address entry: {line}");
|
||||
let parts: Vec<&str> = line.split_whitespace().collect();
|
||||
if parts.len() < 5 {
|
||||
return None;
|
||||
}
|
||||
|
||||
let (vlan, mac_address, port) = match parts.len() {
|
||||
5 => (
|
||||
u16::from_str(parts[0]).ok()?,
|
||||
parse_brocade_mac_address(parts[1]).ok()?,
|
||||
parts[4].to_string(),
|
||||
),
|
||||
_ => (
|
||||
u16::from_str(parts[0]).ok()?,
|
||||
parse_brocade_mac_address(parts[1]).ok()?,
|
||||
parts[5].to_string(),
|
||||
),
|
||||
};
|
||||
|
||||
let port =
|
||||
PortDeclaration::parse(&port).map_err(|e| Error::UnexpectedError(format!("{e}")));
|
||||
|
||||
match port {
|
||||
Ok(p) => Some(Ok(MacAddressEntry {
|
||||
vlan,
|
||||
mac_address,
|
||||
port: p,
|
||||
})),
|
||||
Err(e) => Some(Err(e)),
|
||||
}
|
||||
}
|
||||
|
||||
fn parse_inter_switch_link_entry(&self, line: &str) -> Option<Result<InterSwitchLink, Error>> {
|
||||
debug!("[Brocade] Parsing inter switch link entry: {line}");
|
||||
let parts: Vec<&str> = line.split_whitespace().collect();
|
||||
if parts.len() < 10 {
|
||||
return None;
|
||||
}
|
||||
|
||||
let local_port = PortLocation::from_str(parts[2]).ok()?;
|
||||
let remote_port = PortLocation::from_str(parts[5]).ok()?;
|
||||
|
||||
Some(Ok(InterSwitchLink {
|
||||
local_port,
|
||||
remote_port: Some(remote_port),
|
||||
}))
|
||||
}
|
||||
|
||||
fn parse_interface_status_entry(&self, line: &str) -> Option<Result<InterfaceInfo, Error>> {
|
||||
debug!("[Brocade] Parsing interface status entry: {line}");
|
||||
let parts: Vec<&str> = line.split_whitespace().collect();
|
||||
if parts.len() < 6 {
|
||||
return None;
|
||||
}
|
||||
|
||||
let interface_type = match parts[0] {
|
||||
"Fo" => InterfaceType::Ethernet("FortyGigabitEthernet".to_string()),
|
||||
"Te" => InterfaceType::Ethernet("TenGigabitEthernet".to_string()),
|
||||
_ => return None,
|
||||
};
|
||||
let port_location = PortLocation::from_str(parts[1]).ok()?;
|
||||
let status = match parts[2] {
|
||||
"connected" => InterfaceStatus::Connected,
|
||||
"notconnected" => InterfaceStatus::NotConnected,
|
||||
"sfpAbsent" => InterfaceStatus::SfpAbsent,
|
||||
_ => return None,
|
||||
};
|
||||
let operating_mode = match parts[3] {
|
||||
"ISL" => Some(PortOperatingMode::Fabric),
|
||||
"Trunk" => Some(PortOperatingMode::Trunk),
|
||||
"Access" => Some(PortOperatingMode::Access),
|
||||
"--" => None,
|
||||
_ => return None,
|
||||
};
|
||||
|
||||
Some(Ok(InterfaceInfo {
|
||||
name: format!("{interface_type} {port_location}"),
|
||||
port_location,
|
||||
interface_type,
|
||||
operating_mode,
|
||||
status,
|
||||
}))
|
||||
}
|
||||
|
||||
fn map_configure_interfaces_error(&self, err: Error) -> Error {
|
||||
debug!("[Brocade] {err}");
|
||||
|
||||
if let Error::CommandError(message) = &err {
|
||||
if message.contains("switchport")
|
||||
&& message.contains("Cannot configure aggregator member")
|
||||
{
|
||||
let re = Regex::new(r"\(conf-if-([a-zA-Z]+)-([\d/]+)\)#").unwrap();
|
||||
|
||||
if let Some(caps) = re.captures(message) {
|
||||
let interface_type = &caps[1];
|
||||
let port_location = &caps[2];
|
||||
let interface = format!("{interface_type} {port_location}");
|
||||
|
||||
return Error::CommandError(format!(
|
||||
"Cannot configure interface '{interface}', it is a member of a port-channel (LAG)"
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
err
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl BrocadeClient for NetworkOperatingSystemClient {
|
||||
async fn version(&self) -> Result<BrocadeInfo, Error> {
|
||||
Ok(self.version.clone())
|
||||
}
|
||||
|
||||
async fn get_mac_address_table(&self) -> Result<Vec<MacAddressEntry>, Error> {
|
||||
let output = self
|
||||
.shell
|
||||
.run_command("show mac-address-table", ExecutionMode::Regular)
|
||||
.await?;
|
||||
|
||||
output
|
||||
.lines()
|
||||
.skip(1)
|
||||
.filter_map(|line| self.parse_mac_entry(line))
|
||||
.collect()
|
||||
}
|
||||
|
||||
async fn get_stack_topology(&self) -> Result<Vec<InterSwitchLink>, Error> {
|
||||
let output = self
|
||||
.shell
|
||||
.run_command("show fabric isl", ExecutionMode::Regular)
|
||||
.await?;
|
||||
|
||||
output
|
||||
.lines()
|
||||
.skip(6)
|
||||
.filter_map(|line| self.parse_inter_switch_link_entry(line))
|
||||
.collect()
|
||||
}
|
||||
|
||||
async fn get_interfaces(&self) -> Result<Vec<InterfaceInfo>, Error> {
|
||||
let output = self
|
||||
.shell
|
||||
.run_command(
|
||||
"show interface status rbridge-id all",
|
||||
ExecutionMode::Regular,
|
||||
)
|
||||
.await?;
|
||||
|
||||
output
|
||||
.lines()
|
||||
.skip(2)
|
||||
.filter_map(|line| self.parse_interface_status_entry(line))
|
||||
.collect()
|
||||
}
|
||||
|
||||
async fn configure_interfaces(
|
||||
&self,
|
||||
interfaces: &Vec<(String, PortOperatingMode)>,
|
||||
) -> Result<(), Error> {
|
||||
info!("[Brocade] Configuring {} interface(s)...", interfaces.len());
|
||||
|
||||
let mut commands = vec!["configure terminal".to_string()];
|
||||
|
||||
for interface in interfaces {
|
||||
commands.push(format!("interface {}", interface.0));
|
||||
|
||||
match interface.1 {
|
||||
PortOperatingMode::Fabric => {
|
||||
commands.push("fabric isl enable".into());
|
||||
commands.push("fabric trunk enable".into());
|
||||
}
|
||||
PortOperatingMode::Trunk => {
|
||||
commands.push("switchport".into());
|
||||
commands.push("switchport mode trunk".into());
|
||||
commands.push("switchport trunk allowed vlan all".into());
|
||||
commands.push("no switchport trunk tag native-vlan".into());
|
||||
commands.push("spanning-tree shutdown".into());
|
||||
commands.push("no fabric isl enable".into());
|
||||
commands.push("no fabric trunk enable".into());
|
||||
commands.push("no shutdown".into());
|
||||
}
|
||||
PortOperatingMode::Access => {
|
||||
commands.push("switchport".into());
|
||||
commands.push("switchport mode access".into());
|
||||
commands.push("switchport access vlan 1".into());
|
||||
commands.push("no spanning-tree shutdown".into());
|
||||
commands.push("no fabric isl enable".into());
|
||||
commands.push("no fabric trunk enable".into());
|
||||
}
|
||||
}
|
||||
|
||||
commands.push("no shutdown".into());
|
||||
commands.push("exit".into());
|
||||
}
|
||||
|
||||
self.shell
|
||||
.run_commands(commands, ExecutionMode::Regular)
|
||||
.await
|
||||
.map_err(|err| self.map_configure_interfaces_error(err))?;
|
||||
|
||||
info!("[Brocade] Interfaces configured.");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn find_available_channel_id(&self) -> Result<PortChannelId, Error> {
|
||||
info!("[Brocade] Finding next available channel id...");
|
||||
|
||||
let output = self
|
||||
.shell
|
||||
.run_command("show port-channel summary", ExecutionMode::Regular)
|
||||
.await?;
|
||||
|
||||
let used_ids: Vec<u8> = output
|
||||
.lines()
|
||||
.skip(6)
|
||||
.filter_map(|line| {
|
||||
let parts: Vec<&str> = line.split_whitespace().collect();
|
||||
if parts.len() < 8 {
|
||||
return None;
|
||||
}
|
||||
|
||||
u8::from_str(parts[0]).ok()
|
||||
})
|
||||
.collect();
|
||||
|
||||
let mut next_id: u8 = 1;
|
||||
loop {
|
||||
if !used_ids.contains(&next_id) {
|
||||
break;
|
||||
}
|
||||
next_id += 1;
|
||||
}
|
||||
|
||||
info!("[Brocade] Found channel id: {next_id}");
|
||||
Ok(next_id)
|
||||
}
|
||||
|
||||
async fn create_port_channel(
|
||||
&self,
|
||||
channel_id: PortChannelId,
|
||||
channel_name: &str,
|
||||
ports: &[PortLocation],
|
||||
) -> Result<(), Error> {
|
||||
info!(
|
||||
"[Brocade] Configuring port-channel '{channel_id} {channel_name}' with ports: {}",
|
||||
ports
|
||||
.iter()
|
||||
.map(|p| format!("{p}"))
|
||||
.collect::<Vec<String>>()
|
||||
.join(", ")
|
||||
);
|
||||
|
||||
let interfaces = self.get_interfaces().await?;
|
||||
|
||||
let mut commands = vec![
|
||||
"configure terminal".into(),
|
||||
format!("interface port-channel {}", channel_id),
|
||||
"no shutdown".into(),
|
||||
"exit".into(),
|
||||
];
|
||||
|
||||
for port in ports {
|
||||
let interface = interfaces.iter().find(|i| i.port_location == *port);
|
||||
let Some(interface) = interface else {
|
||||
continue;
|
||||
};
|
||||
|
||||
commands.push(format!("interface {}", interface.name));
|
||||
commands.push("no switchport".into());
|
||||
commands.push("no ip address".into());
|
||||
commands.push("no fabric isl enable".into());
|
||||
commands.push("no fabric trunk enable".into());
|
||||
commands.push(format!("channel-group {channel_id} mode active"));
|
||||
commands.push("no shutdown".into());
|
||||
commands.push("exit".into());
|
||||
}
|
||||
|
||||
self.shell
|
||||
.run_commands(commands, ExecutionMode::Regular)
|
||||
.await?;
|
||||
|
||||
info!("[Brocade] Port-channel '{channel_name}' configured.");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn clear_port_channel(&self, channel_name: &str) -> Result<(), Error> {
|
||||
info!("[Brocade] Clearing port-channel: {channel_name}");
|
||||
|
||||
let commands = vec![
|
||||
"configure terminal".into(),
|
||||
format!("no interface port-channel {}", channel_name),
|
||||
"exit".into(),
|
||||
];
|
||||
|
||||
self.shell
|
||||
.run_commands(commands, ExecutionMode::Regular)
|
||||
.await?;
|
||||
|
||||
info!("[Brocade] Port-channel '{channel_name}' cleared.");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn enable_snmp(&self, user_name: &str, auth: &str, des: &str) -> Result<(), Error> {
|
||||
let commands = vec![
|
||||
"configure terminal".into(),
|
||||
"snmp-server view ALL 1 included".into(),
|
||||
"snmp-server group public v3 priv read ALL".into(),
|
||||
format!(
|
||||
"snmp-server user {user_name} groupname public auth md5 auth-password {auth} priv des priv-password {des}"
|
||||
),
|
||||
"exit".into(),
|
||||
];
|
||||
self.shell
|
||||
.run_commands(commands, ExecutionMode::Regular)
|
||||
.await?;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
@@ -1,367 +0,0 @@
|
||||
use std::net::IpAddr;
|
||||
use std::time::Duration;
|
||||
use std::time::Instant;
|
||||
|
||||
use crate::BrocadeOptions;
|
||||
use crate::Error;
|
||||
use crate::ExecutionMode;
|
||||
use crate::TimeoutConfig;
|
||||
use crate::ssh;
|
||||
|
||||
use log::debug;
|
||||
use log::info;
|
||||
use russh::ChannelMsg;
|
||||
use tokio::time::timeout;
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct BrocadeShell {
|
||||
ip: IpAddr,
|
||||
username: String,
|
||||
password: String,
|
||||
options: BrocadeOptions,
|
||||
before_all_commands: Vec<String>,
|
||||
after_all_commands: Vec<String>,
|
||||
}
|
||||
|
||||
impl BrocadeShell {
|
||||
pub async fn init(
|
||||
ip_addresses: &[IpAddr],
|
||||
username: &str,
|
||||
password: &str,
|
||||
options: &BrocadeOptions,
|
||||
) -> Result<Self, Error> {
|
||||
let ip = ip_addresses
|
||||
.first()
|
||||
.ok_or_else(|| Error::ConfigurationError("No IP addresses provided".to_string()))?;
|
||||
|
||||
let brocade_ssh_client_options =
|
||||
ssh::try_init_client(username, password, ip, options).await?;
|
||||
|
||||
Ok(Self {
|
||||
ip: *ip,
|
||||
username: username.to_string(),
|
||||
password: password.to_string(),
|
||||
before_all_commands: vec![],
|
||||
after_all_commands: vec![],
|
||||
options: brocade_ssh_client_options,
|
||||
})
|
||||
}
|
||||
|
||||
pub async fn open_session(&self, mode: ExecutionMode) -> Result<BrocadeSession, Error> {
|
||||
BrocadeSession::open(
|
||||
self.ip,
|
||||
self.options.ssh.port,
|
||||
&self.username,
|
||||
&self.password,
|
||||
self.options.clone(),
|
||||
mode,
|
||||
)
|
||||
.await
|
||||
}
|
||||
|
||||
pub async fn with_session<F, R>(&self, mode: ExecutionMode, callback: F) -> Result<R, Error>
|
||||
where
|
||||
F: FnOnce(
|
||||
&mut BrocadeSession,
|
||||
) -> std::pin::Pin<
|
||||
Box<dyn std::future::Future<Output = Result<R, Error>> + Send + '_>,
|
||||
>,
|
||||
{
|
||||
let mut session = self.open_session(mode).await?;
|
||||
|
||||
let _ = session.run_commands(self.before_all_commands.clone()).await;
|
||||
let result = callback(&mut session).await;
|
||||
let _ = session.run_commands(self.after_all_commands.clone()).await;
|
||||
|
||||
session.close().await?;
|
||||
result
|
||||
}
|
||||
|
||||
pub async fn run_command(&self, command: &str, mode: ExecutionMode) -> Result<String, Error> {
|
||||
let mut session = self.open_session(mode).await?;
|
||||
|
||||
let _ = session.run_commands(self.before_all_commands.clone()).await;
|
||||
let result = session.run_command(command).await;
|
||||
let _ = session.run_commands(self.after_all_commands.clone()).await;
|
||||
|
||||
session.close().await?;
|
||||
result
|
||||
}
|
||||
|
||||
pub async fn run_commands(
|
||||
&self,
|
||||
commands: Vec<String>,
|
||||
mode: ExecutionMode,
|
||||
) -> Result<(), Error> {
|
||||
let mut session = self.open_session(mode).await?;
|
||||
|
||||
let _ = session.run_commands(self.before_all_commands.clone()).await;
|
||||
let result = session.run_commands(commands).await;
|
||||
let _ = session.run_commands(self.after_all_commands.clone()).await;
|
||||
|
||||
session.close().await?;
|
||||
result
|
||||
}
|
||||
|
||||
pub fn before_all(&mut self, commands: Vec<String>) {
|
||||
self.before_all_commands = commands;
|
||||
}
|
||||
|
||||
pub fn after_all(&mut self, commands: Vec<String>) {
|
||||
self.after_all_commands = commands;
|
||||
}
|
||||
}
|
||||
|
||||
pub struct BrocadeSession {
|
||||
pub channel: russh::Channel<russh::client::Msg>,
|
||||
pub mode: ExecutionMode,
|
||||
pub options: BrocadeOptions,
|
||||
}
|
||||
|
||||
impl BrocadeSession {
|
||||
pub async fn open(
|
||||
ip: IpAddr,
|
||||
port: u16,
|
||||
username: &str,
|
||||
password: &str,
|
||||
options: BrocadeOptions,
|
||||
mode: ExecutionMode,
|
||||
) -> Result<Self, Error> {
|
||||
let client = ssh::create_client(ip, port, username, password, &options).await?;
|
||||
let mut channel = client.channel_open_session().await?;
|
||||
|
||||
channel
|
||||
.request_pty(false, "vt100", 80, 24, 0, 0, &[])
|
||||
.await?;
|
||||
channel.request_shell(false).await?;
|
||||
|
||||
wait_for_shell_ready(&mut channel, &options.timeouts).await?;
|
||||
|
||||
if let ExecutionMode::Privileged = mode {
|
||||
try_elevate_session(&mut channel, username, password, &options.timeouts).await?;
|
||||
}
|
||||
|
||||
Ok(Self {
|
||||
channel,
|
||||
mode,
|
||||
options,
|
||||
})
|
||||
}
|
||||
|
||||
pub async fn close(&mut self) -> Result<(), Error> {
|
||||
debug!("[Brocade] Closing session...");
|
||||
|
||||
self.channel.data(&b"exit\n"[..]).await?;
|
||||
if let ExecutionMode::Privileged = self.mode {
|
||||
self.channel.data(&b"exit\n"[..]).await?;
|
||||
}
|
||||
|
||||
let start = Instant::now();
|
||||
while start.elapsed() < self.options.timeouts.cleanup {
|
||||
match timeout(self.options.timeouts.message_wait, self.channel.wait()).await {
|
||||
Ok(Some(ChannelMsg::Close)) => break,
|
||||
Ok(Some(_)) => continue,
|
||||
Ok(None) | Err(_) => break,
|
||||
}
|
||||
}
|
||||
|
||||
debug!("[Brocade] Session closed.");
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub async fn run_command(&mut self, command: &str) -> Result<String, Error> {
|
||||
if self.should_skip_command(command) {
|
||||
return Ok(String::new());
|
||||
}
|
||||
|
||||
debug!("[Brocade] Running command: '{command}'...");
|
||||
|
||||
self.channel
|
||||
.data(format!("{}\n", command).as_bytes())
|
||||
.await?;
|
||||
tokio::time::sleep(Duration::from_millis(100)).await;
|
||||
|
||||
let output = self.collect_command_output().await?;
|
||||
let output = String::from_utf8(output)
|
||||
.map_err(|_| Error::UnexpectedError("Invalid UTF-8 in command output".to_string()))?;
|
||||
|
||||
self.check_for_command_errors(&output, command)?;
|
||||
Ok(output)
|
||||
}
|
||||
|
||||
pub async fn run_commands(&mut self, commands: Vec<String>) -> Result<(), Error> {
|
||||
for command in commands {
|
||||
self.run_command(&command).await?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn should_skip_command(&self, command: &str) -> bool {
|
||||
if (command.starts_with("write") || command.starts_with("deploy")) && self.options.dry_run {
|
||||
info!("[Brocade] Dry-run mode enabled, skipping command: {command}");
|
||||
return true;
|
||||
}
|
||||
false
|
||||
}
|
||||
|
||||
async fn collect_command_output(&mut self) -> Result<Vec<u8>, Error> {
|
||||
let mut output = Vec::new();
|
||||
let start = Instant::now();
|
||||
let read_timeout = Duration::from_millis(500);
|
||||
let log_interval = Duration::from_secs(5);
|
||||
let mut last_log = Instant::now();
|
||||
|
||||
loop {
|
||||
if start.elapsed() > self.options.timeouts.command_execution {
|
||||
return Err(Error::TimeoutError(
|
||||
"Timeout waiting for command completion.".into(),
|
||||
));
|
||||
}
|
||||
|
||||
if start.elapsed() > self.options.timeouts.command_output
|
||||
&& last_log.elapsed() > log_interval
|
||||
{
|
||||
info!("[Brocade] Waiting for command output...");
|
||||
last_log = Instant::now();
|
||||
}
|
||||
|
||||
match timeout(read_timeout, self.channel.wait()).await {
|
||||
Ok(Some(ChannelMsg::Data { data } | ChannelMsg::ExtendedData { data, .. })) => {
|
||||
output.extend_from_slice(&data);
|
||||
let current_output = String::from_utf8_lossy(&output);
|
||||
if current_output.contains('>') || current_output.contains('#') {
|
||||
return Ok(output);
|
||||
}
|
||||
}
|
||||
Ok(Some(ChannelMsg::Eof | ChannelMsg::Close)) => return Ok(output),
|
||||
Ok(Some(ChannelMsg::ExitStatus { exit_status })) => {
|
||||
debug!("[Brocade] Command exit status: {exit_status}");
|
||||
}
|
||||
Ok(Some(_)) => continue,
|
||||
Ok(None) | Err(_) => {
|
||||
if output.is_empty() {
|
||||
if let Ok(None) = timeout(read_timeout, self.channel.wait()).await {
|
||||
break;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
tokio::time::sleep(Duration::from_millis(100)).await;
|
||||
let current_output = String::from_utf8_lossy(&output);
|
||||
if current_output.contains('>') || current_output.contains('#') {
|
||||
return Ok(output);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(output)
|
||||
}
|
||||
|
||||
fn check_for_command_errors(&self, output: &str, command: &str) -> Result<(), Error> {
|
||||
const ERROR_PATTERNS: &[&str] = &[
|
||||
"invalid input",
|
||||
"syntax error",
|
||||
"command not found",
|
||||
"unknown command",
|
||||
"permission denied",
|
||||
"access denied",
|
||||
"authentication failed",
|
||||
"configuration error",
|
||||
"failed to",
|
||||
"error:",
|
||||
];
|
||||
|
||||
let output_lower = output.to_lowercase();
|
||||
if ERROR_PATTERNS.iter().any(|&p| output_lower.contains(p)) {
|
||||
return Err(Error::CommandError(format!(
|
||||
"Command error: {}",
|
||||
output.trim()
|
||||
)));
|
||||
}
|
||||
|
||||
if !command.starts_with("show") && output.trim().is_empty() {
|
||||
return Err(Error::CommandError(format!(
|
||||
"Command '{command}' produced no output"
|
||||
)));
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
async fn wait_for_shell_ready(
|
||||
channel: &mut russh::Channel<russh::client::Msg>,
|
||||
timeouts: &TimeoutConfig,
|
||||
) -> Result<(), Error> {
|
||||
let mut buffer = Vec::new();
|
||||
let start = Instant::now();
|
||||
|
||||
while start.elapsed() < timeouts.shell_ready {
|
||||
match timeout(timeouts.message_wait, channel.wait()).await {
|
||||
Ok(Some(ChannelMsg::Data { data })) => {
|
||||
buffer.extend_from_slice(&data);
|
||||
let output = String::from_utf8_lossy(&buffer);
|
||||
let output = output.trim();
|
||||
if output.ends_with('>') || output.ends_with('#') {
|
||||
debug!("[Brocade] Shell ready");
|
||||
return Ok(());
|
||||
}
|
||||
}
|
||||
Ok(Some(_)) => continue,
|
||||
Ok(None) => break,
|
||||
Err(_) => continue,
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn try_elevate_session(
|
||||
channel: &mut russh::Channel<russh::client::Msg>,
|
||||
username: &str,
|
||||
password: &str,
|
||||
timeouts: &TimeoutConfig,
|
||||
) -> Result<(), Error> {
|
||||
channel.data(&b"enable\n"[..]).await?;
|
||||
let start = Instant::now();
|
||||
let mut buffer = Vec::new();
|
||||
|
||||
while start.elapsed() < timeouts.shell_ready {
|
||||
match timeout(timeouts.message_wait, channel.wait()).await {
|
||||
Ok(Some(ChannelMsg::Data { data })) => {
|
||||
buffer.extend_from_slice(&data);
|
||||
let output = String::from_utf8_lossy(&buffer);
|
||||
|
||||
if output.ends_with('#') {
|
||||
debug!("[Brocade] Privileged mode established");
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
if output.contains("User Name:") {
|
||||
channel.data(format!("{}\n", username).as_bytes()).await?;
|
||||
buffer.clear();
|
||||
} else if output.contains("Password:") {
|
||||
channel.data(format!("{}\n", password).as_bytes()).await?;
|
||||
buffer.clear();
|
||||
} else if output.contains('>') {
|
||||
return Err(Error::AuthenticationError(
|
||||
"Enable authentication failed".into(),
|
||||
));
|
||||
}
|
||||
}
|
||||
Ok(Some(_)) => continue,
|
||||
Ok(None) => break,
|
||||
Err(_) => continue,
|
||||
}
|
||||
}
|
||||
|
||||
let output = String::from_utf8_lossy(&buffer);
|
||||
if output.ends_with('#') {
|
||||
debug!("[Brocade] Privileged mode established");
|
||||
Ok(())
|
||||
} else {
|
||||
Err(Error::AuthenticationError(format!(
|
||||
"Enable failed. Output:\n{output}"
|
||||
)))
|
||||
}
|
||||
}
|
||||
@@ -1,131 +0,0 @@
|
||||
use std::borrow::Cow;
|
||||
use std::sync::Arc;
|
||||
|
||||
use async_trait::async_trait;
|
||||
use log::debug;
|
||||
use russh::client::Handler;
|
||||
use russh::kex::DH_G1_SHA1;
|
||||
use russh::kex::ECDH_SHA2_NISTP256;
|
||||
use russh_keys::key::SSH_RSA;
|
||||
|
||||
use super::BrocadeOptions;
|
||||
use super::Error;
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct SshOptions {
|
||||
pub preferred_algorithms: russh::Preferred,
|
||||
pub port: u16,
|
||||
}
|
||||
|
||||
impl Default for SshOptions {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
preferred_algorithms: Default::default(),
|
||||
port: 22,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl SshOptions {
|
||||
fn ecdhsa_sha2_nistp256(port: u16) -> Self {
|
||||
Self {
|
||||
preferred_algorithms: russh::Preferred {
|
||||
kex: Cow::Borrowed(&[ECDH_SHA2_NISTP256]),
|
||||
key: Cow::Borrowed(&[SSH_RSA]),
|
||||
..Default::default()
|
||||
},
|
||||
port,
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
|
||||
fn legacy(port: u16) -> Self {
|
||||
Self {
|
||||
preferred_algorithms: russh::Preferred {
|
||||
kex: Cow::Borrowed(&[DH_G1_SHA1]),
|
||||
key: Cow::Borrowed(&[SSH_RSA]),
|
||||
..Default::default()
|
||||
},
|
||||
port,
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub struct Client;
|
||||
|
||||
#[async_trait]
|
||||
impl Handler for Client {
|
||||
type Error = Error;
|
||||
|
||||
async fn check_server_key(
|
||||
&mut self,
|
||||
_server_public_key: &russh_keys::key::PublicKey,
|
||||
) -> Result<bool, Self::Error> {
|
||||
Ok(true)
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn try_init_client(
|
||||
username: &str,
|
||||
password: &str,
|
||||
ip: &std::net::IpAddr,
|
||||
base_options: &BrocadeOptions,
|
||||
) -> Result<BrocadeOptions, Error> {
|
||||
let mut default = SshOptions::default();
|
||||
default.port = base_options.ssh.port;
|
||||
let ssh_options = vec![
|
||||
default,
|
||||
SshOptions::ecdhsa_sha2_nistp256(base_options.ssh.port),
|
||||
SshOptions::legacy(base_options.ssh.port),
|
||||
];
|
||||
|
||||
for ssh in ssh_options {
|
||||
let opts = BrocadeOptions {
|
||||
ssh: ssh.clone(),
|
||||
..base_options.clone()
|
||||
};
|
||||
debug!("Creating client {ip}:{} {username}", ssh.port);
|
||||
let client = create_client(*ip, ssh.port, username, password, &opts).await;
|
||||
|
||||
match client {
|
||||
Ok(_) => {
|
||||
return Ok(opts);
|
||||
}
|
||||
Err(e) => match e {
|
||||
Error::NetworkError(e) => {
|
||||
if e.contains("No common key exchange algorithm") {
|
||||
continue;
|
||||
} else {
|
||||
return Err(Error::NetworkError(e));
|
||||
}
|
||||
}
|
||||
_ => return Err(e),
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
Err(Error::NetworkError(
|
||||
"Could not establish ssh connection: wrong key exchange algorithm)".to_string(),
|
||||
))
|
||||
}
|
||||
|
||||
pub async fn create_client(
|
||||
ip: std::net::IpAddr,
|
||||
port: u16,
|
||||
username: &str,
|
||||
password: &str,
|
||||
options: &BrocadeOptions,
|
||||
) -> Result<russh::client::Handle<Client>, Error> {
|
||||
let config = russh::client::Config {
|
||||
preferred: options.ssh.preferred_algorithms.clone(),
|
||||
..Default::default()
|
||||
};
|
||||
let mut client = russh::client::connect(Arc::new(config), (ip, port), Client {}).await?;
|
||||
if !client.authenticate_password(username, password).await? {
|
||||
return Err(Error::AuthenticationError(
|
||||
"ssh authentication failed".to_string(),
|
||||
));
|
||||
}
|
||||
Ok(client)
|
||||
}
|
||||
@@ -1,11 +0,0 @@
|
||||
#!/bin/sh
|
||||
set -e
|
||||
|
||||
cd "$(dirname "$0")/.."
|
||||
|
||||
cargo install mdbook --locked
|
||||
mdbook build
|
||||
|
||||
test -f book/index.html || (echo "ERROR: book/index.html not found" && exit 1)
|
||||
test -f book/concepts.html || (echo "ERROR: book/concepts.html not found" && exit 1)
|
||||
test -f book/guides/getting-started.html || (echo "ERROR: book/guides/getting-started.html not found" && exit 1)
|
||||
16
build/ci.sh
@@ -1,16 +0,0 @@
|
||||
#!/bin/sh
|
||||
set -e
|
||||
|
||||
cd "$(dirname "$0")/.."
|
||||
|
||||
BRANCH="${1:-main}"
|
||||
|
||||
echo "=== Running CI for branch: $BRANCH ==="
|
||||
|
||||
echo "--- Checking code ---"
|
||||
./build/check.sh
|
||||
|
||||
echo "--- Building book ---"
|
||||
./build/book.sh
|
||||
|
||||
echo "=== CI passed ==="
|
||||
@@ -1,10 +1,5 @@
|
||||
#!/bin/sh
|
||||
set -e
|
||||
|
||||
cd "$(dirname "$0")/.."
|
||||
|
||||
rustc --version
|
||||
cargo check --all-targets --all-features --keep-going
|
||||
cargo fmt --check
|
||||
cargo clippy
|
||||
cargo test
|
||||
BIN
data/okd/bin/kubectl
(Stored with Git LFS)
BIN
data/okd/bin/oc
(Stored with Git LFS)
BIN
data/okd/bin/oc_README.md
(Stored with Git LFS)
BIN
data/okd/bin/openshift-install
(Stored with Git LFS)
BIN
data/okd/bin/openshift-install_README.md
(Stored with Git LFS)
BIN
data/okd/installer_image/scos-9.0.20250510-0-live-initramfs.x86_64.img
(Stored with Git LFS)
BIN
data/okd/installer_image/scos-9.0.20250510-0-live-kernel.x86_64
(Stored with Git LFS)
BIN
data/okd/installer_image/scos-9.0.20250510-0-live-rootfs.x86_64.img
(Stored with Git LFS)
@@ -1 +0,0 @@
|
||||
scos-9.0.20250510-0-live-initramfs.x86_64.img
|
||||
@@ -1 +0,0 @@
|
||||
scos-9.0.20250510-0-live-kernel.x86_64
|
||||
@@ -1 +0,0 @@
|
||||
scos-9.0.20250510-0-live-rootfs.x86_64.img
|
||||
@@ -1,8 +0,0 @@
|
||||
Here lies all the data files required for an OKD cluster PXE boot setup.
|
||||
|
||||
This inclues ISO files, binary boot files, ipxe, etc.
|
||||
|
||||
TODO as of august 2025 :
|
||||
|
||||
- `harmony_inventory_agent` should be downloaded from official releases, this embedded version is practical for now though
|
||||
- The cluster ssh key should be generated and handled by harmony with the private key saved in a secret store
|
||||
9
data/pxe/okd/http_files/.gitattributes
vendored
@@ -1,9 +0,0 @@
|
||||
harmony_inventory_agent filter=lfs diff=lfs merge=lfs -text
|
||||
os filter=lfs diff=lfs merge=lfs -text
|
||||
os/centos-stream-9 filter=lfs diff=lfs merge=lfs -text
|
||||
os/centos-stream-9/images filter=lfs diff=lfs merge=lfs -text
|
||||
os/centos-stream-9/initrd.img filter=lfs diff=lfs merge=lfs -text
|
||||
os/centos-stream-9/vmlinuz filter=lfs diff=lfs merge=lfs -text
|
||||
os/centos-stream-9/images/efiboot.img filter=lfs diff=lfs merge=lfs -text
|
||||
os/centos-stream-9/images/install.img filter=lfs diff=lfs merge=lfs -text
|
||||
os/centos-stream-9/images/pxeboot filter=lfs diff=lfs merge=lfs -text
|
||||
@@ -1 +0,0 @@
|
||||
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBx6bDylvC68cVpjKfEFtLQJ/dOFi6PVS2vsIOqPDJIc jeangab@liliane2
|
||||
BIN
data/pxe/okd/http_files/harmony_inventory_agent
(Stored with Git LFS)
BIN
data/pxe/okd/http_files/os/centos-stream-9/images/efiboot.img
(Stored with Git LFS)
BIN
data/pxe/okd/http_files/os/centos-stream-9/images/install.img
(Stored with Git LFS)
BIN
data/pxe/okd/http_files/os/centos-stream-9/initrd.img
(Stored with Git LFS)
BIN
data/pxe/okd/http_files/os/centos-stream-9/vmlinuz
(Stored with Git LFS)
@@ -1,3 +0,0 @@
|
||||
.terraform
|
||||
*.tfstate
|
||||
venv
|
||||
|
Before Width: | Height: | Size: 72 KiB |
|
Before Width: | Height: | Size: 38 KiB |
|
Before Width: | Height: | Size: 38 KiB |
|
Before Width: | Height: | Size: 52 KiB |
|
Before Width: | Height: | Size: 62 KiB |
|
Before Width: | Height: | Size: 64 KiB |
|
Before Width: | Height: | Size: 100 KiB |
@@ -1,5 +0,0 @@
|
||||
To build :
|
||||
|
||||
```bash
|
||||
npx @marp-team/marp-cli@latest -w slides.md
|
||||
```
|
||||
|
Before Width: | Height: | Size: 11 KiB |
@@ -1,9 +0,0 @@
|
||||
To run this :
|
||||
|
||||
```bash
|
||||
virtualenv venv
|
||||
source venv/bin/activate
|
||||
pip install ansible ansible-dev-tools
|
||||
ansible-lint download.yml
|
||||
ansible-playbook -i localhost download.yml
|
||||
```
|
||||
@@ -1,8 +0,0 @@
|
||||
- name: Test Ansible URL Validation
|
||||
hosts: localhost
|
||||
tasks:
|
||||
- name: Download a file
|
||||
ansible.builtin.get_url:
|
||||
url: "http:/wikipedia.org/"
|
||||
dest: "/tmp/ansible-test/wikipedia.html"
|
||||
mode: '0900'
|
||||
|
Before Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 275 KiB |
|
Before Width: | Height: | Size: 212 KiB |
|
Before Width: | Height: | Size: 384 KiB |
|
Before Width: | Height: | Size: 8.3 KiB |
@@ -1,241 +0,0 @@
|
||||
---
|
||||
theme: uncover
|
||||
---
|
||||
|
||||
# Voici l'histoire de Petit Poisson
|
||||
|
||||
---
|
||||
|
||||
<img src="./Happy_swimmer.jpg" width="600"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./happy_landscape_swimmer.jpg" width="1000"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./Happy_swimmer.jpg" width="200"/>
|
||||
|
||||
<img src="./tryrust.org.png" width="600"/>
|
||||
|
||||
[https://tryrust.org](https://tryrust.org)
|
||||
|
||||
---
|
||||
|
||||
<img src="./texto_deploy_prod_1.png" width="600"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./texto_deploy_prod_2.png" width="600"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./texto_deploy_prod_3.png" width="600"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./texto_deploy_prod_4.png" width="600"/>
|
||||
|
||||
---
|
||||
|
||||
## Demo time
|
||||
|
||||
---
|
||||
|
||||
<img src="./Happy_swimmer_sunglasses.jpg" width="1000"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./texto_download_wikipedia.png" width="600"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./ansible.jpg" width="200"/>
|
||||
|
||||
## Ansible❓
|
||||
|
||||
---
|
||||
|
||||
<img src="./Happy_swimmer.jpg" width="200"/>
|
||||
|
||||
```yaml
|
||||
- name: Download wikipedia
|
||||
hosts: localhost
|
||||
tasks:
|
||||
- name: Download a file
|
||||
ansible.builtin.get_url:
|
||||
url: "https:/wikipedia.org/"
|
||||
dest: "/tmp/ansible-test/wikipedia.html"
|
||||
mode: '0900'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<img src="./Happy_swimmer.jpg" width="200"/>
|
||||
|
||||
```
|
||||
ansible-lint download.yml
|
||||
|
||||
Passed: 0 failure(s), 0 warning(s) on 1 files. Last profile that met the validation criteria was 'production'.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
```
|
||||
git push
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<img src="./75_years_later.jpg" width="1100"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./texto_download_wikipedia_fail.png" width="600"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./Happy_swimmer_reversed.jpg" width="600"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./ansible_output_fail.jpg" width="1100"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./Happy_swimmer_reversed_1hit.jpg" width="600"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./ansible_crossed_out.jpg" width="400"/>
|
||||
|
||||
---
|
||||
|
||||
|
||||
<img src="./terraform.jpg" width="400"/>
|
||||
|
||||
## Terraform❓❗
|
||||
|
||||
---
|
||||
|
||||
<img src="./Happy_swimmer_reversed_1hit.jpg" width="200"/>
|
||||
<img src="./terraform.jpg" width="200"/>
|
||||
|
||||
```tf
|
||||
provider "docker" {}
|
||||
|
||||
resource "docker_network" "invalid_network" {
|
||||
name = "my-invalid-network"
|
||||
|
||||
ipam_config {
|
||||
subnet = "172.17.0.0/33"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<img src="./Happy_swimmer_reversed_1hit.jpg" width="100"/>
|
||||
<img src="./terraform.jpg" width="200"/>
|
||||
|
||||
```
|
||||
terraform plan
|
||||
|
||||
Terraform used the selected providers to generate the following execution plan.
|
||||
Resource actions are indicated with the following symbols:
|
||||
+ create
|
||||
|
||||
Terraform will perform the following actions:
|
||||
|
||||
# docker_network.invalid_network will be created
|
||||
+ resource "docker_network" "invalid_network" {
|
||||
+ driver = (known after apply)
|
||||
+ id = (known after apply)
|
||||
+ internal = (known after apply)
|
||||
+ ipam_driver = "default"
|
||||
+ name = "my-invalid-network"
|
||||
+ options = (known after apply)
|
||||
+ scope = (known after apply)
|
||||
|
||||
+ ipam_config {
|
||||
+ subnet = "172.17.0.0/33"
|
||||
# (2 unchanged attributes hidden)
|
||||
}
|
||||
}
|
||||
|
||||
Plan: 1 to add, 0 to change, 0 to destroy.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
✅
|
||||
|
||||
---
|
||||
|
||||
```
|
||||
terraform apply
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
```
|
||||
Plan: 1 to add, 0 to change, 0 to destroy.
|
||||
|
||||
Do you want to perform these actions?
|
||||
Terraform will perform the actions described above.
|
||||
Only 'yes' will be accepted to approve.
|
||||
|
||||
Enter a value: yes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
```
|
||||
docker_network.invalid_network: Creating...
|
||||
╷
|
||||
│ Error: Unable to create network: Error response from daemon: invalid network config:
|
||||
│ invalid subnet 172.17.0.0/33: invalid CIDR block notation
|
||||
│
|
||||
│ with docker_network.invalid_network,
|
||||
│ on main.tf line 11, in resource "docker_network" "invalid_network":
|
||||
│ 11: resource "docker_network" "invalid_network" {
|
||||
│
|
||||
╵
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
|
||||
<img src="./Happy_swimmer_reversed_fullhit.jpg" width="1100"/>
|
||||
|
||||
---
|
||||
|
||||
<img src="./ansible_crossed_out.jpg" width="300"/>
|
||||
<img src="./terraform_crossed_out.jpg" width="400"/>
|
||||
<img src="./Happy_swimmer_reversed_fullhit.jpg" width="300"/>
|
||||
|
||||
---
|
||||
|
||||
## Harmony❓❗
|
||||
|
||||
---
|
||||
|
||||
Demo time
|
||||
|
||||
---
|
||||
|
||||
<img src="./Happy_swimmer.jpg" width="300"/>
|
||||
|
||||
---
|
||||
|
||||
# 🎼
|
||||
|
||||
Harmony : [https://git.nationtech.io/nationtech/harmony](https://git.nationtech.io/nationtech/harmony)
|
||||
|
||||
|
||||
<img src="./qrcode_gitea_nationtech.png" width="120"/>
|
||||
|
||||
|
||||
LinkedIn : [https://www.linkedin.com/in/jean-gabriel-gill-couture/](https://www.linkedin.com/in/jean-gabriel-gill-couture/)
|
||||
|
||||
Courriel : [jg@nationtech.io](mailto:jg@nationtech.io)
|
||||
@@ -1,132 +0,0 @@
|
||||
# Harmony, Orchestrateur d'infrastructure open-source
|
||||
|
||||
**Target Duration:** 25 minutes\
|
||||
**Tone:** Friendly, expert-to-expert, inspiring.
|
||||
|
||||
---
|
||||
|
||||
#### **Slide 1: Title Slide**
|
||||
|
||||
- **Visual:** Clean and simple. Your company logo (NationTech) and the Harmony logo.
|
||||
|
||||
---
|
||||
|
||||
#### **Slide 2: The YAML Labyrinth**
|
||||
|
||||
**Goal:** Get every head in the room nodding in agreement. Start with their world, not yours.
|
||||
|
||||
- **Visual:**
|
||||
- Option A: "The Pull Request from Hell". A screenshot of a GitHub pull request for a seemingly minor change that touches dozens of YAML files across multiple directories. A sea of red and green diffs that is visually overwhelming.
|
||||
- Option B: A complex flowchart connecting dozens of logos: Terraform, Ansible, K8s, Helm, etc.
|
||||
- **Narration:**\
|
||||
[...ADD SOMETHING FOR INTRODUCTION...]\
|
||||
"We love the power that tools like Kubernetes and the CNCF landscape have given us. But let's be honest... when did our infrastructure code start looking like _this_?"\
|
||||
"We have GitOps, which is great. But it often means we're managing this fragile cathedral of YAML, Helm charts, and brittle scripts. We spend more time debugging indentation and tracing variables than we do building truly resilient systems."
|
||||
|
||||
---
|
||||
|
||||
#### **Slide 3: The Real Cost of Infrastructure**
|
||||
|
||||
- **Visual:** "The Jenga Tower of Tools". A tall, precarious Jenga tower where each block is the logo of a different tool (Terraform, K8s, Helm, Ansible, Prometheus, ArgoCD, etc.). One block near the bottom is being nervously pulled out.
|
||||
- **Narration:**
|
||||
"The real cost isn't just complexity; it's the constant need to choose, learn, integrate, and operate a dozen different tools, each with its own syntax and failure modes. It's the nagging fear that a tiny typo in a config file could bring everything down. Click-ops isn't the answer, but the current state of IaC feels like we've traded one problem for another."
|
||||
|
||||
---
|
||||
|
||||
#### **Slide 4: The Broken Promise of "Code"**
|
||||
|
||||
**Goal:** Introduce the core idea before introducing the product. This makes the solution feel inevitable.
|
||||
|
||||
- **(Initial Visual):** A two-panel slide.
|
||||
- **Left Panel Title: "The Plan"** - A terminal showing a green, successful `terraform plan` output.
|
||||
- **Right Panel Title: "The Reality"** - The _next_ screen in the terminal, showing the `terraform apply` failing with a cascade of red error text.
|
||||
- **Narration:**
|
||||
"We call our discipline **Infrastructure as Code**. And we've all been here. Our 'compiler' is a `terraform plan` that says everything looks perfect. We get the green light."
|
||||
(Pause for a beat)
|
||||
"And then we `apply`, and reality hits. It fails halfway through, at runtime, when it's most expensive and painful to fix."
|
||||
|
||||
**(Click to transition the slide)**
|
||||
|
||||
- **(New Visual):** The entire slide is replaced by a clean screenshot of a code editor (like nvim 😉) showing Harmony's Rust DSL. A red squiggly line is under a config line. The error message is clear in the "Problems" panel: `error: Incompatible deployment. Production target 'gcp-prod-cluster' requires a StorageClass with 'snapshots' capability, but 'standard-sc' does not provide it.`
|
||||
- **Narration (continued):**
|
||||
"In software development, we solved these problems years ago. We don't accept 'it compiled, but crashed on startup'. We have real tools, type systems, compilers, test frameworks, and IDEs that catch our mistakes before they ever reach production. **So, what if we could treat our entire infrastructure... like a modern, compiled application?**"
|
||||
"What if your infrastructure code could get compile-time checks, straight into the editor... instead of runtime panics and failures at 3 AM in production?"
|
||||
|
||||
---
|
||||
|
||||
#### **Slide 5: Introducing Harmony**
|
||||
|
||||
**Goal:** Introduce Harmony as the answer to the "What If?" question.
|
||||
|
||||
- **Visual:** The Harmony logo, large and centered.
|
||||
- **Tagline:** `Infrastructure in type-safe Rust. No YAML required.`
|
||||
- **Narration:**
|
||||
"This is Harmony. It's an open-source orchestrator that lets you define your entire stack — from a dev laptop to a multi-site bare-metal cluster—in a single, type-safe Rust codebase."
|
||||
|
||||
---
|
||||
|
||||
#### **Slide 6: Before & After**
|
||||
|
||||
- **Visual:** A side-by-side comparison. Left side: A screen full of complex, nested YAML. Right side: 10-15 lines of clean, readable Harmony Rust DSL that accomplishes the same thing.
|
||||
- **Narration:**
|
||||
"This is the difference. On the left, the fragile world of strings and templates. On the right, a portable, verifiable program that describes your apps, your infra, and your operations. We unify scaffolding, provisioning, and Day-2 ops, all verified by the Rust compiler. But enough slides... let's see it in action."
|
||||
|
||||
---
|
||||
|
||||
#### **Slide 7: Live Demo: Zero to Monitored App**
|
||||
|
||||
**Goal:** Show, don't just tell. Make it look effortless. This is where you build the "dream."
|
||||
|
||||
- **Visual:** Your terminal/IDE, ready to go.
|
||||
- **Narration Guide:**
|
||||
"Okay, for this demo, we're going to take a standard web app from GitHub. Nothing special about it."
|
||||
_(Show the repo)_
|
||||
"Now, let's bring it into Harmony. This is the entire definition we need to describe the application and its needs."
|
||||
_(Show the Rust DSL)_
|
||||
"First, let's run it locally on k3d. The exact same definition for dev as for prod."
|
||||
_(Deploy locally, show it works)_
|
||||
"Cool. But a real app needs monitoring. In Harmony, that's just adding a feature to our code."
|
||||
_(Uncomment one line: `.with_feature(Monitoring)` and redeploy)_
|
||||
"And just like that, we have a fully configured Prometheus and Grafana stack, scraping our app. No YAML, no extra config."
|
||||
"Finally, let's push this to our production staging cluster. We just change the target and specify our multi-site Ceph storage."
|
||||
_(Deploy to the remote cluster)_
|
||||
"And there it is. We've gone from a simple web app to a monitored, enterprise-grade service in minutes."
|
||||
|
||||
---
|
||||
|
||||
#### **Slide 8: Live Demo: Embracing Chaos**
|
||||
|
||||
**Goal:** Prove the "predictable" and "resilient" claims in the most dramatic way possible.
|
||||
|
||||
- **Visual:** A slide showing a map or diagram of your distributed infrastructure (the different data centers). Then switch back to your terminal.
|
||||
- **Narration Guide:**
|
||||
"This is great when things are sunny. But production is chaos. So... let's break things. On purpose."
|
||||
"First, a network failure." _(Kill a switch/link, show app is still up)_
|
||||
"Now, let's power off a storage server." _(Force off a server, show Ceph healing and the app is unaffected)_
|
||||
"How about a control plane node?" _(Force off a k8s control plane, show the cluster is still running)_
|
||||
"Okay, for the grand finale. What if we have a cascading failure? I'm going to kill _another_ storage server. This should cause a total failure in this data center."
|
||||
_(Force off the second server, narrate what's happening)_
|
||||
"And there it is... Ceph has lost quorum in this site... and Harmony has automatically failed everything over to our other datacenter. The app is still running."
|
||||
|
||||
---
|
||||
|
||||
#### **Slide 9: The New Reality**
|
||||
|
||||
**Goal:** Summarize the dream and tell the audience what you want them to do.
|
||||
|
||||
- **Visual:** The clean, simple Harmony Rust DSL code from Slide 6. A summary of what was just accomplished is listed next to it: `✓ GitHub to Prod in minutes`, `✓ Type-Safe Validation`, `✓ Built-in Monitoring`, `✓ Automated Multi-Site Failover`.
|
||||
- **Narration:**
|
||||
"So, in just a few minutes, we went from a simple web app to a multi-site, monitored, and chaos-proof production deployment. We did it with a small amount of code that is easy to read, easy to verify, and completely portable. This is our vision: to offload the complexity, and make infrastructure simple, predictable, and even fun again."
|
||||
|
||||
---
|
||||
|
||||
#### **Slide 10: Join Us**
|
||||
|
||||
- **Visual:** A clean, final slide with QR codes and links.
|
||||
- GitHub Repo (`github.com/nation-tech/harmony`)
|
||||
- Website (`harmony.sh` or similar)
|
||||
- Your contact info (`jg@nation.tech` / LinkedIn / Twitter)
|
||||
- **Narration:**
|
||||
"Harmony is open-source, AGPLv3. We believe this is the future, but we're just getting started. We know this crowd has great infrastructure minds out there, and we need your feedback. Please, check out the project on GitHub. Star it if you like what you see. Tell us what's missing. Let's build this future together. Thank you."
|
||||
|
||||
**(Open for Q&A)**
|
||||