Add Micro–DC/minimum-toolset/policies-and-compliance/minimum-toolset-profile.toon
This commit is contained in:
@@ -0,0 +1,160 @@
|
||||
meta:
|
||||
format: toon
|
||||
version: "1.0"
|
||||
kind: "toolset_profile"
|
||||
name: "minimum_toolset_mvp_v1"
|
||||
generated_by: "AI Council OS — ORIGINZERO"
|
||||
status: "draft"
|
||||
|
||||
context:
|
||||
objective: >
|
||||
Define a minimum viable toolset for sovereign micro-DC operations that
|
||||
balances sovereignty, safety, and sustainability while remaining operable
|
||||
by small teams. This profile constrains the allowed tools for IaC, GitOps,
|
||||
policy-as-code, observability, and network verification.
|
||||
scope:
|
||||
- "All sovereign micro-DC modules following the global blueprint"
|
||||
- "All environments: lab, staging, production"
|
||||
non_goals:
|
||||
- "Describing every optional or future tool"
|
||||
- "Replacing detailed runbooks or vendor documentation"
|
||||
|
||||
profiles:
|
||||
- id: "minimum_mvp"
|
||||
label: "Minimum Viable Toolset"
|
||||
description: >
|
||||
Base toolset used by default for all new micro-DC deployments. Additional
|
||||
tools require a formal RFC and approval from the CI/CD & GitOps Governance
|
||||
Lead plus Security Architect.
|
||||
|
||||
owners:
|
||||
primary_role: "CI/CD & GitOps Governance Lead"
|
||||
supporting_roles:
|
||||
- "Principal SRE/DevOps Architect"
|
||||
- "Sovereign Compliance & Sustainability Lead"
|
||||
- "Security Architect"
|
||||
|
||||
constraints:
|
||||
- "Each category (IaC, GitOps, policy, observability, network verification) must have exactly one canonical tool or stack."
|
||||
- "Introducing non-canonical tools into production requires an RFC and risk assessment."
|
||||
- "All production changes must flow through Git and CI/CD; no manual configuration drift."
|
||||
|
||||
categories:
|
||||
|
||||
iac:
|
||||
canonical_tools:
|
||||
- name: "Terraform"
|
||||
purpose: "Declarative infrastructure as code for network, security, and DCIM/inventory where APIs exist."
|
||||
scope:
|
||||
- "L2/L3 network configuration (where API-driven)"
|
||||
- "Firewall and load balancer policies"
|
||||
- "IPAM and DNS records"
|
||||
- "Cloud or virtual resources, if applicable"
|
||||
conventions:
|
||||
modules:
|
||||
- "Shared core modules for naming, tagging, and security baselines."
|
||||
- "Per-site root modules (e.g. network/terraform/sites/<SITE_CODE>)."
|
||||
state_management:
|
||||
- "Remote state with encryption at rest."
|
||||
- "State backends must respect data residency and sovereignty rules."
|
||||
- name: "Ansible"
|
||||
purpose: "Host configuration management and day-0/day-1 bootstrap of bare metal, hypervisors, and K8s nodes."
|
||||
scope:
|
||||
- "OS install and base hardening"
|
||||
- "Hypervisor configuration (e.g. Proxmox VE)"
|
||||
- "K8s node bootstrap and joining clusters"
|
||||
- "Ceph node configuration and basic cluster bring-up"
|
||||
conventions:
|
||||
- "All playbooks and roles live in infra-foundation/hypervisor/ansible and baremetal/profiles."
|
||||
- "Ansible is not used for logical network topology that is already managed by Terraform."
|
||||
|
||||
gitops:
|
||||
canonical_tools:
|
||||
- name: "Argo CD"
|
||||
purpose: "Declarative GitOps controller for Kubernetes and platform-level configuration."
|
||||
scope:
|
||||
- "K8s cluster bootstrapping and addons"
|
||||
- "Platform services (monitoring, logging, ingress, policy engines)"
|
||||
- "Tenant workloads and namespaces"
|
||||
conventions:
|
||||
- "App-of-apps pattern per site: k8s/clusters/<SITE_CODE>/apps.yaml."
|
||||
- "Separate Argo Projects for infra/platform vs tenant workloads."
|
||||
- "Write access to Argo-managed namespaces is via Git only (no kubectl apply to prod)."
|
||||
|
||||
policy_as_code:
|
||||
canonical_tools:
|
||||
- name: "Kyverno"
|
||||
purpose: "Kubernetes-native policy engine for admission control, data residency, and security policies."
|
||||
scope:
|
||||
- "Enforce namespace conventions and labels (e.g. data_classification, country_code)."
|
||||
- "Enforce storageClass and namespace bindings for residency."
|
||||
- "Require NetworkPolicies in non-public namespaces."
|
||||
- "Baseline security controls (e.g. no privileged containers by default)."
|
||||
ci_integration:
|
||||
- "Policies are stored in policies-and-compliance/opa-policies-or-kyverno/."
|
||||
- "CI pipeline runs Kyverno CLI tests on pull requests before merge."
|
||||
notes:
|
||||
- "If OPA/Gatekeeper is already deeply adopted, a separate profile may be defined; this MVP profile assumes Kyverno."
|
||||
|
||||
observability:
|
||||
canonical_tools:
|
||||
- name: "Prometheus"
|
||||
purpose: "Metrics collection for infrastructure and platforms."
|
||||
scope:
|
||||
- "K8s cluster metrics"
|
||||
- "Node and application metrics"
|
||||
- "Facility and sustainability metrics (PDU, UPS, temperature, PUE/WUE)"
|
||||
- name: "Alertmanager"
|
||||
purpose: "Routing alerts to on-call and NOC."
|
||||
- name: "Loki"
|
||||
purpose: "Centralized log aggregation for clusters and platform services."
|
||||
- name: "Tempo"
|
||||
purpose: "Distributed tracing backend for platform and key workloads."
|
||||
- name: "Grafana"
|
||||
purpose: "Dashboards for operations, SLOs, and sustainability KPIs."
|
||||
conventions:
|
||||
- "Core observability stack runs on a designated infra cluster per country or region."
|
||||
- "Data flows must respect data residency and classification rules; logs or traces from CRITICAL_SOVEREIGN workloads stay in-country."
|
||||
- "All SLO/SLI definitions (including AI/ML fabric SLOs) are captured as dashboards with alerting rules stored in Git."
|
||||
|
||||
network_verification:
|
||||
canonical_tools:
|
||||
- name: "Batfish"
|
||||
purpose: "Static analysis of network configurations to verify reachability, isolation, and policy compliance."
|
||||
scope:
|
||||
- "Pre-change analysis for routing, ACLs, VRFs, and inter-site connectivity."
|
||||
- "Verification of sovereignty boundaries (no unintended leakage of CRITICAL_SOVEREIGN networks)."
|
||||
conventions:
|
||||
- "Each pull request that modifies network configuration must run Batfish tests in the CI pipeline."
|
||||
- "Test suites are stored under infra-foundation/network/tests/batfish/."
|
||||
- name: "Synthetic Probes"
|
||||
purpose: "Runtime validation of network paths using basic tools (ping, traceroute, HTTP checks, throughput probes)."
|
||||
implementation:
|
||||
- "K8s Jobs or DaemonSets scheduled in key namespaces."
|
||||
- "Results exported as Prometheus metrics for alerting."
|
||||
|
||||
change_control:
|
||||
principles:
|
||||
- "All changes must originate from a Git commit and flow through CI/CD."
|
||||
- "Tool configuration is code; direct UI changes on tools (e.g. Grafana, Argo, Proxmox) must be codified into Git promptly."
|
||||
- "Non-canonical tools may be used in labs, but must not become production dependencies without a toolset RFC."
|
||||
rfc_required_when:
|
||||
- "Introducing a new observability stack (e.g. alternate log store)."
|
||||
- "Adding a second GitOps controller."
|
||||
- "Replacing Terraform, Ansible, Argo CD, Kyverno, Prometheus, Loki, Tempo, Grafana, or Batfish in production."
|
||||
|
||||
rollout_strategy:
|
||||
phases:
|
||||
- id: "T0"
|
||||
name: "Assessment & Inventory"
|
||||
description: "Map current tools in use and identify overlaps with the MVP profile."
|
||||
- id: "T1"
|
||||
name: "Pilot Site Adoption"
|
||||
description: "Apply the minimum toolset at one non-critical site; refine patterns and runbooks."
|
||||
- id: "T2"
|
||||
name: "Template Hardening"
|
||||
description: "Codify patterns into reusable modules and templates; finalize documentation and SLOs."
|
||||
- id: "T3"
|
||||
name: "Broad Adoption"
|
||||
description: "Adopt the MVP toolset as the default for all new sites and progressively migrate existing ones."
|
||||
|
||||
Reference in New Issue
Block a user