diff --git a/Micro–DC/docs/training/D4 b/Micro–DC/docs/training/D4 new file mode 100644 index 0000000..1acd48e --- /dev/null +++ b/Micro–DC/docs/training/D4 @@ -0,0 +1,585 @@ +Alright, sovereignty & compliance time 👮‍♀️📜 + +Below is **D4 — Sovereignty & Compliance Labs Manual**, built to sit on top of D1-D3. + +We'll treat D4 as a **focused lab block** rather than a long calendar: +3 micro-labs you can run as half-day sessions or one intense full day: + +1. **Lab 1 - Data Classification & Sovereign Namespaces** +2. **Lab 2 - Backup & Cross-Border Data Residency** +3. **Lab 3 - Admin Access, JIT & Audit Evidence** + +Everything assumes you already have: + +* Repos: `infra-foundation`, `platform-clusters`, `policies-and-compliance` +* Pipelines: `lint_and_unit`, `policy_gates`, `integration_test`, `site_rollout` +* Site: **EU-PAR-FR01**, jurisdiction FR / EU-EEA, with data classifications including `CRITICAL_SOVEREIGN_FR` + +--- + +# D4 - Sovereignty & Compliance Labs Manual + +## Shared Foundations for All Labs + +### Roles at the (virtual) table + +* **Sovereign Compliance Lead / DPO** +* **Security Architect** +* **Platform/SRE engineer** +* **Tenant/Product representative** (for realistic requirements) + +### Common Pre-Reqs + +* `policies-and-compliance` contains at least: + + * `data-classification.yaml` + * `opa-policies/data_residency.rego` + * `opa-policies/rbac.rego` +* `platform-clusters`: + + * K8s mgmt cluster for EU-PAR-FR01 is up and GitOps-managed. + * Namespaces + StorageClasses for FR tenants exist or can be created. +* Trainees know how to: + + * Branch → MR → CI → merge → `site_rollout` + * Run `./scripts/lint.sh` and `./scripts/run_opa.sh` locally. + +--- + +## Lab 1 - Data Classification & Sovereign Namespaces + +**Theme:** +Turn abstract GDPR / sovereignty rules into **concrete namespace & label design**, then enforce them via policy. + +### Learning Objectives + +By the end of Lab 1, trainees can: + +* Map **business data types** to classification levels (PUBLIC / PERSONAL / CRITICAL_SOVEREIGN_FR, etc.). +* Design namespaces and labels that encode classification and jurisdiction. +* See how mislabeling is caught by `policy_gates`. + +### Timebox + +~2-3 hours. + +### Step 0 - Scenario + +A new tenant *Justice Ministry - Case Analytics* wants to run workloads in EU-PAR-FR01. They process: + +* Criminal case data +* Personal identifiers +* Sensitive categories (e.g., ethnicity, health markers) + +Compliance decision: + +* This tenant's data is treated as **`CRITICAL_SOVEREIGN_FR`** +* Must never leave **France**; backups remain in FR. + +### Step 1 - Update / Confirm Classification Rules + +**Repo:** `policies-and-compliance` +**Branch:** `feat/d4-lab1-justice-classification` + +1. Open `data-classification.yaml` and verify it has: + + ```yaml + levels: + - name: PUBLIC + - name: INTERNAL + - name: PERSONAL + - name: SENSITIVE_PERSONAL + - name: CRITICAL_SOVEREIGN_FR + + residency: + CRITICAL_SOVEREIGN_FR: + must_stay_in_country: FR + SENSITIVE_PERSONAL: + must_stay_in_region: EU_EEA + ``` + +2. If you need a tenant-specific label, optionally add: + + ```yaml + tenant_overlays: + justice_case_analytics: + base_level: CRITICAL_SOVEREIGN_FR + notes: "Justice ministry workloads with case data and identifiers" + ``` + +3. Run: + + ```bash + ./scripts/lint.sh + ./scripts/run_opa.sh # should still pass + ``` + +4. Push branch, open MR, ensure `policy_gates` passes. + +--- + +### Step 2 - Create a Sovereign Namespace + +**Repo:** `platform-clusters` +**Branch:** `feat/d4-lab1-justice-namespace` + +Add: + +`k8s/clusters/eu-par-fr01/namespaces/fr-critical-sovereign-justice.yaml`: + +```yaml +apiVersion: v1 +kind: Namespace +metadata: + name: fr-critical-sovereign-justice + labels: + data_classification: CRITICAL_SOVEREIGN_FR + country: FR + tenant: justice_case_analytics +``` + +Run local checks: + +```bash +./scripts/lint.sh +./scripts/run_opa.sh +``` + +If you've got a naming policy such as: + +```rego +deny[msg] { + input.kind == "Namespace" + startswith(input.metadata.name, "fr-critical-sovereign-") + input.metadata.labels["data_classification"] != "CRITICAL_SOVEREIGN_FR" + msg := sprintf("namespace %v must be labeled CRITICAL_SOVEREIGN_FR", [input.metadata.name]) +} +``` + +it should now pass. + +--- + +### Step 3 - Intentional Mislabel (Training Failure) + +To make the lab real: + +1. Temporarily change the label in your branch: + + ```yaml + data_classification: SENSITIVE_PERSONAL # WRONG + ``` + +2. Re-run `./scripts/run_opa.sh` and observe the **deny** from the naming/data-class policy. + +3. Fix back to `CRITICAL_SOVEREIGN_FR`, rerun, confirm pass. + +--- + +### Step 4 - Deploy Namespace via GitOps + +Push branch, open MR, and once CI is green: + +* Merge to `main` +* Trigger `site_rollout EU-PAR-FR01` (or let an environment pipeline do it). +* Verify via: + + ```bash + kubectl get ns fr-critical-sovereign-justice -o yaml + ``` + +*Only for observation; no manual updates.* + +### Lab 1 Definition of Done + +* `fr-critical-sovereign-justice` namespace exists in the cluster with correct labels. +* Policy prevents mislabeling for namespaces that follow the `fr-critical-sovereign-*` pattern. +* Trainees understand how **classification → namespace → policy** flows end-to-end. + +--- + +## Lab 2 - Backup & Cross-Border Data Residency + +**Theme:** +Backups are the classic place sovereignty gets broken. This lab turns that into a controlled exercise. + +### Learning Objectives + +By the end of Lab 2, trainees can: + +* Model backup policies for sovereign data. +* Understand how residency policies block illegal targets. +* Fix issues without relaxing policy (no “disable OPA” shortcuts). + +### Timebox + +~2-3 hours. + +### Step 0 - Scenario + +Justice tenant wants hourly backups of all critical sovereign namespaces: + +* `fr-critical-sovereign-justice` +* `fr-critical-sovereign-ai` (from earlier labs) + +However, a platform engineer mistakenly configures backups to region `eu-central-1` (with nodes in DE). + +### Step 1 - Confirm Residency Policy + +**Repo:** `policies-and-compliance` +Check that you have `opa-policies/data_residency.rego` with *at least*: + +```rego +package data_residency + +deny[msg] { + input.kind == "BackupPolicy" + input.metadata.labels["data_classification"] == "CRITICAL_SOVEREIGN_FR" + not input.spec.target.region == "fr-central" + msg := sprintf("critical FR data must backup to fr-central, got %v", [input.spec.target.region]) +} +``` + +If missing, create it and wire `run_opa.sh` to use it. + +--- + +### Step 2 - Add a Non-Compliant BackupPolicy (Failure Injection) + +**Repo:** `platform-clusters` +**Branch:** `feat/d4-lab2-backup-residency` + +`k8s/clusters/eu-par-fr01/backups/fr-critical-sovereign-backup.yaml`: + +```yaml +apiVersion: backup.example.io/v1 +kind: BackupPolicy +metadata: + name: fr-critical-sovereign-backup + labels: + data_classification: CRITICAL_SOVEREIGN_FR + tenant: justice_case_analytics +spec: + schedule: "0 * * * *" + target: + provider: "object-storage" + region: "eu-central-1" # INTENTIONALLY WRONG +``` + +Run: + +```bash +./scripts/lint.sh +./scripts/run_opa.sh +``` + +You should see a deny message from `data_residency`. + +Ask each trainee to: + +* Point to **which file** is being rejected. +* Identify **which policy** raised the error. +* Explain the business reason (FR-only backups for critical data). + +--- + +### Step 3 - Fix the Backup Target + +Correct it: + +```yaml + region: "fr-central" +``` + +Re-run: + +```bash +./scripts/run_opa.sh # should pass now +``` + +Push branch, open MR, confirm `policy_gates` passes. + +--- + +### Step 4 - Deploy and Verify Backups + +After merge: + +1. Trigger `site_rollout EU-PAR-FR01`. +2. Verify in backup system UI / CRs that: + + * Policy is applied to the correct namespace(s). + * Target region is `fr-central`. +3. Run a **test backup** (e.g., using a “dry run” or test snapshot) and ensure no errors from controller. + +### Step 5 - Compliance Evidence Hook + +As part of the lab, show how you'd gather audit evidence: + +* Screenshot / log extract of: + + * `policy_gates` success for `fr-critical-sovereign-backup.yaml`. + * Backup controller logs confirming target region. +* Optional: put a short Markdown in `policies-and-compliance/docs/evidence/`: + + ```markdown + # EU-PAR-FR01 - Critical Sovereign Backup Residency Evidence + + - BackupPolicy: fr-critical-sovereign-backup + - Data classification: CRITICAL_SOVEREIGN_FR + - Target region: fr-central + - Evidence: + - CI job: link-to-pipeline + - Backup job logs: location/path + ``` + +This becomes a pattern for real audits. + +### Lab 2 Definition of Done + +* Non-compliant backup policy was **blocked by CI**, not discovered in production. +* Fixed backup policy is deployed with FR-only region. +* There is a repeatable way to gather **audit evidence** that backups respect residency. + +--- + +## Lab 3 - Admin Access, JIT & Audit Evidence + +**Theme:** +Admin access is high-risk. We want JIT, limited scope, and strong logging — all enforced via policy and pipelines. + +### Learning Objectives + +By the end of Lab 3, trainees can: + +* Encode admin access constraints as OPA policies. +* Configure a “normal” ops group vs a “JIT elevated” group. +* Show an audit trail for a temporary elevation. + +### Timebox + +~3 hours. + +### Step 0 - Scenario + +* Default: Only `sovereign-ops-admins@sovereign-ops.fr` can have cluster admin rights. +* In emergencies, an on-call engineer can get **time-limited elevation** via a “JIT admin” role, but: + + * They still appear as a **group** from the IdP. + * Every elevation must leave an audit trace (ticket, MR, pipeline). + +We simulate: + +1. A bad direct binding to an external user. +2. A corrected binding to a JIT group with clear expiry & evidence. + +--- + +### Step 1 - Check RBAC Policy + +**Repo:** `policies-and-compliance` +Ensure `opa-policies/rbac.rego` is something like: + +```rego +package rbac + +deny[msg] { + input.kind == "ClusterRoleBinding" + input.metadata.name == "cluster-admin" + some s + s := input.subjects[_] + s.kind == "User" + not endswith(s.name, "@sovereign-ops.fr") + msg := "cluster-admin bindings must target sovereign-ops.fr principals only" +} +``` + +We'll extend this in a moment to handle JIT groups. + +--- + +### Step 2 - Add a Bad Binding (Failure Injection) + +**Repo:** `platform-clusters` +**Branch:** `feat/d4-lab3-rbac` + +Manifest: + +```yaml +# k8s/clusters/eu-par-fr01/rbac/bad-cluster-admin.yaml +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: cluster-admin +subjects: + - kind: User + name: temp-admin@example.com # WRONG - external domain +roleRef: + kind: ClusterRole + name: cluster-admin + apiGroup: rbac.authorization.k8s.io +``` + +Run: + +```bash +./scripts/run_opa.sh +``` + +Confirm it fails and trainees can identify: + +* `kind: ClusterRoleBinding` +* `metadata.name: cluster-admin` +* subject `temp-admin@example.com` violates the rule. + +--- + +### Step 3 - Design the JIT Admin Pattern + +Work with the group to define: + +* A JIT group name, e.g. `sovereign-ops-jit-admins@sovereign-ops.fr` +* A **process**: + + * JIT elevation is created by MR with: + + * Reference to incident ticket. + * Time-bound comment / annotation. + * Removal is *another MR* reverting or removing the binding. + +Extend RBAC policy to allow either: + +* The default admins group, or +* The JIT group. + +Example: + +```rego +package rbac + +default deny := [] + +deny[msg] { + input.kind == "ClusterRoleBinding" + input.metadata.name == "cluster-admin" + some s + s := input.subjects[_] + s.kind == "Group" + not allowed_admin_group(s.name) + msg := sprintf("cluster-admin binding must target an allowed admin group, got %v", [s.name]) +} + +allowed_admin_group(name) { + name == "sovereign-ops-admins@sovereign-ops.fr" +} + +allowed_admin_group(name) { + name == "sovereign-ops-jit-admins@sovereign-ops.fr" +} +``` + +--- + +### Step 4 - Add a Correct Binding with Annotation + +Replace bad binding with: + +```yaml +# k8s/clusters/eu-par-fr01/rbac/cluster-admin-jit.yaml +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: cluster-admin + annotations: + jit-elevation: "true" + jit-ticket: "INC-2026-0001" + jit-expiry: "2026-12-31T23:59:59Z" +subjects: + - kind: Group + name: sovereign-ops-jit-admins@sovereign-ops.fr +roleRef: + kind: ClusterRole + name: cluster-admin + apiGroup: rbac.authorization.k8s.io +``` + +Run: + +```bash +./scripts/run_opa.sh # should pass +``` + +Push branch, open MR. Require: + +* Approval from **Security** and **Compliance** for this MR. + +After merge, run `site_rollout`. + +--- + +### Step 5 - JIT Elevation & De-Elevation (Audit Drill) + +Simulate: + +1. Incident occurs (fake incident ID). +2. JIT group gets populated at the IdP side (out of scope here, assume done). +3. The above binding is live in K8s after `site_rollout`. +4. After “incident resolution”, create a **follow-up MR** that: + + * Removes or comments out `cluster-admin-jit.yaml`. + * Or changes it to bind only the permanent admin group. + +Collect **evidence**: + +* CI & Git history: + + * MR that introduced JIT binding with `jit-*` annotations. + * MR that removed it. +* Optional: Add a small audit entry in `policies-and-compliance/docs/evidence/jit-admin-elevations.md`: + + ```markdown + # JIT Admin Elevation - INC-2026-0001 + + - Incident ID: INC-2026-0001 + - Site: EU-PAR-FR01 + - JIT group: sovereign-ops-jit-admins@sovereign-ops.fr + - Elevation MR: link-to-MR + - Expiry: 2026-12-31T23:59:59Z + - De-elevation MR: link-to-MR + ``` + +### Lab 3 Definition of Done + +* Bad per-user binding is caught and never deployed. +* JIT admin pattern (group + annotations + approvals) is implemented and enforced. +* There is a **documented pattern** for audit-ready JIT elevation. + +--- + +## D4 Overall Definition of Done + +When you've run Labs 1-3, you should have: + +1. **Sovereign-aware namespaces & labels** + + * Critical sovereign FR workloads sit in correctly labeled namespaces. + * Mislabel attempts are blocked by policy. + +2. **Residency-safe backup policies** + + * Backups for `CRITICAL_SOVEREIGN_FR` workloads target FR-only regions. + * Cross-border misconfigs are blocked at `policy_gates`. + +3. **Controlled admin access model** + + * Only approved groups can be cluster admins. + * JIT elevation is controlled, auditable, and time-bound. + +4. **Audit evidence patterns** + + * Simple Markdown docs + pipeline logs used as audit artefacts. + * Team knows how to demonstrate compliance, not just configure it. + +--- + +