Add Micro–DC/docs/training/D4

2025-12-05 12:55:41 +00:00
parent e85d9ca87e
commit 90b0b8e41e
1 changed files with 585 additions and 0 deletions
--- a/Micro–DC/docs/training/D4
+++ b/Micro–DC/docs/training/D4
@@ -0,0 +1,585 @@
+Alright, sovereignty & compliance time 👮‍♀️📜
+
+Below is **D4 — Sovereignty & Compliance Labs Manual**, built to sit on top of D1-D3.
+
+We'll treat D4 as a **focused lab block** rather than a long calendar:
+3 micro-labs you can run as half-day sessions or one intense full day:
+
+1. **Lab 1 - Data Classification & Sovereign Namespaces**
+2. **Lab 2 - Backup & Cross-Border Data Residency**
+3. **Lab 3 - Admin Access, JIT & Audit Evidence**
+
+Everything assumes you already have:
+
+* Repos: `infra-foundation`, `platform-clusters`, `policies-and-compliance`
+* Pipelines: `lint_and_unit`, `policy_gates`, `integration_test`, `site_rollout`
+* Site: **EU-PAR-FR01**, jurisdiction FR / EU-EEA, with data classifications including `CRITICAL_SOVEREIGN_FR`
+
+---
+
+# D4 - Sovereignty & Compliance Labs Manual
+
+## Shared Foundations for All Labs
+
+### Roles at the (virtual) table
+
+* **Sovereign Compliance Lead / DPO**
+* **Security Architect**
+* **Platform/SRE engineer**
+* **Tenant/Product representative** (for realistic requirements)
+
+### Common Pre-Reqs
+
+* `policies-and-compliance` contains at least:
+
+  * `data-classification.yaml`
+  * `opa-policies/data_residency.rego`
+  * `opa-policies/rbac.rego`
+* `platform-clusters`:
+
+  * K8s mgmt cluster for EU-PAR-FR01 is up and GitOps-managed.
+  * Namespaces + StorageClasses for FR tenants exist or can be created.
+* Trainees know how to:
+
+  * Branch → MR → CI → merge → `site_rollout`
+  * Run `./scripts/lint.sh` and `./scripts/run_opa.sh` locally.
+
+---
+
+## Lab 1 - Data Classification & Sovereign Namespaces
+
+**Theme:**
+Turn abstract GDPR / sovereignty rules into **concrete namespace & label design**, then enforce them via policy.
+
+### Learning Objectives
+
+By the end of Lab 1, trainees can:
+
+* Map **business data types** to classification levels (PUBLIC / PERSONAL / CRITICAL_SOVEREIGN_FR, etc.).
+* Design namespaces and labels that encode classification and jurisdiction.
+* See how mislabeling is caught by `policy_gates`.
+
+### Timebox
+
+~2-3 hours.
+
+### Step 0 - Scenario
+
+A new tenant *Justice Ministry - Case Analytics* wants to run workloads in EU-PAR-FR01. They process:
+
+* Criminal case data
+* Personal identifiers
+* Sensitive categories (e.g., ethnicity, health markers)
+
+Compliance decision:
+
+* This tenant's data is treated as **`CRITICAL_SOVEREIGN_FR`**
+* Must never leave **France**; backups remain in FR.
+
+### Step 1 - Update / Confirm Classification Rules
+
+**Repo:** `policies-and-compliance`
+**Branch:** `feat/d4-lab1-justice-classification`
+
+1. Open `data-classification.yaml` and verify it has:
+
+   ```yaml
+   levels:
+     - name: PUBLIC
+     - name: INTERNAL
+     - name: PERSONAL
+     - name: SENSITIVE_PERSONAL
+     - name: CRITICAL_SOVEREIGN_FR
+
+   residency:
+     CRITICAL_SOVEREIGN_FR:
+       must_stay_in_country: FR
+     SENSITIVE_PERSONAL:
+       must_stay_in_region: EU_EEA
+   ```
+
+2. If you need a tenant-specific label, optionally add:
+
+   ```yaml
+   tenant_overlays:
+     justice_case_analytics:
+       base_level: CRITICAL_SOVEREIGN_FR
+       notes: "Justice ministry workloads with case data and identifiers"
+   ```
+
+3. Run:
+
+   ```bash
+   ./scripts/lint.sh
+   ./scripts/run_opa.sh   # should still pass
+   ```
+
+4. Push branch, open MR, ensure `policy_gates` passes.
+
+---
+
+### Step 2 - Create a Sovereign Namespace
+
+**Repo:** `platform-clusters`
+**Branch:** `feat/d4-lab1-justice-namespace`
+
+Add:
+
+`k8s/clusters/eu-par-fr01/namespaces/fr-critical-sovereign-justice.yaml`:
+
+```yaml
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: fr-critical-sovereign-justice
+  labels:
+    data_classification: CRITICAL_SOVEREIGN_FR
+    country: FR
+    tenant: justice_case_analytics
+```
+
+Run local checks:
+
+```bash
+./scripts/lint.sh
+./scripts/run_opa.sh
+```
+
+If you've got a naming policy such as:
+
+```rego
+deny[msg] {
+  input.kind == "Namespace"
+  startswith(input.metadata.name, "fr-critical-sovereign-")
+  input.metadata.labels["data_classification"] != "CRITICAL_SOVEREIGN_FR"
+  msg := sprintf("namespace %v must be labeled CRITICAL_SOVEREIGN_FR", [input.metadata.name])
+}
+```
+
+it should now pass.
+
+---
+
+### Step 3 - Intentional Mislabel (Training Failure)
+
+To make the lab real:
+
+1. Temporarily change the label in your branch:
+
+   ```yaml
+   data_classification: SENSITIVE_PERSONAL   # WRONG
+   ```
+
+2. Re-run `./scripts/run_opa.sh` and observe the **deny** from the naming/data-class policy.
+
+3. Fix back to `CRITICAL_SOVEREIGN_FR`, rerun, confirm pass.
+
+---
+
+### Step 4 - Deploy Namespace via GitOps
+
+Push branch, open MR, and once CI is green:
+
+* Merge to `main`
+* Trigger `site_rollout EU-PAR-FR01` (or let an environment pipeline do it).
+* Verify via:
+
+  ```bash
+  kubectl get ns fr-critical-sovereign-justice -o yaml
+  ```
+
+*Only for observation; no manual updates.*
+
+### Lab 1 Definition of Done
+
+* `fr-critical-sovereign-justice` namespace exists in the cluster with correct labels.
+* Policy prevents mislabeling for namespaces that follow the `fr-critical-sovereign-*` pattern.
+* Trainees understand how **classification → namespace → policy** flows end-to-end.
+
+---
+
+## Lab 2 - Backup & Cross-Border Data Residency
+
+**Theme:**
+Backups are the classic place sovereignty gets broken. This lab turns that into a controlled exercise.
+
+### Learning Objectives
+
+By the end of Lab 2, trainees can:
+
+* Model backup policies for sovereign data.
+* Understand how residency policies block illegal targets.
+* Fix issues without relaxing policy (no “disable OPA” shortcuts).
+
+### Timebox
+
+~2-3 hours.
+
+### Step 0 - Scenario
+
+Justice tenant wants hourly backups of all critical sovereign namespaces:
+
+* `fr-critical-sovereign-justice`
+* `fr-critical-sovereign-ai` (from earlier labs)
+
+However, a platform engineer mistakenly configures backups to region `eu-central-1` (with nodes in DE).
+
+### Step 1 - Confirm Residency Policy
+
+**Repo:** `policies-and-compliance`
+Check that you have `opa-policies/data_residency.rego` with *at least*:
+
+```rego
+package data_residency
+
+deny[msg] {
+  input.kind == "BackupPolicy"
+  input.metadata.labels["data_classification"] == "CRITICAL_SOVEREIGN_FR"
+  not input.spec.target.region == "fr-central"
+  msg := sprintf("critical FR data must backup to fr-central, got %v", [input.spec.target.region])
+}
+```
+
+If missing, create it and wire `run_opa.sh` to use it.
+
+---
+
+### Step 2 - Add a Non-Compliant BackupPolicy (Failure Injection)
+
+**Repo:** `platform-clusters`
+**Branch:** `feat/d4-lab2-backup-residency`
+
+`k8s/clusters/eu-par-fr01/backups/fr-critical-sovereign-backup.yaml`:
+
+```yaml
+apiVersion: backup.example.io/v1
+kind: BackupPolicy
+metadata:
+  name: fr-critical-sovereign-backup
+  labels:
+    data_classification: CRITICAL_SOVEREIGN_FR
+    tenant: justice_case_analytics
+spec:
+  schedule: "0 * * * *"
+  target:
+    provider: "object-storage"
+    region: "eu-central-1"   # INTENTIONALLY WRONG
+```
+
+Run:
+
+```bash
+./scripts/lint.sh
+./scripts/run_opa.sh
+```
+
+You should see a deny message from `data_residency`.
+
+Ask each trainee to:
+
+* Point to **which file** is being rejected.
+* Identify **which policy** raised the error.
+* Explain the business reason (FR-only backups for critical data).
+
+---
+
+### Step 3 - Fix the Backup Target
+
+Correct it:
+
+```yaml
+    region: "fr-central"
+```
+
+Re-run:
+
+```bash
+./scripts/run_opa.sh   # should pass now
+```
+
+Push branch, open MR, confirm `policy_gates` passes.
+
+---
+
+### Step 4 - Deploy and Verify Backups
+
+After merge:
+
+1. Trigger `site_rollout EU-PAR-FR01`.
+2. Verify in backup system UI / CRs that:
+
+   * Policy is applied to the correct namespace(s).
+   * Target region is `fr-central`.
+3. Run a **test backup** (e.g., using a “dry run” or test snapshot) and ensure no errors from controller.
+
+### Step 5 - Compliance Evidence Hook
+
+As part of the lab, show how you'd gather audit evidence:
+
+* Screenshot / log extract of:
+
+  * `policy_gates` success for `fr-critical-sovereign-backup.yaml`.
+  * Backup controller logs confirming target region.
+* Optional: put a short Markdown in `policies-and-compliance/docs/evidence/`:
+
+  ```markdown
+  # EU-PAR-FR01 - Critical Sovereign Backup Residency Evidence
+
+  - BackupPolicy: fr-critical-sovereign-backup
+  - Data classification: CRITICAL_SOVEREIGN_FR
+  - Target region: fr-central
+  - Evidence:
+    - CI job: link-to-pipeline
+    - Backup job logs: location/path
+  ```
+
+This becomes a pattern for real audits.
+
+### Lab 2 Definition of Done
+
+* Non-compliant backup policy was **blocked by CI**, not discovered in production.
+* Fixed backup policy is deployed with FR-only region.
+* There is a repeatable way to gather **audit evidence** that backups respect residency.
+
+---
+
+## Lab 3 - Admin Access, JIT & Audit Evidence
+
+**Theme:**
+Admin access is high-risk. We want JIT, limited scope, and strong logging — all enforced via policy and pipelines.
+
+### Learning Objectives
+
+By the end of Lab 3, trainees can:
+
+* Encode admin access constraints as OPA policies.
+* Configure a “normal” ops group vs a “JIT elevated” group.
+* Show an audit trail for a temporary elevation.
+
+### Timebox
+
+~3 hours.
+
+### Step 0 - Scenario
+
+* Default: Only `sovereign-ops-admins@sovereign-ops.fr` can have cluster admin rights.
+* In emergencies, an on-call engineer can get **time-limited elevation** via a “JIT admin” role, but:
+
+  * They still appear as a **group** from the IdP.
+  * Every elevation must leave an audit trace (ticket, MR, pipeline).
+
+We simulate:
+
+1. A bad direct binding to an external user.
+2. A corrected binding to a JIT group with clear expiry & evidence.
+
+---
+
+### Step 1 - Check RBAC Policy
+
+**Repo:** `policies-and-compliance`
+Ensure `opa-policies/rbac.rego` is something like:
+
+```rego
+package rbac
+
+deny[msg] {
+  input.kind == "ClusterRoleBinding"
+  input.metadata.name == "cluster-admin"
+  some s
+  s := input.subjects[_]
+  s.kind == "User"
+  not endswith(s.name, "@sovereign-ops.fr")
+  msg := "cluster-admin bindings must target sovereign-ops.fr principals only"
+}
+```
+
+We'll extend this in a moment to handle JIT groups.
+
+---
+
+### Step 2 - Add a Bad Binding (Failure Injection)
+
+**Repo:** `platform-clusters`
+**Branch:** `feat/d4-lab3-rbac`
+
+Manifest:
+
+```yaml
+# k8s/clusters/eu-par-fr01/rbac/bad-cluster-admin.yaml
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: cluster-admin
+subjects:
+  - kind: User
+    name: temp-admin@example.com   # WRONG - external domain
+roleRef:
+  kind: ClusterRole
+  name: cluster-admin
+  apiGroup: rbac.authorization.k8s.io
+```
+
+Run:
+
+```bash
+./scripts/run_opa.sh
+```
+
+Confirm it fails and trainees can identify:
+
+* `kind: ClusterRoleBinding`
+* `metadata.name: cluster-admin`
+* subject `temp-admin@example.com` violates the rule.
+
+---
+
+### Step 3 - Design the JIT Admin Pattern
+
+Work with the group to define:
+
+* A JIT group name, e.g. `sovereign-ops-jit-admins@sovereign-ops.fr`
+* A **process**:
+
+  * JIT elevation is created by MR with:
+
+    * Reference to incident ticket.
+    * Time-bound comment / annotation.
+  * Removal is *another MR* reverting or removing the binding.
+
+Extend RBAC policy to allow either:
+
+* The default admins group, or
+* The JIT group.
+
+Example:
+
+```rego
+package rbac
+
+default deny := []
+
+deny[msg] {
+  input.kind == "ClusterRoleBinding"
+  input.metadata.name == "cluster-admin"
+  some s
+  s := input.subjects[_]
+  s.kind == "Group"
+  not allowed_admin_group(s.name)
+  msg := sprintf("cluster-admin binding must target an allowed admin group, got %v", [s.name])
+}
+
+allowed_admin_group(name) {
+  name == "sovereign-ops-admins@sovereign-ops.fr"
+}
+
+allowed_admin_group(name) {
+  name == "sovereign-ops-jit-admins@sovereign-ops.fr"
+}
+```
+
+---
+
+### Step 4 - Add a Correct Binding with Annotation
+
+Replace bad binding with:
+
+```yaml
+# k8s/clusters/eu-par-fr01/rbac/cluster-admin-jit.yaml
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: cluster-admin
+  annotations:
+    jit-elevation: "true"
+    jit-ticket: "INC-2026-0001"
+    jit-expiry: "2026-12-31T23:59:59Z"
+subjects:
+  - kind: Group
+    name: sovereign-ops-jit-admins@sovereign-ops.fr
+roleRef:
+  kind: ClusterRole
+  name: cluster-admin
+  apiGroup: rbac.authorization.k8s.io
+```
+
+Run:
+
+```bash
+./scripts/run_opa.sh   # should pass
+```
+
+Push branch, open MR. Require:
+
+* Approval from **Security** and **Compliance** for this MR.
+
+After merge, run `site_rollout`.
+
+---
+
+### Step 5 - JIT Elevation & De-Elevation (Audit Drill)
+
+Simulate:
+
+1. Incident occurs (fake incident ID).
+2. JIT group gets populated at the IdP side (out of scope here, assume done).
+3. The above binding is live in K8s after `site_rollout`.
+4. After “incident resolution”, create a **follow-up MR** that:
+
+   * Removes or comments out `cluster-admin-jit.yaml`.
+   * Or changes it to bind only the permanent admin group.
+
+Collect **evidence**:
+
+* CI & Git history:
+
+  * MR that introduced JIT binding with `jit-*` annotations.
+  * MR that removed it.
+* Optional: Add a small audit entry in `policies-and-compliance/docs/evidence/jit-admin-elevations.md`:
+
+  ```markdown
+  # JIT Admin Elevation - INC-2026-0001
+
+  - Incident ID: INC-2026-0001
+  - Site: EU-PAR-FR01
+  - JIT group: sovereign-ops-jit-admins@sovereign-ops.fr
+  - Elevation MR: link-to-MR
+  - Expiry: 2026-12-31T23:59:59Z
+  - De-elevation MR: link-to-MR
+  ```
+
+### Lab 3 Definition of Done
+
+* Bad per-user binding is caught and never deployed.
+* JIT admin pattern (group + annotations + approvals) is implemented and enforced.
+* There is a **documented pattern** for audit-ready JIT elevation.
+
+---
+
+## D4 Overall Definition of Done
+
+When you've run Labs 1-3, you should have:
+
+1. **Sovereign-aware namespaces & labels**
+
+   * Critical sovereign FR workloads sit in correctly labeled namespaces.
+   * Mislabel attempts are blocked by policy.
+
+2. **Residency-safe backup policies**
+
+   * Backups for `CRITICAL_SOVEREIGN_FR` workloads target FR-only regions.
+   * Cross-border misconfigs are blocked at `policy_gates`.
+
+3. **Controlled admin access model**
+
+   * Only approved groups can be cluster admins.
+   * JIT elevation is controlled, auditable, and time-bound.
+
+4. **Audit evidence patterns**
+
+   * Simple Markdown docs + pipeline logs used as audit artefacts.
+   * Team knows how to demonstrate compliance, not just configure it.
+
+---
+
+