EU-startup/Micro–DC/docs/training/D4

Alright, sovereignty & compliance time 👮‍♀️📜

Below is **D4 — Sovereignty & Compliance Labs Manual**, built to sit on top of D1-D3.

We'll treat D4 as a **focused lab block** rather than a long calendar:
3 micro-labs you can run as half-day sessions or one intense full day:

1. **Lab 1 - Data Classification & Sovereign Namespaces**
2. **Lab 2 - Backup & Cross-Border Data Residency**
3. **Lab 3 - Admin Access, JIT & Audit Evidence**

Everything assumes you already have:

* Repos: `infra-foundation`, `platform-clusters`, `policies-and-compliance`
* Pipelines: `lint_and_unit`, `policy_gates`, `integration_test`, `site_rollout`
* Site: **EU-PAR-FR01**, jurisdiction FR / EU-EEA, with data classifications including `CRITICAL_SOVEREIGN_FR`

---

# D4 - Sovereignty & Compliance Labs Manual

## Shared Foundations for All Labs

### Roles at the (virtual) table

* **Sovereign Compliance Lead / DPO**
* **Security Architect**
* **Platform/SRE engineer**
* **Tenant/Product representative** (for realistic requirements)

### Common Pre-Reqs

* `policies-and-compliance` contains at least:

  * `data-classification.yaml`
  * `opa-policies/data_residency.rego`
  * `opa-policies/rbac.rego`
* `platform-clusters`:

  * K8s mgmt cluster for EU-PAR-FR01 is up and GitOps-managed.
  * Namespaces + StorageClasses for FR tenants exist or can be created.
* Trainees know how to:

  * Branch → MR → CI → merge → `site_rollout`
  * Run `./scripts/lint.sh` and `./scripts/run_opa.sh` locally.

---

## Lab 1 - Data Classification & Sovereign Namespaces

**Theme:**
Turn abstract GDPR / sovereignty rules into **concrete namespace & label design**, then enforce them via policy.

### Learning Objectives

By the end of Lab 1, trainees can:

* Map **business data types** to classification levels (PUBLIC / PERSONAL / CRITICAL_SOVEREIGN_FR, etc.).
* Design namespaces and labels that encode classification and jurisdiction.
* See how mislabeling is caught by `policy_gates`.

### Timebox

~2-3 hours.

### Step 0 - Scenario

A new tenant *Justice Ministry - Case Analytics* wants to run workloads in EU-PAR-FR01. They process:

* Criminal case data
* Personal identifiers
* Sensitive categories (e.g., ethnicity, health markers)

Compliance decision:

* This tenant's data is treated as **`CRITICAL_SOVEREIGN_FR`**
* Must never leave **France**; backups remain in FR.

### Step 1 - Update / Confirm Classification Rules

**Repo:** `policies-and-compliance`
**Branch:** `feat/d4-lab1-justice-classification`

1. Open `data-classification.yaml` and verify it has:

   ```yaml
   levels:
     - name: PUBLIC
     - name: INTERNAL
     - name: PERSONAL
     - name: SENSITIVE_PERSONAL
     - name: CRITICAL_SOVEREIGN_FR

   residency:
     CRITICAL_SOVEREIGN_FR:
       must_stay_in_country: FR
     SENSITIVE_PERSONAL:
       must_stay_in_region: EU_EEA
   ```

2. If you need a tenant-specific label, optionally add:

   ```yaml
   tenant_overlays:
     justice_case_analytics:
       base_level: CRITICAL_SOVEREIGN_FR
       notes: "Justice ministry workloads with case data and identifiers"
   ```

3. Run:

   ```bash
   ./scripts/lint.sh
   ./scripts/run_opa.sh   # should still pass
   ```

4. Push branch, open MR, ensure `policy_gates` passes.

---

### Step 2 - Create a Sovereign Namespace

**Repo:** `platform-clusters`
**Branch:** `feat/d4-lab1-justice-namespace`

Add:

`k8s/clusters/eu-par-fr01/namespaces/fr-critical-sovereign-justice.yaml`:

```yaml
apiVersion: v1
kind: Namespace
metadata:
  name: fr-critical-sovereign-justice
  labels:
    data_classification: CRITICAL_SOVEREIGN_FR
    country: FR
    tenant: justice_case_analytics
```

Run local checks:

```bash
./scripts/lint.sh
./scripts/run_opa.sh
```

If you've got a naming policy such as:

```rego
deny[msg] {
  input.kind == "Namespace"
  startswith(input.metadata.name, "fr-critical-sovereign-")
  input.metadata.labels["data_classification"] != "CRITICAL_SOVEREIGN_FR"
  msg := sprintf("namespace %v must be labeled CRITICAL_SOVEREIGN_FR", [input.metadata.name])
}
```

it should now pass.

---

### Step 3 - Intentional Mislabel (Training Failure)

To make the lab real:

1. Temporarily change the label in your branch:

   ```yaml
   data_classification: SENSITIVE_PERSONAL   # WRONG
   ```

2. Re-run `./scripts/run_opa.sh` and observe the **deny** from the naming/data-class policy.

3. Fix back to `CRITICAL_SOVEREIGN_FR`, rerun, confirm pass.

---

### Step 4 - Deploy Namespace via GitOps

Push branch, open MR, and once CI is green:

* Merge to `main`
* Trigger `site_rollout EU-PAR-FR01` (or let an environment pipeline do it).
* Verify via:

  ```bash
  kubectl get ns fr-critical-sovereign-justice -o yaml
  ```

*Only for observation; no manual updates.*

### Lab 1 Definition of Done

* `fr-critical-sovereign-justice` namespace exists in the cluster with correct labels.
* Policy prevents mislabeling for namespaces that follow the `fr-critical-sovereign-*` pattern.
* Trainees understand how **classification → namespace → policy** flows end-to-end.

---

## Lab 2 - Backup & Cross-Border Data Residency

**Theme:**
Backups are the classic place sovereignty gets broken. This lab turns that into a controlled exercise.

### Learning Objectives

By the end of Lab 2, trainees can:

* Model backup policies for sovereign data.
* Understand how residency policies block illegal targets.
* Fix issues without relaxing policy (no “disable OPA” shortcuts).

### Timebox

~2-3 hours.

### Step 0 - Scenario

Justice tenant wants hourly backups of all critical sovereign namespaces:

* `fr-critical-sovereign-justice`
* `fr-critical-sovereign-ai` (from earlier labs)

However, a platform engineer mistakenly configures backups to region `eu-central-1` (with nodes in DE).

### Step 1 - Confirm Residency Policy

**Repo:** `policies-and-compliance`
Check that you have `opa-policies/data_residency.rego` with *at least*:

```rego
package data_residency

deny[msg] {
  input.kind == "BackupPolicy"
  input.metadata.labels["data_classification"] == "CRITICAL_SOVEREIGN_FR"
  not input.spec.target.region == "fr-central"
  msg := sprintf("critical FR data must backup to fr-central, got %v", [input.spec.target.region])
}
```

If missing, create it and wire `run_opa.sh` to use it.

---

### Step 2 - Add a Non-Compliant BackupPolicy (Failure Injection)

**Repo:** `platform-clusters`
**Branch:** `feat/d4-lab2-backup-residency`

`k8s/clusters/eu-par-fr01/backups/fr-critical-sovereign-backup.yaml`:

```yaml
apiVersion: backup.example.io/v1
kind: BackupPolicy
metadata:
  name: fr-critical-sovereign-backup
  labels:
    data_classification: CRITICAL_SOVEREIGN_FR
    tenant: justice_case_analytics
spec:
  schedule: "0 * * * *"
  target:
    provider: "object-storage"
    region: "eu-central-1"   # INTENTIONALLY WRONG
```

Run:

```bash
./scripts/lint.sh
./scripts/run_opa.sh
```

You should see a deny message from `data_residency`.

Ask each trainee to:

* Point to **which file** is being rejected.
* Identify **which policy** raised the error.
* Explain the business reason (FR-only backups for critical data).

---

### Step 3 - Fix the Backup Target

Correct it:

```yaml
    region: "fr-central"
```

Re-run:

```bash
./scripts/run_opa.sh   # should pass now
```

Push branch, open MR, confirm `policy_gates` passes.

---

### Step 4 - Deploy and Verify Backups

After merge:

1. Trigger `site_rollout EU-PAR-FR01`.
2. Verify in backup system UI / CRs that:

   * Policy is applied to the correct namespace(s).
   * Target region is `fr-central`.
3. Run a **test backup** (e.g., using a “dry run” or test snapshot) and ensure no errors from controller.

### Step 5 - Compliance Evidence Hook

As part of the lab, show how you'd gather audit evidence:

* Screenshot / log extract of:

  * `policy_gates` success for `fr-critical-sovereign-backup.yaml`.
  * Backup controller logs confirming target region.
* Optional: put a short Markdown in `policies-and-compliance/docs/evidence/`:

  ```markdown
  # EU-PAR-FR01 - Critical Sovereign Backup Residency Evidence

  - BackupPolicy: fr-critical-sovereign-backup
  - Data classification: CRITICAL_SOVEREIGN_FR
  - Target region: fr-central
  - Evidence:
    - CI job: link-to-pipeline
    - Backup job logs: location/path
  ```

This becomes a pattern for real audits.

### Lab 2 Definition of Done

* Non-compliant backup policy was **blocked by CI**, not discovered in production.
* Fixed backup policy is deployed with FR-only region.
* There is a repeatable way to gather **audit evidence** that backups respect residency.

---

## Lab 3 - Admin Access, JIT & Audit Evidence

**Theme:**
Admin access is high-risk. We want JIT, limited scope, and strong logging — all enforced via policy and pipelines.

### Learning Objectives

By the end of Lab 3, trainees can:

* Encode admin access constraints as OPA policies.
* Configure a “normal” ops group vs a “JIT elevated” group.
* Show an audit trail for a temporary elevation.

### Timebox

~3 hours.

### Step 0 - Scenario

* Default: Only `sovereign-ops-admins@sovereign-ops.fr` can have cluster admin rights.
* In emergencies, an on-call engineer can get **time-limited elevation** via a “JIT admin” role, but:

  * They still appear as a **group** from the IdP.
  * Every elevation must leave an audit trace (ticket, MR, pipeline).

We simulate:

1. A bad direct binding to an external user.
2. A corrected binding to a JIT group with clear expiry & evidence.

---

### Step 1 - Check RBAC Policy

**Repo:** `policies-and-compliance`
Ensure `opa-policies/rbac.rego` is something like:

```rego
package rbac

deny[msg] {
  input.kind == "ClusterRoleBinding"
  input.metadata.name == "cluster-admin"
  some s
  s := input.subjects[_]
  s.kind == "User"
  not endswith(s.name, "@sovereign-ops.fr")
  msg := "cluster-admin bindings must target sovereign-ops.fr principals only"
}
```

We'll extend this in a moment to handle JIT groups.

---

### Step 2 - Add a Bad Binding (Failure Injection)

**Repo:** `platform-clusters`
**Branch:** `feat/d4-lab3-rbac`

Manifest:

```yaml
# k8s/clusters/eu-par-fr01/rbac/bad-cluster-admin.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cluster-admin
subjects:
  - kind: User
    name: temp-admin@example.com   # WRONG - external domain
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
```

Run:

```bash
./scripts/run_opa.sh
```

Confirm it fails and trainees can identify:

* `kind: ClusterRoleBinding`
* `metadata.name: cluster-admin`
* subject `temp-admin@example.com` violates the rule.

---

### Step 3 - Design the JIT Admin Pattern

Work with the group to define:

* A JIT group name, e.g. `sovereign-ops-jit-admins@sovereign-ops.fr`
* A **process**:

  * JIT elevation is created by MR with:

    * Reference to incident ticket.
    * Time-bound comment / annotation.
  * Removal is *another MR* reverting or removing the binding.

Extend RBAC policy to allow either:

* The default admins group, or
* The JIT group.

Example:

```rego
package rbac

default deny := []

deny[msg] {
  input.kind == "ClusterRoleBinding"
  input.metadata.name == "cluster-admin"
  some s
  s := input.subjects[_]
  s.kind == "Group"
  not allowed_admin_group(s.name)
  msg := sprintf("cluster-admin binding must target an allowed admin group, got %v", [s.name])
}

allowed_admin_group(name) {
  name == "sovereign-ops-admins@sovereign-ops.fr"
}

allowed_admin_group(name) {
  name == "sovereign-ops-jit-admins@sovereign-ops.fr"
}
```

---

### Step 4 - Add a Correct Binding with Annotation

Replace bad binding with:

```yaml
# k8s/clusters/eu-par-fr01/rbac/cluster-admin-jit.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cluster-admin
  annotations:
    jit-elevation: "true"
    jit-ticket: "INC-2026-0001"
    jit-expiry: "2026-12-31T23:59:59Z"
subjects:
  - kind: Group
    name: sovereign-ops-jit-admins@sovereign-ops.fr
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
```

Run:

```bash
./scripts/run_opa.sh   # should pass
```

Push branch, open MR. Require:

* Approval from **Security** and **Compliance** for this MR.

After merge, run `site_rollout`.

---

### Step 5 - JIT Elevation & De-Elevation (Audit Drill)

Simulate:

1. Incident occurs (fake incident ID).
2. JIT group gets populated at the IdP side (out of scope here, assume done).
3. The above binding is live in K8s after `site_rollout`.
4. After “incident resolution”, create a **follow-up MR** that:

   * Removes or comments out `cluster-admin-jit.yaml`.
   * Or changes it to bind only the permanent admin group.

Collect **evidence**:

* CI & Git history:

  * MR that introduced JIT binding with `jit-*` annotations.
  * MR that removed it.
* Optional: Add a small audit entry in `policies-and-compliance/docs/evidence/jit-admin-elevations.md`:

  ```markdown
  # JIT Admin Elevation - INC-2026-0001

  - Incident ID: INC-2026-0001
  - Site: EU-PAR-FR01
  - JIT group: sovereign-ops-jit-admins@sovereign-ops.fr
  - Elevation MR: link-to-MR
  - Expiry: 2026-12-31T23:59:59Z
  - De-elevation MR: link-to-MR
  ```

### Lab 3 Definition of Done

* Bad per-user binding is caught and never deployed.
* JIT admin pattern (group + annotations + approvals) is implemented and enforced.
* There is a **documented pattern** for audit-ready JIT elevation.

---

## D4 Overall Definition of Done

When you've run Labs 1-3, you should have:

1. **Sovereign-aware namespaces & labels**

   * Critical sovereign FR workloads sit in correctly labeled namespaces.
   * Mislabel attempts are blocked by policy.

2. **Residency-safe backup policies**

   * Backups for `CRITICAL_SOVEREIGN_FR` workloads target FR-only regions.
   * Cross-border misconfigs are blocked at `policy_gates`.

3. **Controlled admin access model**

   * Only approved groups can be cluster admins.
   * JIT elevation is controlled, auditable, and time-bound.

4. **Audit evidence patterns**

   * Simple Markdown docs + pipeline logs used as audit artefacts.
   * Team knows how to demonstrate compliance, not just configure it.

---