Files
EU-startup/Micro–DC/docs/training/D4

586 lines
14 KiB
Plaintext

Alright, sovereignty & compliance time 👮‍♀️📜
Below is **D4 — Sovereignty & Compliance Labs Manual**, built to sit on top of D1-D3.
We'll treat D4 as a **focused lab block** rather than a long calendar:
3 micro-labs you can run as half-day sessions or one intense full day:
1. **Lab 1 - Data Classification & Sovereign Namespaces**
2. **Lab 2 - Backup & Cross-Border Data Residency**
3. **Lab 3 - Admin Access, JIT & Audit Evidence**
Everything assumes you already have:
* Repos: `infra-foundation`, `platform-clusters`, `policies-and-compliance`
* Pipelines: `lint_and_unit`, `policy_gates`, `integration_test`, `site_rollout`
* Site: **EU-PAR-FR01**, jurisdiction FR / EU-EEA, with data classifications including `CRITICAL_SOVEREIGN_FR`
---
# D4 - Sovereignty & Compliance Labs Manual
## Shared Foundations for All Labs
### Roles at the (virtual) table
* **Sovereign Compliance Lead / DPO**
* **Security Architect**
* **Platform/SRE engineer**
* **Tenant/Product representative** (for realistic requirements)
### Common Pre-Reqs
* `policies-and-compliance` contains at least:
* `data-classification.yaml`
* `opa-policies/data_residency.rego`
* `opa-policies/rbac.rego`
* `platform-clusters`:
* K8s mgmt cluster for EU-PAR-FR01 is up and GitOps-managed.
* Namespaces + StorageClasses for FR tenants exist or can be created.
* Trainees know how to:
* Branch → MR → CI → merge → `site_rollout`
* Run `./scripts/lint.sh` and `./scripts/run_opa.sh` locally.
---
## Lab 1 - Data Classification & Sovereign Namespaces
**Theme:**
Turn abstract GDPR / sovereignty rules into **concrete namespace & label design**, then enforce them via policy.
### Learning Objectives
By the end of Lab 1, trainees can:
* Map **business data types** to classification levels (PUBLIC / PERSONAL / CRITICAL_SOVEREIGN_FR, etc.).
* Design namespaces and labels that encode classification and jurisdiction.
* See how mislabeling is caught by `policy_gates`.
### Timebox
~2-3 hours.
### Step 0 - Scenario
A new tenant *Justice Ministry - Case Analytics* wants to run workloads in EU-PAR-FR01. They process:
* Criminal case data
* Personal identifiers
* Sensitive categories (e.g., ethnicity, health markers)
Compliance decision:
* This tenant's data is treated as **`CRITICAL_SOVEREIGN_FR`**
* Must never leave **France**; backups remain in FR.
### Step 1 - Update / Confirm Classification Rules
**Repo:** `policies-and-compliance`
**Branch:** `feat/d4-lab1-justice-classification`
1. Open `data-classification.yaml` and verify it has:
```yaml
levels:
- name: PUBLIC
- name: INTERNAL
- name: PERSONAL
- name: SENSITIVE_PERSONAL
- name: CRITICAL_SOVEREIGN_FR
residency:
CRITICAL_SOVEREIGN_FR:
must_stay_in_country: FR
SENSITIVE_PERSONAL:
must_stay_in_region: EU_EEA
```
2. If you need a tenant-specific label, optionally add:
```yaml
tenant_overlays:
justice_case_analytics:
base_level: CRITICAL_SOVEREIGN_FR
notes: "Justice ministry workloads with case data and identifiers"
```
3. Run:
```bash
./scripts/lint.sh
./scripts/run_opa.sh # should still pass
```
4. Push branch, open MR, ensure `policy_gates` passes.
---
### Step 2 - Create a Sovereign Namespace
**Repo:** `platform-clusters`
**Branch:** `feat/d4-lab1-justice-namespace`
Add:
`k8s/clusters/eu-par-fr01/namespaces/fr-critical-sovereign-justice.yaml`:
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: fr-critical-sovereign-justice
labels:
data_classification: CRITICAL_SOVEREIGN_FR
country: FR
tenant: justice_case_analytics
```
Run local checks:
```bash
./scripts/lint.sh
./scripts/run_opa.sh
```
If you've got a naming policy such as:
```rego
deny[msg] {
input.kind == "Namespace"
startswith(input.metadata.name, "fr-critical-sovereign-")
input.metadata.labels["data_classification"] != "CRITICAL_SOVEREIGN_FR"
msg := sprintf("namespace %v must be labeled CRITICAL_SOVEREIGN_FR", [input.metadata.name])
}
```
it should now pass.
---
### Step 3 - Intentional Mislabel (Training Failure)
To make the lab real:
1. Temporarily change the label in your branch:
```yaml
data_classification: SENSITIVE_PERSONAL # WRONG
```
2. Re-run `./scripts/run_opa.sh` and observe the **deny** from the naming/data-class policy.
3. Fix back to `CRITICAL_SOVEREIGN_FR`, rerun, confirm pass.
---
### Step 4 - Deploy Namespace via GitOps
Push branch, open MR, and once CI is green:
* Merge to `main`
* Trigger `site_rollout EU-PAR-FR01` (or let an environment pipeline do it).
* Verify via:
```bash
kubectl get ns fr-critical-sovereign-justice -o yaml
```
*Only for observation; no manual updates.*
### Lab 1 Definition of Done
* `fr-critical-sovereign-justice` namespace exists in the cluster with correct labels.
* Policy prevents mislabeling for namespaces that follow the `fr-critical-sovereign-*` pattern.
* Trainees understand how **classification → namespace → policy** flows end-to-end.
---
## Lab 2 - Backup & Cross-Border Data Residency
**Theme:**
Backups are the classic place sovereignty gets broken. This lab turns that into a controlled exercise.
### Learning Objectives
By the end of Lab 2, trainees can:
* Model backup policies for sovereign data.
* Understand how residency policies block illegal targets.
* Fix issues without relaxing policy (no “disable OPA” shortcuts).
### Timebox
~2-3 hours.
### Step 0 - Scenario
Justice tenant wants hourly backups of all critical sovereign namespaces:
* `fr-critical-sovereign-justice`
* `fr-critical-sovereign-ai` (from earlier labs)
However, a platform engineer mistakenly configures backups to region `eu-central-1` (with nodes in DE).
### Step 1 - Confirm Residency Policy
**Repo:** `policies-and-compliance`
Check that you have `opa-policies/data_residency.rego` with *at least*:
```rego
package data_residency
deny[msg] {
input.kind == "BackupPolicy"
input.metadata.labels["data_classification"] == "CRITICAL_SOVEREIGN_FR"
not input.spec.target.region == "fr-central"
msg := sprintf("critical FR data must backup to fr-central, got %v", [input.spec.target.region])
}
```
If missing, create it and wire `run_opa.sh` to use it.
---
### Step 2 - Add a Non-Compliant BackupPolicy (Failure Injection)
**Repo:** `platform-clusters`
**Branch:** `feat/d4-lab2-backup-residency`
`k8s/clusters/eu-par-fr01/backups/fr-critical-sovereign-backup.yaml`:
```yaml
apiVersion: backup.example.io/v1
kind: BackupPolicy
metadata:
name: fr-critical-sovereign-backup
labels:
data_classification: CRITICAL_SOVEREIGN_FR
tenant: justice_case_analytics
spec:
schedule: "0 * * * *"
target:
provider: "object-storage"
region: "eu-central-1" # INTENTIONALLY WRONG
```
Run:
```bash
./scripts/lint.sh
./scripts/run_opa.sh
```
You should see a deny message from `data_residency`.
Ask each trainee to:
* Point to **which file** is being rejected.
* Identify **which policy** raised the error.
* Explain the business reason (FR-only backups for critical data).
---
### Step 3 - Fix the Backup Target
Correct it:
```yaml
region: "fr-central"
```
Re-run:
```bash
./scripts/run_opa.sh # should pass now
```
Push branch, open MR, confirm `policy_gates` passes.
---
### Step 4 - Deploy and Verify Backups
After merge:
1. Trigger `site_rollout EU-PAR-FR01`.
2. Verify in backup system UI / CRs that:
* Policy is applied to the correct namespace(s).
* Target region is `fr-central`.
3. Run a **test backup** (e.g., using a “dry run” or test snapshot) and ensure no errors from controller.
### Step 5 - Compliance Evidence Hook
As part of the lab, show how you'd gather audit evidence:
* Screenshot / log extract of:
* `policy_gates` success for `fr-critical-sovereign-backup.yaml`.
* Backup controller logs confirming target region.
* Optional: put a short Markdown in `policies-and-compliance/docs/evidence/`:
```markdown
# EU-PAR-FR01 - Critical Sovereign Backup Residency Evidence
- BackupPolicy: fr-critical-sovereign-backup
- Data classification: CRITICAL_SOVEREIGN_FR
- Target region: fr-central
- Evidence:
- CI job: link-to-pipeline
- Backup job logs: location/path
```
This becomes a pattern for real audits.
### Lab 2 Definition of Done
* Non-compliant backup policy was **blocked by CI**, not discovered in production.
* Fixed backup policy is deployed with FR-only region.
* There is a repeatable way to gather **audit evidence** that backups respect residency.
---
## Lab 3 - Admin Access, JIT & Audit Evidence
**Theme:**
Admin access is high-risk. We want JIT, limited scope, and strong logging — all enforced via policy and pipelines.
### Learning Objectives
By the end of Lab 3, trainees can:
* Encode admin access constraints as OPA policies.
* Configure a “normal” ops group vs a “JIT elevated” group.
* Show an audit trail for a temporary elevation.
### Timebox
~3 hours.
### Step 0 - Scenario
* Default: Only `sovereign-ops-admins@sovereign-ops.fr` can have cluster admin rights.
* In emergencies, an on-call engineer can get **time-limited elevation** via a “JIT admin” role, but:
* They still appear as a **group** from the IdP.
* Every elevation must leave an audit trace (ticket, MR, pipeline).
We simulate:
1. A bad direct binding to an external user.
2. A corrected binding to a JIT group with clear expiry & evidence.
---
### Step 1 - Check RBAC Policy
**Repo:** `policies-and-compliance`
Ensure `opa-policies/rbac.rego` is something like:
```rego
package rbac
deny[msg] {
input.kind == "ClusterRoleBinding"
input.metadata.name == "cluster-admin"
some s
s := input.subjects[_]
s.kind == "User"
not endswith(s.name, "@sovereign-ops.fr")
msg := "cluster-admin bindings must target sovereign-ops.fr principals only"
}
```
We'll extend this in a moment to handle JIT groups.
---
### Step 2 - Add a Bad Binding (Failure Injection)
**Repo:** `platform-clusters`
**Branch:** `feat/d4-lab3-rbac`
Manifest:
```yaml
# k8s/clusters/eu-par-fr01/rbac/bad-cluster-admin.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cluster-admin
subjects:
- kind: User
name: temp-admin@example.com # WRONG - external domain
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
```
Run:
```bash
./scripts/run_opa.sh
```
Confirm it fails and trainees can identify:
* `kind: ClusterRoleBinding`
* `metadata.name: cluster-admin`
* subject `temp-admin@example.com` violates the rule.
---
### Step 3 - Design the JIT Admin Pattern
Work with the group to define:
* A JIT group name, e.g. `sovereign-ops-jit-admins@sovereign-ops.fr`
* A **process**:
* JIT elevation is created by MR with:
* Reference to incident ticket.
* Time-bound comment / annotation.
* Removal is *another MR* reverting or removing the binding.
Extend RBAC policy to allow either:
* The default admins group, or
* The JIT group.
Example:
```rego
package rbac
default deny := []
deny[msg] {
input.kind == "ClusterRoleBinding"
input.metadata.name == "cluster-admin"
some s
s := input.subjects[_]
s.kind == "Group"
not allowed_admin_group(s.name)
msg := sprintf("cluster-admin binding must target an allowed admin group, got %v", [s.name])
}
allowed_admin_group(name) {
name == "sovereign-ops-admins@sovereign-ops.fr"
}
allowed_admin_group(name) {
name == "sovereign-ops-jit-admins@sovereign-ops.fr"
}
```
---
### Step 4 - Add a Correct Binding with Annotation
Replace bad binding with:
```yaml
# k8s/clusters/eu-par-fr01/rbac/cluster-admin-jit.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cluster-admin
annotations:
jit-elevation: "true"
jit-ticket: "INC-2026-0001"
jit-expiry: "2026-12-31T23:59:59Z"
subjects:
- kind: Group
name: sovereign-ops-jit-admins@sovereign-ops.fr
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
```
Run:
```bash
./scripts/run_opa.sh # should pass
```
Push branch, open MR. Require:
* Approval from **Security** and **Compliance** for this MR.
After merge, run `site_rollout`.
---
### Step 5 - JIT Elevation & De-Elevation (Audit Drill)
Simulate:
1. Incident occurs (fake incident ID).
2. JIT group gets populated at the IdP side (out of scope here, assume done).
3. The above binding is live in K8s after `site_rollout`.
4. After “incident resolution”, create a **follow-up MR** that:
* Removes or comments out `cluster-admin-jit.yaml`.
* Or changes it to bind only the permanent admin group.
Collect **evidence**:
* CI & Git history:
* MR that introduced JIT binding with `jit-*` annotations.
* MR that removed it.
* Optional: Add a small audit entry in `policies-and-compliance/docs/evidence/jit-admin-elevations.md`:
```markdown
# JIT Admin Elevation - INC-2026-0001
- Incident ID: INC-2026-0001
- Site: EU-PAR-FR01
- JIT group: sovereign-ops-jit-admins@sovereign-ops.fr
- Elevation MR: link-to-MR
- Expiry: 2026-12-31T23:59:59Z
- De-elevation MR: link-to-MR
```
### Lab 3 Definition of Done
* Bad per-user binding is caught and never deployed.
* JIT admin pattern (group + annotations + approvals) is implemented and enforced.
* There is a **documented pattern** for audit-ready JIT elevation.
---
## D4 Overall Definition of Done
When you've run Labs 1-3, you should have:
1. **Sovereign-aware namespaces & labels**
* Critical sovereign FR workloads sit in correctly labeled namespaces.
* Mislabel attempts are blocked by policy.
2. **Residency-safe backup policies**
* Backups for `CRITICAL_SOVEREIGN_FR` workloads target FR-only regions.
* Cross-border misconfigs are blocked at `policy_gates`.
3. **Controlled admin access model**
* Only approved groups can be cluster admins.
* JIT elevation is controlled, auditable, and time-bound.
4. **Audit evidence patterns**
* Simple Markdown docs + pipeline logs used as audit artefacts.
* Team knows how to demonstrate compliance, not just configure it.
---