Add Micro–DC/docs/training/D4
This commit is contained in:
585
Micro–DC/docs/training/D4
Normal file
585
Micro–DC/docs/training/D4
Normal file
@@ -0,0 +1,585 @@
|
||||
Alright, sovereignty & compliance time 👮♀️📜
|
||||
|
||||
Below is **D4 — Sovereignty & Compliance Labs Manual**, built to sit on top of D1-D3.
|
||||
|
||||
We'll treat D4 as a **focused lab block** rather than a long calendar:
|
||||
3 micro-labs you can run as half-day sessions or one intense full day:
|
||||
|
||||
1. **Lab 1 - Data Classification & Sovereign Namespaces**
|
||||
2. **Lab 2 - Backup & Cross-Border Data Residency**
|
||||
3. **Lab 3 - Admin Access, JIT & Audit Evidence**
|
||||
|
||||
Everything assumes you already have:
|
||||
|
||||
* Repos: `infra-foundation`, `platform-clusters`, `policies-and-compliance`
|
||||
* Pipelines: `lint_and_unit`, `policy_gates`, `integration_test`, `site_rollout`
|
||||
* Site: **EU-PAR-FR01**, jurisdiction FR / EU-EEA, with data classifications including `CRITICAL_SOVEREIGN_FR`
|
||||
|
||||
---
|
||||
|
||||
# D4 - Sovereignty & Compliance Labs Manual
|
||||
|
||||
## Shared Foundations for All Labs
|
||||
|
||||
### Roles at the (virtual) table
|
||||
|
||||
* **Sovereign Compliance Lead / DPO**
|
||||
* **Security Architect**
|
||||
* **Platform/SRE engineer**
|
||||
* **Tenant/Product representative** (for realistic requirements)
|
||||
|
||||
### Common Pre-Reqs
|
||||
|
||||
* `policies-and-compliance` contains at least:
|
||||
|
||||
* `data-classification.yaml`
|
||||
* `opa-policies/data_residency.rego`
|
||||
* `opa-policies/rbac.rego`
|
||||
* `platform-clusters`:
|
||||
|
||||
* K8s mgmt cluster for EU-PAR-FR01 is up and GitOps-managed.
|
||||
* Namespaces + StorageClasses for FR tenants exist or can be created.
|
||||
* Trainees know how to:
|
||||
|
||||
* Branch → MR → CI → merge → `site_rollout`
|
||||
* Run `./scripts/lint.sh` and `./scripts/run_opa.sh` locally.
|
||||
|
||||
---
|
||||
|
||||
## Lab 1 - Data Classification & Sovereign Namespaces
|
||||
|
||||
**Theme:**
|
||||
Turn abstract GDPR / sovereignty rules into **concrete namespace & label design**, then enforce them via policy.
|
||||
|
||||
### Learning Objectives
|
||||
|
||||
By the end of Lab 1, trainees can:
|
||||
|
||||
* Map **business data types** to classification levels (PUBLIC / PERSONAL / CRITICAL_SOVEREIGN_FR, etc.).
|
||||
* Design namespaces and labels that encode classification and jurisdiction.
|
||||
* See how mislabeling is caught by `policy_gates`.
|
||||
|
||||
### Timebox
|
||||
|
||||
~2-3 hours.
|
||||
|
||||
### Step 0 - Scenario
|
||||
|
||||
A new tenant *Justice Ministry - Case Analytics* wants to run workloads in EU-PAR-FR01. They process:
|
||||
|
||||
* Criminal case data
|
||||
* Personal identifiers
|
||||
* Sensitive categories (e.g., ethnicity, health markers)
|
||||
|
||||
Compliance decision:
|
||||
|
||||
* This tenant's data is treated as **`CRITICAL_SOVEREIGN_FR`**
|
||||
* Must never leave **France**; backups remain in FR.
|
||||
|
||||
### Step 1 - Update / Confirm Classification Rules
|
||||
|
||||
**Repo:** `policies-and-compliance`
|
||||
**Branch:** `feat/d4-lab1-justice-classification`
|
||||
|
||||
1. Open `data-classification.yaml` and verify it has:
|
||||
|
||||
```yaml
|
||||
levels:
|
||||
- name: PUBLIC
|
||||
- name: INTERNAL
|
||||
- name: PERSONAL
|
||||
- name: SENSITIVE_PERSONAL
|
||||
- name: CRITICAL_SOVEREIGN_FR
|
||||
|
||||
residency:
|
||||
CRITICAL_SOVEREIGN_FR:
|
||||
must_stay_in_country: FR
|
||||
SENSITIVE_PERSONAL:
|
||||
must_stay_in_region: EU_EEA
|
||||
```
|
||||
|
||||
2. If you need a tenant-specific label, optionally add:
|
||||
|
||||
```yaml
|
||||
tenant_overlays:
|
||||
justice_case_analytics:
|
||||
base_level: CRITICAL_SOVEREIGN_FR
|
||||
notes: "Justice ministry workloads with case data and identifiers"
|
||||
```
|
||||
|
||||
3. Run:
|
||||
|
||||
```bash
|
||||
./scripts/lint.sh
|
||||
./scripts/run_opa.sh # should still pass
|
||||
```
|
||||
|
||||
4. Push branch, open MR, ensure `policy_gates` passes.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 - Create a Sovereign Namespace
|
||||
|
||||
**Repo:** `platform-clusters`
|
||||
**Branch:** `feat/d4-lab1-justice-namespace`
|
||||
|
||||
Add:
|
||||
|
||||
`k8s/clusters/eu-par-fr01/namespaces/fr-critical-sovereign-justice.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: fr-critical-sovereign-justice
|
||||
labels:
|
||||
data_classification: CRITICAL_SOVEREIGN_FR
|
||||
country: FR
|
||||
tenant: justice_case_analytics
|
||||
```
|
||||
|
||||
Run local checks:
|
||||
|
||||
```bash
|
||||
./scripts/lint.sh
|
||||
./scripts/run_opa.sh
|
||||
```
|
||||
|
||||
If you've got a naming policy such as:
|
||||
|
||||
```rego
|
||||
deny[msg] {
|
||||
input.kind == "Namespace"
|
||||
startswith(input.metadata.name, "fr-critical-sovereign-")
|
||||
input.metadata.labels["data_classification"] != "CRITICAL_SOVEREIGN_FR"
|
||||
msg := sprintf("namespace %v must be labeled CRITICAL_SOVEREIGN_FR", [input.metadata.name])
|
||||
}
|
||||
```
|
||||
|
||||
it should now pass.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 - Intentional Mislabel (Training Failure)
|
||||
|
||||
To make the lab real:
|
||||
|
||||
1. Temporarily change the label in your branch:
|
||||
|
||||
```yaml
|
||||
data_classification: SENSITIVE_PERSONAL # WRONG
|
||||
```
|
||||
|
||||
2. Re-run `./scripts/run_opa.sh` and observe the **deny** from the naming/data-class policy.
|
||||
|
||||
3. Fix back to `CRITICAL_SOVEREIGN_FR`, rerun, confirm pass.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 - Deploy Namespace via GitOps
|
||||
|
||||
Push branch, open MR, and once CI is green:
|
||||
|
||||
* Merge to `main`
|
||||
* Trigger `site_rollout EU-PAR-FR01` (or let an environment pipeline do it).
|
||||
* Verify via:
|
||||
|
||||
```bash
|
||||
kubectl get ns fr-critical-sovereign-justice -o yaml
|
||||
```
|
||||
|
||||
*Only for observation; no manual updates.*
|
||||
|
||||
### Lab 1 Definition of Done
|
||||
|
||||
* `fr-critical-sovereign-justice` namespace exists in the cluster with correct labels.
|
||||
* Policy prevents mislabeling for namespaces that follow the `fr-critical-sovereign-*` pattern.
|
||||
* Trainees understand how **classification → namespace → policy** flows end-to-end.
|
||||
|
||||
---
|
||||
|
||||
## Lab 2 - Backup & Cross-Border Data Residency
|
||||
|
||||
**Theme:**
|
||||
Backups are the classic place sovereignty gets broken. This lab turns that into a controlled exercise.
|
||||
|
||||
### Learning Objectives
|
||||
|
||||
By the end of Lab 2, trainees can:
|
||||
|
||||
* Model backup policies for sovereign data.
|
||||
* Understand how residency policies block illegal targets.
|
||||
* Fix issues without relaxing policy (no “disable OPA” shortcuts).
|
||||
|
||||
### Timebox
|
||||
|
||||
~2-3 hours.
|
||||
|
||||
### Step 0 - Scenario
|
||||
|
||||
Justice tenant wants hourly backups of all critical sovereign namespaces:
|
||||
|
||||
* `fr-critical-sovereign-justice`
|
||||
* `fr-critical-sovereign-ai` (from earlier labs)
|
||||
|
||||
However, a platform engineer mistakenly configures backups to region `eu-central-1` (with nodes in DE).
|
||||
|
||||
### Step 1 - Confirm Residency Policy
|
||||
|
||||
**Repo:** `policies-and-compliance`
|
||||
Check that you have `opa-policies/data_residency.rego` with *at least*:
|
||||
|
||||
```rego
|
||||
package data_residency
|
||||
|
||||
deny[msg] {
|
||||
input.kind == "BackupPolicy"
|
||||
input.metadata.labels["data_classification"] == "CRITICAL_SOVEREIGN_FR"
|
||||
not input.spec.target.region == "fr-central"
|
||||
msg := sprintf("critical FR data must backup to fr-central, got %v", [input.spec.target.region])
|
||||
}
|
||||
```
|
||||
|
||||
If missing, create it and wire `run_opa.sh` to use it.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 - Add a Non-Compliant BackupPolicy (Failure Injection)
|
||||
|
||||
**Repo:** `platform-clusters`
|
||||
**Branch:** `feat/d4-lab2-backup-residency`
|
||||
|
||||
`k8s/clusters/eu-par-fr01/backups/fr-critical-sovereign-backup.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: backup.example.io/v1
|
||||
kind: BackupPolicy
|
||||
metadata:
|
||||
name: fr-critical-sovereign-backup
|
||||
labels:
|
||||
data_classification: CRITICAL_SOVEREIGN_FR
|
||||
tenant: justice_case_analytics
|
||||
spec:
|
||||
schedule: "0 * * * *"
|
||||
target:
|
||||
provider: "object-storage"
|
||||
region: "eu-central-1" # INTENTIONALLY WRONG
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
./scripts/lint.sh
|
||||
./scripts/run_opa.sh
|
||||
```
|
||||
|
||||
You should see a deny message from `data_residency`.
|
||||
|
||||
Ask each trainee to:
|
||||
|
||||
* Point to **which file** is being rejected.
|
||||
* Identify **which policy** raised the error.
|
||||
* Explain the business reason (FR-only backups for critical data).
|
||||
|
||||
---
|
||||
|
||||
### Step 3 - Fix the Backup Target
|
||||
|
||||
Correct it:
|
||||
|
||||
```yaml
|
||||
region: "fr-central"
|
||||
```
|
||||
|
||||
Re-run:
|
||||
|
||||
```bash
|
||||
./scripts/run_opa.sh # should pass now
|
||||
```
|
||||
|
||||
Push branch, open MR, confirm `policy_gates` passes.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 - Deploy and Verify Backups
|
||||
|
||||
After merge:
|
||||
|
||||
1. Trigger `site_rollout EU-PAR-FR01`.
|
||||
2. Verify in backup system UI / CRs that:
|
||||
|
||||
* Policy is applied to the correct namespace(s).
|
||||
* Target region is `fr-central`.
|
||||
3. Run a **test backup** (e.g., using a “dry run” or test snapshot) and ensure no errors from controller.
|
||||
|
||||
### Step 5 - Compliance Evidence Hook
|
||||
|
||||
As part of the lab, show how you'd gather audit evidence:
|
||||
|
||||
* Screenshot / log extract of:
|
||||
|
||||
* `policy_gates` success for `fr-critical-sovereign-backup.yaml`.
|
||||
* Backup controller logs confirming target region.
|
||||
* Optional: put a short Markdown in `policies-and-compliance/docs/evidence/`:
|
||||
|
||||
```markdown
|
||||
# EU-PAR-FR01 - Critical Sovereign Backup Residency Evidence
|
||||
|
||||
- BackupPolicy: fr-critical-sovereign-backup
|
||||
- Data classification: CRITICAL_SOVEREIGN_FR
|
||||
- Target region: fr-central
|
||||
- Evidence:
|
||||
- CI job: link-to-pipeline
|
||||
- Backup job logs: location/path
|
||||
```
|
||||
|
||||
This becomes a pattern for real audits.
|
||||
|
||||
### Lab 2 Definition of Done
|
||||
|
||||
* Non-compliant backup policy was **blocked by CI**, not discovered in production.
|
||||
* Fixed backup policy is deployed with FR-only region.
|
||||
* There is a repeatable way to gather **audit evidence** that backups respect residency.
|
||||
|
||||
---
|
||||
|
||||
## Lab 3 - Admin Access, JIT & Audit Evidence
|
||||
|
||||
**Theme:**
|
||||
Admin access is high-risk. We want JIT, limited scope, and strong logging — all enforced via policy and pipelines.
|
||||
|
||||
### Learning Objectives
|
||||
|
||||
By the end of Lab 3, trainees can:
|
||||
|
||||
* Encode admin access constraints as OPA policies.
|
||||
* Configure a “normal” ops group vs a “JIT elevated” group.
|
||||
* Show an audit trail for a temporary elevation.
|
||||
|
||||
### Timebox
|
||||
|
||||
~3 hours.
|
||||
|
||||
### Step 0 - Scenario
|
||||
|
||||
* Default: Only `sovereign-ops-admins@sovereign-ops.fr` can have cluster admin rights.
|
||||
* In emergencies, an on-call engineer can get **time-limited elevation** via a “JIT admin” role, but:
|
||||
|
||||
* They still appear as a **group** from the IdP.
|
||||
* Every elevation must leave an audit trace (ticket, MR, pipeline).
|
||||
|
||||
We simulate:
|
||||
|
||||
1. A bad direct binding to an external user.
|
||||
2. A corrected binding to a JIT group with clear expiry & evidence.
|
||||
|
||||
---
|
||||
|
||||
### Step 1 - Check RBAC Policy
|
||||
|
||||
**Repo:** `policies-and-compliance`
|
||||
Ensure `opa-policies/rbac.rego` is something like:
|
||||
|
||||
```rego
|
||||
package rbac
|
||||
|
||||
deny[msg] {
|
||||
input.kind == "ClusterRoleBinding"
|
||||
input.metadata.name == "cluster-admin"
|
||||
some s
|
||||
s := input.subjects[_]
|
||||
s.kind == "User"
|
||||
not endswith(s.name, "@sovereign-ops.fr")
|
||||
msg := "cluster-admin bindings must target sovereign-ops.fr principals only"
|
||||
}
|
||||
```
|
||||
|
||||
We'll extend this in a moment to handle JIT groups.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 - Add a Bad Binding (Failure Injection)
|
||||
|
||||
**Repo:** `platform-clusters`
|
||||
**Branch:** `feat/d4-lab3-rbac`
|
||||
|
||||
Manifest:
|
||||
|
||||
```yaml
|
||||
# k8s/clusters/eu-par-fr01/rbac/bad-cluster-admin.yaml
|
||||
kind: ClusterRoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
metadata:
|
||||
name: cluster-admin
|
||||
subjects:
|
||||
- kind: User
|
||||
name: temp-admin@example.com # WRONG - external domain
|
||||
roleRef:
|
||||
kind: ClusterRole
|
||||
name: cluster-admin
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
./scripts/run_opa.sh
|
||||
```
|
||||
|
||||
Confirm it fails and trainees can identify:
|
||||
|
||||
* `kind: ClusterRoleBinding`
|
||||
* `metadata.name: cluster-admin`
|
||||
* subject `temp-admin@example.com` violates the rule.
|
||||
|
||||
---
|
||||
|
||||
### Step 3 - Design the JIT Admin Pattern
|
||||
|
||||
Work with the group to define:
|
||||
|
||||
* A JIT group name, e.g. `sovereign-ops-jit-admins@sovereign-ops.fr`
|
||||
* A **process**:
|
||||
|
||||
* JIT elevation is created by MR with:
|
||||
|
||||
* Reference to incident ticket.
|
||||
* Time-bound comment / annotation.
|
||||
* Removal is *another MR* reverting or removing the binding.
|
||||
|
||||
Extend RBAC policy to allow either:
|
||||
|
||||
* The default admins group, or
|
||||
* The JIT group.
|
||||
|
||||
Example:
|
||||
|
||||
```rego
|
||||
package rbac
|
||||
|
||||
default deny := []
|
||||
|
||||
deny[msg] {
|
||||
input.kind == "ClusterRoleBinding"
|
||||
input.metadata.name == "cluster-admin"
|
||||
some s
|
||||
s := input.subjects[_]
|
||||
s.kind == "Group"
|
||||
not allowed_admin_group(s.name)
|
||||
msg := sprintf("cluster-admin binding must target an allowed admin group, got %v", [s.name])
|
||||
}
|
||||
|
||||
allowed_admin_group(name) {
|
||||
name == "sovereign-ops-admins@sovereign-ops.fr"
|
||||
}
|
||||
|
||||
allowed_admin_group(name) {
|
||||
name == "sovereign-ops-jit-admins@sovereign-ops.fr"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 4 - Add a Correct Binding with Annotation
|
||||
|
||||
Replace bad binding with:
|
||||
|
||||
```yaml
|
||||
# k8s/clusters/eu-par-fr01/rbac/cluster-admin-jit.yaml
|
||||
kind: ClusterRoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
metadata:
|
||||
name: cluster-admin
|
||||
annotations:
|
||||
jit-elevation: "true"
|
||||
jit-ticket: "INC-2026-0001"
|
||||
jit-expiry: "2026-12-31T23:59:59Z"
|
||||
subjects:
|
||||
- kind: Group
|
||||
name: sovereign-ops-jit-admins@sovereign-ops.fr
|
||||
roleRef:
|
||||
kind: ClusterRole
|
||||
name: cluster-admin
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
./scripts/run_opa.sh # should pass
|
||||
```
|
||||
|
||||
Push branch, open MR. Require:
|
||||
|
||||
* Approval from **Security** and **Compliance** for this MR.
|
||||
|
||||
After merge, run `site_rollout`.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 - JIT Elevation & De-Elevation (Audit Drill)
|
||||
|
||||
Simulate:
|
||||
|
||||
1. Incident occurs (fake incident ID).
|
||||
2. JIT group gets populated at the IdP side (out of scope here, assume done).
|
||||
3. The above binding is live in K8s after `site_rollout`.
|
||||
4. After “incident resolution”, create a **follow-up MR** that:
|
||||
|
||||
* Removes or comments out `cluster-admin-jit.yaml`.
|
||||
* Or changes it to bind only the permanent admin group.
|
||||
|
||||
Collect **evidence**:
|
||||
|
||||
* CI & Git history:
|
||||
|
||||
* MR that introduced JIT binding with `jit-*` annotations.
|
||||
* MR that removed it.
|
||||
* Optional: Add a small audit entry in `policies-and-compliance/docs/evidence/jit-admin-elevations.md`:
|
||||
|
||||
```markdown
|
||||
# JIT Admin Elevation - INC-2026-0001
|
||||
|
||||
- Incident ID: INC-2026-0001
|
||||
- Site: EU-PAR-FR01
|
||||
- JIT group: sovereign-ops-jit-admins@sovereign-ops.fr
|
||||
- Elevation MR: link-to-MR
|
||||
- Expiry: 2026-12-31T23:59:59Z
|
||||
- De-elevation MR: link-to-MR
|
||||
```
|
||||
|
||||
### Lab 3 Definition of Done
|
||||
|
||||
* Bad per-user binding is caught and never deployed.
|
||||
* JIT admin pattern (group + annotations + approvals) is implemented and enforced.
|
||||
* There is a **documented pattern** for audit-ready JIT elevation.
|
||||
|
||||
---
|
||||
|
||||
## D4 Overall Definition of Done
|
||||
|
||||
When you've run Labs 1-3, you should have:
|
||||
|
||||
1. **Sovereign-aware namespaces & labels**
|
||||
|
||||
* Critical sovereign FR workloads sit in correctly labeled namespaces.
|
||||
* Mislabel attempts are blocked by policy.
|
||||
|
||||
2. **Residency-safe backup policies**
|
||||
|
||||
* Backups for `CRITICAL_SOVEREIGN_FR` workloads target FR-only regions.
|
||||
* Cross-border misconfigs are blocked at `policy_gates`.
|
||||
|
||||
3. **Controlled admin access model**
|
||||
|
||||
* Only approved groups can be cluster admins.
|
||||
* JIT elevation is controlled, auditable, and time-bound.
|
||||
|
||||
4. **Audit evidence patterns**
|
||||
|
||||
* Simple Markdown docs + pipeline logs used as audit artefacts.
|
||||
* Team knows how to demonstrate compliance, not just configure it.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user