🔐

AUTH

P0 · Foundation Wave 2 + slice-3 shipped · 18 migrations · JWT/JWKS · admin REST · bootstrap CLI · 22-role RBAC · TOTP MFA · HIBP · OIDC + SAML + Passkey SSO · GeoIP impossible-travel with per-tenant policy + CIDR allowlist + sticky-suppression + admin REST · SAML XML-DSig verify with exc-c14n Owner: CSO (vacant) → interim CEO

The shared identity plane. Wave 2 + slice-3 shipped 2026-05-18 — JWT RS256 + JWKS + admin REST + bootstrap CLI + 22-role closed-enum RBAC + TOTP MFA + HIBP breach check + OIDC Authorization Code + PKCE + WebAuthn passkey enrolment/login + SAML 2.0 SP-initiated SSO with full XML-DSig verify (RSA-SHA256 + exclusive-c14n) + GeoIP impossible-travel detector (slice-2: MaxMind GeoLite2 + cross-continent + haversine 1000km/h) + per-tenant policy + CIDR allowlist + sticky-challenge suppression + admin REST for policy mutation. All four login flows (password, OIDC, SAML, Passkey) honour the policy via assess_login. Tenant identity for Lumi's BRAIN at Stage 3+.

AUTH is the most sensitive single piece of CyberOS infrastructure — and per the 2026-05-14 research review (archived; see cyberos/CHANGELOG.md for that decision entry) it is not P0 #1. The reordered P0 build sequence is AI Gateway @ P0 · slice 1 → AUTH @ P0 · slice 2 → MCP Gateway @ P0 · slice 3 → OBS at P0 · start (parallel) → CHAT + CUO Phase 1 @ P0 · exit. AUTH at P0 · slice 2 is a stub: magic-link login + TOTP MFA + an RBAC predicate that hard-codes the 5 essential roles. The full 7,000 LoC build (WebAuthn L3 passkeys, per-tenant authz server, JWKS rotation, all 22 RBAC roles, KMS-wrapped signing keys, impossible-travel detection) is delivered across slices 2–5 in services/auth/RFC.md, landing fully at P3 (P3 · exit). The architecture below is the P3-full target; the §"P0 · slice 2 stub vs P3 full" section documents the staged delivery. AUTH owns the identity database, signs the JWTs every other module trusts, and applies the same predicate to a human and to an agent so the platform never has a "second-class" auth path. The agent-equal property is non-negotiable: a persona-versioned agent JWT is evaluated by the same code path as a human one. Lumi's BRAIN (the cloud-hosted org tenant per BRAIN_AUTOSYNC_DESIGN.md Stage 3+) consumes AUTH's tenant-scoped JWT to enforce row-level isolation. Sign-in UI mockup implements the design system's Liquid Glass + Umber/Ochre brand.

AUTH is the identity and authorisation service for every actor inside CyberOS — humans (CEO, CFO, members, clients) and agents (CUO persona-versioned tokens, Skill bot accounts, scheduled tasks). It speaks OAuth 2.1 + PKCE to the world, OIDC for SSO federation, WebAuthn L3 for passkey enrolment, and TOTP / RFC 6238 for the soft-MFA fallback. Inside, it sits on PostgreSQL with row-level security keyed by tenant_id, a Redis hot-path session cache, and an HSM-resident RS256 signing key. Every authentication and authorisation decision lands as a chained audit row in BRAIN — so "did this agent really have permission to delete that record?" is never an interpretation, it is a replay.

P0 · slice 2 stub status
Designed · RFC drafted
5 slices · 5 open Qs in RFC §6
P3 full status
Designed · P3 · exit target
full WebAuthn L3 + per-tenant authz
P0 · slice 2 stub LoC
~1,500
slice 1: tenant + subject CRUD + RLS
P3 full LoC
~7,000
Rust (axum) + sqlx · 5 slices
Planned tests
120+
incl. WebAuthn / OAuth flow suite
RBAC roles
5 (stub) → 22 (full)
closed catalogue · see §RBAC
JWT
RS256 · 15 min
refresh: 30 d rotating · KMS at slice 5
MFA
TOTP (P0 · slice 2) → WebAuthn (P3)
passkey mandatory at T2+ once full
Depends on
BRAIN · OBS · AI Gateway
audit + telemetry sink + cost-of-everything
Enables
Lumi's BRAIN tenant isolation
Stage 3 of universal protocol
0

The bigger picture — three strategic moves

AUTH on this page is presented at three time horizons simultaneously: the P0 · slice 2 stub that unblocks the rest of P0, the P3 full identity platform, and the Lumi's BRAIN tenant identity that turns BRAIN's universal protocol into a multi-tenant cloud product. Holding all three lenses lets engineers reading the page cold understand both what ships first and what the end-state looks like — without conflating them.

Move 1 · P0 · slice 2 stub
🚀
Unblock the rest of P0

Magic-link login + TOTP MFA + 5 hard-coded RBAC roles + JWT RS256 with dev keys. ~1,500 LoC. Lands at P0 · slice 2 so CHAT/CUO Phase 1/OBS instrumentation can identify the actor on every cross-module call. Per research review §2.4 — AI Gateway is P0 #1, not AUTH.

Scope: tenant + subject CRUD with RLS · admin REST · audit-chain bridge · 15 tests

Move 2 · P3 full
🔐
Production-grade identity

WebAuthn L3 passkeys · per-tenant OAuth 2.1 authz server with PKCE · OIDC SSO · KMS-wrapped signing keys with JWKS rotation · all 22 RBAC roles · Scope Contract Grants · impossible-travel detection · HIBP password-breach check · audit-chain emission on every decision. ~7,000 LoC across slices 2–5. P3 (P3 · exit) target.

Slices 2–5 in RFC §3 · 120+ tests · OAuth 2.1 conformance suite

Move 3 · Lumi tenant identity
☁️
Multi-tenant Lumi's BRAIN

Issues tenant-scoped JWTs that Lumi's BRAIN (cloud-hosted org tenant) consumes for row-level isolation. Stage 3 of the universal BRAIN protocol depends on this. Per-tenant residency pinning (vn-hanoi-1 / sg-1 / eu-fra-1) flows through tenant config to AUTH session config to BRAIN read scoping.

Per BRAIN_AUTOSYNC_DESIGN.md §6 · enables Lumi's BRAIN deployment

The reordered P0 build sequence (per research review §2.4)

gantt title P0 module build sequence — reordered per research review dateFormat YYYY-MM-DD axisFormat M+%w section Existing BRAIN (shipped) :done, brain, 2026-05-01, 1d SKILL (shipped) :done, skill, 2026-05-01, 1d OBS instrumentation :active, obs, 2026-05-01, 90d section P0 · slice 1 AI Gateway slice 1 :crit, ai, 2026-06-01, 30d section P0 · slice 2 AUTH slice 1 (stub) :crit, auth1, after ai, 30d MCP Gateway slice 1 :mcp, after auth1, 15d section P0 · slice 3 → P0 · exit CHAT slice 1 :chat, after mcp, 30d CUO Phase 1 (shipped) :done, cuo, 2026-05-01, 1d section P0 · exit → P2 · exit AUTH slices 2-4 (full) :auth_full, after chat, 180d section P3 phase AUTH slice 5 (KMS+OIDC) :auth5, after auth_full, 60d Lumi's BRAIN Stage 3 :crit, lumi, after auth5, 30d

Why this order: AI Gateway is the cost-of-everything-else gate (every CHAT @genie, every CUO Phase 2, every BRAIN semantic search depends on it). AUTH at P0 · slice 2 is a stub because for the first 6 months CyberSkill has 10 Members and one tenant — full WebAuthn + 22-role RBAC at P0 · slice 1 is over-engineering. Per reviewer: "AUTH at P0 · slice 2 can be magic-link + TOTP with a stub RBAC. WebAuthn passkeys for the Founder and a per-tenant authz server can land at P0 · exit without blocking any other module."

1

Why AUTH exists

Per-module auth is one of the easiest mistakes to make and one of the hardest to unmake. The first version of any platform ships with one module that has "owns its own login", the second module copies the pattern, the third inherits a subtle bug, and within a year there are four MFA implementations, three session models, and zero coherent answer to "who can do what". CyberOS treats this as a P0 design constraint: a single authentication service, a single authorisation predicate, a single audit trail. Every module that wants to know "can this actor do X to that resource?" asks AUTH the same way.

🆔
One identity, many shapes

Humans, agents, service accounts, and API keys all materialise as Subject rows. Same predicate; different scopes.

⚖️
Agent-equal evaluation

An agent JWT is not a magic backdoor. The RBAC engine evaluates it with exactly the same code path as a human JWT — just with a smaller, persona-bound scope.

📜
Every decision auditable

Login, logout, MFA challenge, role grant, scope decision — each one writes a chained row to BRAIN. "Did Compliance permit this?" → grep auth.decision.

The bet is the same bet BRAIN makes about audit: pay the cost once at the infrastructure layer, and every module inherits the property for free. Without AUTH as a shared plane, every module re-implements login, every module ships its own MFA bug, and "who has access to what" is a vibes question. With AUTH as a shared plane, the answer is a row in auth.decision and a JWT claim — the regulator can read it, the auditor can verify it, the engineer can replay it.

2

What it does — 5W1H2C5M

A structured decomposition of AUTH's scope. Every cell traces back to + §9.6 + §11.2.3.

AxisQuestionAnswer
5W · WhatWhat is AUTH?A Postgres-backed OAuth 2.1 + OIDC server with WebAuthn / TOTP MFA, an RBAC engine, a Scope Contract Grant evaluator, and an HSM-resident JWT signing key. Single Rust binary, exposed over gRPC internally and HTTPS externally.
5W · WhoWho authenticates?Humans: founders, members, clients, employees, contractors. Agents: CUO persona-stamped tokens, Skill bot accounts, scheduled tasks. Services: module-to-module mTLS clients (separate cert root). Owner: CSO seat (interim CEO until filled).
5W · WhenWhen does auth happen?(a) at session start (login + MFA); (b) at every cross-module API call (RBAC predicate); (c) at every MCP tool invocation (Scope Contract Grant); (d) on token refresh (rotation + reuse detection). Hot-path RBAC is sub-millisecond against Redis cache.
5W · WhereWhere does it run?P0: single region (Singapore SG-1) backed by AWS RDS Postgres + ElastiCache Redis. P3: multi-region active-active with eventual JWT consistency. WebAuthn / TOTP secrets at rest in KMS-wrapped column-level encryption.
5W · WhyWhy a separate layer?Because per-module auth is the single biggest source of platform-wide CVE class regression. Centralise the primitive, never the policy.
1H · HowHow does it work?OAuth 2.1 with PKCE for browser flows; client_credentials for service-to-service; refresh_token with rotation + reuse detection for long sessions; WebAuthn discoverable credentials for passkey login; TOTP RFC 6238 for fallback; OIDC for SSO federation with Google / Microsoft 365 / Okta. RBAC is a single Rego-like predicate evaluated at every cross-module call. Each decision is a chained audit row.
2C · CostCost budget?P0: ~$60/month (RDS t4g.small + ElastiCache cache.t4g.micro + one Fargate task). 50-tenant: ~$220/month. Per-login cost ~$0.000002 amortised; RBAC predicate ~$0.00000005.
2C · ConstraintsConstraints?(a) FIDO Alliance L3 WebAuthn for elevated roles. (b) Vietnamese Decree 53: data-residency proof for VN tenants. (c) Vietnam Decree 53 Art. 26 — incident notification ≤ 24h to MoPS. (d) PDPL Art. 14 — DSAR data export. (e) Per-module auth is strictly forbidden.
5M · MaterialsStack?Rust 1.81 · axum 0.7 · sqlx · PostgreSQL 16 · Redis 7 · WebAuthn-rs · totp-rs · rsa for JWT signing · AWS KMS for signing-key wrap · OpenTelemetry SDK · LiteFS for read replicas at P3+.
5M · MethodsMethod choices?OAuth 2.1 (not 2.0 — PKCE mandatory). RBAC + Scope Contract Grants (no ABAC complexity at P0). JWT RS256 (not HS256 — separates signer from verifier). Argon2id for password hashing. Refresh-token rotation with reuse detection (RFC 6749 §10.4). Row-level security on every identity table keyed by tenant_id.
5M · MachinesDeployment?Fargate task in SG-1 (P0). Multi-AZ Postgres RDS. Redis cluster mode at P3+. WebAuthn relying-party ID = the company root domain (P0: cyberos.com).
5M · ManpowerWho maintains?0.5 FTE today (covered by CEO). By P1: CSO seat owns 100% capacity + 24/7 on-call rotation with CTO.
5M · MeasurementHow measured?N(FR pending)..012 — zero tenant-data leakage, zero compensation in BRAIN, mandatory mTLS, OWASP Gen AI Top-10 mitigations all green. Per-tenant auth-decision dashboard. Annual penetration test (P2+).
2.5

P0 · slice 2 stub vs P3 full — what ships when

Per the reviewer's reorder, AUTH ships in two distinct shapes separated by 10 months. The P0 · slice 2 stub is mergeable in one week; it unblocks every other P0 module by giving them an actor identity. The P3 full AUTH ships over slices 2–5 across P0 · exit through P3 · exit and is what the SOC 2 Type II auditor evaluates. Confusing the two is the most common bug in module-page reading; this table is the disambiguator.

CapabilityP0 · slice 2 stub (slice 1)P3 full (slices 2–5)
Login mechanismMagic-link via email · valid 15 min · single-useWebAuthn L3 passkeys (mandatory at T2+) · OAuth 2.1 + PKCE · OIDC SSO (Google / Microsoft 365 / Okta)
MFATOTP (RFC 6238) optional · enforced for tenant adminsTOTP + WebAuthn · passkey mandatory for T2+ subjects · impossible-travel challenge · HIBP password-breach check
RBAC catalogue5 hard-coded roles: root-admin, tenant-admin, tenant-member, service-account, agent-persona22-role closed catalogue (see §RBAC) · Scope Contract Grants · agent-equal predicate evaluation
JWT signingRS256 with dev keys on disk · 15 min access · 30 d refresh · rotatingRS256 via AWS KMS (key never touches disk) · JWKS rotation endpoint · per-tenant signing-key scope at slice 5
Tenant isolationPostgres RLS on every identity table keyed by tenant_id · admin REST scoped to caller's tenantSame RLS · plus per-tenant residency pinning (vn-hanoi-1 / sg-1 / eu-fra-1) · per-tenant authz server endpoint (slice 4)
Audit-chain emissionEvery login, every RBAC decision, every session revocation appends a row to BRAIN via the canonical Writer (subprocess shim acceptable)Same emission · PyO3 binding for < 1ms p99 audit-bridge latency · streaming append at high decision rate
Admin surfacesREST: tenant CRUD · subject CRUD · role grant · session revokeSame REST · plus gRPC RBAC.Check internal API · MCP tool catalogue (token-issue, RBAC-check) · CLI cyberos-auth 25 subcommands
Cost~$60/month (RDS t4g.small + ElastiCache cache.t4g.micro + 1 Fargate task)~$220/month at 50 tenants (P3) · ~$1.2k/month at 500 tenants · KMS adds ~$10/month per active signing-key version
LoC~1,500 Rust~7,000 Rust (cumulative over slices 1–5)
Tests15 (tenant + subject CRUD + RLS isolation)120+ (OAuth 2.1 conformance + WebAuthn flow + RBAC predicate matrix + audit-chain integration)
Lumi's BRAIN integrationJWT carries tenant_id claim; Lumi's BRAIN deployment (Stage 3) comes online in parallelTenant residency claim · agent-persona claim · Scope Contract Grant claim · all consumed by Lumi's BRAIN read scoping
SOC 2 / ISO 27001 evidenceNot auditor-ready · stub mode is internal-onlyP1 SOC 2 Type I evidence window opens at P0 · exit · P2 ISO 27001:2022 + Type II at P2 · exit · ISO 42001 AIMS at P3 · exit

Migration discipline

The transition from P0 · slice 2 stub to P3 full is not a rewrite — it is a layered addition. Each slice adds capability without changing the P0 · slice 2 contract. Pre-existing tenants/subjects/JWT claims continue to validate at every slice boundary. The conformance test suite runs both old and new code paths through P3 · exit; once Type I evidence is collected, the stub-only paths are retired.

What the P0 · slice 2 stub does not compromise on: (a) agent-equal evaluation (the agent-persona role is in the 5-role stub catalogue), (b) audit-chain emission (every decision is chain-linked from day one), (c) RLS isolation (the database scope is set correctly from slice 1, even if WebAuthn isn't there). What it does defer: signing key in KMS, OIDC SSO, full 22-role catalogue, JWKS rotation, passkey enrolment, impossible-travel challenge, HIBP integration.

2.6

The 22-role RBAC catalogue (closed)

The closed RBAC catalogue is the same shape per the original AUTH spec, refactored slightly so the P0 · slice 2 stub catalogue is a strict subset. Adding a new role to the catalogue is an ADR — not a code change. The 5 stub roles are the first 5; the remaining 17 land across P0→P3.

#RoleScope summaryStub (P0 · slice 2)?Lands at
1root-adminCross-tenant superuser. Reserved for CyberSkill operators. Cannot be self-assigned.✅ yesslice 1
2tenant-adminFull admin within one tenant. Manage subjects, roles, billing, residency.✅ yesslice 1
3tenant-memberRegular member. Read all shareable+, write to own personal scopes, MFA-required for sensitive ops.✅ yesslice 1
4service-accountNon-human identity. Module-to-module mTLS clients. Token-exchange via slice-5 flow.✅ yesslice 1
5agent-personaPersona-versioned agent identity (CUO + 47 C-suite persona workflows). JWT stamped with persona-version.✅ yesslice 1
6founderFounder-CEO equivalent. WebAuthn passkey required. Cross-module privileged read.slice 3 (WebAuthn)
7cfoCFO seat. Read all financial · approve disbursements · ESOP grant signoff.slice 4 (RBAC catalogue)
8ctoCTO seat. Tech-debt signoff · security advisory ack · OBS digest target.slice 4
9cooCOO seat. Cross-module status digests · blocker triage · process changes.slice 4
10chroCHRO seat. HR records · onboarding · performance review · PII-read elevated.slice 4
11cmoCMO seat. Campaign briefs · content calendars · external comms approval.slice 4
12cpoCPO seat. Product-brief signoff · roadmap commits · feature-request-author canonical role.slice 4
13csoCSO (Strategy) seat. OKR cascade · scenario modelling · competitive intel read.slice 4
14csecoCSO (Security) seat. Security review · key rotation approval · vulnerability triage.slice 4
15cloCLO (Compliance/Legal) seat. Contract redline · DSAR triage · regulatory drift signoff.slice 4
16cdoCDO (Data) seat. Data quality · lineage · residency review · BRAIN owner-of-record at P1+.slice 4
17dpoData Protection Officer. DSAR fulfilment · breach-notification authority · purge approval.slice 4
18caioChief AI Officer (emerging P2+). AI Gateway budget owner · synthesis sub-skill review.slice 5
19client-portal-userExternal-facing tenant user (PORTAL module · P4). Read shareable scopes via PORTAL filter only.slice 5 + PORTAL
20auditorExternal auditor (SOC 2 / ISO). Read-only · time-bounded · scope-pinned to evidence window.slice 5
21regulatorExternal regulatory authority (MoPS, GDT, PDPC). Read-only · DSAR + breach-evidence scopes.slice 5
22billing-systemStripe / VietQR / Momo webhook identity. Write-restricted to billing event schema.slice 5 + TEN

Adding a new role requires an ADR (Architecture Decision Record) with: business rationale, scope-creep risk assessment, audit-trail implications, and explicit DPO + CSEC review. Removing a role requires a deprecation window of 90 days with shadow-monitoring of the role's actual usage. The 22-role boundary is a design assertion — ABAC complexity is deferred indefinitely.

2.7

AUTH ↔ Lumi's BRAIN — tenant identity for the universal protocol

Per BRAIN_AUTOSYNC_DESIGN.md §6, Lumi's BRAIN is the cloud-hosted org-tenant instance of the BRAIN protocol. It depends on AUTH for two non-negotiable properties: (1) a tenant-scoped JWT issued by AUTH that identifies which tenant a sync request belongs to, (2) a subject-scoped JWT with the agent-persona claim that identifies which user within that tenant is reading or writing. Without both, multi-tenant isolation is impossible.

The JWT claim shape

{
  "iss": "https://auth.cyberos.com",
  "sub": "user:stephen@cyberskill.world",
  "aud": ["brain.cyberos.com"],
  "iat": 1778765128,
  "exp": 1778766028,                          // 15 min
  "jti": "",
  // === Tenant identity ===
  "tenant_id": "org:cyberskill",
  "tenant_residency": "sg-1",                 // sg-1 | vn-hanoi-1 | eu-fra-1
  "tenant_plan": "pro",                       // free | pro | enterprise
  // === Subject scope ===
  "role": ["tenant-member", "cpo"],
  "rbac_grants": ["feature-request-author:execute", "kb:read", "brain:read:personal:*"],
  // === Persona stamp (when agent-issued) ===
  "agent_persona": null,                      // or "cuo-cpo@0.4.1" for agent JWTs
  "agent_persona_version": null,
  // === Scope Contract Grants (slice 4+) ===
  "scope_grants": [
    {"resource": "lumi:org:cyberskill:shareable", "rights": ["read"]},
    {"resource": "lumi:org:cyberskill:fr-decisions", "rights": ["read", "write"]}
  ]
}

How Lumi's BRAIN enforces the JWT

sequenceDiagram autonumber participant U as User (sync client) participant A as AUTH (auth.cyberos.com) participant L as Lumi's BRAIN (brain.cyberos.com) participant P as Postgres (RLS-scoped) participant W as Writer U->>A: POST /oauth/token (refresh) A->>A: verify refresh · rotate · sign new JWT A-->>U: {access_token, expires_in: 900} U->>L: POST /v1/sync/push with Bearer JWT L->>L: verify JWT signature against JWKS (cached) L->>L: extract tenant_id, role, residency alt residency mismatch L-->>U: 403 (wrong shard for this tenant) end L->>L: extract sync payload · validate sync_class L->>P: SET LOCAL app.current_tenant_id = '<tenant_id>' L->>W: write through canonical Writer (lumi-scoped) W->>P: INSERT with RLS · tenant_id auto-bound P-->>W: row inserted W-->>L: lumi_chain_hash L-->>U: 201 + lumi_chain_hash + sync_confirmation

What this contract requires from AUTH

  • tenant_id is non-removable. Every JWT — human, agent, service account — carries a tenant_id. Tenant-less JWTs are rejected at issuance.
  • JWKS endpoint is reachable from Lumi's BRAIN cloud region. ~1 RTT cost on cold cache; cached for the JWT's lifetime.
  • Refresh-token reuse detection is enforced at AUTH. A reused refresh token revokes the whole session (per AUTH RFC §3 slice 2). Lumi's BRAIN's pull queue must handle 401 by triggering re-auth, not by silently failing.
  • Agent-persona claims survive the agent-equal property. When the CUO router issues a JWT on a user's behalf, the agent_persona claim is set; Lumi's BRAIN's read scope is then determined by the intersection of user RBAC + persona Scope Contract Grant.
  • Tenant residency pinning flows through. A vn-hanoi-1-pinned tenant's JWT is rejected at the sg-1 Lumi's BRAIN shard. Cross-shard sync requires a special cross-shard-migrate claim (CDO + CLO approval).
2.8

RFC open questions — proposed defaults

The services/auth/RFC.md §6 lists 5 open decisions that block slice 1. The proposed defaults below were drafted from the research review §2 + the BRAIN_AUTOSYNC_DESIGN.md Stage 3 dependencies. They land as ADRs once Stephen signs off.

RFC §6 questionProposed defaultRationale
Q1 — Workspace membershipNew repo-root Cargo workspace with skill/ and services/auth/ as members. Lift the workspace root to cyberos/Cargo.toml.Skill module already has its own workspace at cyberos/modules/skill/Cargo.toml. Lifting to repo-root lets services/* share dependency resolution + dev-dep with skill (e.g. tokio, tracing, sqlx). Common-dependency drift gets caught at build time.
Q2 — Memory bridge timingSubprocess shim at slice 4 → PyO3 at slice 5. Slice 4 (RBAC + audit-chain bridge) accepts cyberos --store $BRAIN put … via stdin; slice 5 evaluates PyO3 binding for < 1ms p99.Subprocess overhead acceptable at audit-chain write rate (≤ 100 writes/sec in P0). Mergeable in 1 week vs PyO3 binding's ~2 week effort. Defer the optimisation to when its absence is felt.
Q3 — Tenant-0 bootstrapcyberos-auth bootstrap CLI subcommand runs as root, seeds tenant 0 + the first root-admin subject with a one-time enrolment URL.Avoids the chicken-and-egg "no admin to create the first admin" problem. Single-use URL with 1-hour expiry minimises bootstrap-credential risk window. CLI requires sudo + audit-emit.
Q4 — HIBP integration toggleEnabled by default; per-tenant opt-out via tenant.security_config.hibp_check = false.Outbound HTTPS to api.pwnedpasswords.com is one extra dependency but k-anonymity-protected (only first 5 chars of password hash leave the host). For Vietnamese-residency-paranoid tenants (Decree 53), the opt-out exists. Default-on protects ≥ 95% of users at near-zero engineering cost.
Q5 — OBS deferralSlice 1 emits structured tracing logs to stdout; slice 5 switches to OTLP once OBS lands.OBS at P0 · start (parallel to BRAIN) means OTLP is available by AUTH slice 5 at P3 · exit. Slice 1 stdout logs are sufficient for engineer-local development. Slice 5 single-config swap to OTLP doesn't change AUTH's design.

Once Stephen signs off, each default becomes a numbered ADR (services/auth/decisions/DEC-AUTH-001..005.md) — same as the existing DEC-058 pattern. Slice 1 unblocks immediately after Q1 + Q3 are signed (workspace + bootstrap are mechanical); Q2 + Q4 + Q5 can land later without code rewrites.

3

Architecture

AUTH is one Rust service with five surfaces (OIDC issuer, OAuth 2.1 token endpoint, WebAuthn endpoints, RBAC gRPC API, admin REST), three stores (Postgres for identity / role / session, Redis for hot-path session and RBAC cache, KMS for signing-key wrap), and a single audit sink (BRAIN). The diagram below shows the canonical request flow for a cross-module call.

graph TB subgraph CLIENTS ["Clients"] BROWSER["Browser SPA
(OIDC + PKCE)"] MOBILE["Mobile (P3)"] AGENT["CUO agent
persona-stamped JWT"] SVC["Module service
(mTLS · client_credentials)"] end subgraph EDGE ["Edge"] AR["Apollo Router
verifies JWT @JwtAuth"] MCP["MCP Gateway
verifies OAuth 2.1 PRM"] end subgraph AUTH ["AUTH service (Rust · axum)"] OIDC["OIDC issuer
/.well-known/openid-configuration"] OAUTH["OAuth 2.1 token endpoint
PKCE · audience-bound"] WA["WebAuthn / TOTP
passkey + 2FA"] RBAC["RBAC engine
predicate evaluator"] SCOPE["Scope Contract
Grant resolver"] ADMIN["Admin REST
role / tenant CRUD"] end subgraph STORES ["Stores"] PG[("PostgreSQL
identity · roles · sessions
RLS by tenant_id")] RED[("Redis 7
session + RBAC cache
TTL ≤ 15 min")] KMS[("AWS KMS
RS256 signing key
wrapped on disk")] end subgraph SINKS ["Audit & telemetry"] BRAIN["🧠 BRAIN
auth.decision rows"] OBS["👁 OBS
traces + metrics"] end BROWSER --> OIDC MOBILE --> OIDC OIDC --> OAUTH AGENT --> OAUTH SVC --> OAUTH OAUTH --> WA WA --> PG OAUTH --> PG OAUTH --> KMS AR --> RBAC MCP --> RBAC RBAC --> SCOPE SCOPE --> PG RBAC --> RED ADMIN --> PG OAUTH --> BRAIN RBAC --> BRAIN AUTH --> OBS classDef planned fill:#cba88a,stroke:#4338ca classDef store fill:#f5f3ff,stroke:#7c3aed classDef sink fill:#f5ede6,stroke:#45210e class OIDC,OAUTH,WA,RBAC,SCOPE,ADMIN,AR,MCP planned class PG,RED,KMS store class BRAIN,OBS sink

Internal components

ComponentPath (planned)Responsibility
oidc.rsservices/auth/src/oidc.rsOIDC issuer — discovery doc, JWKS endpoint, ID-token issuance. Pure OpenID Connect Core 1.0 compliance.
oauth.rsservices/auth/src/oauth.rsOAuth 2.1 token endpoint — PKCE-required, audience-bound, refresh-token rotation, reuse detection.
webauthn.rsservices/auth/src/webauthn.rsWebAuthn L3 — credential creation, assertion verification, attestation parsing. Wraps webauthn-rs.
totp.rsservices/auth/src/totp.rsRFC 6238 TOTP — enrollment QR generation, code verification, replay window enforcement.
rbac.rsservices/auth/src/rbac.rsRBAC predicate evaluator. Input: (subject, action, resource). Output: allow / deny + audit record. Hot-path cache in Redis.
scope.rsservices/auth/src/scope.rsScope Contract Grant resolver. Reads agent-persona scope sheets from BRAIN; emits effective scope set per JWT.
session.rsservices/auth/src/session.rsSession manager — issue, validate, revoke. Device tracking + impossible-travel detection.
jwt.rsservices/auth/src/jwt.rsJWT issuance / verification. RS256 via KMS; key rotation; JWKS publication.
impossible_travel.rsservices/auth/src/impossible_travel.rsGeographic-velocity check on consecutive logins ((FR pending)); challenge if velocity > 1,000 km/h.
password.rsservices/auth/src/password.rsArgon2id hashing, password-policy enforcement, breach-list check via HIBP k-anonymity API.
device.rsservices/auth/src/device.rsDevice fingerprinting ((FR pending)), new-device email, force-logout endpoint.
magic_link.rsservices/auth/src/magic_link.rsMagic-link onboarding flow ((FR pending)) — single-use, time-bound, onboarding-only.
audit_bridge.rsservices/auth/src/audit_bridge.rsWrites every decision to BRAIN via the canonical writer. Includes actor, action, resource, decision, reason.
admin.rsservices/auth/src/admin.rsAdmin REST — create tenant, assign role, revoke session, view audit. CSO + CEO scope only.
migrations/services/auth/migrations/sqlx migrations. Every table has RLS by tenant_id. Indices for hot-path queries.
4

Data model

Identity, role, and session live in PostgreSQL with row-level security keyed by tenant_id. The schema is normalised but optimised for hot-path RBAC: the subject_role and role_permission tables are de-normalised into a materialised subject_effective_permission view that Redis caches with a 15-minute TTL.

erDiagram TENANT ||--o{ SUBJECT: "owns" TENANT ||--o{ ROLE: "defines" SUBJECT ||--o{ SESSION: "creates" SUBJECT ||--o{ API_KEY: "owns" SUBJECT ||--o{ WEBAUTHN_CREDENTIAL: "registers" SUBJECT ||--o| TOTP_SECRET: "has" SUBJECT ||--o{ SUBJECT_ROLE: "holds" ROLE ||--o{ SUBJECT_ROLE: "granted" ROLE ||--o{ ROLE_PERMISSION: "carries" PERMISSION ||--o{ ROLE_PERMISSION: "grants" SUBJECT ||--o{ SCOPE_CONTRACT_GRANT: "has agent scope" SUBJECT ||--o{ AUTH_DECISION: "subject of" SUBJECT ||--o{ DEVICE: "trusts" SESSION }o--|| DEVICE: "issued from" SERVICE_ACCOUNT ||--|| SUBJECT: "is a" TENANT { uuid id PK string slug string display_name string country "VN or SG or other" string data_residency "vn-hanoi or sg-1" timestamp created_at } SUBJECT { uuid id PK uuid tenant_id FK string kind "human or agent or service or api_key" string email string display_name string status "active or locked or disabled" string password_hash "argon2id - null for agents" timestamp created_at timestamp last_login_at } ROLE { uuid id PK uuid tenant_id FK string code "CEO or CFO or CHRO or CTO or other" string display_name string scope_level "platform or tenant or project" } PERMISSION { string code PK "rew.write_run or brain.put or other" string action "read or write or delete or invoke" string resource_class } ROLE_PERMISSION { uuid role_id FK string permission_code FK } SUBJECT_ROLE { uuid subject_id FK uuid role_id FK timestamp granted_at timestamp expires_at uuid granted_by FK } SESSION { uuid id PK uuid subject_id FK string jti "JWT ID" string refresh_jti timestamp issued_at timestamp expires_at string ip_address string user_agent uuid device_id FK string status "active or revoked" } API_KEY { uuid id PK uuid subject_id FK string prefix "ck_live_…" string secret_hash "argon2id" string scopes "comma-separated" timestamp last_used_at timestamp expires_at } WEBAUTHN_CREDENTIAL { bytea credential_id PK uuid subject_id FK bytea public_key bigint sign_count string aaguid string transports timestamp created_at } TOTP_SECRET { uuid subject_id PK bytea secret_encrypted "KMS-wrapped" timestamp enrolled_at } DEVICE { uuid id PK uuid subject_id FK string fingerprint_sha256 string label timestamp first_seen_at timestamp last_seen_at bool trusted } SCOPE_CONTRACT_GRANT { uuid id PK uuid subject_id FK "agent persona" string persona_version "cuo-v2.3.1" string scope_set "comma-separated permission codes" timestamp valid_from timestamp valid_to } SERVICE_ACCOUNT { uuid subject_id PK string client_id bytea client_secret_hash string allowed_audiences } AUTH_DECISION { uuid id PK uuid tenant_id FK uuid subject_id FK string action string resource string decision "allow or deny or challenge" string reason_code timestamp ts string brain_chain "linked audit row chain hash" }

RBAC role catalogue ( — closed)

Role codeDisplayScope levelExamples of permissions
founderFounder / CEOplatformAll; sign authority.
cfoChief Financial Officertenantfinance.*, rew.read_run (not write).
chroChief Human Resources Officertenanthr.*, rew.write_run (with CFO co-sign).
ctoChief Technology Officertenantengineering.*, auth.admin.
csoChief Security Officertenantauth.*, obs.audit_read, key-rotation.
cloChief Legal Officertenantlegal.*, brain.delete_purge_approve.
dpoData Protection Officertenantbrain.dsar_export, auth.audit_read.
cdoChief Data Officertenantbrain.admin, obs.read.
cpoChief Product Officertenantproduct.*, kb.write.
cmoChief Marketing Officertenantmarketing.*, crm.read.
ccoChief Customer Officertenantsupport.*, crm.write.
memberOperating MembertenantScope-narrowed per module; default chat.* + brain.read_own.
contributorExternal ContributorprojectProject-scoped read + comment only.
clientClient UserprojectPer-project shared workspace; no cross-tenant access.
serviceService Accounttenantclient_credentials only; explicit audience.
agent.cuoCUO AgenttenantRouting only; never destructive without human gate.
agent.skillSkill AgenttenantPer-skill scope; capability-broker gated.
agent.scheduledScheduled TasktenantSingle-purpose; cron-defined scope.
+ 4 more(auditor, intern, vendor, partner)tenant / projectSee.
5

API surface

Four surfaces: an OIDC + OAuth 2.1 HTTPS edge for browsers and agents, a gRPC API for internal RBAC predicate evaluation, an MCP tool catalogue (the security-sensitive subset), and a small admin CLI surface.

GraphQL subgraph (federated)

AUTH publishes a thin federated subgraph for cross-module identity queries. Sensitive writes (role grant, session revoke) remain on the admin REST API.

extend schema
 @link(url: "https://specs.apollo.dev/federation/v2.5", import: ["@key", "@external", "@shareable", "@requiresScopes"])

type Subject @key(fields: "id") {
 id: ID!
 tenantId: ID!
 kind: SubjectKind!
 email: String
 displayName: String!
 status: SubjectStatus!
 roles: [Role!]! @requiresScopes(scopes: [["auth.read"]])
 effectivePermissions: [String!]! @requiresScopes(scopes: [["auth.read"]])
 lastLoginAt: DateTime
 createdAt: DateTime!
}

type Role @key(fields: "code tenantId") {
 code: String!
 tenantId: ID!
 displayName: String!
 scopeLevel: ScopeLevel!
}

type Session @key(fields: "id") @requiresScopes(scopes: [["auth.session_read"]]) {
 id: ID!
 subjectId: ID!
 issuedAt: DateTime!
 expiresAt: DateTime!
 status: SessionStatus!
 device: Device
}

type Device {
 id: ID!
 label: String!
 trusted: Boolean!
 lastSeenAt: DateTime!
}

type AuthDecision @key(fields: "id") @requiresScopes(scopes: [["auth.audit_read"]]) {
 id: ID!
 subjectId: ID!
 action: String!
 resource: String!
 decision: Decision!
 reasonCode: String!
 ts: DateTime!
 brainChain: String!
}

enum SubjectKind { HUMAN AGENT SERVICE API_KEY }
enum SubjectStatus { ACTIVE LOCKED DISABLED }
enum ScopeLevel { PLATFORM TENANT PROJECT }
enum SessionStatus { ACTIVE REVOKED EXPIRED }
enum Decision { ALLOW DENY CHALLENGE }

type Query {
 me: Subject!
 subject(id: ID!): Subject
 decisions(subjectId: ID, since: DateTime, limit: Int = 50): [AuthDecision!]!
 @requiresScopes(scopes: [["auth.audit_read"]])
}

type Mutation {
 revokeSession(id: ID!): Boolean!
 @requiresScopes(scopes: [["auth.session_revoke"]])
 grantRole(subjectId: ID!, roleCode: String!, expiresAt: DateTime): Boolean!
 @requiresScopes(scopes: [["auth.role_grant"]])
 rotateApiKey(id: ID!): ApiKeyRotation!
 @requiresScopes(scopes: [["auth.api_key_rotate"]])
}

REST + OAuth surface (planned)

MethodPathPurpose
GET/.well-known/openid-configurationOIDC discovery document.
GET/.well-known/oauth-authorization-serverOAuth 2.1 metadata (RFC 8414).
GET/.well-known/jwks.jsonJWKS for JWT verification.
GET/oauth/authorizeAuthorisation endpoint (PKCE-required).
POST/oauth/tokenToken endpoint — authorization_code · refresh_token · client_credentials.
POST/oauth/revokeToken revocation (RFC 7009).
POST/oauth/introspectToken introspection (RFC 7662) — internal mTLS only.
POST/webauthn/register/beginStart passkey registration.
POST/webauthn/register/finishFinish passkey registration.
POST/webauthn/authenticate/beginStart passkey assertion.
POST/webauthn/authenticate/finishFinish passkey assertion.
POST/totp/enrollBegin TOTP enrolment (QR generation).
POST/totp/verifyVerify TOTP code.
POST/magic-link/requestRequest onboarding magic link.
GET/magic-link/consume?token=…Consume magic link.
GET/admin/sessionsList sessions (admin scope).
POST/admin/sessions/{id}/revokeRevoke session.
POST/admin/roles/grantGrant role.
POST/admin/sso/saml/configConfigure SAML IdP (P1).

gRPC RBAC API (internal)

syntax = "proto3";
package cyberos.auth.v1;

service RBAC {
 / Hot-path predicate. Sub-millisecond Redis cache.
 rpc Check(CheckRequest) returns (CheckResponse);/ Batch variant for module gateways.
 rpc CheckBatch(CheckBatchRequest) returns (CheckBatchResponse);/ Resolve effective permissions for a subject.
 rpc EffectivePermissions(SubjectRef) returns (PermissionSet);
}

message CheckRequest {
 string subject_jwt = 1;/ verified upstream
 string action = 2;/ "rew.write_run"
 string resource = 3;/ "rew/run/2026-05"
 map<string, string> attrs = 4;/ optional ABAC hints (P3+)
}

message CheckResponse {
 bool allow = 1;
 string reason_code = 2;/ "policy.role.cfo_required" | "ok"
 string brain_chain = 3;/ audit row chain hash
}

MCP tool catalogue

Tool nameInputsOutputsAnnotations
cyberos.auth.whoamiSubjectreadonly · scope=auth.read
cyberos.auth.check_permissionaction, resource{allow, reason}readonly
cyberos.auth.list_sessionssubject_idSessionreadonly · scope=auth.session_read
cyberos.auth.revoke_sessionsession_id{ok}destructive · scope=auth.session_revoke · human-confirm
cyberos.auth.grant_rolesubject_id, role_code, expires_at?{ok, audit_row}destructive · scope=auth.role_grant · human-confirm
cyberos.auth.list_decisionssubject_id?, since?, limitAuthDecisionreadonly · scope=auth.audit_read
cyberos.auth.rotate_api_keyapi_key_id{new_key, prefix}destructive · scope=auth.api_key_rotate
6

Key flows

Flow 1 — Human login with WebAuthn passkey

sequenceDiagram autonumber participant U as User (browser) participant SPA as CyberOS SPA participant A as AUTH /oauth/authorize participant W as AUTH /webauthn/* participant PG as PostgreSQL participant K as AWS KMS participant B as BRAIN audit U->>SPA: open app SPA->>A: GET /oauth/authorize?client=cyberos-spa&PKCE_S256=… A-->>SPA: redirect to login form SPA->>W: POST /webauthn/authenticate/begin {email} W->>PG: lookup credential by email PG-->>W: credential metadata W-->>SPA: assertion challenge SPA->>U: navigator.credentials.get(challenge) U->>SPA: signed assertion SPA->>W: POST /webauthn/authenticate/finish {assertion} W->>W: verify signature against stored public key W->>PG: update sign_count, last_seen_at W-->>A: subject_id authenticated A->>K: sign JWT (RS256, 15-min lifetime) K-->>A: signed access + refresh tokens A->>B: append auth.decision {subject, action:"login", decision:"allow"} A-->>SPA: 302 redirect with auth code SPA->>A: POST /oauth/token (code + PKCE verifier) A-->>SPA: {access_token, refresh_token} Note over SPA: subsequent calls carry Bearer JWT

WebAuthn replaces the password leg entirely for elevated roles (CEO / CFO / CHRO / CSO / CLO). TOTP remains as fallback for member-tier accounts ((FR pending)).

Flow 2 — TOTP MFA login (member tier)

sequenceDiagram autonumber participant U as User participant SPA as SPA participant A as AUTH /oauth/authorize participant P as AUTH /password participant T as AUTH /totp/verify participant K as KMS participant B as BRAIN U->>SPA: open app SPA->>A: GET /oauth/authorize A-->>SPA: login form U->>SPA: email + password SPA->>P: verify password (argon2id) P-->>SPA: 200 — TOTP required SPA-->>U: show TOTP prompt U->>SPA: 6-digit code SPA->>T: POST /totp/verify {code} T->>T: window-check (±1 step, replay-list check) alt valid + not replayed T-->>A: subject authenticated A->>K: sign JWT K-->>A: tokens A->>B: auth.decision {decision:"allow"} A-->>SPA: tokens else replayed code T->>B: auth.decision {decision:"deny", reason:"totp.replay"} T-->>SPA: 401 end

Flow 3 — Agent authentication (CUO persona-stamped token)

sequenceDiagram autonumber participant CUO as CUO router participant A as AUTH /oauth/token participant SC as scope.rs (Scope Contract resolver) participant BR as 🧠 BRAIN read participant K as KMS participant B as BRAIN write Note over CUO: CUO needs to invoke a skill on behalf of stephen@… CUO->>A: POST /oauth/token
grant_type=urn:cyberos:agent-impersonation
actor_jwt=cuo-service-jwt
on_behalf_of=stephen@…
persona_version=cuo-v2.3.1 A->>SC: resolve scope for persona cuo-v2.3.1 SC->>BR: read meta/persona/cuo/v2.3.1/scope.md BR-->>SC: {scope_set, valid_to} SC-->>A: effective scope ⊂ user's scope ∩ persona scope A->>K: sign JWT with claims {sub:stephen, act:cuo-v2.3.1, scope:…} K-->>A: signed token A->>B: auth.decision {subject:cuo, action:"impersonate", on_behalf_of:stephen, allow} A-->>CUO: {access_token, ttl:15min} Note over CUO: every downstream call carries this persona-stamped JWT;
BRAIN audit rows capture both `actor` and `on_behalf_of`.

The agent token's effective scope is the intersection of (a) what the user could do, (b) what the persona is granted, and (c) what AUTH's policy allows. Three-way narrowing means an agent can never exceed its user, nor its persona contract.

Flow 4 — RBAC predicate on cross-module call

sequenceDiagram autonumber participant M as Module (e.g. REW) participant AR as Apollo Router participant R as AUTH RBAC.Check (gRPC) participant RC as Redis cache participant PG as PostgreSQL participant B as BRAIN M->>AR: GraphQL mutation rewriteRun(…) AR->>AR: verify JWT signature + audience AR->>R: Check{subject_jwt, action:"rew.write_run", resource:"rew/run/2026-05"} R->>RC: GET subject_effective_permissions: alt cache hit RC-->>R: permission set else cache miss R->>PG: SELECT * FROM subject_effective_permission WHERE subject_id=… PG-->>R: rows R->>RC: SET subject_effective_permissions: TTL=15min end alt allowed R-->>AR: {allow:true, reason:"ok"} AR->>M: forward request R->>B: auth.decision (async) else denied R-->>AR: {allow:false, reason:"policy.role.cfo_required"} AR-->>M: 403 Forbidden R->>B: auth.decision {decision:"deny"} end

Hot-path latency budget: ≤ 8 ms p95 with cache hit, ≤ 25 ms p95 with miss. Audit write is fire-and-forget over a bounded channel.

Flow 5 — Service-account token-exchange (module-to-module)

sequenceDiagram autonumber participant SVC as CHAT service participant A as AUTH /oauth/token participant K as KMS participant B as BRAIN Note over SVC: CHAT needs to call BRAIN's put endpoint SVC->>A: POST /oauth/token
grant_type=client_credentials
client_id=chat-service
client_secret=…
scope=brain.put
audience=brain.cyberos.internal A->>A: verify client secret (argon2id) A->>A: check audience ∈ allowed_audiences A->>K: sign JWT {sub:chat-service, aud:brain, scope:brain.put, exp:15m} K-->>A: token A->>B: auth.decision {subject:chat-service, action:"token_exchange"} A-->>SVC: {access_token, ttl:900}

Service-to-service calls always use client_credentials with an audience claim — receiving services reject tokens not aimed at them, eliminating confused-deputy attacks.

7

Session lifecycle

A session traverses six states from issuance to expiry, with three terminal states (revoked, expired, replaced). Every transition writes an auth.decision row to BRAIN.

stateDiagram-v2 [*] --> Authenticating: user submits credentials Authenticating --> MFA_Required: password verified, 2FA outstanding Authenticating --> Locked: too many failures MFA_Required --> Active: TOTP / WebAuthn assertion verified MFA_Required --> Locked: too many 2FA failures Active --> Refreshing: access_token within 1 min of expiry Refreshing --> Active: rotated refresh + new access issued Refreshing --> Revoked: reuse detected (refresh replay) Active --> Revoked: admin revoke OR impossible-travel challenge Active --> Expired: refresh_token TTL reached (30 d) Active --> Replaced: user logs in from a new device, old session optionally evicted Revoked --> [*] Expired --> [*] Replaced --> [*] Locked --> Active: lockout cleared (15 min auto OR admin unlock)

Token lifetime budget

Token typeLifetimeRotationNotes
Access token (JWT RS256)15 minutesnone (issue new)(FR pending) — short-lived
Refresh token30 daysevery use(FR pending) — replay invalidates session
OAuth authorization code60 secondssingle-usePKCE-bound
Magic link15 minutessingle-use(FR pending) — onboarding only
Agent persona token15 minuteseach tool call may refreshscope ⊂ user × persona × policy
API keyconfigurable (default 90 d)manualargon2id-hashed; ck_live_… prefix
WebAuthn credentialindefiniteuser-initiated revokesign_count tracked to detect cloning
8

Functional Requirements

Wave 2 shipped five FRs (FR-AUTH-001 through FR-AUTH-005) on 2026-05-18. Eleven more land sequentially before P3 · exit. All AUTH FRs land via the feature-request-author + feature-request-audit Agent Skill pair.

FRTitleStatus
FR-AUTH-001Tenant + signing-key migrations (0001_tenants.sql, 0006_signing_keys.sql)shipped · 2026-05-18
FR-AUTH-002Subject + RLS migrations (0003_subjects.sql, 0004_rls_roles.sql, 0005_rls_enable_on_tables.sql) — USING + WITH CHECK on every tenant-scoped tableshipped · 2026-05-18
FR-AUTH-003Admin REST + idempotency layer (POST /v1/admin/tenants, POST /v1/admin/subjects, idempotency-key replay protection via 0002_admin_idempotency.sql)shipped · 2026-05-18
FR-AUTH-004JWT (RS256) + JWKS — POST /v1/auth/token accepts grant_type=password and grant_type=refresh_token; GET /.well-known/jwks.json publishes public key; auto-bootstrap RSA-2048 on AppState boot (keygen.rs) — and the JWT-verification middleware (middleware.rs · verify_jwt + require_scope) gating every admin endpointshipped · 2026-05-18
FR-AUTH-005Admin REST list + revoke + unrevoke — GET /v1/admin/tenants, GET /v1/admin/subjects, POST /v1/admin/subjects/{id}/revoke, POST /v1/admin/subjects/{id}/unrevokeshipped · 2026-05-18
FR-AUTH-10122-role RBAC catalogue (closed enum + permission matrix)planned · next

Additional FRs (WebAuthn enrolment, OIDC SSO, KMS rotation, impossible-travel, HIBP integration, Scope Contract Grants) land sequentially per services/auth/RFC.md slice plan.

9

Non-Functional Requirements

security NFRs all flow through AUTH. Cross-referenced at nfr-catalog.html#auth.

NFR IDConcernTargetMeasurement
N(FR pending)Tenant data leakage incidents= 0 (sev-0)quarterly pen-test + automated cross-tenant test gate
N(FR pending)P1 base-salary system-reduction= 0 — legal commitmentpolicy gate on REW + audit replay
N(FR pending)Agent auto-act on irreversible op without confirm= 0 — runtime checkMCP gateway annotation check; CI test
N(FR pending)Prompt-injection exfiltration via email/document= 0 — CaMeL enforcedCaMeL test suite + red-team
N(FR pending)TLS for all inter-service trafficmTLS in cluster; HTTPS externalconfig inspection + Istio audit
N(FR pending)Penetration-test cadenceannual (P2+); after major releasescontract evidence
N(FR pending)Vulnerability remediation SLOCritical ≤ 24 h; High ≤ 7 dJIRA SLA tracking
N(FR pending)Sub-processor list public on Trust Centeralwaysweb-page diff alert
N(FR pending)OWASP Gen AI Top-10 mitigationsall addressedannual attestation
N(FR pending)RBAC predicate p95 (cache hit)≤ 8 msbench/rbac.rs · nightly
N(FR pending)RBAC predicate p95 (cache miss)≤ 25 msbench/rbac.rs · nightly
N(FR pending)OAuth token endpoint p95≤ 250 msk6 load test
N(FR pending)WebAuthn assertion verify p95≤ 50 msk6 load test
N(FR pending)AUTH availability (28-day)≥ 99.95%SLO monitor (N(FR pending))
N(FR pending)Auth-decision durability0 dropped rows under crashchaos test + BRAIN ledger walk
N(FR pending)Refresh-token reuse detection rate100% (zero false negatives)property-based test
N(FR pending)Session revoke propagation≤ 5 s end-to-end(FR pending) test
10

Dependencies

AUTH depends on three primitives (BRAIN for audit, OBS for telemetry, KMS for the signing key) and is depended on by every module that does anything cross-tenant or cross-module — which is to say, all of them.

graph LR subgraph upstream ["AUTH depends on"] BRAIN["🧠 BRAIN
auth.decision rows"] OBS["👁 OBS
traces + alerts"] KMS["🔑 AWS KMS
RS256 signing key"] PG["🗄 PostgreSQL
identity store"] REDIS["⚡ Redis 7
session + RBAC cache"] end AUTH["🔐 AUTH"] subgraph downstream ["Used by all 22 modules"] AR["Apollo Router
JWT verify @ edge"] MCP["🔌 MCP Gateway
tool-call gating"] AI["🧠 AI Gateway
per-tenant routing"] CHAT["💬 CHAT"] REW["💎 REW"] HR["👥 HR"] CRM["🏢 CRM"] OTHERS["…16 more"] end BRAIN --> AUTH OBS --> AUTH KMS --> AUTH PG --> AUTH REDIS --> AUTH AUTH --> AR AUTH --> MCP AUTH --> AI AUTH --> CHAT AUTH --> REW AUTH --> HR AUTH --> CRM AUTH --> OTHERS classDef shipped fill:#f5ede6,stroke:#45210e classDef planned fill:#fef6e0,stroke:#9c750a class BRAIN shipped class AUTH,AR,MCP,AI,CHAT,REW,HR,CRM,OTHERS,OBS planned class KMS,PG,REDIS shipped
11

Compliance scope

AUTH is the regulator's first stop. The audit chain it produces — co-owned with BRAIN — answers every identity, access, and breach-notification question across PDPL, GDPR, ISO 27001, and SOC 2.

Regulation / standardArticle / clauseAUTH feature that satisfies it
Vietnam PDPL (Law 91/2025)Art. 4 — Lawful processing basisEach subject row records consent basis; each auth.decision carries reason_code.
Vietnam PDPLArt. 14 — DSARcyberos-auth dsar-export bundles identity + decisions for a subject.
Vietnam Decree 13/2023Art. 17 — Personal data processing logauth.decision rows are the processing log for identity events.
Vietnam Decree 53/2022Art. 26 — Data localisation; breach notification ≤ 24hPer-tenant data-residency tag; auto-page MoPS template generator (planned P1).
GDPR (EU 2016/679)Art. 32 — Security of processingWebAuthn L3 · TLS 1.3 · KMS-wrapped signing · row-level security · Argon2id.
GDPRArt. 33 — Breach notificationauth.decision chain provides forensic timeline; alert templates auto-fill.
EU AI Act (Reg. 2024/1689)Art. 14 — Human oversightDestructive tool calls require human-confirm; agent JWTs cannot escalate.
ISO/IEC 27001:2022A.5.16 — Identity managementClosed role catalogue + scope contract grants.
ISO/IEC 27001:2022A.5.17 — Authentication informationArgon2id hashes; KMS-wrapped TOTP secrets.
ISO/IEC 27001:2022A.8.5 — Secure authenticationOAuth 2.1 + PKCE + MFA mandatory.
SOC 2 Type IICC6.1 — Logical accessRBAC predicate at every API boundary + audit chain.
SOC 2 Type IICC6.6 — Restricted accessRLS + scope contract grants narrow agents to a subset of users.
OWASP Gen AI Top-10 (2025)LLM02: Insecure output handlingMCP destructive-confirm gating routed through AUTH RBAC.
OWASP Gen AI Top-10 (2025)LLM08: Excessive agencyThree-way scope narrowing (user × persona × policy).
NIST SP 800-63BAAL2 / AAL3TOTP for AAL2; WebAuthn (phishing-resistant) for AAL3.
12

Risk entries

AUTH-specific risks tracked in the risk register. AUTH carries the highest single-module risk weight; one bug here is platform-wide.

IDRiskLikelihoodImpactOwnerMitigation
R-AUTH-001JWT signing-key compromiseLowCatastrophicCSOKMS-wrapped at rest; rotation every 90 d; old JWKS retained 30 d for verification.
R-AUTH-002Refresh-token replay leading to permanent session hijackLowHighCSORotation-on-use with reuse detection. Reuse → all sessions for that subject revoked.
R-AUTH-003Cross-tenant RLS bypass via subqueryLowCatastrophicCTORLS enabled on every table at migration time; CI test verifies cross-tenant read fails.
R-AUTH-004Agent JWT scope-escalation via prompt injectionMediumHighCSOScope is policy-resolved at AUTH not prompt-derived; injection cannot widen.
R-AUTH-005WebAuthn relying-party-ID mismatch breaks logins after domain changeLowMediumCTORP-ID stored per credential; migration tooling re-enrols passkeys with new RP-ID.
R-AUTH-006HIBP outage during signup blocks new usersMediumLowCTOHIBP check is best-effort; on timeout, log + allow with policy reviewer note.
R-AUTH-007TOTP secret leakage via DB backupLowHighCSOColumn-level KMS encryption; backups encrypted with separate keys.
R-AUTH-008OAuth implementation CVE (e.g., authorization-code injection)MediumHighCSOOAuth 2.1-only (mandatory PKCE); annual pen-test; subscribe to RFC drafts watchlist.
R-AUTH-009Compensation co-sign bypass via direct DB writeLowCatastrophicCSO(FR pending) enforced at REW boundary; direct DB write requires CSO+CFO co-sign; audit alarm.
R-AUTH-010Audit-chain write outage blinds the SOCLowHighCTOBounded local buffer + retry; alert on backlog > 60 s; AUTH refuses new logins after 5 min buffer.
Reordering + stub + Lumi integration risks (per research review)
R-AUTH-011P0 · slice 2 stub stays in production past P3 — the team relies on magic-link for too long, never adopts WebAuthn passkeys; SOC 2 evidence window opens with the stub still primaryHighMediumCTOSOC 2 readiness gate at P0 · exit explicitly requires WebAuthn enrolment for T2+ subjects. P1 exit blocked if > 50% of T2+ subjects haven't enrolled a passkey. Stub-only paths retire at P1 · exit per migration discipline.
R-AUTH-012AI-Gateway-before-AUTH reorder regret — early modules embed mock auth that becomes hard to replace at P0 · slice 2MediumMediumCTOMock-auth at P0 · slice 1 (during AI Gateway slice 1) MUST use a single shared "mock-AUTH" library that lives in services/auth/mock/ with the SAME API as the real AUTH. At P0 · slice 2 slice 1, the mock library imports the real AUTH via feature flag; mock-only paths retire at P0 · exit.
R-AUTH-013Lumi's BRAIN tenant-id claim spoofing — attacker steals an AUTH-issued JWT, modifies tenant_id, presents to Lumi's BRAINLowCatastrophicCSOJWT is RS256-signed; modifying any claim invalidates signature. JWKS endpoint over HTTPS-only with certificate pinning. Lumi's BRAIN re-verifies signature on every request (cached JWKS for JWT-lifetime ≤ 15 min). Audit-chain emission on every Lumi push includes the verifying key's kid.
R-AUTH-014Cross-shard JWT replay — a tenant pinned to sg-1 has its JWT presented to the eu-fra-1 Lumi's BRAIN shardMediumHighCTOJWT carries tenant_residency claim. Each Lumi's BRAIN shard rejects JWTs whose tenant_residency doesn't match its own shard ID. Cross-shard migration requires explicit cross-shard-migrate claim (CDO + CLO co-signed, 1-hour TTL, one-time-use).
R-AUTH-015Sub-process audit-bridge slow path — slice-4 subprocess shim becomes a bottleneck at > 100 auth decisions/sec, audit-chain backlog growsMediumMediumCTOSlice 5 PyO3 binding target < 1ms p99. If slice 4 subprocess hits > 60 s backlog at sustained load, AUTH refuses new logins (per R-AUTH-010). Bench at every slice ship; alert if p95 of subprocess invocation > 50 ms.
R-AUTH-016Tenant-0 bootstrap leak — the one-time enrolment URL for tenant 0's first root-admin is shared, intercepted, or leaks via logsLowCatastrophicCSOOne-time URL has 1-hour expiry and single-use semantics. Bootstrap CLI requires sudo + audit-emit. URL is printed once to stdout (never logged to file). All bootstrap actions audit-chained with the originating shell session ID.
R-AUTH-017PDPL Art. 38 SME grace lapse — tenant ages out of SME 5-year grace (post-2031-01-01) and AUTH hasn't gated the DPO-required workflows by thenHighMediumCLOAUTH tenant config has pdpl_sme_grace_expires_at. At 90 days before expiry, AUTH emits a "DPO required" advisory to the tenant-admin role. At expiry, AUTH refuses to issue tenant-admin login until a DPO subject is enrolled.
13

KPIs

AUTH health rolls up into 9 KPIs covering authentication success, authorisation correctness, performance, and compliance posture.

KPIFormulaSourceTarget
Successful login ratelogins.success / logins.totalauth.decision≥ 99.5% / day
MFA challenge successmfa.success / mfa.totalauth.decision≥ 98%
RBAC predicate p95 (cache hit)Prometheus histogramOBS≤ 8 ms
Token endpoint p95histogramOBS≤ 250 ms
Refresh-reuse detectionscount / 28 dauth.decisiontracked; expect < 5/month
Impossible-travel challengescount / 28 dauth.decisiontracked; alert on > 50/day
Auth-decision durabilityrows_in_brain / rows_emittedchaos test100%
Session revoke p95 (end-to-end)histogramOBS≤ 5 s ((FR pending))
Penetration-test critical findingscountannual report= 0 critical · ≤ 2 high
P0 · slice 2-stub-vs-P3-full + Lumi-integration KPIs
Stub-to-full migration coveragesubjects_with_webauthn / total_T2+_subjectsauth subjects table≥ 50% by P1 · start · ≥ 95% by P1 · exit (P1 exit gate)
Mock-AUTH retirementdays_since_module_X_consumed_mock_authmodule dependency graph + CI0 modules consuming mock-AUTH by P0 · exit (all migrated to real AUTH)
Lumi tenant-id verification success rateverified_jwts / total_lumi_sync_requestsLumi's BRAIN access logs≥ 99.99% — non-verification means JWKS outage
Cross-shard rejection raterejected_cross_shard / totalLumi's BRAIN edge proxy0% under steady state; spike means tenant residency misconfigured
Audit-bridge p99 latencysubprocess shim or PyO3OBSslice 4: ≤ 50 ms · slice 5: ≤ 1 ms (PyO3 target)
SME-grace lapsed tenantscountauth tenant config0 lapsed without DPO enrolled (R-AUTH-017 mitigation)
22-role catalogue stabilityADR-validated role changes / total schema changesgit log + ADRs100% — every role addition is an ADR
14

RACI matrix

AUTH is owned by the CSO seat. Today (CSO vacant), the CEO is interim accountable with the CTO as engineering owner.

ActivityCEOCTOCSOCDOCLODPO
Service design + specACRICI
ImplementationIARIII
On-call rotationIRAIII
Penetration testingCCA/RIII
Role catalogue changesAIRICI
Key rotation (JWT signing)ICA/RIII
Incident response (auth breach)ARRCCR
Decree 53 MoPS notificationCIRIAR
DSAR fulfilment (auth scope)IICRCA

R Responsible · A Accountable · C Consulted · I Informed.

15

Planned CLI surface

A single admin CLI cyberos-auth for tenant operators. Every destructive command writes a chained audit row before exit.

1. Create a tenant

$ cyberos-auth tenant create \
 --slug stephen-personal \
 --display "Stephen Personal" \
 --country VN \
 --residency vn-hanoi-1

[tenant created]
 id: 01HZJ8R4M2K7QXP3F9D8YN7B2T
 slug: stephen-personal
 residency: vn-hanoi-1
 audit: brain seq=14823 chain=9f3e2a1b…8d4c

2. Enrol a passkey (interactive)

$ cyberos-auth webauthn enrol --subject stephen@cyberskill.com

→ open https://auth.cyberos.com/webauthn/enrol?token=eyJhbGc… on the device
✓ credential registered
 credential_id: AaJ9K… (TouchID, MacBook Pro 16, 2023)
 aaguid: adce0002-35bc-c60a-648b-0b25f1f05503

3. Grant a role

$ cyberos-auth role grant \
 --subject acme-cfo@acme.com \
 --role cfo \
 --expires 2027-12-31T00:00:00Z

[role granted] subject=acme-cfo@acme.com role=cfo expires=2027-12-31
[audit] brain seq=14831 chain=b8d4…e3f7

4. Revoke a session

$ cyberos-auth session revoke --id 01HZJ8…JTC

[revoke] session 01HZJ8…JTC → status=revoked
[propagated] Redis cache invalidated; Apollo Router will reject within 5 s
[audit] brain seq=14832 chain=c9e5…f4a8

5. Read auth decisions for a subject (DSAR-style)

$ cyberos-auth decisions list --subject stephen@cyberskill.com --since 7d --format jsonl | head -3
{"ts":"2026-05-13T09:21:08Z","action":"login","decision":"allow","reason":"webauthn.aal3","device":"MacBook Pro 16"}
{"ts":"2026-05-13T09:21:09Z","action":"brain.put","decision":"allow","reason":"role.founder","resource":"memories/decisions/holdco-flip.md"}
{"ts":"2026-05-13T11:04:32Z","action":"rew.write_run","decision":"deny","reason":"policy.cfo_co_sign_required","resource":"rew/run/2026-05"}

6. Rotate the JWT signing key

$ cyberos-auth keys rotate --reason "quarterly-scheduled"

[rotate] new key id: 2026-q2-sig
[kms] old key id 2026-q1-sig retained for verification (30 d)
[jwks] /.well-known/jwks.json updated; downstream propagation: ≤ 60 s
[audit] brain seq=14841 chain=d1f3…a8c7

7. Export DSAR bundle

$ cyberos-auth dsar-export --subject acme-contact@acme.com --output dsar.zip

[dsar] subject: acme-contact@acme.com
[dsar] identity: 1 row
[dsar] decisions: 4,217 rows (28 d)
[dsar] sessions: 12 rows
[dsar] devices: 3 rows
[dsar] webauthn: 0 credentials (none registered)
[dsar] written: dsar.zip (412 KB)
16

Phase status & estimates

Status
Planned
P0 design phase · P0 · slice 1 start
Est. LoC (Rust)
~7,000
services/auth + sqlx migrations
Planned tests
120+
unit · integration · OAuth conformance
External libs
~12
axum · sqlx · webauthn-rs · totp-rs · rsa
CLI subcommands
~25 planned
cyberos-auth entrypoint
P0 budget
~$60/mo
RDS + Redis + Fargate
CapabilityStatus
OAuth 2.1 + PKCE token endpointplanned · P0
WebAuthn L3 (passkeys)planned · P0
TOTP MFAplanned · P0
RBAC predicate gRPC APIplanned · P0
JWT RS256 with KMS signingplanned · P0
Refresh-token rotation + reuse detectionplanned · P0
Service-account client_credentialsplanned · P0
Magic-link onboardingplanned · P0
Audit-chain integration (auth.decision)planned · P0
Agent token-exchange (persona-stamped)planned · P0
OIDC SSO (Google · M365 · Okta)planned · P1
Impossible-travel detectionplanned · P1
Device tracking + new-device emailplanned · P1
Decree 53 MoPS-notification automationplanned · P1
Multi-region active-activeplanned · P3+
SAML 2.0 enterprise SSOplanned · P2+
FIDO2 hardware-key attestationplanned · P2+
17

References

  • services/auth/RFC.md (drafted 2026-05-14) — implementation RFC: 5-slice ship plan · 5 open questions · audit-chain bridge design · DoD. The source-of-truth for everything P0 · slice 2 → P3.
  • services/auth/mockups/sign-in.htmlfirst AUTH UI mockup — Liquid Glass + Umber/Ochre · Be Vietnam Pro · passkey-first flow · MFA chips · BRAIN audit-chain trust footnote. Implements design system Part 21.
  • BRAIN_AUTOSYNC_DESIGN.md §6 — Lumi's BRAIN spec. AUTH issues the tenant-scoped JWT that Lumi consumes for row-level isolation. AUTH is gated by AI Gateway + TEN per the reordered P0 sequence.
  • archive/2026-05-14/RESEARCH_REVIEW.md §2.4 — archive/2026-05-14/RESEARCH_REVIEW.md (archived; see cyberos/CHANGELOG.md). The "AUTH not P0 #1" reorder rationale. Section 2.4 verbatim: "AUTH at P0 · slice 2 can be magic-link + TOTP with a stub RBAC. WebAuthn passkeys for the Founder and a per-tenant authz server can land at P0 · exit without blocking any other module."
  • archive/2026-05-14/AUDIT_AND_PLAN.mdarchive/2026-05-14/AUDIT_AND_PLAN.md (archived; see cyberos/CHANGELOG.md). AUTH's 5-slice progression and slice ordering relative to AI Gateway / MCP Gateway / OBS / CHAT.
  • AUTHORING_DISCIPLINE.mdFR authoring playbook (feature-request-author + audit pair). AUTH FRs land here; FR-AUTH-001 through FR-AUTH-005 all ship via this discipline.
  • AGENTS.md §3.6 + §11 — BRAIN protocol. allowed_brain_scopes + trust model · the basis for AUTH's audit-chain bridge contract.
  • Decree 53/2022/NĐ-CP (Vietnam) — Cybersecurity Law implementing decree; data-residency + breach notification.
  • Decree 13/2023/NĐ-CP (Vietnam) — Personal data processing protection.
  • Law 91/2025/QH15 (Vietnam PDPL) — Personal Data Protection Law.
  • RFC 6749 / draft-ietf-oauth-v2-1 — OAuth 2.1 specification.
  • RFC 7636 — Proof Key for Code Exchange (PKCE).
  • RFC 6238 — Time-Based One-Time Password.
  • RFC 7519 — JSON Web Token (JWT).
  • RFC 8414 — OAuth Authorization Server Metadata.
  • WebAuthn Level 3 — W3C Recommendation 2023.
  • NIST SP 800-63B — Digital Identity Authenticator Assurance Levels.
  • Architecture context: infrastructure.html#auth.

Personas & skill bundles that touch AUTH

AUTH issues 47 distinct agent-persona JWT shapes (one per active CUO persona, plus the human roles in the 22-role RBAC catalogue). The personas below govern AUTH's policy itself, while every other persona consumes the JWTs AUTH issues.

Persona governance (5 of 47)
Skill bundles AUTH gates

Every other CUO persona (42 of 47) consumes AUTH JWTs to read scope-limited BRAIN data per the Scope Contract Grant mechanism. See § RBAC catalogue above.