Skip to content

RFC0055 Identity-Aware Routing#535

Merged
ameowlia merged 76 commits into
developfrom
feature/app-to-app-mtls-routing
Jun 16, 2026
Merged

RFC0055 Identity-Aware Routing#535
ameowlia merged 76 commits into
developfrom
feature/app-to-app-mtls-routing

Conversation

@rkoster

@rkoster rkoster commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

RFC0055: Identity-Aware mTLS Routing

Implements Phase 1 (1a + 1b) of RFC0055: App-to-App mTLS Routing.

Tracking: cloudfoundry/community#1481
Acceptance Testing Guide: https://gist.github.com/rkoster/5b252b0edca606f10be2dbdcb81a796f

What This Does

Enables GoRouter to enforce mutual TLS and identity-based authorization on a per-domain basis. Apps calling routes on configured mTLS domains must present a valid Diego instance identity certificate. GoRouter extracts the caller's app/space/org identity and checks it against route policies before forwarding the request.

Phase 1a: mTLS Infrastructure

  • Per-domain TLS configuration via GetConfigForClient callback (SNI-based)
  • Domain-specific client certificate validation against configurable CA
  • Domain-aware XFCC header handling with two formats:
    • raw: base64-encoded full certificate (~1.5KB)
    • envoy: compact Hash=...;Subject="..." format (~250 bytes)
  • SNI/Host mismatch protection (prevents connection reuse attacks across domains)
  • BOSH job properties for router.domains

Phase 1b: Authorization

  • Identity extraction from Diego instance identity certificates (Subject DN OUs + SPIFFE URIs)
  • Pre-selection auth: validates mTLS domain, client cert presence, identity extraction
  • Post-selection auth: enforces route policies (scope and allowed_sources) against selected endpoint
  • Supports authorization at app, space, and org granularity
  • Default deny when no route policies are configured
  • RTR access logs emitted for denied requests (401/403)

Key Design Decisions

  • Two-layer authorization: Pre-selection (before endpoint is chosen) handles domain/cert/identity checks. Post-selection (after load balancer picks a backend) handles scope and route-policy checks against the specific endpoint's tags.
  • Feature is dormant by default: No behavior change unless router.domains is configured in the BOSH manifest and a shared domain with --enforce-access-rules is created.
  • No regression on existing traffic: Non-mTLS domains are completely unaffected.

Testing

  • Unit tests for all new handlers and config validation
  • Integration tests for end-to-end mTLS routing flows
  • BOSH template tests for configuration rendering
  • CI runs go fmt, go vet, staticcheck, ginkgo with --race

Configuration Example

# BOSH manifest (via ops-file)
router:
  domains:
    - name: "*.apps.identity"
      ca_certs: "((diego_instance_identity_ca.certificate))"
      forwarded_client_cert: sanitize_set
      xfcc_format: envoy

Related PRs

Component PR Status
cloud_controller_ng cloudfoundry/cloud_controller_ng#4910 Open
capi-release cloudfoundry/capi-release#625 Open
CLI cloudfoundry/cli#3758 Draft

Merge Ordering

All PRs are independently safe to merge — the feature is dormant without the ops-file and domain creation. No strict ordering required. Recommend merging around the same time once all are approved.

AI Disclosure

This PR was developed with AI assistance. All code has been read and verified manually. Each error path, branch, and edge case has corresponding test coverage.

@rkoster

rkoster commented Apr 16, 2026

Copy link
Copy Markdown
Contributor Author

Latest Update: RFC-Compliant Post-Selection Authorization

Implemented breaking change to replace pre-selection authorization with strict post-selection enforcement per RFC lines 475-517.

Key Changes (commit cbf0695)

Architecture:

  • ✅ Composable PostSelectionHandler interface for middleware pipeline
  • ✅ Separation of pre-selection checks (SNI, route lookup, identity) from post-selection authorization
  • ✅ Immediate 403 on authorization failure (non-retriable, per RFC)
  • ✅ Post-selection scope checking with :post-selection suffix in metrics

Implementation:

  • handlers/post_selection_pipeline.go - Infrastructure for composable checks
  • handlers/mtls_scope_auth.go - Org/space boundary enforcement
  • handlers/mtls_access_rules_auth.go - Access rules evaluation (cf:app:, cf:space:, etc.)
  • handlers/mtls_pre_auth.go - Pre-selection checks only
  • handlers/mtls_auth_error.go - Custom error type with Rule/Reason/HTTPStatus

Test Coverage:

  • +44 new tests (14 scope + 17 access rules + 13 pipeline)
  • +4 integration tests for shared route scenarios
  • All 393 tests passing

RFC Compliance

Intermittent 403s - Expected for shared routes across scope boundaries (RFC-compliant)
Error messages - Include "caller org X does not match selected backend org Y"
Strict enforcement - Prevents unauthorized cross-scope access

Breaking Change

⚠️ This replaces the permissive pre-selection authorization entirely. No feature flag provided as this is a security improvement required by the RFC.

Deprecated:

  • handlers/mtls_authorization.go (old implementation with migration notes)
  • route/pool.go EndpointOrgIDs/SpaceIDs methods

Integration Test Results

All integration tests compile successfully. Shared route scenarios validate:

  • Intermittent 403s with scope=space (different spaces in same org)
  • Always succeed with scope=org (same org, different spaces)
  • Always fail with scope=org (different orgs)
  • Per-endpoint access rules with intermittent behavior

Ready for full integration test run and review.

@rkoster

rkoster commented Apr 16, 2026

Copy link
Copy Markdown
Contributor Author

Refactoring: AuthError for Future Extensibility

Commit: 4ff64b9

Renamed MtlsAuthError to AuthError to prepare for future authentication methods beyond mTLS, such as SPIFFE JWT tokens.

Changes

  • ✅ Renamed handlers/mtls_auth_error.gohandlers/auth_error.go
  • ✅ Updated struct, constructor functions, and all references
  • ✅ Changed error messages from "mTLS authorization denied" to "authorization denied"
  • ✅ Updated all test files

Benefits

  • 🔮 Future-proof: Ready for SPIFFE JWT token authentication
  • 🏗️ Generic design: Error type not tied to specific auth mechanism
  • 🧩 Reusable: Can be used across different authentication methods
  • Clean: Better naming convention for authorization errors

No functional changes - pure refactoring for extensibility.

@rkoster rkoster force-pushed the feature/app-to-app-mtls-routing branch 3 times, most recently from 1f9b804 to 79271b7 Compare April 17, 2026 12:12
@rkoster rkoster force-pushed the feature/app-to-app-mtls-routing branch from 5cc4170 to b875867 Compare April 20, 2026 09:18
rkoster added 23 commits June 10, 2026 18:38
…line

- Fix dead-code bug: skip internal error handler for *AuthError in
  proxy_round_tripper so ReverseProxy.ErrorHandler can write the 403
- Fix error leak: replace err.Error() with generic status text in
  fallback error handler to avoid exposing internal details
- Extract handleReverseProxyError() as testable package-level function
- Add unit tests for handleReverseProxyError (proxy_error_handler_test.go)
- Add post-selection pipeline tests in proxy_round_tripper_test.go
- Add Layer 0 security branch test in mtls_pre_auth_test.go
Add ERB template validation that raises a deployment error when
xfcc_format is configured alongside forwarded_client_cert: always_forward
on an mTLS domain. In always_forward mode the XFCC header is passed
through untouched, so xfcc_format has no effect and the combination
indicates operator misconfiguration.

Add rspec coverage for the new validation and surrounding valid
combinations (sanitize_set+envoy, always_forward alone, xfcc_format
without explicit forwarded_client_cert).
Previously this combination was only rejected by the BOSH template at
deploy time. With gorouter now used outside of BOSH (cf-on-kind), the
Go config must also enforce this constraint.

Also removes dead code in GetMtlsDomainConfig wildcard matching where
the strings.Contains check was redundant due to SplitN guarantees.
…ort/readyreader to routing-api BOSH package spec
- Rename caller_app/space/org → caller_cf_app/space/org for clarity
- Remove auth, auth_rule, auth_denied_reason fields (not needed)
- Always emit tls_sni and caller_cf_* fields with "-" when empty
- Removes conditional emission that caused inconsistent log output
Per-request denial log statements (mtls-route-policies-denied,
mtls-pre-auth-denied, mtls-scope-auth-denied, post-selection-auth-denied)
now log at DEBUG level to avoid log volume amplification in production.

The access log already captures all denial information via caller_cf_*
fields and HTTP status codes. These DEBUG logs remain available for
local debugging when operators enable debug-level logging.
…ation tests

- Update router.client_cert_validation description to note that router.domains
  enforce mTLS independently
- Update router.domains description to clarify relationship with
  router.client_cert_validation
- Add rspec tests for all ERB template validation branches: non-array input,
  non-hash entry, missing/empty name, missing/empty ca_certs, invalid
  forwarded_client_cert mode, and invalid xfcc_format value

Addresses PR #535 review threads 1-8.
- Rename identityHandler to cfIdentityHandler / NewCfIdentity to clarify
  it is specific to CF app instance identity certificates (thread 9)
- Guard identity extraction: only run when (1) TLS was used and (2) the
  host is a configured mTLS domain, preventing XFCC header spoofing on
  non-mTLS routes (thread 10)
- Move MtlsPreAuth handler above ClientCert in the proxy chain so a 421
  response skips unnecessary certificate processing (thread 11)
- Use configured xfcc_format from domain config instead of auto-detecting
  format at runtime; reject if format doesn't match (thread 12)

All 386 handler tests and 179 proxy tests passing.
Split MtlsPreAuth into MtlsSniCheck (early 421) and MtlsPreAuth (post-
CfIdentity 403) to fix the handler ordering regression from ac2e87e
where moving MtlsPreAuth above ClientCert/CfIdentity caused CallerIdentity
to always be nil, denying all mTLS app-to-app requests with 403.

Handler chain order is now:
  Lookup → MtlsSniCheck → ClientCert → CfIdentity → MtlsPreAuth

Additional PR #535 review feedback addressed:
- Add route_policy field to access logs (renamed from auth_rule, always
  emitted with '-' when empty)
- Remove per-request denial/granted log statements entirely (they
  duplicate access log information)
- Move routePolicies/routePolicyScope from endpoint to pool-level fields
  to avoid stale data and reduce mutex contention

All handler, access log, route, and proxy tests passing.
Cover the behavior introduced when moving route policy fields from
endpoint to pool level: initial state, Put updates, re-Put updates,
persistence after Remove, and default-deny (empty policies with scope).
DNS hostnames are case-insensitive per RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt),
but IsMtlsDomain() and GetMtlsDomainConfig() used case-sensitive map lookups.
This caused mTLS domain matching to fail when clients sent uppercase or
mixed-case hostnames in the Host header or SNI field.

Fix by normalizing domain names to lowercase both when storing in
mtlsDomainMap (in processMtlsDomains) and when looking up in
GetMtlsDomainConfig.

Added unit tests covering:
- Wildcard domain matching with uppercase host
- Exact domain matching with mixed case host
- Matching with uppercase host and port
- IsMtlsDomain with various case combinations
The route policies auth handler was using pool-level policies instead of
endpoint-level policies. This caused authorization failures when multiple
endpoints on the same route have different route policies (e.g., backend-1
allows app-1, backend-2 allows app-2).

Now uses the selected endpoint's RoutePolicies field which is already
passed to the Check method, enabling per-endpoint authorization decisions.

Fixes CI test: allows only the specified app and denies others (per-endpoint rules)
Address PR #535 review threads 14, 15, 16:

- Normalize domain.Domain to lowercase when storing in mtlsDomainMap,
  not just the map key (thread 16)
- Make domainMatches() case-insensitive by lowercasing both hostname
  and pattern before comparison (thread 15)

Added 9 new tests covering mixed-case domain configuration and
hostname matching. All 569 tests passing (175 config + 394 handlers).
…g standards

Remove duplicative per-request logs that violate gorouter logging standards.
Access logs already capture all necessary information via status codes
and the caller_cf_* fields.

Removed log statements:
- clientcert.go: using-mtls-domain-xfcc-config (Debug)
- mtls_sni_check.go: mtls-enforcement-mismatch (Warn) x2
- mtls_scope_auth.go: mtls-scope-auth-no-route-pool (Error)
- mtls_route_policies_auth.go: mtls-route-policies-auth-no-route-pool (Error)
- router.go: mtls-domain-detected (Debug)

Addresses PR #535 comment: #535 (comment)
mTLS handler constructors now return NoopHandler/NoopPostSelectionHandler
when len(cfg.Domains) == 0, avoiding unnecessary handler instantiation.

This keeps the conditional logic encapsulated in the handler package
rather than coupling proxy setup to handler internals.

Handlers affected:
- NewMtlsSniCheck -> NoopHandler
- NewCfIdentity -> NoopHandler
- NewMtlsPreAuth -> NoopHandler
- NewMtlsScopeAuth -> NoopPostSelectionHandler
- NewMtlsRoutePoliciesAuth -> NoopPostSelectionHandler

Tests added to each handler's test file verifying constructor behavior.
Move tls_sni, caller_cf_app, caller_cf_space, caller_cf_org, and
route_policy from always-present to the router.logging.extra_access_log_fields
opt-in mechanism. Foundations not using the mTLS feature will no longer
see these fields (always '-') in every access log line, preventing
breakage of existing log parsers.

Update the spec description to list all five new available field names.
Add three test cases covering populated values, empty (dash) values,
and absence from output when not listed in ExtraFields.

Addresses PR #535 review thread 18.
…in ERB

Add test cases for all five previously untested error paths in
processMtlsDomains (threads 19-23):
- invalid forwarded_client_cert value
- invalid xfcc_format value
- ca_certs containing invalid PEM data
- ca_certs empty/missing
- domain name empty

Normalize forwarded_client_cert and xfcc_format values to lowercase in
the ERB template before validation, so mixed-case input from operators
(e.g. 'Sanitize_Set') is accepted consistently.

Also remove debug annotation from GetMtlsDomainConfig test context
string (thread 24).

Addresses PR #535 review threads 19-24.
Add tests for the three new fields populated in the access log handler
from RequestInfo (thread 25):
- CallerIdentity -> CallerCFApp, CallerCFSpace, CallerCFOrg
- AuthResult -> RoutePolicy
- TlsSNI -> TlsSNI

Each field is tested with a value present and with the source nil/empty
to verify correct zero-value behaviour.

Addresses PR #535 review thread 25.
Addresses PR review feedback:

- Move post-selection authorization (scope auth, route-policy auth,
  pipeline, AuthError, no-op handler) into a dedicated
  handlers/postselection package. RequestInfo, CallerIdentity, and
  AuthResult remain in handlers to avoid an import cycle.
- fix: authorize route policies against the pool-level
  RoutePool.RoutePolicies() instead of the selected endpoint's
  per-endpoint copy, which can be stale on routes shared across
  backends.
- fix: log unrecognized route-policy rules at warn level
  (malformed-route-policy) instead of silently skipping them.
- test: assert CfIdentity leaves CallerIdentity unset on non-mTLS
  domains and when no RequestInfo is present; rewrite shared-route
  specs to verify pool-level authorization and stale per-endpoint
  denial.
The postselection refactor moved handlers into a new
code.cloudfoundry.org/gorouter/handlers/postselection package, but the
gorouter BOSH package spec was not updated to vendor it. This caused the
compile step to fail with:

  proxy.go:27:2: cannot find module providing package
  code.cloudfoundry.org/gorouter/handlers/postselection:
  import lookup disabled by -mod=vendor

Add the missing gosub entry so the package is included in the release.
Comment thread src/code.cloudfoundry.org/gorouter/integration/identity_aware_routing_test.go Outdated
Comment thread src/code.cloudfoundry.org/gorouter/integration/identity_aware_routing_test.go Outdated
rkoster added 2 commits June 12, 2026 20:39
ContainSubstring("tls") matched any TLS error; pin to the exact
Go TLS alert names for clearer test intent:
- "tls: certificate required"  (no client cert)
- "tls: unknown certificate authority" (cert from unknown CA)
The test constructed a scenario where two backends of the same route
have different route policies, which cannot happen in production —
CAPI enforces policies per-route at registration time so all backends
of a route always carry identical policies.

The pool's routePolicies field is last-writer-wins: after backend-2
registered, the pool held only ["cf:app:allowed-app-2"], so a caller
with allowed-app-1 would always get 403. The test asserted intermittent
allow/deny behaviour that was impossible under the current pool
implementation and would always fail.

Add a comment in pool.go explaining the last-writer-wins invariant.
@ameowlia

Copy link
Copy Markdown
Member

✅ After identity domain is created, but before route policies are created, it shows up in the gorouter routing table...

  "backend.apps.identity": [
    {
      "address": "10.10.0.10:61000",
      "availability_zone": "null",
      "protocol": "http1",
      "tls": true,
      "ttl": 120,
      "tags": {
        "app_id": "07a9f703-300b-40e3-81be-2f0060f7f676",
        "app_name": "backend",
        "component": "route-emitter",
        "instance_id": "0",
        "organization_id": "f77f7566-0895-42f9-b0a7-cde95989fb1b",
        "organization_name": "o",
        "process_id": "07a9f703-300b-40e3-81be-2f0060f7f676",
        "process_instance_id": "8124e938-9e7c-4409-7732-81a1",
        "process_type": "web",
        "source_id": "07a9f703-300b-40e3-81be-2f0060f7f676",
        "space_id": "bf561092-f646-408c-ab70-3e95f597f076",
        "space_name": "s"
      },
      "private_instance_id": "8124e938-9e7c-4409-7732-81a1",
      "server_cert_domain_san": "8124e938-9e7c-4409-7732-81a1",
      "load_balancing_algorithm": "round-robin",
      "route_policy_scope": "any"
    }
  ],

@ameowlia

ameowlia commented Jun 16, 2026

Copy link
Copy Markdown
Member

✅ After route policies are created, it shows up in the gorouter routing table...

  "backend.apps.identity": [
    {
      "address": "10.10.0.10:61000",
      "availability_zone": "null",
      "protocol": "http1",
      "tls": true,
      "ttl": 120,
      "tags": {
        "app_id": "07a9f703-300b-40e3-81be-2f0060f7f676",
        "app_name": "backend",
        "component": "route-emitter",
        "instance_id": "0",
        "organization_id": "f77f7566-0895-42f9-b0a7-cde95989fb1b",
        "organization_name": "o",
        "process_id": "07a9f703-300b-40e3-81be-2f0060f7f676",
        "process_instance_id": "8124e938-9e7c-4409-7732-81a1",
        "process_type": "web",
        "source_id": "07a9f703-300b-40e3-81be-2f0060f7f676",
        "space_id": "bf561092-f646-408c-ab70-3e95f597f076",
        "space_name": "s"
      },
      "private_instance_id": "8124e938-9e7c-4409-7732-81a1",
      "server_cert_domain_san": "8124e938-9e7c-4409-7732-81a1",
      "load_balancing_algorithm": "round-robin",
      "route_policy_scope": "any",
      "route_policies": [
        "cf:app:59f7aef6-9bb2-4c28-9daf-bd9bbbc1733a"
      ]
    }
    ```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants