Skip to content

feat: Add registration performance tooling and LB cache infrastructure#230

Open
pablomh wants to merge 4 commits intoredhat-performance:mainfrom
pablomh:registration-metrics
Open

feat: Add registration performance tooling and LB cache infrastructure#230
pablomh wants to merge 4 commits intoredhat-performance:mainfrom
pablomh:registration-metrics

Conversation

@pablomh
Copy link
Copy Markdown
Contributor

@pablomh pablomh commented Apr 4, 2026

Summary

  • registration_metrics.py — standalone Python script (stdlib only) that parses production.log to extract per-registration timing and call-count metrics correlated by consumer UUID. Each metric maps to a specific in-flight optimization PR so improvements can be measured objectively. Supports --inventory (Ansible INI with SSH, journald fallback for foremanctl), --sosreport-dir (local or HTTP URL including workdir-exporter), --no-verify-ssl for internal self-signed certs, --compare, and --cache-stats. Groups output by: Satellite (ssh), Standalone capsules (ssh/mqtt), Load-balanced capsules (ssh).
  • HAProxy health check — upgrades the host_registration backend (port 9090) from TCP check to HTTP check against the smart-proxy's GET /register/health endpoint. haproxy_registration_fall defaults to 9999 (never removes capsule from rotation during stress tests); set to 3 for production failover.
  • registration_cache Ansible role — installs Redis (RHEL 9) or Valkey (RHEL 10+, redis:// wire-compatible) on capsule_lbs hosts for smart-proxy shared script cache. Configures bind address, maxmemory, opens firewall port 6379.
  • Capsule :cache_url configuration — sets :cache_url: redis://<lb_private_ip>:6379/0 in each capsule's smart-proxy registration settings when a load balancer exists in the same location.

Requirements

  • Smart-proxy companion PR for GET /register/health endpoint must be deployed on capsules before the HAProxy health check is useful.
  • Smart-proxy companion PR for :cache_url (shared Redis cache) must be deployed on capsules before the registration_cache role takes effect.
  • Redis (RHEL 9) or Valkey (RHEL 10+) packages available from AppStream on capsule_lbs hosts.
  • private_ip must be defined for capsule_lbs hosts in the inventory location groups.

Test plan

  • ./scripts/registration_metrics.py -i conf/contperf/inventory.red.ini produces metrics grouped by Satellite/Standalone-ssh/Standalone-mqtt
  • --sosreport-dir https://workdir-exporter.../sosreport/ --no-verify-ssl fetches and processes archives
  • --compare before/ after/ shows percentage changes per metric
  • After running capsule_lbs.yaml: Redis/Valkey running on LB host, port 6379 open, bound to private IP
  • After running capsules.yaml: /etc/foreman-proxy/settings.d/registration.yml contains :cache_url: on LB-backed capsules
  • HAProxy stats page shows HTTP health check results for host_registration backend

pablomh and others added 4 commits April 4, 2026 02:03
Adds a standalone Python script (stdlib only, no pip install required)
that parses Foreman production.log to extract per-registration timing
and call-count metrics, correlated by consumer UUID.

Each metric maps to a specific in-flight PR so improvements can be
measured objectively as changes land:

  POST /rhsm/consumers duration  -> foreman#XXXXX + katello#XXXXX
  GET  /compliance call count    -> katello#XXXXX (compliance caching)
  GET  /rhsm/status call count   -> katello#XXXXX (status caching)
  GET  /rhsm/consumers redundant -> katello#XXXXX (eliminate redundant GETs)
  GET  /register P99             -> smart-proxy#XXXXX (script caching)

Input modes:
  --inventory FILE   SSH to satellite6 hosts via Ansible INI inventory
  --sosreport PATH   single .tar.xz archive or extracted directory
  --sosreport-dir    local dir or HTTP URL with multiple archives
                     (works directly with workdir-exporter URLs)
  --log FILE         direct path to production.log (plain or .gz)
  --compare A B      compare two sources and print a diff table

Rotated and gzipped log files handled transparently. HTTP sosreport
archives streamed without writing to disk.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Upgrades the host_registration backend (port 9090) from a bare TCP check
to an application-level HTTP health check against the smart-proxy's new
GET /register/health endpoint.  The endpoint returns 200 if the capsule
can reach Foreman, 503 if not.

Key design: haproxy_registration_fall defaults to 9999 (effectively
never removes a capsule from rotation).  During stress tests a capsule
under heavy load may transiently fail the health check even though it is
still functioning — removing it would concentrate load on the remaining
capsules and cause cascading failures.

The health check data is still visible in HAProxy stats, making it
useful for observability without causing load-induced false failover.

To enable production-grade automatic failover (~90s at 30s interval):
  haproxy_registration_fall: 3   # in inventory or group_vars

ssl verify none skips certificate chain validation for the health check
connection. The smart-proxy uses a Foreman-issued cert which is not
trusted by system CAs; skipping verification avoids distributing the
Foreman CA to LB hosts. The connection is still TLS-encrypted and is
internal to a trusted private network.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Adds an Ansible role that installs and configures Redis or Valkey on
capsule_lbs hosts to serve as the shared script cache for the smart-proxy
registration module (:cache_url setting).

When multiple capsule nodes serve the same registration parameters behind
a load balancer, a single warm request on any node populates the shared
cache so all other nodes can skip the Foreman round-trip immediately.

Package selection is automatic based on os_major_release:
  RHEL 9 and earlier → redis   (from AppStream)
  RHEL 10 and later  → valkey  (Redis fork shipped in RHEL 10; Redis
                                 removed due to SSPL license change)
Both packages use port 6379 and the same redis:// wire protocol —
no changes needed in capsule configuration when upgrading RHEL.

Configuration applied:
  bind 127.0.0.1 <private_ip>   allows capsule nodes on private network
  maxmemory 64mb                 registration scripts are ~5-10 KB each
  maxmemory-policy allkeys-lru   evict least-recently-used on full

Firewall: port 6379/tcp opened on the internal zone.

After running this playbook, configure each capsule's smart-proxy:
  # /etc/foreman-proxy/settings.d/registration.yml
  :cache_url: redis://<lb_private_ip>:6379/0

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
After installation, set :cache_url in /etc/foreman-proxy/settings.d/
registration.yml on capsules that sit behind a load balancer. This
connects each capsule's smart-proxy to the shared Redis/Valkey cache
running on the LB host (installed by the registration_cache role), so
one warm GET /register request on any capsule benefits all nodes in
the pool.

The setting is only applied when the capsule's location has an entry
in the capsule_lbs group, matching the existing conditional pattern
used elsewhere in this role for LB-specific configuration. The LB's
private_ip is used (not its hostname) to avoid DNS round-trips on the
private network.

Cache URL format: redis://<lb_private_ip>:6379/0

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant