feat: Add registration performance tooling and LB cache infrastructure by pablomh · Pull Request #230 · redhat-performance/satperf

pablomh · 2026-04-04T21:56:31Z

Summary

registration_metrics.py — standalone Python script (stdlib only) that parses production.log to extract per-registration timing and call-count metrics correlated by consumer UUID. Each metric maps to a specific in-flight optimization PR so improvements can be measured objectively. Supports --inventory (Ansible INI with SSH, journald fallback for foremanctl), --sosreport-dir (local or HTTP URL including workdir-exporter), --no-verify-ssl for internal self-signed certs, --compare, and --cache-stats. Groups output by: Satellite (ssh), Standalone capsules (ssh/mqtt), Load-balanced capsules (ssh).
HAProxy health check — upgrades the host_registration backend (port 9090) from TCP check to HTTP check against the smart-proxy's GET /register/health endpoint. haproxy_registration_fall defaults to 9999 (never removes capsule from rotation during stress tests); set to 3 for production failover.
registration_cache Ansible role — installs Redis (RHEL 9) or Valkey (RHEL 10+, redis:// wire-compatible) on capsule_lbs hosts for smart-proxy shared script cache. Configures bind address, maxmemory, opens firewall port 6379.
Capsule :cache_url configuration — sets :cache_url: redis://<lb_private_ip>:6379/0 in each capsule's smart-proxy registration settings when a load balancer exists in the same location.

Requirements

Smart-proxy companion PR for GET /register/health endpoint must be deployed on capsules before the HAProxy health check is useful.
Smart-proxy companion PR for :cache_url (shared Redis cache) must be deployed on capsules before the registration_cache role takes effect.
Redis (RHEL 9) or Valkey (RHEL 10+) packages available from AppStream on capsule_lbs hosts.
private_ip must be defined for capsule_lbs hosts in the inventory location groups.

Test plan

./scripts/registration_metrics.py -i conf/contperf/inventory.red.ini produces metrics grouped by Satellite/Standalone-ssh/Standalone-mqtt
--sosreport-dir https://workdir-exporter.../sosreport/ --no-verify-ssl fetches and processes archives
--compare before/ after/ shows percentage changes per metric
After running capsule_lbs.yaml: Redis/Valkey running on LB host, port 6379 open, bound to private IP
After running capsules.yaml: /etc/foreman-proxy/settings.d/registration.yml contains :cache_url: on LB-backed capsules
HAProxy stats page shows HTTP health check results for host_registration backend

Adds a standalone Python script (stdlib only, no pip install required) that parses Foreman production.log to extract per-registration timing and call-count metrics, correlated by consumer UUID. Each metric maps to a specific in-flight PR so improvements can be measured objectively as changes land: POST /rhsm/consumers duration -> foreman#XXXXX + katello#XXXXX GET /compliance call count -> katello#XXXXX (compliance caching) GET /rhsm/status call count -> katello#XXXXX (status caching) GET /rhsm/consumers redundant -> katello#XXXXX (eliminate redundant GETs) GET /register P99 -> smart-proxy#XXXXX (script caching) Input modes: --inventory FILE SSH to satellite6 hosts via Ansible INI inventory --sosreport PATH single .tar.xz archive or extracted directory --sosreport-dir local dir or HTTP URL with multiple archives (works directly with workdir-exporter URLs) --log FILE direct path to production.log (plain or .gz) --compare A B compare two sources and print a diff table Rotated and gzipped log files handled transparently. HTTP sosreport archives streamed without writing to disk. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Upgrades the host_registration backend (port 9090) from a bare TCP check to an application-level HTTP health check against the smart-proxy's new GET /register/health endpoint. The endpoint returns 200 if the capsule can reach Foreman, 503 if not. Key design: haproxy_registration_fall defaults to 9999 (effectively never removes a capsule from rotation). During stress tests a capsule under heavy load may transiently fail the health check even though it is still functioning — removing it would concentrate load on the remaining capsules and cause cascading failures. The health check data is still visible in HAProxy stats, making it useful for observability without causing load-induced false failover. To enable production-grade automatic failover (~90s at 30s interval): haproxy_registration_fall: 3 # in inventory or group_vars ssl verify none skips certificate chain validation for the health check connection. The smart-proxy uses a Foreman-issued cert which is not trusted by system CAs; skipping verification avoids distributing the Foreman CA to LB hosts. The connection is still TLS-encrypted and is internal to a trusted private network. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Adds an Ansible role that installs and configures Redis or Valkey on capsule_lbs hosts to serve as the shared script cache for the smart-proxy registration module (:cache_url setting). When multiple capsule nodes serve the same registration parameters behind a load balancer, a single warm request on any node populates the shared cache so all other nodes can skip the Foreman round-trip immediately. Package selection is automatic based on os_major_release: RHEL 9 and earlier → redis (from AppStream) RHEL 10 and later → valkey (Redis fork shipped in RHEL 10; Redis removed due to SSPL license change) Both packages use port 6379 and the same redis:// wire protocol — no changes needed in capsule configuration when upgrading RHEL. Configuration applied: bind 127.0.0.1 <private_ip> allows capsule nodes on private network maxmemory 64mb registration scripts are ~5-10 KB each maxmemory-policy allkeys-lru evict least-recently-used on full Firewall: port 6379/tcp opened on the internal zone. After running this playbook, configure each capsule's smart-proxy: # /etc/foreman-proxy/settings.d/registration.yml :cache_url: redis://<lb_private_ip>:6379/0 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

After installation, set :cache_url in /etc/foreman-proxy/settings.d/ registration.yml on capsules that sit behind a load balancer. This connects each capsule's smart-proxy to the shared Redis/Valkey cache running on the LB host (installed by the registration_cache role), so one warm GET /register request on any capsule benefits all nodes in the pool. The setting is only applied when the capsule's location has an entry in the capsule_lbs group, matching the existing conditional pattern used elsewhere in this role for LB-specific configuration. The LB's private_ip is used (not its hostname) to avoid DNS round-trips on the private network. Cache URL format: redis://<lb_private_ip>:6379/0 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

pablomh and others added 4 commits April 4, 2026 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add registration performance tooling and LB cache infrastructure#230

feat: Add registration performance tooling and LB cache infrastructure#230
pablomh wants to merge 4 commits intoredhat-performance:mainfrom
pablomh:registration-metrics

pablomh commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pablomh commented Apr 4, 2026

Summary

Requirements

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant