feat: Add registration performance tooling and LB cache infrastructure#230
Open
pablomh wants to merge 4 commits intoredhat-performance:mainfrom
Open
feat: Add registration performance tooling and LB cache infrastructure#230pablomh wants to merge 4 commits intoredhat-performance:mainfrom
pablomh wants to merge 4 commits intoredhat-performance:mainfrom
Conversation
Adds a standalone Python script (stdlib only, no pip install required)
that parses Foreman production.log to extract per-registration timing
and call-count metrics, correlated by consumer UUID.
Each metric maps to a specific in-flight PR so improvements can be
measured objectively as changes land:
POST /rhsm/consumers duration -> foreman#XXXXX + katello#XXXXX
GET /compliance call count -> katello#XXXXX (compliance caching)
GET /rhsm/status call count -> katello#XXXXX (status caching)
GET /rhsm/consumers redundant -> katello#XXXXX (eliminate redundant GETs)
GET /register P99 -> smart-proxy#XXXXX (script caching)
Input modes:
--inventory FILE SSH to satellite6 hosts via Ansible INI inventory
--sosreport PATH single .tar.xz archive or extracted directory
--sosreport-dir local dir or HTTP URL with multiple archives
(works directly with workdir-exporter URLs)
--log FILE direct path to production.log (plain or .gz)
--compare A B compare two sources and print a diff table
Rotated and gzipped log files handled transparently. HTTP sosreport
archives streamed without writing to disk.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Upgrades the host_registration backend (port 9090) from a bare TCP check to an application-level HTTP health check against the smart-proxy's new GET /register/health endpoint. The endpoint returns 200 if the capsule can reach Foreman, 503 if not. Key design: haproxy_registration_fall defaults to 9999 (effectively never removes a capsule from rotation). During stress tests a capsule under heavy load may transiently fail the health check even though it is still functioning — removing it would concentrate load on the remaining capsules and cause cascading failures. The health check data is still visible in HAProxy stats, making it useful for observability without causing load-induced false failover. To enable production-grade automatic failover (~90s at 30s interval): haproxy_registration_fall: 3 # in inventory or group_vars ssl verify none skips certificate chain validation for the health check connection. The smart-proxy uses a Foreman-issued cert which is not trusted by system CAs; skipping verification avoids distributing the Foreman CA to LB hosts. The connection is still TLS-encrypted and is internal to a trusted private network. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Adds an Ansible role that installs and configures Redis or Valkey on
capsule_lbs hosts to serve as the shared script cache for the smart-proxy
registration module (:cache_url setting).
When multiple capsule nodes serve the same registration parameters behind
a load balancer, a single warm request on any node populates the shared
cache so all other nodes can skip the Foreman round-trip immediately.
Package selection is automatic based on os_major_release:
RHEL 9 and earlier → redis (from AppStream)
RHEL 10 and later → valkey (Redis fork shipped in RHEL 10; Redis
removed due to SSPL license change)
Both packages use port 6379 and the same redis:// wire protocol —
no changes needed in capsule configuration when upgrading RHEL.
Configuration applied:
bind 127.0.0.1 <private_ip> allows capsule nodes on private network
maxmemory 64mb registration scripts are ~5-10 KB each
maxmemory-policy allkeys-lru evict least-recently-used on full
Firewall: port 6379/tcp opened on the internal zone.
After running this playbook, configure each capsule's smart-proxy:
# /etc/foreman-proxy/settings.d/registration.yml
:cache_url: redis://<lb_private_ip>:6379/0
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
After installation, set :cache_url in /etc/foreman-proxy/settings.d/ registration.yml on capsules that sit behind a load balancer. This connects each capsule's smart-proxy to the shared Redis/Valkey cache running on the LB host (installed by the registration_cache role), so one warm GET /register request on any capsule benefits all nodes in the pool. The setting is only applied when the capsule's location has an entry in the capsule_lbs group, matching the existing conditional pattern used elsewhere in this role for LB-specific configuration. The LB's private_ip is used (not its hostname) to avoid DNS round-trips on the private network. Cache URL format: redis://<lb_private_ip>:6379/0 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
production.logto extract per-registration timing and call-count metrics correlated by consumer UUID. Each metric maps to a specific in-flight optimization PR so improvements can be measured objectively. Supports--inventory(Ansible INI with SSH, journald fallback for foremanctl),--sosreport-dir(local or HTTP URL including workdir-exporter),--no-verify-sslfor internal self-signed certs,--compare, and--cache-stats. Groups output by: Satellite (ssh), Standalone capsules (ssh/mqtt), Load-balanced capsules (ssh).host_registrationbackend (port 9090) from TCP check to HTTP check against the smart-proxy'sGET /register/healthendpoint.haproxy_registration_falldefaults to 9999 (never removes capsule from rotation during stress tests); set to 3 for production failover.capsule_lbshosts for smart-proxy shared script cache. Configures bind address, maxmemory, opens firewall port 6379.:cache_urlconfiguration — sets:cache_url: redis://<lb_private_ip>:6379/0in each capsule's smart-proxy registration settings when a load balancer exists in the same location.Requirements
GET /register/healthendpoint must be deployed on capsules before the HAProxy health check is useful.:cache_url(shared Redis cache) must be deployed on capsules before theregistration_cacherole takes effect.capsule_lbshosts.private_ipmust be defined for capsule_lbs hosts in the inventory location groups.Test plan
./scripts/registration_metrics.py -i conf/contperf/inventory.red.iniproduces metrics grouped by Satellite/Standalone-ssh/Standalone-mqtt--sosreport-dir https://workdir-exporter.../sosreport/ --no-verify-sslfetches and processes archives--compare before/ after/shows percentage changes per metriccapsule_lbs.yaml: Redis/Valkey running on LB host, port 6379 open, bound to private IPcapsules.yaml:/etc/foreman-proxy/settings.d/registration.ymlcontains:cache_url:on LB-backed capsules