Skip to content

feat: decoupled prometheus exporter's calculation and output #12383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

SkyeYoung
Copy link
Member

@SkyeYoung SkyeYoung commented Jun 26, 2025

Description

This PR decouples the calculation and output processes of the Prometheus exporter. The "calculation" is performed in the privileged agent process at intervals defined by the refresh_interval(default is 15s) and written to a shared dict, while the "output" (i.e., the /apisix/prometheus/metrics API) is moved to the worker process, which only reads and returns the cached data in the shared dict.

The above are just the core changes. In fact, I encountered many other problems, which have been commented or annotated in the corresponding positions, and will not be repeated here.

For the testing part, since the Prometheus exporter currently refreshes data every 15 seconds, I used a smaller interval in the relevant tests to pass the original tests.

Which issue(s) this PR fixes:

Fixes #

Stress Testing

How to

  1. install wrk2, git clone apisix
  2. deploy etcd(./test.sh init-etcd), nginx(as upstream, ./test.sh start-nginx)
  3. make run
  4. create 10k routes(./test.sh create)
  5. enable promethues(./test.sh enable-prometheus)
  6. run benchmark(./test.sh benchmark)

Results

Key Performance Indicators

Scenario CPU Usage Memory Usage P50 Latency P90 Latency P99 Latency
Prometheus Disabled 33.0% 144.3MB 3.58ms 5.99ms 8.99ms
Prometheus Enabled 38.3% (+5.2%) 100.2MB (-44.1MB) 4.22ms (+0.64ms) 7.07ms (+1.08ms) 10.66ms (+1.67ms)
1 Metrics Reqs 38.6% (+5.6%) 101.3MB (-42.9MB) 4.25ms (+0.68ms) 7.87ms (+1.87ms) 12.17ms (+3.18ms)
3 Metrics Reqs 39.1% (+6.1%) 100.1MB (-44.2MB) 3.88ms (+0.30ms) 7.17ms (+1.18ms) 11.92ms (+2.93ms)

Performance Impact Summary

Scenario CPU Impact Memory Impact P50 Latency Impact P90 Latency Impact P99 Latency Impact
Prometheus Disabled (Baseline) 0.0% 0.0MB 0.00ms 0.00ms 0.00ms
Prometheus Enabled +5.2% -44.1MB +0.64ms +1.08ms +1.67ms
1 Metrics Reqs +5.6% -42.9MB +0.68ms +1.87ms +3.18ms
3 Metrics Reqs +6.1% -44.2MB +0.30ms +1.18ms +2.93ms
worker_performance_individual latency_metrics_individual

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@@ -454,10 +458,11 @@ local function collect(ctx, stream_only)
local config = core.config.new()

-- config server status
local vars = ngx.var or {}
local hostname = vars.hostname or ""
local hostname = core.utils.gethostname() or ""
Copy link
Member Author

@SkyeYoung SkyeYoung Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because API disabled in the context of ngx.timer, context: ngx.timer.

Comment on lines 224 to 230
-- FIXME:
-- Now the HTTP subsystem loads the stream plugin unintentionally, which shouldn't happen.
-- It breaks the initialization logic of the plugin,
-- here it is temporarily fixed using a workaround.
if ngx.config.subsystem ~= "stream" then
return
end
Copy link
Member Author

@SkyeYoung SkyeYoung Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the comments, the http subsystem also loads the stream plugins. This is an issue that needs to be resolved.

Image Image

Copy link
Contributor

@bzp2010 bzp2010 Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create an issue for this. thx @SkyeYoung

Copy link
Member Author

@SkyeYoung SkyeYoung Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create it a little later

Comment on lines -360 to -367
local enabled = core.table.array_find(http_plugin_names, "prometheus") ~= nil
local active = exporter.get_prometheus() ~= nil
if not enabled then
exporter.destroy()
end
if enabled and not active then
exporter.http_init()
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some description under this comment explaining why we removed it and moved to init and destroy hooks.

Copy link
Member Author

@SkyeYoung SkyeYoung Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code skipped plugin.init() and old_plugin.destroy() used in https://github.com/apache/apisix/blob/6fb9bf94281525c1fca397f681b4890b69440369/apisix/plugin.lua, and implemented the overload of the prometheus plugin for some reason that I have not yet understood (perhaps because prometheus.lua originally did not contain two functions init and destroy).


The initial reason was that even after separating the init_prometheus part and placing it at the end of init_worker, directly calling exporter_timer() would still cause an error. After debugging, another initialization logic was found here. This is obviously redundant.

Currently, we provide init and destroy functions in prometheus.lua, allowing the initialization and reloading of the prometheus plugin to be handled within the plugin's own files, reducing coupling.

This also allows the prometheus plugin to revert to the mechanism provided by plugin.lua, reducing special cases, lowering the cost of understanding, and making the code easier to maintain.

require("apisix.plugins.prometheus.exporter").http_init(prometheus_enabled_in_stream)
elseif not is_http and core.table.array_find(stream_plugin_names, "prometheus") then
require("apisix.plugins.prometheus.exporter").stream_init()
if is_http and (enabled_in_http or enabled_in_stream) then
Copy link
Contributor

@bzp2010 bzp2010 Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE

We will always only handle metrics generation in the http subsystem.

  1. This will ensure that there is no duplication of execution on http and stream to waste compute resources.
  2. This simplifies the design.
  3. Whether or not the user has http enabled (i.e., whether or not it is in stream only mode), an http block for the Prometheus export API and its server block (:9091) will always be present, otherwise Prometheus would be pointless. This means that we can always have an http subsystem context for periodic generation of timers and metrics anyway, even if we are currently in stream only mode.

Copy link
Contributor

@bzp2010 bzp2010 Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments to the code to document the design intent. @SkyeYoung

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to add a link to this PR comment here.

https://github.com/apache/apisix/pull/12383/files#r2221993953

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bzp2010 I think this part of the code can be found by modifying the history, just like the old code.

@@ -35,6 +34,7 @@ local _M = {
priority = 500,
name = plugin_name,
log = exporter.http_log,
destroy = exporter.destroy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE

This will always destroy the plugin (the prometheus instance in it) when reloading it using the Admin API and loading it again based on the latest configuration.
If a reload is performed after a plugin is removed from the profile list, this plugin will not be restored again. Until the next reload.

Technically, exporter.destroy just backs up that instance of the prometheus module and copies it to another variable.
This will cause the export API to stop working, at which point it will always return a {}, which is consistent with the current behavior.
Under the hood, the timer will also stop working, no longer generating metrics based on interval timing, and the metrics computation overhead introduced by APISIX is completely eliminated.
When the next plugin reload occurs, if prometheus is re-enabled, the timer will resume running.

Regarding the background timer introduced by the prometheus third-party library, unfortunately, it never stops running.
It is registered with ngx.timer.every to perform the task of synchronizing the shdict at regular intervals, and this overhead cannot be paused or resumed by external intervention unless we fork and modify the library itself.

So this destruction does not mean that the prometheus instance is actually destroyed, the synchronization timer is stopped, and the shdict is cleared. none of this happens.

@@ -55,4 +55,11 @@ function _M.api()
end


function _M.init()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE

We turned to using the built-in hooks of the plugin system, namely init to initialize the prometheus instance and prometheus metrics registration.

Note, however, that data padding only occurs the first time the plugin is started (it is usually when the worker is started, i.e. the init_prometheus call in init.lua http_init_worker) and every timer.
This initialization just registers the metrics, but doesn't really populate the data.

function _M.init()
local local_conf = core.config.local_conf()
local enabled_in_stream = core.table.array_find(local_conf.stream_plugins, "prometheus")
exporter.http_init(enabled_in_stream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE

The prometheus plugin, loaded by the http subsystem, will register http metrics there, and will decide whether to register stream metrics (xrpc) depending on whether the stream subsystem has been started.
This is mainly for metrics generation needs in privileged processes, stream data is not really reported at any phase in the http subsystem.

local version, err = config:server_version()
if version then
metrics.etcd_reachable:set(1)
if yieldable then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE

Metrics contains the etcd reachability report and the etcd latest modified index report, which relies on communication with etcd.
According to openresty's restrictions, yield is prohibited in the init_worker phase, i.e. cosocket-based communication with etcd is not allowed.
So we skip a capture here until the timer does them.

return
end

if not prometheus then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE

This is used for dynamic disable (by plugin reload API) of the plugin, i.e. done in exporter.destroy.
This is where we stop if the prometheus instance is "destroyed", and as you can see, this will happen before scheduling the next timer task, which means the timer will stop.

Technically, this is the advantage of ngx.timer.at over ngx.timer.every, every is not terminable, the developer can't get an "instance" of a timer to pause or stop it.
But by using ngx.timer.at, we can precisely control whether or not to continue scheduling the next timed task, which allows us to stop the timer. If you need to resume it, just re-execute ngx.timer.at(0).

return
end

exporter_timer(false, false)
Copy link
Contributor

@bzp2010 bzp2010 Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE

The initialization of the timer will perform an acquisition task synchronously, i.e. the first acquisition will always occur in the init_worker phase, which provides initial access to the metrics data.

If at any time , the metrics data (the string in the prometheus-cache shdict) is not available, the API will report an error and log it. This is by design not very likely to happen.

local cached_metrics_text = shdict_prometheus_cache:get(CACHED_METRICS_KEY)
if not cached_metrics_text then
core.log.error("Failed to retrieve cached metrics: data is nil")
return 500, "Failed to retrieve metrics: no data available"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if not prometheus then
core.response.exit(200, "{}")
return core.response.exit(200, "{}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFI, this behavior seems to be inconsistent with what is in get_cached_metrics and we need to confirm which mode should be used. cc @membphis

BTW, prometheus to nil will happen when the plugin is dynamically disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current way is good to me

if not prometheus then
core.response.exit(200, "{}")
return core.response.exit(200, "{}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current way is good to me

@@ -170,6 +170,7 @@ nginx_config: # Config for render the template to generate n
meta:
lua_shared_dict: # Nginx Lua shared memory zone. Size units are m or k.
prometheus-metrics: 15m
prometheus-cache: 10m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add some comments, tell users when they need to modify it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

core.log.error("Failed to collect metrics: ", res)
return
end
shdict_prometheus_cache:set(CACHED_METRICS_KEY, res)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to capture the return value, it maybe fail

if err, tell user what is the reason, and tell the user to change the default size if the shdict is small

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@membphis membphis self-requested a review July 23, 2025 01:34
@SkyeYoung SkyeYoung requested a review from bzp2010 July 23, 2025 01:44
local _, err, forcible = shdict_prometheus_cache:set(CACHED_METRICS_KEY, res)

if err then
core.log.error("Failed to save metrics to shdict: ", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need more information, eg: the name of shdict and the size of value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the value is to large, will fail when we call set method

core.response.set_header("content_type", "text/plain")
return 200, core.table.concat(prometheus:metric_data())
local cached_metrics_text = shdict_prometheus_cache:get(CACHED_METRICS_KEY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just confirm, no "err" is returned?

Copy link
Member Author

@SkyeYoung SkyeYoung Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the part code and found that the second return value, marked as "flags" in its documentation, actually returns an error when an error occurs.

Let me fix this part of the code.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jul 23, 2025
@SkyeYoung SkyeYoung requested a review from membphis July 23, 2025 04:43
membphis
membphis previously approved these changes Jul 23, 2025
Copy link
Member

@membphis membphis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


-- Clear the cached data after cache_exptime to prevent stale data in case of an error.
local _, err, forcible = shdict_prometheus_cache:set(CACHED_METRICS_KEY, res, cache_exptime)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this blank line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

require("apisix.plugins.prometheus.exporter").http_init(prometheus_enabled_in_stream)
elseif not is_http and core.table.array_find(stream_plugin_names, "prometheus") then
require("apisix.plugins.prometheus.exporter").stream_init()
if is_http and (enabled_in_http or enabled_in_stream) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to add a link to this PR comment here.

https://github.com/apache/apisix/pull/12383/files#r2221993953

@SkyeYoung SkyeYoung merged commit 30ae5df into apache:master Jul 25, 2025
39 of 40 checks passed
@SkyeYoung SkyeYoung deleted the young/perf/prometheus-exporter-concurrency branch July 25, 2025 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants