Skip to content

Add new metrics #342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,28 @@ jobs:
- "2.5"
- "2.6"
- "2.7"
- "2.8"
cartridge: [ "", "1.2.0", "2.1.2", "2.4.0", "2.5.1", "2.6.0", "2.7.3" ]
include:
- tarantool: "2.x-latest"
cartridge: "2.7.3"
- tarantool: "2.x-latest"
cartridge: ""
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- uses: tarantool/setup-tarantool@v1
if: matrix.tarantool != '2.x-latest'
with:
tarantool-version: ${{ matrix.tarantool }}

- name: Install latest pre-release Tarantool 2.x
if: matrix.tarantool == '2.x-latest'
run: |
curl -L https://tarantool.io/pre-release/2/installer.sh | bash
sudo apt-get -y install tarantool

- name: lint
run: make lint
env:
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `tnt_cpu_number` (same as `tnt_cpu_count`)
- `tnt_cpu_time` (same as `tnt_cpu_total`)
- `tnt_vinyl_scheduler_dump_total` (same as `tnt_vinyl_scheduler_dump_count`)
- `tnt_replication_lag`
- `tnt_vinyl_regulator_blocked_writers`
- `tnt_net_requests_in_progress_total`
- `tnt_net_requests_in_progress_current`
- `tnt_net_requests_in_stream_total`
- `tnt_net_requests_in_stream_current`
- `tnt_replication_lsn`

### Deprecated

Expand All @@ -43,6 +50,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `tnt_cpu_count`
- `tnt_cpu_total`
- `tnt_vinyl_scheduler_dump_count`
- `tnt_replication_<id>_lag`
- `tnt_replication_master_<id>_lsn`
- `tnt_replication_replica_<id>_lsn`

## [0.12.0] - 2021-11-18
### Changed
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ rpm:

.rocks: metrics-scm-1.rockspec
tarantoolctl rocks make
tarantoolctl rocks install luatest 0.5.6
tarantoolctl rocks install luatest 0.5.7
tarantoolctl rocks install luacov 0.13.0
tarantoolctl rocks install luacheck 0.26.0
if [ -z $(CARTRIDGE_VERSION) ]; then \
Expand Down
46 changes: 40 additions & 6 deletions doc/monitoring/metrics_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,33 @@ Requests:
* - ``tnt_net_requests_current``
- Number of pending network requests

Requests in progress:

.. container:: table

.. list-table::
:widths: 25 75
:header-rows: 0

* - ``tnt_net_requests_in_progress_total``
- Total count of requests processed by tx thread
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requests_in_progress
count of requests processed

It seems like either a mistake or a bad naming

* - ``tnt_net_requests_in_progress_current``
- Count of requests currently being processed in the tx thread

Requests placed in queues of streams:

.. container:: table

.. list-table::
:widths: 25 75
:header-rows: 0

* - ``tnt_net_requests_in_stream_total``
- Total count of requests, which was placed in queues of streams
for all time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for all time
for the whole time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or for the instance lifetime. We should have many metrics like this, choose something consistent

* - ``tnt_net_requests_in_stream_current``
- Count of requests currently waiting in queues of streams

.. _metrics-reference-fibers:

Fibers
Expand Down Expand Up @@ -316,12 +343,16 @@ Learn more about :ref:`replication in Tarantool <replication-mechanism>`.
- LSN number in vclock.
This metric always has the label ``{id="id"}``,
where ``id`` is the instance's number in the replica set.
* - ``tnt_replication_replica_<id>_lsn`` / ``tnt_replication_master_<id>_lsn``
- LSN of the master/replica, where
``id`` is the instance's number in the replica set.
* - ``tnt_replication_<id>_lag``
- Replication lag value in seconds, where
``id`` is the instance's number in the replica set.
* - ``tnt_replication_lsn``
- LSN of the tarantool instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- LSN of the tarantool instance.
- LSN of a tarantool instance.

This metric always has labels ``{id="id", type="type"}``, where
``id`` is the instance's number in the replica set,
``type`` is ``master`` or ``replica``.
* - ``tnt_replication_lag``
- Replication lag value in seconds.
This metric always has labels ``{id="id", stream="stream"}``,
where ``id`` is the instance's number in the replica set,
``stream`` is ``downstream`` or ``upstream``.

.. _metrics-reference-runtime:

Expand Down Expand Up @@ -556,6 +587,9 @@ efficient.
The value is slightly smaller
than the amount of memory allocated for vinyl trees,
reflected in the :ref:`vinyl_memory <cfg_storage-vinyl_memory>` parameter.
* - ``tnt_vinyl_regulator_blocked_writers``
- The number of fibers that are blocked waiting
for Vinyl level0 memory quota.

Transactional activity
~~~~~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion metrics/init.lua
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,8 @@ return {
enable_default_metrics = function(include, exclude)
log.warn('metrics tnt_net_sent_rps, tnt_net_received_rps, tnt_net_connections_rps, '..
'tnt_net_requests_rps, tnt_stats_op_rps, tnt_space_count, tnt_fiber_count, ' ..
'lj_gc_total, tnt_cpu_count, tnt_cpu_total, tnt_vinyl_scheduler_dump_count ' ..
'lj_gc_total, tnt_cpu_count, tnt_cpu_total, tnt_vinyl_scheduler_dump_count, ' ..
'tnt_replication_<id>_lag, tnt_replication_master/replica_<id>_lsn ' ..
'are deprecated and will be removed in next releases.')
require('metrics.tarantool').enable(include, exclude)
end,
Expand Down
12 changes: 9 additions & 3 deletions metrics/tarantool/info.lua
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,15 @@ local function update_info_metrics()

for k, v in ipairs(info.replication) do
if v.upstream ~= nil then
local metric_name = 'replication_' .. k .. '_lag'
collectors_list[metric_name] =
utils.set_gauge(metric_name, 'Replication lag for instance ' .. k, v.upstream.lag)
local metric_name_old = 'replication_' .. k .. '_lag'
collectors_list[metric_name_old] =
utils.set_gauge(metric_name_old, 'Replication lag for instance ' .. k, v.upstream.lag)
collectors_list.replication_lag =
utils.set_gauge('replication_lag', 'Replication lag', v.upstream.lag, {stream = 'upstream', id = k})
end
if v.downstream ~= nil then
collectors_list.replication_lag =
utils.set_gauge('replication_lag', 'Replication lag', v.downstream.lag, {stream = 'downstream', id = k})
end
end

Expand Down
19 changes: 19 additions & 0 deletions metrics/tarantool/network.lua
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,25 @@ local function update_network_metrics()
collectors_list.net_requests_current =
utils.set_gauge('net_requests_current', 'Pending requests', box_stat_net.REQUESTS.current)
end

if box_stat_net.REQUESTS_IN_PROGRESS ~= nil then
collectors_list.net_requests_in_progress_total =
utils.set_counter('net_requests_in_progress_total', 'Requests in progress total amount',
box_stat_net.REQUESTS_IN_PROGRESS.total)
collectors_list.net_requests_in_progress_current =
utils.set_gauge('net_requests_in_progress_current',
'Count of requests currently being processed in the tx thread', box_stat_net.REQUESTS_IN_PROGRESS.current)
end

if box_stat_net.REQUESTS_IN_STREAM_QUEUE ~= nil then
collectors_list.net_requests_in_stream_total =
utils.set_counter('net_requests_in_stream_queue_total',
'Total count of requests, which was placed in queues of streams',
box_stat_net.REQUESTS_IN_STREAM_QUEUE.total)
collectors_list.net_requests_in_stream_current =
utils.set_gauge('net_requests_in_stream_queue_current',
'count of requests currently waiting in queues of streams', box_stat_net.REQUESTS_IN_STREAM_QUEUE.current)
end
end


Expand Down
9 changes: 9 additions & 0 deletions metrics/tarantool/replicas.lua
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ local function update_replicas_metrics()
local lsn = replication_info.lsn
local metric_name = 'replication_replica_' .. k .. '_lsn'
collectors_list[metric_name] = utils.set_gauge(metric_name, 'lsn for replica ' .. k, lsn - v)

collectors_list.replication_lsn =
utils.set_gauge('replication_lsn', 'lsn for instance', lsn - v, {type = 'replica', id = k})
end
end
else
Expand All @@ -29,6 +32,12 @@ local function update_replicas_metrics()
'lsn for master ' .. k,
current_box_info.lsn - lsn
)
collectors_list.replication_lsn = utils.set_gauge(
'replication_lsn',
'lsn for instance',
current_box_info.lsn - lsn,
{type = 'master', id = k}
)
end
end
end
Expand Down
5 changes: 5 additions & 0 deletions metrics/tarantool/vinyl.lua
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ local function update()
collectors_list.vinyl_regulator_dump_watermark =
utils.set_gauge('vinyl_regulator_dump_watermark', 'Point when dumping must occur',
vinyl_stat.regulator.dump_watermark)
if vinyl_stat.regulator.blocked_writers ~= nil then
collectors_list.vinyl_regulator_blocked_writers =
utils.set_gauge('vinyl_regulator_blocked_writers', 'The number of fibers that are blocked waiting ' ..
'for Vinyl level0 memory quota', vinyl_stat.regulator.blocked_writers)
end

collectors_list.vinyl_tx_conflict =
utils.set_gauge('vinyl_tx_conflict', 'Count of transaction conflicts', vinyl_stat.tx.conflict)
Expand Down
7 changes: 6 additions & 1 deletion test/tarantool/vinyl_test.lua
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,10 @@ g.test_vinyl_metrics_present = function()
local metrics_cnt = fun.iter(metrics.collect()):filter(function(x)
return x.metric_name:find('tnt_vinyl')
end):length()
t.assert_equals(metrics_cnt, 20)
if utils.is_version_less(_TARANTOOL, '2.8.3')
and utils.is_version_greater(_TARANTOOL, '2.0.0') then
t.assert_equals(metrics_cnt, 20)
else
t.assert_equals(metrics_cnt, 21)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, do we really should get there if Tarantool is 1.10.x? Condition says that we will.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, seems that was backported to the latest 1.10 version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's interesting. But I still think we should add strict version comparison here for 1.10

end
end
12 changes: 8 additions & 4 deletions test/utils.lua
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,13 @@ function utils.find_metric(metric_name, metrics_data)
return #m > 0 and m or nil
end

local function to_number_multiple(...)
return unpack(fun.map(tonumber, {...}):totable())
end

function utils.is_version_less(ver_str, reference_ver_str)
local major, minor, patch = string.match(ver_str, '^(%d+).(%d+).(%d+)')
local ref_major, ref_minor, ref_patch = string.match(reference_ver_str, '^(%d+).(%d+).(%d+)')
local major, minor, patch = to_number_multiple(string.match(ver_str, '^(%d+).(%d+).(%d+)'))
local ref_major, ref_minor, ref_patch = to_number_multiple(string.match(reference_ver_str, '^(%d+).(%d+).(%d+)'))

if ( major < ref_major ) or ( major == ref_major and minor < ref_minor) or
( major == ref_major and minor == ref_minor and patch < ref_patch) then
Expand All @@ -75,8 +79,8 @@ function utils.is_version_less(ver_str, reference_ver_str)
end

function utils.is_version_greater(ver_str, reference_ver_str)
local major, minor, patch = string.match(ver_str, '^(%d+).(%d+).(%d+)')
local ref_major, ref_minor, ref_patch = string.match(reference_ver_str, '^(%d+).(%d+).(%d+)')
local major, minor, patch = to_number_multiple(string.match(ver_str, '^(%d+).(%d+).(%d+)'))
local ref_major, ref_minor, ref_patch = to_number_multiple(string.match(reference_ver_str, '^(%d+).(%d+).(%d+)'))

if ( major > ref_major ) or ( major == ref_major and minor > ref_minor) or
( major == ref_major and minor == ref_minor and patch > ref_patch) then
Expand Down