fix: ensure integer comparison for disk fsync latency in etcd validation#3085
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: redscholar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
e793680 to
43840dc
Compare
43840dc to
f052fdf
Compare
Signed-off-by: redscholar <blacktiledhouse@gmail.com>
f052fdf to
67fee65
Compare
There was a problem hiding this comment.
Code Review
This pull request updates the ETCD disk fsync latency assertion by adding integer casting to the comparison. The review highlights two critical issues: the task was incorrectly moved out of its conditional block, which may lead to template errors if dependencies are missing, and the comparison logic uses 'ge' instead of 'le', meaning the check currently fails when latency is acceptable and passes when it exceeds the maximum allowed threshold.
| - name: ETCD | Assert disk fsync latency meets requirements | ||
| assert: | ||
| that: (index (.fio_result.stdout.jobs | first) "sync" "lat_ns" "percentile" "90.000000") | ge .cluster_require.etcd_disk_wal_fysnc_duration_seconds | ||
| that: (index (.fio_result.stdout.jobs | first) "sync" "lat_ns" "percentile" "90.000000") | int | ge (int .cluster_require.etcd_disk_wal_fysnc_duration_seconds) | ||
| fail_msg: >- | ||
| The 90th percentile fsync latency is {{ index (.fio_result.stdout.jobs | first) "sync" "lat_ns" "percentile" "90.000000" }}ns, which exceeds the maximum allowed: {{ .cluster_require.etcd_disk_wal_fysnc_duration_seconds }}ns. |
There was a problem hiding this comment.
There are two critical issues introduced or maintained in this change:
- Indentation/Logic Flow: The assertion task has been moved out of the conditional
block(line 38). Since it depends on.fio_result(registered at line 43), it will now cause a template error iffiois not installed or the test is skipped. It should remain indented within the inner block to ensure it only runs when the test results are available. - Comparison Logic: The use of
ge(greater than or equal) is incorrect for a performance check where a maximum latency is specified. The assertion currently passes if the latency is higher than the limit and fails if it is lower. It should usele(less than or equal) to ensure the measured latency does not exceed the threshold.
While the addition of integer casting correctly addresses the type mismatch issue, these structural and logic errors must be resolved.
- name: ETCD | Assert disk fsync latency meets requirements
assert:
that: (index (.fio_result.stdout.jobs | first) "sync" "lat_ns" "percentile" "90.000000") | int | le (int .cluster_require.etcd_disk_wal_fysnc_duration_seconds)
fail_msg: >-
The 90th percentile fsync latency is {{ index (.fio_result.stdout.jobs | first) "sync" "lat_ns" "percentile" "90.000000" }}ns, which exceeds the maximum allowed: {{ .cluster_require.etcd_disk_wal_fysnc_duration_seconds }}ns.
|



What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for reviewers:
Does this PR introduced a user-facing change?
Additional documentation, usage docs, etc.: