Skip to content

Conversation

bmastbergen
Copy link
Owner

No description provided.

@bmastbergen bmastbergen force-pushed the bmastbergen_ciqcbr7_9-kabi-test branch from 32cca66 to 06a09e1 Compare August 12, 2025 14:15
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
JIRA: https://issues.redhat.com/browse/RHEL-95318

commit c5b6aba
Author: Maxim Levitsky <[email protected]>
Date:   Mon May 12 14:04:02 2025 -0400

    locking/mutex: implement mutex_trylock_nested

    Despite the fact that several lockdep-related checks are skipped when
    calling trylock* versions of the locking primitives, for example
    mutex_trylock, each time the mutex is acquired, a held_lock is still
    placed onto the lockdep stack by __lock_acquire() which is called
    regardless of whether the trylock* or regular locking API was used.

    This means that if the caller successfully acquires more than
    MAX_LOCK_DEPTH locks of the same class, even when using mutex_trylock,
    lockdep will still complain that the maximum depth of the held lock stack
    has been reached and disable itself.

    For example, the following error currently occurs in the ARM version
    of KVM, once the code tries to lock all vCPUs of a VM configured with more
    than MAX_LOCK_DEPTH vCPUs, a situation that can easily happen on modern
    systems, where having more than 48 CPUs is common, and it's also common to
    run VMs that have vCPU counts approaching that number:

    [  328.171264] BUG: MAX_LOCK_DEPTH too low!
    [  328.175227] turning off the locking correctness validator.
    [  328.180726] Please attach the output of /proc/lock_stat to the bug report
    [  328.187531] depth: 48  max: 48!
    [  328.190678] 48 locks held by qemu-kvm/11664:
    [  328.194957]  #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0
    [  328.204048]  #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.212521]  #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.220991]  #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.229463]  #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.237934]  #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.246405]  #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0

    Luckily, in all instances that require locking all vCPUs, the
    'kvm->lock' is taken a priori, and that fact makes it possible to use
    the little known feature of lockdep, called a 'nest_lock', to avoid this
    warning and subsequent lockdep self-disablement.

    The action of 'nested lock' being provided to lockdep's lock_acquire(),
    causes the lockdep to detect that the top of the held lock stack contains
    a lock of the same class and then increment its reference counter instead
    of pushing a new held_lock item onto that stack.

    See __lock_acquire for more information.

    Signed-off-by: Maxim Levitsky <[email protected]>
    Acked-by: Peter Zijlstra (Intel) <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>

Signed-off-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
JIRA: https://issues.redhat.com/browse/RHEL-95318

commit b586c5d
Author: Maxim Levitsky <[email protected]>
Date:   Mon May 12 14:04:06 2025 -0400

    KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs

    Use kvm_trylock_all_vcpus instead of a custom implementation when locking
    all vCPUs of a VM, to avoid triggering a lockdep warning, in the case in
    which the VM is configured to have more than MAX_LOCK_DEPTH vCPUs.

    This fixes the following false lockdep warning:

    [  328.171264] BUG: MAX_LOCK_DEPTH too low!
    [  328.175227] turning off the locking correctness validator.
    [  328.180726] Please attach the output of /proc/lock_stat to the bug report
    [  328.187531] depth: 48  max: 48!
    [  328.190678] 48 locks held by qemu-kvm/11664:
    [  328.194957]  #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0
    [  328.204048]  #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.212521]  #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.220991]  #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.229463]  #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.237934]  #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.246405]  #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0

    Suggested-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Maxim Levitsky <[email protected]>
    Acked-by: Marc Zyngier <[email protected]>
    Acked-by: Peter Zijlstra (Intel) <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>

Signed-off-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 20, 2025
JIRA: https://issues.redhat.com/browse/RHEL-71129

upstream
========
commit ea04fe1
Author: Aditya Bodkhe <[email protected]>
Date: Tue Apr 29 12:21:32 2025 +0530

description
===========
pert script tests fails with segmentation fault as below:

  92: perf script tests:
  --- start ---
  test child forked, pid 103769
  DB test
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.012 MB /tmp/perf-test-script.7rbftEpOzX/perf.data (9 samples) ]
  /usr/libexec/perf-core/tests/shell/script.sh: line 35:
  103780 Segmentation fault      (core dumped)
  perf script -i "${perfdatafile}" -s "${db_test}"
  --- Cleaning up ---
  ---- end(-1) ----
  92: perf script tests                                               : FAILED!

Backtrace pointed to :
	#0  0x0000000010247dd0 in maps.machine ()
	#1  0x00000000101d178c in db_export.sample ()
	#2  0x00000000103412c8 in python_process_event ()
	#3  0x000000001004eb28 in process_sample_event ()
	#4  0x000000001024fcd0 in machines.deliver_event ()
	#5  0x000000001025005c in perf_session.deliver_event ()
	#6  0x00000000102568b0 in __ordered_events__flush.part.0 ()
	ctrliq#7  0x0000000010251618 in perf_session.process_events ()
	ctrliq#8  0x0000000010053620 in cmd_script ()
	ctrliq#9  0x00000000100b5a28 in run_builtin ()
	ctrliq#10 0x00000000100b5f94 in handle_internal_command ()
	ctrliq#11 0x0000000010011114 in main ()

Further investigation reveals that this occurs in the `perf script tests`,
because it uses `db_test.py` script. This script sets `perf_db_export_mode = True`.

With `perf_db_export_mode` enabled, if a sample originates from a hypervisor,
perf doesn't set maps for "[H]" sample in the code. Consequently, `al->maps` remains NULL
when `maps__machine(al->maps)` is called from `db_export__sample`.

As al->maps can be NULL in case of Hypervisor samples , use thread->maps
because even for Hypervisor sample, machine should exist.
If we don't have machine for some reason, return -1 to avoid segmentation fault.

    Reported-by: Disha Goel <[email protected]>
    Signed-off-by: Aditya Bodkhe <[email protected]>
    Reviewed-by: Adrian Hunter <[email protected]>
    Tested-by: Disha Goel <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Adrian Hunter <[email protected]>
    Signed-off-by: Namhyung Kim <[email protected]>

Signed-off-by: Jakub Brnak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant