[rocky8_10] History rebuild for kernel-4.18.0-553.58.1.el8_10 #384

PlaidCat · 2025-07-01T13:06:30Z

General Process:

Download all unprocessed src.rpm
for each src,pm
- Find all commits in changelog up to last known tag ... in this case 4.18.0-553
- Re-play commits in revese order (oldest in change log to newest) with git cherry-pick
- After replay replace ENTIRE code in branch with rpmbuild -bp from corresponding src.rpm.
- Tag Rebuild branch
Do local build with https://github.com/ctrliq/kernel-src-tree/wiki/Kernel-Make,-KABI,-Install,-and-Reboot-script

Checking Rebuild Commits for Potentially missing commits:

[jmaple@devbox kernel-src-tree]$ cat ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 553283
Number of commits in rpm: 59
Number of commits matched with upstream: 52 (88.14%)
Number of commits in upstream but not in rpm: 553231
Number of commits NOT found in upstream: 7 (11.86%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.58.1.el8_10 for kernel-4.18.0-553.58.1.el8_10
Clean Cherry Picks: 32 (61.54%)
Empty Cherry Picks: 20 (38.46%)
_______________________________

__EMPTY COMMITS__________________________
0d48566d4b58946c8e1b0baac0347616060a81c9 s390/pci: rename lock member in struct zpci_dev
bcb5d6c769039c8358a2359e7c3ea5d97ce93108 s390/pci: introduce lock to synchronize state of zpci_dev's
6ee600bfbe0f818ffb7748d99e9b0c89d0d9f02a s390/pci: remove hotplug slot when releasing the device
c4a585e952ca403a370586d3f16e8331a7564901 s390/pci: Fix potential double remove of hotplug slot
05a2538f2b48500cf4e8a0a0ce76623cc5bafcf1 s390/pci: Fix duplicate pci_dev_put() in disable_slot() when PF has child VFs
d76f9633296785343d45f85199f4138cb724b6d2 s390/pci: Remove redundant bus removal and disable from zpci_release_device()
47c397844869ad0e6738afb5879c7492f4691122 s390/pci: Prevent self deletion in disable_slot()
4b1815a52d7eb03b3e0e6742c6728bc16a4b2d1d s390/pci: Allow re-add of a reserved but not yet removed device
774a1fa880bc949d88b5ddec9494a13be733dfa8 s390/pci: Serialize device addition and removal
6e9b01909a811555ff3326cf80a5847169c57806 net: remove gfp_mask from napi_alloc_skb()
4309363f19598999b25a1e55fccf688daa4cc220 idpf: remove legacy Page Pool Ethtool stats
e4891e4687c8dd136d80d6c1b857a02931ed6fc8 idpf: split &idpf_queue into 4 strictly-typed queue structures
bf9bf7042a38ebd2485592467772db50605bd4a2 idpf: avoid bloating &idpf_q_vector with big %NR_CPUS
14f662b43bf8c765114f73d184af2702b2280436 idpf: merge singleq and splitq &net_device_ops
f771314d6b75181de7079c3c7d666293e4ed2b22 idpf: compile singleq code only under default-n CONFIG_IDPF_SINGLEQ
3cc88e8405b8d55e0ff035e31971aadd6baee2b6 idpf: fix memleak in vport interrupt configuration
e4b398dd82f5d5867bc5f442c43abc8fba30ed2c idpf: fix netdev Tx queue stop/wake
407e0efdf8baf1672876d5948b75049860a93e59 idpf: fix idpf_vport_splitq_napi_poll()
7292af042bcf22e2c18b96ed250f78498a5b28ab idpf: fix a race in txq wakeup
680811c67906191b237bbafe7dabbbad64649b39 idpf: check error for register_netdev() on init

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values
redhat/configs: set CONFIG_IDPF_SINGLEQ as disabled

BUILD

[jmaple@devbox code]$ egrep -B 5 -A 5 "\[TIMER\]|^Starting Build" kbuild.resf_kernel-4.18.0-553.58.1.el8_10.log
/mnt/code/kernel-src-tree
no .config file found, moving on
[TIMER]{MRPROPER}: 0s
x86_64 architecture detected, copying config
'configs/kernel-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky8_10_rebuild-2e416d167715"
Making olddefconfig
--
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
--
  LD [M]  sound/usb/usx2y/snd-usb-usx2y.ko
  LD [M]  sound/virtio/virtio_snd.ko
  LD [M]  sound/x86/snd-hdmi-lpe-audio.ko
  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1537s
Making Modules
  INSTALL arch/x86/crypto/blowfish-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx2.ko
  INSTALL arch/x86/crypto/camellia-x86_64.ko
--
  INSTALL sound/virtio/virtio_snd.ko
  INSTALL sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL sound/xen/snd_xen_front.ko
  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-rocky8_10_rebuild-2e416d167715+
[TIMER]{MODULES}: 13s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-rocky8_10_rebuild-2e416d167715+ arch/x86/boot/bzImage \
	System.map "/boot"
[TIMER]{INSTALL}: 20s
Checking kABI
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-rocky8_10_rebuild-2e416d167715+ and Index to 0
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 0s
[TIMER]{BUILD}: 1537s
[TIMER]{MODULES}: 13s
[TIMER]{INSTALL}: 20s
[TIMER]{TOTAL} 1575s
Rebooting in 10 seconds

KBuild

[jmaple@devbox code]$ ls ~/workspace/vms/r8-sigcloud/builder/code/kselftest.4.18.0-rocky8_10_rebuild-6f9106f46020+.log kselftest.4.18.0-rocky8_10_rebuild-2e416d167715+.log | while read line; do echo $line; grep '^ok ' $line | wc -l; done
/home/jmaple/workspace/vms/r8-sigcloud/builder/code/kselftest.4.18.0-rocky8_10_rebuild-6f9106f46020+.log
205
kselftest.4.18.0-rocky8_10_rebuild-2e416d167715+.log
206

jira LE-3467 cve CVE-2024-50301 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Chen Ridong <[email protected]> commit 4a74da0 KASAN reports an out of bounds read: BUG: KASAN: slab-out-of-bounds in __kuid_val include/linux/uidgid.h:36 BUG: KASAN: slab-out-of-bounds in uid_eq include/linux/uidgid.h:63 [inline] BUG: KASAN: slab-out-of-bounds in key_task_permission+0x394/0x410 security/keys/permission.c:54 Read of size 4 at addr ffff88813c3ab618 by task stress-ng/4362 CPU: 2 PID: 4362 Comm: stress-ng Not tainted 5.10.0-14930-gafbffd6c3ede #15 Call Trace: __dump_stack lib/dump_stack.c:82 [inline] dump_stack+0x107/0x167 lib/dump_stack.c:123 print_address_description.constprop.0+0x19/0x170 mm/kasan/report.c:400 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560 kasan_report+0x3a/0x50 mm/kasan/report.c:585 __kuid_val include/linux/uidgid.h:36 [inline] uid_eq include/linux/uidgid.h:63 [inline] key_task_permission+0x394/0x410 security/keys/permission.c:54 search_nested_keyrings+0x90e/0xe90 security/keys/keyring.c:793 This issue was also reported by syzbot. It can be reproduced by following these steps(more details [1]): 1. Obtain more than 32 inputs that have similar hashes, which ends with the pattern '0xxxxxxxe6'. 2. Reboot and add the keys obtained in step 1. The reproducer demonstrates how this issue happened: 1. In the search_nested_keyrings function, when it iterates through the slots in a node(below tag ascend_to_node), if the slot pointer is meta and node->back_pointer != NULL(it means a root), it will proceed to descend_to_node. However, there is an exception. If node is the root, and one of the slots points to a shortcut, it will be treated as a keyring. 2. Whether the ptr is keyring decided by keyring_ptr_is_keyring function. However, KEYRING_PTR_SUBTYPE is 0x2UL, the same as ASSOC_ARRAY_PTR_SUBTYPE_MASK. 3. When 32 keys with the similar hashes are added to the tree, the ROOT has keys with hashes that are not similar (e.g. slot 0) and it splits NODE A without using a shortcut. When NODE A is filled with keys that all hashes are xxe6, the keys are similar, NODE A will split with a shortcut. Finally, it forms the tree as shown below, where slot 6 points to a shortcut. NODE A +------>+---+ ROOT | | 0 | xxe6 +---+ | +---+ xxxx | 0 | shortcut : : xxe6 +---+ | +---+ xxe6 : : | | | xxe6 +---+ | +---+ | 6 |---+ : : xxe6 +---+ +---+ xxe6 : : | f | xxe6 +---+ +---+ xxe6 | f | +---+ 4. As mentioned above, If a slot(slot 6) of the root points to a shortcut, it may be mistakenly transferred to a key*, leading to a read out-of-bounds read. To fix this issue, one should jump to descend_to_node if the ptr is a shortcut, regardless of whether the node is root or not. [1] https://lore.kernel.org/linux-kernel/[email protected]/ [jarkko: tweaked the commit message a bit to have an appropriate closes tag.] Fixes: b2a4df2 ("KEYS: Expand the capacity of a keyring") Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/T/ Signed-off-by: Chen Ridong <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]> (cherry picked from commit 4a74da0) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 cve CVE-2022-48919 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Ronnie Sahlberg <[email protected]> commit 3d6cc98 When cifs_get_root() fails during cifs_smb3_do_mount() we call deactivate_locked_super() which eventually will call delayed_free() which will free the context. In this situation we should not proceed to enter the out: section in cifs_smb3_do_mount() and free the same resources a second time. [Thu Feb 10 12:59:06 2022] BUG: KASAN: use-after-free in rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] Read of size 8 at addr ffff888364f4d110 by task swapper/1/0 [Thu Feb 10 12:59:06 2022] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G OE 5.17.0-rc3+ #4 [Thu Feb 10 12:59:06 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019 [Thu Feb 10 12:59:06 2022] Call Trace: [Thu Feb 10 12:59:06 2022] <IRQ> [Thu Feb 10 12:59:06 2022] dump_stack_lvl+0x5d/0x78 [Thu Feb 10 12:59:06 2022] print_address_description.constprop.0+0x24/0x150 [Thu Feb 10 12:59:06 2022] ? rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] kasan_report.cold+0x7d/0x117 [Thu Feb 10 12:59:06 2022] ? rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] __asan_load8+0x86/0xa0 [Thu Feb 10 12:59:06 2022] rcu_cblist_dequeue+0x32/0x60 [Thu Feb 10 12:59:06 2022] rcu_core+0x547/0xca0 [Thu Feb 10 12:59:06 2022] ? call_rcu+0x3c0/0x3c0 [Thu Feb 10 12:59:06 2022] ? __this_cpu_preempt_check+0x13/0x20 [Thu Feb 10 12:59:06 2022] ? lock_is_held_type+0xea/0x140 [Thu Feb 10 12:59:06 2022] rcu_core_si+0xe/0x10 [Thu Feb 10 12:59:06 2022] __do_softirq+0x1d4/0x67b [Thu Feb 10 12:59:06 2022] __irq_exit_rcu+0x100/0x150 [Thu Feb 10 12:59:06 2022] irq_exit_rcu+0xe/0x30 [Thu Feb 10 12:59:06 2022] sysvec_hyperv_stimer0+0x9d/0xc0 ... [Thu Feb 10 12:59:07 2022] Freed by task 58179: [Thu Feb 10 12:59:07 2022] kasan_save_stack+0x26/0x50 [Thu Feb 10 12:59:07 2022] kasan_set_track+0x25/0x30 [Thu Feb 10 12:59:07 2022] kasan_set_free_info+0x24/0x40 [Thu Feb 10 12:59:07 2022] ____kasan_slab_free+0x137/0x170 [Thu Feb 10 12:59:07 2022] __kasan_slab_free+0x12/0x20 [Thu Feb 10 12:59:07 2022] slab_free_freelist_hook+0xb3/0x1d0 [Thu Feb 10 12:59:07 2022] kfree+0xcd/0x520 [Thu Feb 10 12:59:07 2022] cifs_smb3_do_mount+0x149/0xbe0 [cifs] [Thu Feb 10 12:59:07 2022] smb3_get_tree+0x1a0/0x2e0 [cifs] [Thu Feb 10 12:59:07 2022] vfs_get_tree+0x52/0x140 [Thu Feb 10 12:59:07 2022] path_mount+0x635/0x10c0 [Thu Feb 10 12:59:07 2022] __x64_sys_mount+0x1bf/0x210 [Thu Feb 10 12:59:07 2022] do_syscall_64+0x5c/0xc0 [Thu Feb 10 12:59:07 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae [Thu Feb 10 12:59:07 2022] Last potentially related work creation: [Thu Feb 10 12:59:07 2022] kasan_save_stack+0x26/0x50 [Thu Feb 10 12:59:07 2022] __kasan_record_aux_stack+0xb6/0xc0 [Thu Feb 10 12:59:07 2022] kasan_record_aux_stack_noalloc+0xb/0x10 [Thu Feb 10 12:59:07 2022] call_rcu+0x76/0x3c0 [Thu Feb 10 12:59:07 2022] cifs_umount+0xce/0xe0 [cifs] [Thu Feb 10 12:59:07 2022] cifs_kill_sb+0xc8/0xe0 [cifs] [Thu Feb 10 12:59:07 2022] deactivate_locked_super+0x5d/0xd0 [Thu Feb 10 12:59:07 2022] cifs_smb3_do_mount+0xab9/0xbe0 [cifs] [Thu Feb 10 12:59:07 2022] smb3_get_tree+0x1a0/0x2e0 [cifs] [Thu Feb 10 12:59:07 2022] vfs_get_tree+0x52/0x140 [Thu Feb 10 12:59:07 2022] path_mount+0x635/0x10c0 [Thu Feb 10 12:59:07 2022] __x64_sys_mount+0x1bf/0x210 [Thu Feb 10 12:59:07 2022] do_syscall_64+0x5c/0xc0 [Thu Feb 10 12:59:07 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae Reported-by: Shyam Prasad N <[email protected]> Reviewed-by: Shyam Prasad N <[email protected]> Signed-off-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]> (cherry picked from commit 3d6cc98) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Paulo Alcantara <[email protected]> commit 12c30f3 This fixes the following warning reported by kernel test robot fs/smb/client/cifsfs.c:982 cifs_smb3_do_mount() warn: possible memory leak of 'cifs_sb' Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]> (cherry picked from commit 12c30f3) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Gerd Bayer <[email protected]> commit 0d48566 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/0d48566d.failed Since this guards only the Function Measurement Block, rename from generic lock to fmb_lock in preparation to introduce another lock that guards the state member Signed-off-by: Gerd Bayer <[email protected]> Reviewed-by: Niklas Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit 0d48566) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # arch/s390/pci/pci.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Gerd Bayer <[email protected]> commit bcb5d6c Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/bcb5d6c7.failed There's a number of tasks that need the state of a zpci device to be stable. Other tasks need to be synchronized as they change the state. State changes could be generated by the system as availability or error events, or be requested by the user through manipulations in sysfs. Some other actions accessible through sysfs - like device resets - need the state to be stable. Unsynchronized state handling could lead to unusable devices. This has been observed in cases of concurrent state changes through systemd udev rules and DPM boot control. Some breakage can be provoked by artificial tests, e.g. through repetitively injecting "recover" on a PCI function through sysfs while running a "hotplug remove/add" in a loop through a PCI slot's "power" attribute in sysfs. After a few iterations this could result in a kernel oops. So introduce a new mutex "state_lock" to guard the state property of the struct zpci_dev. Acquire this lock in all task that modify the state: - hotplug add and remove, through the PCI hotplug slot entry, - avaiability events, as reported by the platform, - error events, as reported by the platform, - during device resets, explicit through sysfs requests or implict through the common PCI layer. Break out an inner _do_recover() routine out of recover_store() to separte the necessary synchronizations from the actual manipulations of the zpci_dev required for the reset. With the following changes I was able to run the inject loops for hours without hitting an error. Signed-off-by: Gerd Bayer <[email protected]> Reviewed-by: Niklas Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit bcb5d6c) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # arch/s390/pci/pci.c # arch/s390/pci/pci_sysfs.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Gerd Bayer <[email protected]> commit 6ee600b Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/6ee600bf.failed Centralize the removal so all paths are covered and the hotplug slot will remain active until the device is really destroyed. Signed-off-by: Gerd Bayer <[email protected]> Reviewed-by: Niklas Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit 6ee600b) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # arch/s390/pci/pci.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Niklas Schnelle <[email protected]> commit c4a585e Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/c4a585e9.failed In commit 6ee600b ("s390/pci: remove hotplug slot when releasing the device") the zpci_exit_slot() was moved from zpci_device_reserved() to zpci_release_device() with the intention of keeping the hotplug slot around until the device is actually removed. Now zpci_release_device() is only called once all references are dropped. Since the zPCI subsystem only drops its reference once the device is in the reserved state it follows that zpci_release_device() must only deal with devices in the reserved state. Despite that it contains code to tear down from both configured and standby state. For the standby case this already includes the removal of the hotplug slot so would cause a double removal if a device was ever removed in either configured or standby state. Instead of causing a potential double removal in a case that should never happen explicitly WARN_ON() if a device in non-reserved state is released and get rid of the dead code cases. Fixes: 6ee600b ("s390/pci: remove hotplug slot when releasing the device") Reviewed-by: Matthew Rosato <[email protected]> Reviewed-by: Gerd Bayer <[email protected]> Tested-by: Gerd Bayer <[email protected]> Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit c4a585e) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # arch/s390/pci/pci.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Niklas Schnelle <[email protected]> commit 42420c5 The zpci_create_device() function returns an error pointer that needs to be checked before dereferencing it as a struct zpci_dev pointer. Add the missing check in __clp_add() where it was missed when adding the scan_list in the fixed commit. Simply not adding the device to the scan list results in the previous behavior. Cc: [email protected] Fixes: 0467cdd ("s390/pci: Sort PCI functions prior to creating virtual busses") Signed-off-by: Niklas Schnelle <[email protected]> Reviewed-by: Gerd Bayer <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit 42420c5) Signed-off-by: Jonathan Maple <[email protected]>

…hild VFs jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Niklas Schnelle <[email protected]> commit 05a2538 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/05a2538f.failed With commit bcb5d6c ("s390/pci: introduce lock to synchronize state of zpci_dev's") the code to ignore power off of a PF that has child VFs was changed from a direct return to a goto to the unlock and pci_dev_put() section. The change however left the existing pci_dev_put() untouched resulting in a doubple put. This can subsequently cause a use after free if the struct pci_dev is released in an unexpected state. Fix this by removing the extra pci_dev_put(). Cc: [email protected] Fixes: bcb5d6c ("s390/pci: introduce lock to synchronize state of zpci_dev's") Signed-off-by: Niklas Schnelle <[email protected]> Reviewed-by: Gerd Bayer <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit 05a2538) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/pci/hotplug/s390_pci_hpc.c

…device() jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Niklas Schnelle <[email protected]> commit d76f963 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/d76f9633.failed Remove zpci_bus_remove_device() and zpci_disable_device() calls from zpci_release_device(). These calls were done when the device transitioned into the ZPCI_FN_STATE_STANDBY state which is guaranteed to happen before it enters the ZPCI_FN_STATE_RESERVED state. When zpci_release_device() is called the device is known to be in the ZPCI_FN_STATE_RESERVED state which is also checked by a WARN_ON(). Cc: [email protected] Fixes: a46044a ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <[email protected]> Reviewed-by: Julian Ruess <[email protected]> Tested-by: Gerd Bayer <[email protected]> Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit d76f963) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # arch/s390/pci/pci.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Niklas Schnelle <[email protected]> commit 47c3978 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/47c39784.failed As disable_slot() takes a struct zpci_dev from the Configured to the Standby state. In Standby there is still a hotplug slot so this is not usually a case of sysfs self deletion. This is important because self deletion gets very hairy in terms of locking (see for example recover_store() in arch/s390/pci/pci_sysfs.c). Because the pci_dev_put() is not within the critical section of the zdev->state_lock however, disable_slot() can turn into a case of self deletion if zPCI device event handling slips between the mutex_unlock() and the pci_dev_put(). If the latter is the last put and zpci_release_device() is called this then tries to remove the hotplug slot via zpci_exit_slot() which will try to remove the hotplug slot directory the disable_slot() is part of i.e. self deletion. Prevent this by widening the zdev->state_lock critical section to include the pci_dev_put() which is then guaranteed to happen with the struct zpci_dev still in Standby state ensuring it will not lead to a zpci_release_device() call as at least the zPCI event handling code still holds a reference. Cc: [email protected] Fixes: a46044a ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <[email protected]> Tested-by: Gerd Bayer <[email protected]> Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit 47c3978) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/pci/hotplug/s390_pci_hpc.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Niklas Schnelle <[email protected]> commit 4b1815a Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/4b1815a5.failed The architecture assumes that PCI functions can be removed synchronously as PCI events are processed. This however clashes with the reference counting of struct pci_dev which allows device drivers to hold on to a struct pci_dev reference even as the underlying device is removed. To bridge this gap commit 2a671f7 ("s390/pci: fix use after free of zpci_dev") keeps the struct zpci_dev in ZPCI_FN_STATE_RESERVED state until common code releases the struct pci_dev. Only when all references are dropped, the struct zpci_dev can be removed and freed. Later commit a46044a ("s390/pci: fix zpci_zdev_put() on reserve") moved the deletion of the struct zpci_dev from the zpci_list in zpci_release_device() to the point where the device is reserved. This was done to prevent handling events for a device that is already being removed, e.g. when the platform generates both PCI event codes 0x304 and 0x308. In retrospect, deletion from the zpci_list in the release function without holding the zpci_list_lock was also racy. A side effect of this handling is that if the underlying device re-appears while the struct zpci_dev is in the ZPCI_FN_STATE_RESERVED state, the new and old instances of the struct zpci_dev and/or struct pci_dev may clash. For example when trying to create the IOMMU sysfs files for the new instance. In this case, re-adding the new instance is aborted. The old instance is removed, and the device will remain absent until the platform issues another event. Fix this by allowing the struct zpci_dev to be brought back up right until it is finally removed. To this end also keep the struct zpci_dev in the zpci_list until it is finally released when all references have been dropped. Deletion from the zpci_list from within the release function is made safe by using kref_put_lock() with the zpci_list_lock. This ensures that the releasing code holds the last reference. Cc: [email protected] Fixes: a46044a ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <[email protected]> Tested-by: Gerd Bayer <[email protected]> Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit 4b1815a) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # arch/s390/pci/pci.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Niklas Schnelle <[email protected]> commit 774a1fa Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/774a1fa8.failed Prior changes ensured that when zpci_release_device() is called and it removed the zdev from the zpci_list this instance can not be found via the zpci_list anymore even while allowing re-add of reserved devices. This only accounts for the overall lifetime and zpci_list addition and removal, it does not yet prevent concurrent add of a new instance for the same underlying device. Such concurrent add would subsequently cause issues such as attempted re-use of the same IOMMU sysfs directory and is generally undesired. Introduce a new zpci_add_remove_lock mutex to serialize adding a new device with removal. Together this ensures that if a struct zpci_dev is not found in the zpci_list it was either already removed and torn down, or its removal and tear down is in progress with the zpci_add_remove_lock held. Cc: [email protected] Fixes: a46044a ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <[email protected]> Tested-by: Gerd Bayer <[email protected]> Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit 774a1fa) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # arch/s390/pci/pci.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Alexander Lobakin <[email protected]> commit c00d33f To ease maintaining of virtchnl2.h, which already is messy enough, make it self-contained by adding missing if_ether.h include due to %ETH_ALEN usage. At the same time, virtchnl2_lan_desc.h is not used anywhere in the file, so move this include to idpf_txrx.h to speed up C preprocessing. Acked-by: Kees Cook <[email protected]> Acked-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit c00d33f) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Larysa Zaremba <[email protected]> commit 5e7695e Unlike ice, idpf does not check, if user has requested at least 1 combined channel. Instead, it relies on a check in the core code. Unfortunately, the check does not trigger for us because of the hacky .set_channels() interpretation logic that is not consistent with the core code. This naturally leads to user being able to trigger a crash with an invalid input. This is how: 1. ethtool -l <IFNAME> -> combined: 40 2. ethtool -L <IFNAME> rx 0 tx 0 combined number is not specified, so command becomes {rx_count = 0, tx_count = 0, combined_count = 40}. 3. ethnl_set_channels checks, if there is at least 1 RX and 1 TX channel, comparing (combined_count + rx_count) and (combined_count + tx_count) to zero. Obviously, (40 + 0) is greater than zero, so the core code deems the input OK. 4. idpf interprets `rx 0 tx 0` as 0 channels and tries to proceed with such configuration. The issue has to be solved fundamentally, as current logic is also known to cause AF_XDP problems in ice [0]. Interpret the command in a way that is more consistent with ethtool manual [1] (--show-channels and --set-channels) and new ice logic. Considering that in the idpf driver only the difference between RX and TX queues forms dedicated channels, change the correct way to set number of channels to: ethtool -L <IFNAME> combined 10 /* For symmetric queues */ ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */ [0] https://lore.kernel.org/netdev/[email protected]/ [1] https://man7.org/linux/man-pages/man8/ethtool.8.html Fixes: 02cbfba ("idpf: add ethtool callbacks") Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Igor Bagnucki <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Tested-by: Krishneil Singh <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> (cherry picked from commit 5e7695e) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Alexander Lobakin <[email protected]> commit d514c8b Currently, idpf enables NAPI and interrupts prior to allocating Rx buffers. This may lead to frame loss (there are no buffers to place incoming frames) and even crashes on quick ifup-ifdown. Interrupts must be enabled only after all the resources are here and available. Split interrupt init into two phases: initialization and enabling, and perform the second only after the queues are fully initialized. Note that we can't just move interrupt initialization down the init process, as the queues must have correct a ::q_vector pointer set and NAPI already added in order to allocate buffers correctly. Also, during the deinit process, disable HW interrupts first and only then disable NAPI. Otherwise, there can be a HW event leading to napi_schedule(), but the NAPI will already be unavailable. Fixes: d4d5587 ("idpf: initialize interrupts and enable vport") Reported-by: Michal Kubiak <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/20240523-net-2024-05-23-intel-net-fixes-v1-1-17a923e0bb5f@intel.com Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit d514c8b) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Alexander Lobakin <[email protected]> commit 66c27e3 In C, we have structures and unions. Casting `void *` via macros is not only error-prone, but also looks confusing and awful in general. In preparation for splitting the queue structs, replace it with a union and direct array dereferences. Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Mina Almasry <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 66c27e3) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 Rebuild_CHGLOG: - net: remove gfp_mask from napi_alloc_skb() [idpf] (Michal Schmidt) [RHEL-71182] Rebuild_FUZZ: 92.31% commit-author Jakub Kicinski <[email protected]> commit 6e9b019 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/6e9b0190.failed __napi_alloc_skb() is napi_alloc_skb() with the added flexibility of choosing gfp_mask. This is a NAPI function, so GFP_ATOMIC is implied. The only practical choice the caller has is whether to set __GFP_NOWARN. But that's a false choice, too, allocation failures in atomic context will happen, and printing warnings in logs, effectively for a packet drop, is both too much and very likely non-actionable. This leads me to a conclusion that most uses of napi_alloc_skb() are simply misguided, and should use __GFP_NOWARN in the first place. We also have a "standard" way of reporting allocation failures via the queue stat API (qstats::rx-alloc-fail). The direct motivation for this patch is that one of the drivers used at Meta calls napi_alloc_skb() (so prior to this patch without __GFP_NOWARN), and the resulting OOM warning is the top networking warning in our fleet. Reviewed-by: Alexander Lobakin <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 6e9b019) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # Documentation/translations/zh_CN/mm/page_frags.rst # Documentation/vm/page_frags.rst # drivers/net/ethernet/intel/ice/ice_txrx.c # drivers/net/ethernet/stmicro/stmmac/stmmac_main.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Alexander Lobakin <[email protected]> commit 4309363 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/4309363f.failed Page Pool Ethtool stats are deprecated since the Netlink Page Pool interface introduction. idpf receives big changes in Rx buffer management, including &page_pool layout, so keeping these deprecated stats does only harm, not speaking of that CONFIG_IDPF selects CONFIG_PAGE_POOL_STATS unconditionally, while the latter is often turned off for better performance. Remove all the references to PP stats from the Ethtool code. The stats are still available in their full via the generic Netlink interface. Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 4309363) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/intel/idpf/Kconfig # drivers/net/ethernet/intel/idpf/idpf_ethtool.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Alexander Lobakin <[email protected]> commit e4891e4 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/e4891e46.failed Currently, sizeof(struct idpf_queue) is 32 Kb. This is due to the 12-bit hashtable declaration at the end of the queue. This HT is needed only for Tx queues when the flow scheduling mode is enabled. But &idpf_queue is unified for all of the queue types, provoking excessive memory usage. The unified structure in general makes the code less effective via suboptimal fields placement. You can't avoid that unless you make unions each 2 fields. Even then, different field alignment etc., doesn't allow you to optimize things to the limit. Split &idpf_queue into 4 structures corresponding to the queue types: RQ (Rx queue), SQ (Tx queue), FQ (buffer queue), and CQ (completion queue). Place only needed fields there and shortcuts handy for hotpath. Allocate the abovementioned hashtable dynamically and only when needed, keeping &idpf_tx_queue relatively short (192 bytes, same as Rx). This HT is used only for OOO completions, which aren't really hotpath anyway. Note that this change must be done atomically, otherwise it's really easy to get lost and miss something. Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit e4891e4) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/intel/idpf/idpf.h # drivers/net/ethernet/intel/idpf/idpf_ethtool.c # drivers/net/ethernet/intel/idpf/idpf_txrx.c # drivers/net/ethernet/intel/idpf/idpf_txrx.h

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Alexander Lobakin <[email protected]> commit bf9bf70 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/bf9bf704.failed With CONFIG_MAXSMP, sizeof(cpumask_t) is 1 Kb. The queue vector structure has them embedded, which means 1 additional Kb of not really hotpath data. We have cpumask_var_t, which is either an embedded cpumask or a pointer for allocating it dynamically when it's big. Use it instead of plain cpumasks and put &idpf_q_vector on a good diet. Also remove redundant pointer to the interrupt name from the structure. request_irq() saves it and free_irq() returns it on deinit, so that you can free the memory. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit bf9bf70) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/intel/idpf/idpf_txrx.c # drivers/net/ethernet/intel/idpf/idpf_txrx.h

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Alexander Lobakin <[email protected]> commit 14f662b Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/14f662b4.failed It makes no sense to have a second &net_device_ops struct (800 bytes of rodata) with only one difference in .ndo_start_xmit, which can easily be just one `if`. This `if` is a drop in the ocean and you won't see any difference. Define unified idpf_xmit_start(). The preparation for sending is the same, just call either idpf_tx_splitq_frame() or idpf_tx_singleq_frame() depending on the active model to actually map and send the skb. Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 14f662b) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c # drivers/net/ethernet/intel/idpf/idpf_txrx.h

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Alexander Lobakin <[email protected]> commit f771314 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/f771314d.failed Currently, all HW supporting idpf supports the singleq model, but none of it advertises it by default, as splitq is supported and preferred for multiple reasons. Still, this almost dead code often times adds hotpath branches and redundant cacheline accesses. While it can't currently be removed, add CONFIG_IDPF_SINGLEQ and build the singleq code only when it's enabled manually. This corresponds to -10 Kb of object code size and a good bunch of hotpath checks. idpf_is_queue_model_split() works as a gate and compiles out to `true` when the config option is disabled. Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit f771314) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/intel/Kconfig

jira LE-3467 cve CVE-2024-44964 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Alexander Lobakin <[email protected]> commit f01032a The second tagged commit introduced a UAF, as it removed restoring q_vector->vport pointers after reinitializating the structures. This is due to that all queue allocation functions are performed here with the new temporary vport structure and those functions rewrite the backpointers to the vport. Then, this new struct is freed and the pointers start leading to nowhere. But generally speaking, the current logic is very fragile. It claims to be more reliable when the system is low on memory, but in fact, it consumes two times more memory as at the moment of running this function, there are two vports allocated with their queues and vectors. Moreover, it claims to prevent the driver from running into "bad state", but in fact, any error during the rebuild leaves the old vport in the partially allocated state. Finally, if the interface is down when the function is called, it always allocates a new queue set, but when the user decides to enable the interface later on, vport_open() allocates them once again, IOW there's a clear memory leak here. Just don't allocate a new queue set when performing a reset, that solves crashes and memory leaks. Readd the old queue number and reopen the interface on rollback - that solves limbo states when the device is left disabled and/or without HW queues enabled. Fixes: 02cbfba ("idpf: add ethtool callbacks") Fixes: e4891e4 ("idpf: split &idpf_queue into 4 strictly-typed queue structures") Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit f01032a) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Michal Kubiak <[email protected]> commit 3cc88e8 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/3cc88e84.failed The initialization of vport interrupt consists of two functions: 1) idpf_vport_intr_init() where a generic configuration is done 2) idpf_vport_intr_req_irq() where the irq for each q_vector is requested. The first function used to create a base name for each interrupt using "kasprintf()" call. Unfortunately, although that call allocated memory for a text buffer, that memory was never released. Fix this by removing creating the interrupt base name in 1). Instead, always create a full interrupt name in the function 2), because there is no need to create a base name separately, considering that the function 2) is never called out of idpf_vport_intr_init() context. Fixes: d4d5587 ("idpf: initialize interrupts and enable vport") Cc: [email protected] # 6.7 Signed-off-by: Michal Kubiak <[email protected]> Reviewed-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 3cc88e8) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/intel/idpf/idpf_txrx.c

jira LE-3467 cve CVE-2024-44932 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Alexander Lobakin <[email protected]> commit 290f1c0 The second tagged commit started sometimes (very rarely, but possible) throwing WARNs from net/core/page_pool.c:page_pool_disable_direct_recycling(). Turned out idpf frees interrupt vectors with embedded NAPIs *before* freeing the queues making page_pools' NAPI pointers lead to freed memory before these pools are destroyed by libeth. It's not clear whether there are other accesses to the freed vectors when destroying the queues, but anyway, we usually free queue/interrupt vectors only when the queues are destroyed and the NAPIs are guaranteed to not be referenced anywhere. Invert the allocation and freeing logic making queue/interrupt vectors be allocated first and freed last. Vectors don't require queues to be present, so this is safe. Additionally, this change allows to remove that useless queue->q_vector pointer cleanup, as vectors are still valid when freeing the queues (+ both are freed within one function, so it's not clear why nullify the pointers at all). Fixes: 1c325aa ("idpf: configure resources for TX queues") Fixes: 90912f9 ("idpf: convert header split mode to libeth + napi_build_skb()") Reported-by: Michal Kubiak <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 290f1c0) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Michal Kubiak <[email protected]> commit e4b398d Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/e4b398dd.failed netif_txq_maybe_stop() returns -1, 0, or 1, while idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result, there sometimes are Tx queue timeout warnings despite that the queue is empty or there is at least enough space to restart it. Make idpf_tx_maybe_stop_common() inline and returning true or false, handling the return of netif_txq_maybe_stop() properly. Use a correct goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or incrementing the stops counter twice. Fixes: 6818c4d ("idpf: add splitq start_xmit") Fixes: a5ab9ee ("idpf: add singleq start_xmit and napi poll") Cc: [email protected] # 6.7+ Signed-off-by: Michal Kubiak <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit e4b398d) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/intel/idpf/idpf_txrx.c # drivers/net/ethernet/intel/idpf/idpf_txrx.h

jira LE-3467 cve CVE-2024-50274 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Pavan Kumar Linga <[email protected]> commit 81d2fb4 When the device control plane is removed or the platform running device control plane is rebooted, a reset is detected on the driver. On driver reset, it releases the resources and waits for the reset to complete. If the reset fails, it takes the error path and releases the vport lock. At this time if the monitoring tools tries to access link settings, it call traces for accessing released vport pointer. To avoid it, move link_speed_mbps to netdev_priv structure which removes the dependency on vport pointer and the vport lock in idpf_get_link_ksettings. Also use netif_carrier_ok() to check the link status and adjust the offsetof to use link_up instead of link_speed_mbps. Fixes: 02cbfba ("idpf: add ethtool callbacks") Cc: [email protected] # 6.7+ Reviewed-by: Tarun K Singh <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 81d2fb4) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 cve CVE-2024-53064 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Pavan Kumar Linga <[email protected]> commit 9b58031 In an event where the platform running the device control plane is rebooted, reset is detected on the driver. It releases all the resources and waits for the reset to complete. Once the reset is done, it tries to build the resources back. At this time if the device control plane is not yet started, then the driver timeouts on the virtchnl message and retries to establish the mailbox again. In the retry flow, mailbox is deinitialized but the mailbox workqueue is still alive and polling for the mailbox message. This results in accessing the released control queue leading to null-ptr-deref. Fix it by unrolling the work queue cancellation and mailbox deinitialization in the reverse order which they got initialized. Fixes: 4930fbf ("idpf: add core init and interrupt request") Fixes: 34c21fa ("idpf: implement virtchnl transaction manager") Cc: [email protected] # 6.9+ Reviewed-by: Tarun K Singh <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 9b58031) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Joshua Hay <[email protected]> commit 52c11d3 On initial driver load, alloc_etherdev_mqs is called with whatever max queue values are provided by the control plane. However, if the driver is loaded on a system where num_online_cpus() returns less than the max queues, the netdev will think there are more queues than are actually available. Only num_online_cpus() will be allocated, but skb_get_queue_mapping(skb) could possibly return an index beyond the range of allocated queues. Consequently, the packet is silently dropped and it appears as if TX is broken. Set the real number of queues during open so the netdev knows how many queues will be allocated. Fixes: 1c325aa ("idpf: configure resources for TX queues") Signed-off-by: Joshua Hay <[email protected]> Reviewed-by: Madhu Chittim <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 52c11d3) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Emil Tantilov <[email protected]> commit 137da75 Restore the call to idpf_vc_xn_shutdown() at the beginning of idpf_vc_core_deinit() provided the function is not called on remove. In the reset path the mailbox is destroyed, leading to all transactions timing out. Fixes: 09d0fb5 ("idpf: deinit virtchnl transaction manager after vport and vectors") Reviewed-by: Larysa Zaremba <[email protected]> Signed-off-by: Emil Tantilov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 137da75) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Manoj Vishwanathan <[email protected]> commit d15fe4e The transaction salt was being accessed before acquiring the idpf_vc_xn_lock when idpf has to forward the virtchnl reply. Fixes: 34c21fa ("idpf: implement virtchnl transaction manager") Signed-off-by: Manoj Vishwanathan <[email protected]> Signed-off-by: David Decotigny <[email protected]> Signed-off-by: Brian Vazquez <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Pavan Kumar Linga <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit d15fe4e) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 cve CVE-2024-58057 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Marco Leogrande <[email protected]> commit 9a5b021 When a workqueue is created with `WQ_UNBOUND`, its work items are served by special worker-pools, whose host workers are not bound to any specific CPU. In the default configuration (i.e. when `queue_delayed_work` and friends do not specify which CPU to run the work item on), `WQ_UNBOUND` allows the work item to be executed on any CPU in the same node of the CPU it was enqueued on. While this solution potentially sacrifices locality, it avoids contention with other processes that might dominate the CPU time of the processor the work item was scheduled on. This is not just a theoretical problem: in a particular scenario misconfigured process was hogging most of the time from CPU0, leaving less than 0.5% of its CPU time to the kworker. The IDPF workqueues that were using the kworker on CPU0 suffered large completion delays as a result, causing performance degradation, timeouts and eventual system crash. Tested: * I have also run a manual test to gauge the performance improvement. The test consists of an antagonist process (`./stress --cpu 2`) consuming as much of CPU 0 as possible. This process is run under `taskset 01` to bind it to CPU0, and its priority is changed with `chrt -pQ 9900 10000 ${pid}` and `renice -n -20 ${pid}` after start. Then, the IDPF driver is forced to prefer CPU0 by editing all calls to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0. Finally, `ktraces` for the workqueue events are collected. Without the current patch, the antagonist process can force arbitrary delays between `workqueue_queue_work` and `workqueue_execute_start`, that in my tests were as high as `30ms`. With the current patch applied, the workqueue can be migrated to another unloaded CPU in the same node, and, keeping everything else equal, the maximum delay I could see was `6us`. Fixes: 0fe4546 ("idpf: add create vport and netdev configuration") Signed-off-by: Marco Leogrande <[email protected]> Signed-off-by: Manoj Vishwanathan <[email protected]> Signed-off-by: Brian Vazquez <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Pavan Kumar Linga <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 9a5b021) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Manoj Vishwanathan <[email protected]> commit d0ea9eb Add more information related to the transaction like cookie, vc_op, salt when transaction times out and include similar information when transaction salt does not match. Info output for transaction timeout: ------------------- (op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms) ------------------- before it was: ------------------- (op 5015, 60000ms) ------------------- Signed-off-by: Manoj Vishwanathan <[email protected]> Signed-off-by: Brian Vazquez <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Pavan Kumar Linga <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit d0ea9eb) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Sridhar Samudrala <[email protected]> commit 69ab25a Handle rsc packet with a single segment same as a multi segment rsc packet so that CHECKSUM_PARTIAL is set in the skb->ip_summed field. The current code is passing CHECKSUM_NONE resulting in TCP GRO layer doing checksum in SW and hiding the issue. This will fail when using dmabufs as payload buffers as skb frag would be unreadable. Fixes: 3a8845a ("idpf: add RX splitq napi poll support") Signed-off-by: Sridhar Samudrala <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 69ab25a) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 cve CVE-2025-21890 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Eric Dumazet <[email protected]> commit 674fcb4 idpf_rx_rsc() uses skb_transport_offset(skb) while the transport header is not set yet. This triggers the following warning for CONFIG_DEBUG_NET=y builds. DEBUG_NET_WARN_ON_ONCE(!skb_transport_header_was_set(skb)) [ 69.261620] WARNING: CPU: 7 PID: 0 at ./include/linux/skbuff.h:3020 idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261629] Modules linked in: vfat fat dummy bridge intel_uncore_frequency_tpmi intel_uncore_frequency_common intel_vsec_tpmi idpf intel_vsec cdc_ncm cdc_eem cdc_ether usbnet mii xhci_pci xhci_hcd ehci_pci ehci_hcd libeth [ 69.261644] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Tainted: G S W 6.14.0-smp-DEV #1697 [ 69.261648] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN [ 69.261650] RIP: 0010:idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261677] ? __warn (kernel/panic.c:242 kernel/panic.c:748) [ 69.261682] ? idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261687] ? report_bug (lib/bug.c:?) [ 69.261690] ? handle_bug (arch/x86/kernel/traps.c:285) [ 69.261694] ? exc_invalid_op (arch/x86/kernel/traps.c:309) [ 69.261697] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) [ 69.261700] ? __pfx_idpf_vport_splitq_napi_poll (drivers/net/ethernet/intel/idpf/idpf_txrx.c:4011) idpf [ 69.261704] ? idpf_vport_splitq_napi_poll (include/linux/skbuff.h:3020) idpf [ 69.261708] ? idpf_vport_splitq_napi_poll (drivers/net/ethernet/intel/idpf/idpf_txrx.c:3072) idpf [ 69.261712] __napi_poll (net/core/dev.c:7194) [ 69.261716] net_rx_action (net/core/dev.c:7265) [ 69.261718] ? __qdisc_run (net/sched/sch_generic.c:293) [ 69.261721] ? sched_clock (arch/x86/include/asm/preempt.h:84 arch/x86/kernel/tsc.c:288) [ 69.261726] handle_softirqs (kernel/softirq.c:561) Fixes: 3a8845a ("idpf: add RX splitq napi poll support") Signed-off-by: Eric Dumazet <[email protected]> Cc: Alan Brady <[email protected]> Cc: Joshua Hay <[email protected]> Cc: Willem de Bruijn <[email protected]> Acked-by: Przemek Kitszel <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 674fcb4) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 cve CVE-2025-22065 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Emil Tantilov <[email protected]> commit 4c9106f With SRIOV enabled, idpf ends up calling into idpf_remove() twice. First via idpf_shutdown() and then again when idpf_remove() calls into sriov_disable(), because the VF devices use the idpf driver, hence the same remove routine. When that happens, it is possible for the adapter to be NULL from the first call to idpf_remove(), leading to a NULL pointer dereference. echo 1 > /sys/class/net/<netif>/device/sriov_numvfs reboot BUG: kernel NULL pointer dereference, address: 0000000000000020 ... RIP: 0010:idpf_remove+0x22/0x1f0 [idpf] ... ? idpf_remove+0x22/0x1f0 [idpf] ? idpf_remove+0x1e4/0x1f0 [idpf] pci_device_remove+0x3f/0xb0 device_release_driver_internal+0x19f/0x200 pci_stop_bus_device+0x6d/0x90 pci_stop_and_remove_bus_device+0x12/0x20 pci_iov_remove_virtfn+0xbe/0x120 sriov_disable+0x34/0xe0 idpf_sriov_configure+0x58/0x140 [idpf] idpf_remove+0x1b9/0x1f0 [idpf] idpf_shutdown+0x12/0x30 [idpf] pci_device_shutdown+0x35/0x60 device_shutdown+0x156/0x200 ... Replace the direct idpf_remove() call in idpf_shutdown() with idpf_vc_core_deinit() and idpf_deinit_dflt_mbx(), which perform the bulk of the cleanup, such as stopping the init task, freeing IRQs, destroying the vports and freeing the mailbox. This avoids the calls to sriov_disable() in addition to a small netdev cleanup, and destroying workqueues, which don't seem to be required on shutdown. Reported-by: Yuying Ma <[email protected]> Fixes: e850efe ("idpf: add module register and probe functionality") Reviewed-by: Madhu Chittim <[email protected]> Signed-off-by: Emil Tantilov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 4c9106f) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Madhu Chittim <[email protected]> commit 713dd6c Split offloads into csum, tso and other offloads so that tunneled packets do not by default have all the offloads enabled. Stateless offloads for encapsulated packets are not yet supported in firmware/software but in the driver we were setting the features same as non encapsulated features. Fixed naming to clarify CSUM bits are being checked for Tx. Inherit netdev features to VLAN interfaces as well. Fixes: 0fe4546 ("idpf: add create vport and netdev configuration") Reviewed-by: Sridhar Samudrala <[email protected]> Signed-off-by: Madhu Chittim <[email protected]> Tested-by: Zachary Goldstein <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 713dd6c) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Michal Swiatkowski <[email protected]> commit 8a558cb In case of failing on rss_data->rss_key allocation the function is freeing vport without freeing earlier allocated q_vector_idxs. Fix it. Move from freeing in error branch to goto scheme. Fixes: d4d5587 ("idpf: initialize interrupts and enable vport") Reviewed-by: Pavan Kumar Linga <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Suggested-by: Pavan Kumar Linga <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 8a558cb) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Larysa Zaremba <[email protected]> commit ed375b1 Before the referenced commit, the shutdown just called idpf_remove(), this way IDPF_REMOVE_IN_PROG was protecting us from the serv_task rescheduling reset. Without this flag set the shutdown process is vulnerable to HW reset or any other triggering conditions (such as default mailbox being destroyed). When one of conditions checked in idpf_service_task becomes true, vc_event_task can be rescheduled during shutdown, this leads to accessing freed memory e.g. idpf_req_rel_vector_indexes() trying to read vport->q_vector_idxs. This in turn causes the system to become defunct during e.g. systemctl kexec. Considering using IDPF_REMOVE_IN_PROG would lead to more heavy shutdown process, instead just cancel the serv_task before cancelling adapter->serv_task before cancelling adapter->vc_event_task to ensure that reset will not be scheduled while we are doing a shutdown. Fixes: 4c9106f ("idpf: fix adapter NULL pointer dereference on reboot") Reviewed-by: Michal Swiatkowski <[email protected]> Signed-off-by: Larysa Zaremba <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Emil Tantilov <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit ed375b1) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Pavan Kumar Linga <[email protected]> commit 2dabe34 idpf_features_check is used to validate the TX packet. skb header length is compared with the hardware supported value received from the device control plane. The value is stored in the adapter structure and to access it, vport pointer is used. During reset all the vports are released and the vport pointer that the netdev private structure points to is NULL. To avoid null-ptr-deref, store the max header length value in netdev private structure. This also helps to cache the value and avoid accessing adapter pointer in hot path. BUG: kernel NULL pointer dereference, address: 0000000000000068 ... RIP: 0010:idpf_features_check+0x6d/0xe0 [idpf] Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x154/0x520 ? exc_page_fault+0x76/0x190 ? asm_exc_page_fault+0x26/0x30 ? idpf_features_check+0x6d/0xe0 [idpf] netif_skb_features+0x88/0x310 validate_xmit_skb+0x2a/0x2b0 validate_xmit_skb_list+0x4c/0x70 sch_direct_xmit+0x19d/0x3a0 __dev_queue_xmit+0xb74/0xe70 ... Fixes: a251eee ("idpf: add SRIOV support and other ndo_ops") Reviewed-by: Madhu Chititm <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 2dabe34) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Eric Dumazet <[email protected]> commit 407e0ef Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/407e0efd.failed idpf_vport_splitq_napi_poll() can incorrectly return @Budget after napi_complete_done() has been called. This violates NAPI rules, because after napi_complete_done(), current thread lost napi ownership. Move the test against POLL_MODE before the napi_complete_done(). Fixes: c2d548c ("idpf: add TX splitq napi poll support") Reported-by: Peter Newman <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/T/#u Signed-off-by: Eric Dumazet <[email protected]> Cc: Joshua Hay <[email protected]> Cc: Alan Brady <[email protected]> Cc: Madhu Chittim <[email protected]> Cc: Phani Burra <[email protected]> Cc: Pavan Kumar Linga <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 407e0ef) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/intel/idpf/idpf_txrx.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Brian Vazquez <[email protected]> commit 7292af0 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/7292af04.failed Add a helper function to correctly handle the lockless synchronization when the sender needs to block. The paradigm is if (no_resources()) { stop_queue(); barrier(); if (!no_resources()) restart_queue(); } netif_subqueue_maybe_stop already handles the paradigm correctly, but the code split the check for resources in three parts, the first one (descriptors) followed the protocol, but the other two (completions and tx_buf) were only doing the first part and so race prone. Luckily netif_subqueue_maybe_stop macro already allows you to use a function to evaluate the start/stop conditions so the fix only requires the right helper function to evaluate all the conditions at once. The patch removes idpf_tx_maybe_stop_common since it's no longer needed and instead adjusts separately the conditions for singleq and splitq. Note that idpf_tx_buf_hw_update doesn't need to check for resources since that will be covered in idpf_tx_splitq_frame. To reproduce: Reduce the threshold for pending completions to increase the chances of hitting this pause by changing your kernel: drivers/net/ethernet/intel/idpf/idpf_txrx.h -#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 1) +#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 4) Use pktgen to force the host to push small pkts very aggressively: ./pktgen_sample02_multiqueue.sh -i eth1 -s 100 -6 -d $IP -m $MAC \ -p 10000-10000 -t 16 -n 0 -v -x -c 64 Fixes: 6818c4d ("idpf: add splitq start_xmit") Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Madhu Chittim <[email protected]> Signed-off-by: Josh Hay <[email protected]> Signed-off-by: Brian Vazquez <[email protected]> Signed-off-by: Luigi Rizzo <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 7292af0) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/intel/idpf/idpf_txrx.c

jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Emil Tantilov <[email protected]> commit 9dc63d8 Mailbox operations are not possible while the driver is in reset. Operations that require MBX exchange with the control plane will result in long delays if executed while a reset is in progress: ethtool -L <inf> combined 8& echo 1 > /sys/class/net/<inf>/device/reset idpf 0000:83:00.0: HW reset detected idpf 0000:83:00.0: Device HW Reset initiated idpf 0000:83:00.0: Transaction timed-out (op:504 cookie:be00 vc_op:504 salt:be timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:508 cookie:bf00 vc_op:508 salt:bf timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:512 cookie:c000 vc_op:512 salt:c0 timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:510 cookie:c100 vc_op:510 salt:c1 timeout:2000ms) idpf 0000:83:00.0: Transaction timed-out (op:509 cookie:c200 vc_op:509 salt:c2 timeout:60000ms) idpf 0000:83:00.0: Transaction timed-out (op:509 cookie:c300 vc_op:509 salt:c3 timeout:60000ms) idpf 0000:83:00.0: Transaction timed-out (op:505 cookie:c400 vc_op:505 salt:c4 timeout:60000ms) idpf 0000:83:00.0: Failed to configure queues for vport 0, -62 Disable mailbox communication in case of a reset, unless it's done during a driver load, where the virtchnl operations are needed to configure the device. Fixes: 8077c72 ("idpf: add controlq init and reset checks") Co-developed-by: Joshua Hay <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Signed-off-by: Emil Tantilov <[email protected]> Reviewed-by: Ahmed Zaki <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 9dc63d8) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 cve CVE-2025-22116 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Emil Tantilov <[email protected]> commit 680811c Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/680811c6.failed Current init logic ignores the error code from register_netdev(), which will cause WARN_ON() on attempt to unregister it, if there was one, and there is no info for the user that the creation of the netdev failed. WARNING: CPU: 89 PID: 6902 at net/core/dev.c:11512 unregister_netdevice_many_notify+0x211/0x1a10 ... [ 3707.563641] unregister_netdev+0x1c/0x30 [ 3707.563656] idpf_vport_dealloc+0x5cf/0xce0 [idpf] [ 3707.563684] idpf_deinit_task+0xef/0x160 [idpf] [ 3707.563712] idpf_vc_core_deinit+0x84/0x320 [idpf] [ 3707.563739] idpf_remove+0xbf/0x780 [idpf] [ 3707.563769] pci_device_remove+0xab/0x1e0 [ 3707.563786] device_release_driver_internal+0x371/0x530 [ 3707.563803] driver_detach+0xbf/0x180 [ 3707.563816] bus_remove_driver+0x11b/0x2a0 [ 3707.563829] pci_unregister_driver+0x2a/0x250 Introduce an error check and log the vport number and error code. On removal make sure to check VPORT_REG_NETDEV flag prior to calling unregister and free on the netdev. Add local variables for idx, vport_config and netdev for readability. Fixes: 0fe4546 ("idpf: add create vport and netdev configuration") Suggested-by: Tony Nguyen <[email protected]> Signed-off-by: Emil Tantilov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Samuel Salin <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> (cherry picked from commit 680811c) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/intel/idpf/idpf_lib.c

…_rcu() jira LE-3467 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Jiri Pirko <[email protected]> commit 2034d90 Make the net pointer stored in possible_net_t structure annotated as an RCU pointer. Change the access helpers to treat it as such. Introduce read_pnet_rcu() helper to allow caller to dereference the net pointer under RCU read lock. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]> (cherry picked from commit 2034d90) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 cve CVE-2025-21765 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Eric Dumazet <[email protected]> commit 482ad2a dev->nd_net can change, readers should either use rcu_read_lock() or RTNL. We currently use a generic helper, dev_net() with no debugging support. We probably have many hidden bugs. Add dev_net_rcu() helper for callers using rcu_read_lock() protection. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 482ad2a) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 cve CVE-2025-21765 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Eric Dumazet <[email protected]> commit 3c8ffcd ip6_default_advmss() needs rcu protection to make sure the net structure it reads does not disappear. Fixes: 5578689 ("[NETNS][IPV6] route6 - make route6 per namespace") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 3c8ffcd) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-3467 cve CVE-2025-21764 Rebuild_History Non-Buildable kernel-4.18.0-553.58.1.el8_10 commit-author Eric Dumazet <[email protected]> commit 628e6d1 ndisc_alloc_skb() can be called without RTNL or RCU being held. Add RCU protection to avoid possible UAF. Fixes: de09334 ("ndisc: Introduce ndisc_alloc_skb() helper.") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 628e6d1) Signed-off-by: Jonathan Maple <[email protected]>

Rebuild_History BUILDABLE Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50% Number of commits in upstream range v4.18~1..kernel-mainline: 553283 Number of commits in rpm: 59 Number of commits matched with upstream: 52 (88.14%) Number of commits in upstream but not in rpm: 553231 Number of commits NOT found in upstream: 7 (11.86%) Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.58.1.el8_10 for kernel-4.18.0-553.58.1.el8_10 Clean Cherry Picks: 32 (61.54%) Empty Cherry Picks: 20 (38.46%) _______________________________ Full Details Located here: ciq/ciq_backports/kernel-4.18.0-553.58.1.el8_10/rebuild.details.txt Includes: * git commit header above * Empty Commits with upstream SHA * RPM ChangeLog Entries that could not be matched Individual Empty Commit failures contained in the same containing directory. The git message for empty commits will have the path for the failed commit. File names are the first 8 characters of the upstream SHA

thefossguy-ciq

🚤

bmastbergen

🥌

PlaidCat added 30 commits June 27, 2025 18:56

PlaidCat added 20 commits June 27, 2025 18:57

PlaidCat requested review from jdieter, juphoff, kerneltoast, bmastbergen and thefossguy-ciq July 1, 2025 13:06

PlaidCat self-assigned this Jul 1, 2025

thefossguy-ciq approved these changes Jul 1, 2025

View reviewed changes

bmastbergen approved these changes Jul 1, 2025

View reviewed changes

PlaidCat merged commit 2e416d1 into rocky8_10 Jul 1, 2025
2 checks passed

PlaidCat deleted the rocky8_10_rebuild branch July 1, 2025 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rocky8_10] History rebuild for kernel-4.18.0-553.58.1.el8_10 #384

[rocky8_10] History rebuild for kernel-4.18.0-553.58.1.el8_10 #384

Uh oh!

PlaidCat commented Jul 1, 2025

Uh oh!

thefossguy-ciq left a comment

Uh oh!

bmastbergen left a comment

Uh oh!

Uh oh!

Uh oh!

[rocky8_10] History rebuild for kernel-4.18.0-553.58.1.el8_10 #384

[rocky8_10] History rebuild for kernel-4.18.0-553.58.1.el8_10 #384

Uh oh!

Conversation

PlaidCat commented Jul 1, 2025

Checking Rebuild Commits for Potentially missing commits:

BUILD

KBuild

Uh oh!

thefossguy-ciq left a comment

Choose a reason for hiding this comment

Uh oh!

bmastbergen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!