Skip to content

Conversation

bmastbergen
Copy link
Collaborator

@bmastbergen bmastbergen commented Sep 22, 2025

Background

Commits

    bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb()

    jira VULN-66054
    cve CVE-2022-49840
    commit-author Baisong Zhong <[email protected]>
    commit d3fd203f36d46aa29600a72d57a1b61af80e4a25
    bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()

    jira VULN-55840
    cve CVE-2025-21867
    commit-author Shigeru Yoshida <[email protected]>
    commit 6b3d638ca897e099fa99bd6d02189d3176f80a47
    RDMA/iwcm: Fix a use-after-free related to destroying CM IDs

    jira VULN-38769
    cve CVE-2024-42285
    commit-author Bart Van Assche <[email protected]>
    commit aee2424246f9f1dadc33faa78990c1e2eb7826e4
    RDMA/iwcm: Fix use-after-free of work objects after cm_id destruction

    jira VULN-72085
    cve CVE-2025-38211
    commit-author Shin'ichiro Kawasaki <[email protected]>
    commit 6883b680e703c6b2efddb4e7a8d891ce1803d06b
    RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency

    jira VULN-45117
    cve CVE-2024-47696
    commit-author Zhu Yanjun <[email protected]>
    commit 86dfdd8288907f03c18b7fb462e0e232c4f98d89
    net: forward_alloc_get depends on CONFIG_MPTCP

    jira VULN-65244
    cve-pre CVE-2025-22058
    commit-author Eric Dumazet <[email protected]>
    commit 6c302e799a0d4be1362f505453b714fe05d91f2a
    net: annotate data-races around sk->sk_forward_alloc

    jira VULN-65244
    cve-pre CVE-2025-22058
    commit-author Eric Dumazet <[email protected]>
    commit 5e6300e7b3a4ab5b72a82079753868e91fbf9efc
    udp: Fix memory accounting leak.

    jira VULN-65244
    cve CVE-2025-22058
    commit-author Kuniyuki Iwashima <[email protected]>
    commit df207de9d9e7a4d92f8567e2c539d9c8c12fd99d

Build Log

/home/brett/kernel-src-tree
Running make mrproper...
[TIMER]{MRPROPER}: 11s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1"
Making olddefconfig
--
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
--
  LD [M]  virt/lib/irqbypass.ko
  BTF [M] sound/x86/snd-hdmi-lpe-audio.ko
  BTF [M] virt/lib/irqbypass.ko
  BTF [M] sound/xen/snd_xen_front.ko
  BTF [M] sound/virtio/virtio_snd.ko
[TIMER]{BUILD}: 942s
Making Modules
  INSTALL /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  INSTALL /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  INSTALL /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+/kernel/arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+/kernel/arch/x86/crypto/camellia-aesni-avx2.ko
--
  SIGN    /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+/kernel/sound/virtio/virtio_snd.ko
  SIGN    /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+/kernel/sound/xen/snd_xen_front.ko
  SIGN    /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  SIGN    /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+/kernel/virt/lib/irqbypass.ko
  DEPMOD  /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+
[TIMER]{MODULES}: 8s
Making Install
sh ./arch/x86/boot/install.sh \
	5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+ arch/x86/boot/bzImage \
	System.map "/boot"
[TIMER]{INSTALL}: 46s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+ and Index to 2
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 11s
[TIMER]{BUILD}: 942s
[TIMER]{MODULES}: 8s
[TIMER]{INSTALL}: 46s
[TIMER]{TOTAL} 1027s
Rebooting in 10 seconds

Testing

selftest-5.14.0-284.30.1.el9_2.92ciq_lts.10.1.x86_64.log

selftest-5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+.log

brett@lycia ~/ciq/many-92-vulns-9-22-25 % grep ^ok selftest-5.14.0-284.30.1.el9_2.92ciq_lts.10.1.x86_64.log | wc -l
297
brett@lycia ~/ciq/many-92-vulns-9-22-25 % grep ^ok selftest-5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-22-25-d61e53eee6e1+.log | wc -l
297
brett@lycia ~/ciq/many-92-vulns-9-22-25 %

jira VULN-66054
cve CVE-2022-49840
commit-author Baisong Zhong <[email protected]>
commit d3fd203

We got a syzkaller problem because of aarch64 alignment fault
if KFENCE enabled. When the size from user bpf program is an odd
number, like 399, 407, etc, it will cause the struct skb_shared_info's
unaligned access. As seen below:

  BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032

  Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
   __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
   arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
   arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
   atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
   __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
   skb_clone+0xf4/0x214 net/core/skbuff.c:1481
   ____bpf_clone_redirect net/core/filter.c:2433 [inline]
   bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
   bpf_prog_d3839dd9068ceb51+0x80/0x330
   bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
   bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
   bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381

  kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512

  allocated by task 15074 on cpu 0 at 1342.585390s:
   kmalloc include/linux/slab.h:568 [inline]
   kzalloc include/linux/slab.h:675 [inline]
   bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
   bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
   bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
   __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
   __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
   __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381

To fix the problem, we adjust @SiZe so that (@SiZe + @hearoom) is a
multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info
is aligned to a cache line.

Fixes: 1cf1cae ("bpf: introduce BPF_PROG_TEST_RUN command")
	Signed-off-by: Baisong Zhong <[email protected]>
	Signed-off-by: Daniel Borkmann <[email protected]>
	Cc: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
(cherry picked from commit d3fd203)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-55840
cve CVE-2025-21867
commit-author Shigeru Yoshida <[email protected]>
commit 6b3d638

KMSAN reported a use-after-free issue in eth_skb_pkt_type()[1]. The
cause of the issue was that eth_skb_pkt_type() accessed skb's data
that didn't contain an Ethernet header. This occurs when
bpf_prog_test_run_xdp() passes an invalid value as the user_data
argument to bpf_test_init().

Fix this by returning an error when user_data is less than ETH_HLEN in
bpf_test_init(). Additionally, remove the check for "if (user_size >
size)" as it is unnecessary.

[1]
BUG: KMSAN: use-after-free in eth_skb_pkt_type include/linux/etherdevice.h:627 [inline]
BUG: KMSAN: use-after-free in eth_type_trans+0x4ee/0x980 net/ethernet/eth.c:165
 eth_skb_pkt_type include/linux/etherdevice.h:627 [inline]
 eth_type_trans+0x4ee/0x980 net/ethernet/eth.c:165
 __xdp_build_skb_from_frame+0x5a8/0xa50 net/core/xdp.c:635
 xdp_recv_frames net/bpf/test_run.c:272 [inline]
 xdp_test_run_batch net/bpf/test_run.c:361 [inline]
 bpf_test_run_xdp_live+0x2954/0x3330 net/bpf/test_run.c:390
 bpf_prog_test_run_xdp+0x148e/0x1b10 net/bpf/test_run.c:1318
 bpf_prog_test_run+0x5b7/0xa30 kernel/bpf/syscall.c:4371
 __sys_bpf+0x6a6/0xe20 kernel/bpf/syscall.c:5777
 __do_sys_bpf kernel/bpf/syscall.c:5866 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5864 [inline]
 __x64_sys_bpf+0xa4/0xf0 kernel/bpf/syscall.c:5864
 x64_sys_call+0x2ea0/0x3d90 arch/x86/include/generated/asm/syscalls_64.h:322
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xd9/0x1d0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
 free_pages_prepare mm/page_alloc.c:1056 [inline]
 free_unref_page+0x156/0x1320 mm/page_alloc.c:2657
 __free_pages+0xa3/0x1b0 mm/page_alloc.c:4838
 bpf_ringbuf_free kernel/bpf/ringbuf.c:226 [inline]
 ringbuf_map_free+0xff/0x1e0 kernel/bpf/ringbuf.c:235
 bpf_map_free kernel/bpf/syscall.c:838 [inline]
 bpf_map_free_deferred+0x17c/0x310 kernel/bpf/syscall.c:862
 process_one_work kernel/workqueue.c:3229 [inline]
 process_scheduled_works+0xa2b/0x1b60 kernel/workqueue.c:3310
 worker_thread+0xedf/0x1550 kernel/workqueue.c:3391
 kthread+0x535/0x6b0 kernel/kthread.c:389
 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

CPU: 1 UID: 0 PID: 17276 Comm: syz.1.16450 Not tainted 6.12.0-05490-g9bb88c659673 #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014

Fixes: be3d72a ("bpf: move user_size out of bpf_test_init")
	Reported-by: syzkaller <[email protected]>
	Suggested-by: Martin KaFai Lau <[email protected]>
	Signed-off-by: Shigeru Yoshida <[email protected]>
	Signed-off-by: Martin KaFai Lau <[email protected]>
	Acked-by: Stanislav Fomichev <[email protected]>
	Acked-by: Daniel Borkmann <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Alexei Starovoitov <[email protected]>
(cherry picked from commit 6b3d638)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-38769
cve CVE-2024-42285
commit-author Bart Van Assche <[email protected]>
commit aee2424

iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with
an existing struct iw_cm_id (cm_id) as follows:

        conn_id->cm_id.iw = cm_id;
        cm_id->context = conn_id;
        cm_id->cm_handler = cma_iw_handler;

rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make
sure that cm_work_handler() does not trigger a use-after-free by only
freeing of the struct rdma_id_private after all pending work has finished.

	Cc: [email protected]
Fixes: 59c68ac ("iw_cm: free cm_id resources on the last deref")
	Reviewed-by: Zhu Yanjun <[email protected]>
	Tested-by: Shin'ichiro Kawasaki <[email protected]>
	Signed-off-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Leon Romanovsky <[email protected]>
(cherry picked from commit aee2424)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-72085
cve CVE-2025-38211
commit-author Shin'ichiro Kawasaki <[email protected]>
commit 6883b68

The commit 59c68ac ("iw_cm: free cm_id resources on the last
deref") simplified cm_id resource management by freeing cm_id once all
references to the cm_id were removed. The references are removed either
upon completion of iw_cm event handlers or when the application destroys
the cm_id. This commit introduced the use-after-free condition where
cm_id_private object could still be in use by event handler works during
the destruction of cm_id. The commit aee2424 ("RDMA/iwcm: Fix a
use-after-free related to destroying CM IDs") addressed this use-after-
free by flushing all pending works at the cm_id destruction.

However, still another use-after-free possibility remained. It happens
with the work objects allocated for each cm_id_priv within
alloc_work_entries() during cm_id creation, and subsequently freed in
dealloc_work_entries() once all references to the cm_id are removed.
If the cm_id's last reference is decremented in the event handler work,
the work object for the work itself gets removed, and causes the use-
after-free BUG below:

  BUG: KASAN: slab-use-after-free in __pwq_activate_work+0x1ff/0x250
  Read of size 8 at addr ffff88811f9cf800 by task kworker/u16:1/147091

  CPU: 2 UID: 0 PID: 147091 Comm: kworker/u16:1 Not tainted 6.15.0-rc2+ #27 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
  Workqueue:  0x0 (iw_cm_wq)
  Call Trace:
   <TASK>
   dump_stack_lvl+0x6a/0x90
   print_report+0x174/0x554
   ? __virt_addr_valid+0x208/0x430
   ? __pwq_activate_work+0x1ff/0x250
   kasan_report+0xae/0x170
   ? __pwq_activate_work+0x1ff/0x250
   __pwq_activate_work+0x1ff/0x250
   pwq_dec_nr_in_flight+0x8c5/0xfb0
   process_one_work+0xc11/0x1460
   ? __pfx_process_one_work+0x10/0x10
   ? assign_work+0x16c/0x240
   worker_thread+0x5ef/0xfd0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0x3b0/0x770
   ? __pfx_kthread+0x10/0x10
   ? rcu_is_watching+0x11/0xb0
   ? _raw_spin_unlock_irq+0x24/0x50
   ? rcu_is_watching+0x11/0xb0
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x30/0x70
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1a/0x30
   </TASK>

  Allocated by task 147416:
   kasan_save_stack+0x2c/0x50
   kasan_save_track+0x10/0x30
   __kasan_kmalloc+0xa6/0xb0
   alloc_work_entries+0xa9/0x260 [iw_cm]
   iw_cm_connect+0x23/0x4a0 [iw_cm]
   rdma_connect_locked+0xbfd/0x1920 [rdma_cm]
   nvme_rdma_cm_handler+0x8e5/0x1b60 [nvme_rdma]
   cma_cm_event_handler+0xae/0x320 [rdma_cm]
   cma_work_handler+0x106/0x1b0 [rdma_cm]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

  Freed by task 147091:
   kasan_save_stack+0x2c/0x50
   kasan_save_track+0x10/0x30
   kasan_save_free_info+0x37/0x60
   __kasan_slab_free+0x4b/0x70
   kfree+0x13a/0x4b0
   dealloc_work_entries+0x125/0x1f0 [iw_cm]
   iwcm_deref_id+0x6f/0xa0 [iw_cm]
   cm_work_handler+0x136/0x1ba0 [iw_cm]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

  Last potentially related work creation:
   kasan_save_stack+0x2c/0x50
   kasan_record_aux_stack+0xa3/0xb0
   __queue_work+0x2ff/0x1390
   queue_work_on+0x67/0xc0
   cm_event_handler+0x46a/0x820 [iw_cm]
   siw_cm_upcall+0x330/0x650 [siw]
   siw_cm_work_handler+0x6b9/0x2b20 [siw]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

This BUG is reproducible by repeating the blktests test case nvme/061
for the rdma transport and the siw driver.

To avoid the use-after-free of cm_id_private work objects, ensure that
the last reference to the cm_id is decremented not in the event handler
works, but in the cm_id destruction context. For that purpose, move
iwcm_deref_id() call from destroy_cm_id() to the callers of
destroy_cm_id(). In iw_destroy_cm_id(), call iwcm_deref_id() after
flushing the pending works.

During the fix work, I noticed that iw_destroy_cm_id() is called from
cm_work_handler() and process_event() context. However, the comment of
iw_destroy_cm_id() notes that the function "cannot be called by the
event thread". Drop the false comment.

Closes: https://lore.kernel.org/linux-rdma/r5676e754sv35aq7cdsqrlnvyhiq5zktteaurl7vmfih35efko@z6lay7uypy3c/
Fixes: 59c68ac ("iw_cm: free cm_id resources on the last deref")
	Cc: [email protected]
	Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Reviewed-by: Zhu Yanjun <[email protected]>
	Signed-off-by: Leon Romanovsky <[email protected]>
(cherry picked from commit 6883b68)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-45117
cve CVE-2024-47696
commit-author Zhu Yanjun <[email protected]>
commit 86dfdd8

In the commit aee2424 ("RDMA/iwcm: Fix a use-after-free related to
destroying CM IDs"), the function flush_workqueue is invoked to flush the
work queue iwcm_wq.

But at that time, the work queue iwcm_wq was created via the function
alloc_ordered_workqueue without the flag WQ_MEM_RECLAIM.

Because the current process is trying to flush the whole iwcm_wq, if
iwcm_wq doesn't have the flag WQ_MEM_RECLAIM, verify that the current
process is not reclaiming memory or running on a workqueue which doesn't
have the flag WQ_MEM_RECLAIM as that can break forward-progress guarantee
leading to a deadlock.

The call trace is as below:

[  125.350876][ T1430] Call Trace:
[  125.356281][ T1430]  <TASK>
[ 125.361285][ T1430] ? __warn (kernel/panic.c:693)
[ 125.367640][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.375689][ T1430] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 125.382505][ T1430] ? handle_bug (arch/x86/kernel/traps.c:239)
[ 125.388987][ T1430] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
[ 125.395831][ T1430] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 125.403125][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.410984][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.418764][ T1430] __flush_workqueue (kernel/workqueue.c:3970)
[ 125.426021][ T1430] ? __pfx___might_resched (kernel/sched/core.c:10151)
[ 125.433431][ T1430] ? destroy_cm_id (drivers/infiniband/core/iwcm.c:375) iw_cm
[ 125.441209][ T1430] ? __pfx___flush_workqueue (kernel/workqueue.c:3910)
[ 125.473900][ T1430] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 125.473909][ T1430] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161)
[ 125.482537][ T1430] _destroy_id (drivers/infiniband/core/cma.c:2044) rdma_cm
[ 125.495072][ T1430] nvme_rdma_free_queue (drivers/nvme/host/rdma.c:656 drivers/nvme/host/rdma.c:650) nvme_rdma
[ 125.505827][ T1430] nvme_rdma_reset_ctrl_work (drivers/nvme/host/rdma.c:2180) nvme_rdma
[ 125.505831][ T1430] process_one_work (kernel/workqueue.c:3231)
[ 125.515122][ T1430] worker_thread (kernel/workqueue.c:3306 kernel/workqueue.c:3393)
[ 125.515127][ T1430] ? __pfx_worker_thread (kernel/workqueue.c:3339)
[ 125.531837][ T1430] kthread (kernel/kthread.c:389)
[ 125.539864][ T1430] ? __pfx_kthread (kernel/kthread.c:342)
[ 125.550628][ T1430] ret_from_fork (arch/x86/kernel/process.c:147)
[ 125.558840][ T1430] ? __pfx_kthread (kernel/kthread.c:342)
[ 125.558844][ T1430] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[  125.566487][ T1430]  </TASK>
[  125.566488][ T1430] ---[ end trace 0000000000000000 ]---

Fixes: aee2424 ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs")
Link: https://patch.msgid.link/r/[email protected]
	Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
	Tested-by: kernel test robot <[email protected]>
	Signed-off-by: Zhu Yanjun <[email protected]>
	Reviewed-by: Bart Van Assche <[email protected]>
	Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 86dfdd8)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-65244
cve-pre CVE-2025-22058
commit-author Eric Dumazet <[email protected]>
commit 6c302e7

(struct proto)->sk_forward_alloc is currently only used by MPTCP.

	Signed-off-by: Eric Dumazet <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 6c302e7)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-65244
cve-pre CVE-2025-22058
commit-author Eric Dumazet <[email protected]>
commit 5e6300e

Every time sk->sk_forward_alloc is read locklessly,
add a READ_ONCE().

Add sk_forward_alloc_add() helper to centralize updates,
to reduce number of WRITE_ONCE().

Fixes: 1da177e ("Linux-2.6.12-rc2")
	Signed-off-by: Eric Dumazet <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 5e6300e)
	Signed-off-by: Brett Mastbergen <[email protected]>
jira VULN-65244
cve CVE-2025-22058
commit-author Kuniyuki Iwashima <[email protected]>
commit df207de

Matt Dowling reported a weird UDP memory usage issue.

Under normal operation, the UDP memory usage reported in /proc/net/sockstat
remains close to zero.  However, it occasionally spiked to 524,288 pages
and never dropped.  Moreover, the value doubled when the application was
terminated.  Finally, it caused intermittent packet drops.

We can reproduce the issue with the script below [0]:

  1. /proc/net/sockstat reports 0 pages

    # cat /proc/net/sockstat | grep UDP:
    UDP: inuse 1 mem 0

  2. Run the script till the report reaches 524,288

    # python3 test.py & sleep 5
    # cat /proc/net/sockstat | grep UDP:
    UDP: inuse 3 mem 524288  <-- (INT_MAX + 1) >> PAGE_SHIFT

  3. Kill the socket and confirm the number never drops

    # pkill python3 && sleep 5
    # cat /proc/net/sockstat | grep UDP:
    UDP: inuse 1 mem 524288

  4. (necessary since v6.0) Trigger proto_memory_pcpu_drain()

    # python3 test.py & sleep 1 && pkill python3

  5. The number doubles

    # cat /proc/net/sockstat | grep UDP:
    UDP: inuse 1 mem 1048577

The application set INT_MAX to SO_RCVBUF, which triggered an integer
overflow in udp_rmem_release().

When a socket is close()d, udp_destruct_common() purges its receive
queue and sums up skb->truesize in the queue.  This total is calculated
and stored in a local unsigned integer variable.

The total size is then passed to udp_rmem_release() to adjust memory
accounting.  However, because the function takes a signed integer
argument, the total size can wrap around, causing an overflow.

Then, the released amount is calculated as follows:

  1) Add size to sk->sk_forward_alloc.
  2) Round down sk->sk_forward_alloc to the nearest lower multiple of
      PAGE_SIZE and assign it to amount.
  3) Subtract amount from sk->sk_forward_alloc.
  4) Pass amount >> PAGE_SHIFT to __sk_mem_reduce_allocated().

When the issue occurred, the total in udp_destruct_common() was 2147484480
(INT_MAX + 833), which was cast to -2147482816 in udp_rmem_release().

At 1) sk->sk_forward_alloc is changed from 3264 to -2147479552, and
2) sets -2147479552 to amount.  3) reverts the wraparound, so we don't
see a warning in inet_sock_destruct().  However, udp_memory_allocated
ends up doubling at 4).

Since commit 3cd3399 ("net: implement per-cpu reserves for
memory_allocated"), memory usage no longer doubles immediately after
a socket is close()d because __sk_mem_reduce_allocated() caches the
amount in udp_memory_per_cpu_fw_alloc.  However, the next time a UDP
socket receives a packet, the subtraction takes effect, causing UDP
memory usage to double.

This issue makes further memory allocation fail once the socket's
sk->sk_rmem_alloc exceeds net.ipv4.udp_rmem_min, resulting in packet
drops.

To prevent this issue, let's use unsigned int for the calculation and
call sk_forward_alloc_add() only once for the small delta.

Note that first_packet_length() also potentially has the same problem.

[0]:
from socket import *

SO_RCVBUFFORCE = 33
INT_MAX = (2 ** 31) - 1

s = socket(AF_INET, SOCK_DGRAM)
s.bind(('', 0))
s.setsockopt(SOL_SOCKET, SO_RCVBUFFORCE, INT_MAX)

c = socket(AF_INET, SOCK_DGRAM)
c.connect(s.getsockname())

data = b'a' * 100

while True:
    c.send(data)

Fixes: f970bd9 ("udp: implement memory accounting helpers")
	Reported-by: Matt Dowling <[email protected]>
	Signed-off-by: Kuniyuki Iwashima <[email protected]>
	Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit df207de)
	Signed-off-by: Brett Mastbergen <[email protected]>
Copy link

🔍 Upstream Linux Kernel Commit Check

  • ⚠️ PR commit ec99fcb7a448 (RDMA/iwcm: Fix a use-after-free related to destroying CM IDs) references upstream commit
    aee2424246f9 which has been referenced by a Fixes: tag in the upstream
    Linux kernel:
    86dfdd828890 RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency (Zhu Yanjun)

This is an automated message from the kernel commit checker workflow.

@bmastbergen
Copy link
Collaborator Author

🔍 Upstream Linux Kernel Commit Check

  • ⚠️ PR commit ec99fcb7a448 (RDMA/iwcm: Fix a use-after-free related to destroying CM IDs) references upstream commit
    aee2424246f9 which has been referenced by a Fixes: tag in the upstream
    Linux kernel:
    86dfdd828890 RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency (Zhu Yanjun)

This is an automated message from the kernel commit checker workflow.

Already included in this PR.

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants