Open
Conversation
This reverts commit 3acf4be. The minimum supported version of clang is clang 10.0.1. Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Fangrui Song <maskray@google.com> Cc: Marco Elver <elver@google.com> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/20200902225911.209899-5-ndesaulniers@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: Ia41a00493adb16ab93b98f41e0c95f30d904d322 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
The -nostdlib option requests the compiler to not use the standard system startup files or libraries when linking. It is effective only when $(CC) is used as a linker driver. Since commit 691efbe ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO"), $(LD) is directly used, hence -nostdlib is unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20211107161802.323125-1-masahiroy@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I2790be770723facab0b00e6bfcd5d9b2388b9306 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
Binutils added support for this instruction in commit e797f7e0b2bedc9328d4a9a0ebc63ca7a2dbbebc which shipped in 2.24 (just missing the 2.23 release) but was cherry-picked into 2.23 in commit 27a50d6755bae906bc73b4ec1a8b448467f0bea1. Thanks to Christian and Simon for helping me with the patch archaeology. According to Documentation/process/changes.rst, the minimum supported version of binutils is 2.23. Since all supported versions of GAS support this instruction, drop the assembler invocation, preprocessor flags/guards, and the cross assembler macro that's now unused. This also avoids a recursive self reference in a follow up cleanup patch. Cc: Christian Biesinger <cbiesinger@google.com> Cc: Simon Marchi <simon.marchi@polymtl.ca> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20211019223646.1146945-2-ndesaulniers@google.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I6c35160a44f66e5224c324c5bbc649ff5e5791b1 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
As Arnd points out: gcc-4.8 already supported -march=armv8, and we require gcc-5.1 now, so both this #if/#else construct and the corresponding "cc32-option,-march=armv8-a" check should be obsolete now. Link: https://lore.kernel.org/lkml/CAK8P3a3UBEJ0Py2ycz=rHfgog8g3mCOeQOwO0Gmp-iz6Uxkapg@mail.gmail.com/ Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20211019223646.1146945-3-ndesaulniers@google.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I2604d7e38e844b8cbcb588b498c1a679e9f1f6af Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
arm64 descends into each vdso directory twice; first in vdso_prepare, second during the ordinary build process. PPC mimicked it and uncovered a problem [1]. In the first descend, Kbuild directly visits the vdso directories, therefore it does not inherit subdir-ccflags-y from upper directories. This means the command line parameters may differ between the two. If it happens, the offset values in the generated headers might be different from real offsets of vdso.so in the kernel. This potential danger should be avoided. The vdso directories are built in the vdso_prepare stage, so the second descend is unneeded. [1]: https://lore.kernel.org/linux-kbuild/CAK7LNARAkJ3_-4gX0VA2UkapbOftuzfSTVMBbgbw=HD8n7N+7w@mail.gmail.com/T/#ma10dcb961fda13f36d42d58fa6cb2da988b7e73a Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20201218024540.1102650-1-masahiroy@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: Ieb09c4a968cb049c043a3ecc13ed34d05deea029 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
commit a5b8ca97fbf8 ("arm64: do not descend to vdso directories twice")
changes the cleaning behavior of arm64's vdso files, in that vdso.lds,
vdso.so, and vdso.so.dbg are not removed upon a 'make clean/mrproper':
$ make defconfig ARCH=arm64
$ make ARCH=arm64
$ make mrproper ARCH=arm64
$ git clean -nxdf
Would remove arch/arm64/kernel/vdso/vdso.lds
Would remove arch/arm64/kernel/vdso/vdso.so
Would remove arch/arm64/kernel/vdso/vdso.so.dbg
To remedy this, manually descend into arch/arm64/kernel/vdso upon
cleaning.
After this commit:
$ make defconfig ARCH=arm64
$ make ARCH=arm64
$ make mrproper ARCH=arm64
$ git clean -nxdf
<empty>
Similar results are obtained for the vdso32 equivalent.
Signed-off-by: Andrew Delgadillo <adelg@google.com>
Cc: stable@vger.kernel.org
Fixes: a5b8ca97fbf8 ("arm64: do not descend to vdso directories twice")
Link: https://lore.kernel.org/r/20210810231755.1743524-1-adelg@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
Change-Id: I9cbb4e21b492dac84a3d7b6858e54ebe0f21cbba
Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
The vdso linker script is preprocessed on demand. Adding it to 'targets' is enough to include the .cmd file. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Greentime Hu <green.hu@gmail.com> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: If2c940a04bc7e4f98d9e37f013e6ae45ebf27dc9 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
Allow the compat vdso (32b) to be compiled as either THUMB2 (default) or
ARM.
For THUMB2, the register r7 is reserved for the frame pointer, but
code in arch/arm64/include/asm/vdso/compat_gettimeofday.h
uses r7. Explicitly set -fomit-frame-pointer, since unwinding through
interworked THUMB2 and ARM is unreliable anyways. See also how
CONFIG_UNWINDER_FRAME_POINTER cannot be selected for
CONFIG_THUMB2_KERNEL for ARCH=arm.
This also helps toolchains that differ in their implicit value if the
choice of -f{no-}omit-frame-pointer is left unspecified, to not error on
the use of r7.
2019 Q4 ARM AAPCS seeks to standardize the use of r11 as the reserved
frame pointer register, but no production compiler that can compile the
Linux kernel currently implements this. We're actively discussing such
a transition with ARM toolchain developers currently.
Reported-by: Luis Lozano <llozano@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Manoj Gupta <manojgupta@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Boyd <swboyd@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Link: https://static.docs.arm.com/ihi0042/i/aapcs32.pdf
Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1084372
Link: https://lore.kernel.org/r/20200608205711.109418-1-ndesaulniers@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
Change-Id: I30e987f3946c1117ec21323074b84a31d568295e
Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
There's no need to allocate the compat vDSO and signal pages using GFP_ATOMIC allocations, so use GFP_KERNEL instead. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20210318170738.7756-2-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I466e82f8ad54a8c9c1dc4ea10055bdd7f0f578e0 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
The aarch32_vdso_pages[] array is unnecessarily confusing. We only ever use the C_VECTORS and C_SIGPAGE slots, and the other slots are unused despite having corresponding mappings (sharing pages with the AArch64 vDSO). Let's make this clearer by using separate variables for the vectors page and the sigreturn page. A subsequent patch will clean up the C_* naming and conflation of pages with mappings. Note that since both the vectors page and sig page are single pages, and the mapping is a single page long, their pages array do not need to be NULL-terminated (and this was not the case with the existing code for the sig page as it was the last entry in the aarch32_vdso_pages array). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200428164921.41641-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: Ib254c5d1b1c1d91bc65747fc97186f1cff89f709 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
Currently we have some ifdeffery to determine the number of elements in
enum arch_vdso_type as VDSO_TYPES, rather that the usual pattern of
having the enum define this:
| enum foo_type {
| FOO_TYPE_A,
| FOO_TYPE_B,
| #ifdef CONFIG_C
| FOO_TYPE_C,
| #endif
| NR_FOO_TYPES
| }
... however, given we only use this number to size the vdso_lookup[]
array, this is redundant anyway as the compiler can automatically size
the array to fit all defined elements.
So let's remove the VDSO_TYPES to simplify the code.
At the same time, let's use designated initializers for the array
elements so that these are guarnateed to be at the expected indices,
regardless of how we modify the structure. For clariy the redundant
explicit initialization of the enum elements is dropped.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200428164921.41641-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
Change-Id: Iffa8bcf3efda4bfaffa68205075b9000045cd641
Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
The current code doesn't use a consistent naming scheme for structures, enums, or variables, making it harder than necessary to determine the relationship between these. Let's make this easier by consistently using 'vdso_abi' nomenclature. The 'vdso_lookup' array is renamed to 'vdso_info' to describe what it contains rather than how it is consumed. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200428164921.41641-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I77fe7e85ec4e07c5738722805f22cfa37ffc5b58 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
The current code doesn't use a consistent naming scheme for structures, enums, or variables, making it harder than necessary to determine the relationship between these. Let's make this easier by consistently using 'map' nomenclature for mappings created in userspace, minimizing redundant comments, and using designated array initializers to tie indices to their respective elements. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200428164921.41641-5-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I77eda5b4275e15597c3c2a59a0396b9e82380d80 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
In preparation for removing the signal trampoline from the compat vDSO, allow the sigpage and the compat vDSO to co-exist. For the moment the vDSO signal trampoline will still be used when built. Subsequent patches will move to the sigpage consistently. Acked-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I9019411be031f1710b11372228bd30342384cce5 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
The 32-bit sigreturn trampoline in the compat sigpage matches the binary representation of the arch/arm/ sigpage exactly. This is important for debuggers (e.g. GDB) and unwinders (e.g. libunwind) since they rely on matching the instruction sequence in order to identify that they are unwinding through a signal. The same cannot be said for the sigreturn trampoline in the compat vDSO, which defeats the unwinder heuristics and instead attempts to use unwind directives for the unwinding. This is in contrast to arch/arm/, which never uses the vDSO for sigreturn. Ensure compatibility with arch/arm/ and existing unwinders by always using the sigpage for the sigreturn trampoline, regardless of the presence of the compat vDSO. Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I528b0f2d1e8356e437c323e94573dd9322efade9 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
The sigreturn code in the compat vDSO is unused. Remove it. Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I9661da6a7293f84fa3075c15c6ae45e29b5a4730 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
Most of the compat vDSO code can be built and guarded using IS_ENABLED, so drop the unnecessary #ifdefs. Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I4a20b867b4362bf5a485a46a935cb089b03a1289 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
There's really no need to put every parameter on a new line when calling a function with a long name, so reformat the *setup_additional_pages() functions in the vDSO setup code to follow the usual conventions. Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: I862531295e6398fc3917cad0d7f669018028db17 Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
For compatability with 32-bit Arm, allow the compat signal page to be remapped via mremap(). Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20210318170738.7756-4-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com> Change-Id: Ibfa46525e7f2382fdafffd31b3472029a45da65b Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
There is currently no dependency for vdso*-wrap.S on vdso*.so, which means that
you can get a build that uses a stale vdso*-wrap.o.
In commit a5b8ca97fbf8, the file that includes the vdso.so was moved and renamed
from arch/arm64/kernel/vdso/vdso.S to arch/arm64/kernel/vdso-wrap.S, when this
happened the Makefile was not updated to force the dependcy on vdso.so.
Fixes: a5b8ca97fbf8 ("arm64: do not descend to vdso directories twice")
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220510102721.50811-1-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
Change-Id: I70f4930c0b20a5eb872029081ef0b83265d72acd
Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
adithya2306
pushed a commit
that referenced
this pull request
Jan 30, 2023
…tion commit 07fd5b6cdf3cc30bfde8fe0f644771688be04447 upstream. Each cset (css_set) is pinned by its tasks. When we're moving tasks around across csets for a migration, we need to hold the source and destination csets to ensure that they don't go away while we're moving tasks about. This is done by linking cset->mg_preload_node on either the mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the same cset->mg_preload_node for both the src and dst lists was deemed okay as a cset can't be both the source and destination at the same time. Unfortunately, this overloading becomes problematic when multiple tasks are involved in a migration and some of them are identity noop migrations while others are actually moving across cgroups. For example, this can happen with the following sequence on cgroup1: #1> mkdir -p /sys/fs/cgroup/misc/a/b #2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs #3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS & #4> PID=$! #5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks #6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs the process including the group leader back into a. In this final migration, non-leader threads would be doing identity migration while the group leader is doing an actual one. After #3, let's say the whole process was in cset A, and that after #4, the leader moves to cset B. Then, during #6, the following happens: 1. cgroup_migrate_add_src() is called on B for the leader. 2. cgroup_migrate_add_src() is called on A for the other threads. 3. cgroup_migrate_prepare_dst() is called. It scans the src list. 4. It notices that B wants to migrate to A, so it tries to A to the dst list but realizes that its ->mg_preload_node is already busy. 5. and then it notices A wants to migrate to A as it's an identity migration, it culls it by list_del_init()'ing its ->mg_preload_node and putting references accordingly. 6. The rest of migration takes place with B on the src list but nothing on the dst list. This means that A isn't held while migration is in progress. If all tasks leave A before the migration finishes and the incoming task pins it, the cset will be destroyed leading to use-after-free. This is caused by overloading cset->mg_preload_node for both src and dst preload lists. We wanted to exclude the cset from the src list but ended up inadvertently excluding it from the dst list too. This patch fixes the issue by separating out cset->mg_preload_node into ->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst preloadings don't interfere with each other. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Reported-by: shisiyuan <shisiyuan19870131@gmail.com> Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com Link: https://www.spinics.net/lists/cgroups/msg33313.html Fixes: f817de9 ("cgroup: prepare migration path for unified hierarchy") Cc: stable@vger.kernel.org # v3.16+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KazuDante89
pushed a commit
to KazuDante89/android_kernel_ghost_lisa
that referenced
this pull request
Jan 30, 2023
[ Upstream commit 6b9dbedbe3499fef862c4dff5217cf91f34e43b3 ]
pty_write() invokes kmalloc() which may invoke a normal printk() to print
failure message. This can cause a deadlock in the scenario reported by
syz-bot below:
CPU0 CPU1 CPU2
---- ---- ----
lock(console_owner);
lock(&port_lock_key);
lock(&port->lock);
lock(&port_lock_key);
lock(&port->lock);
lock(console_owner);
As commit dbdda84 ("printk: Add console owner and waiter logic to
load balance console writes") said, such deadlock can be prevented by
using printk_deferred() in kmalloc() (which is invoked in the section
guarded by the port->lock). But there are too many printk() on the
kmalloc() path, and kmalloc() can be called from anywhere, so changing
printk() to printk_deferred() is too complicated and inelegant.
Therefore, this patch chooses to specify __GFP_NOWARN to kmalloc(), so
that printk() will not be called, and this deadlock problem can be
avoided.
Syzbot reported the following lockdep error:
======================================================
WARNING: possible circular locking dependency detected
5.4.143-00237-g08ccc19a-dirty #10 Not tainted
------------------------------------------------------
syz-executor.4/29420 is trying to acquire lock:
ffffffff8aedb2a0 (console_owner){....}-{0:0}, at: console_trylock_spinning kernel/printk/printk.c:1752 [inline]
ffffffff8aedb2a0 (console_owner){....}-{0:0}, at: vprintk_emit+0x2ca/0x470 kernel/printk/printk.c:2023
but task is already holding lock:
ffff8880119c9158 (&port->lock){-.-.}-{2:2}, at: pty_write+0xf4/0x1f0 drivers/tty/pty.c:120
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> adithya2306#2 (&port->lock){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159
tty_port_tty_get drivers/tty/tty_port.c:288 [inline] <-- lock(&port->lock);
tty_port_default_wakeup+0x1d/0xb0 drivers/tty/tty_port.c:47
serial8250_tx_chars+0x530/0xa80 drivers/tty/serial/8250/8250_port.c:1767
serial8250_handle_irq.part.0+0x31f/0x3d0 drivers/tty/serial/8250/8250_port.c:1854
serial8250_handle_irq drivers/tty/serial/8250/8250_port.c:1827 [inline] <-- lock(&port_lock_key);
serial8250_default_handle_irq+0xb2/0x220 drivers/tty/serial/8250/8250_port.c:1870
serial8250_interrupt+0xfd/0x200 drivers/tty/serial/8250/8250_core.c:126
__handle_irq_event_percpu+0x109/0xa50 kernel/irq/handle.c:156
[...]
-> adithya2306#1 (&port_lock_key){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159
serial8250_console_write+0x184/0xa40 drivers/tty/serial/8250/8250_port.c:3198
<-- lock(&port_lock_key);
call_console_drivers kernel/printk/printk.c:1819 [inline]
console_unlock+0x8cb/0xd00 kernel/printk/printk.c:2504
vprintk_emit+0x1b5/0x470 kernel/printk/printk.c:2024 <-- lock(console_owner);
vprintk_func+0x8d/0x250 kernel/printk/printk_safe.c:394
printk+0xba/0xed kernel/printk/printk.c:2084
register_console+0x8b3/0xc10 kernel/printk/printk.c:2829
univ8250_console_init+0x3a/0x46 drivers/tty/serial/8250/8250_core.c:681
console_init+0x49d/0x6d3 kernel/printk/printk.c:2915
start_kernel+0x5e9/0x879 init/main.c:713
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
-> #0 (console_owner){....}-{0:0}:
[...]
lock_acquire+0x127/0x340 kernel/locking/lockdep.c:4734
console_trylock_spinning kernel/printk/printk.c:1773 [inline] <-- lock(console_owner);
vprintk_emit+0x307/0x470 kernel/printk/printk.c:2023
vprintk_func+0x8d/0x250 kernel/printk/printk_safe.c:394
printk+0xba/0xed kernel/printk/printk.c:2084
fail_dump lib/fault-inject.c:45 [inline]
should_fail+0x67b/0x7c0 lib/fault-inject.c:144
__should_failslab+0x152/0x1c0 mm/failslab.c:33
should_failslab+0x5/0x10 mm/slab_common.c:1224
slab_pre_alloc_hook mm/slab.h:468 [inline]
slab_alloc_node mm/slub.c:2723 [inline]
slab_alloc mm/slub.c:2807 [inline]
__kmalloc+0x72/0x300 mm/slub.c:3871
kmalloc include/linux/slab.h:582 [inline]
tty_buffer_alloc+0x23f/0x2a0 drivers/tty/tty_buffer.c:175
__tty_buffer_request_room+0x156/0x2a0 drivers/tty/tty_buffer.c:273
tty_insert_flip_string_fixed_flag+0x93/0x250 drivers/tty/tty_buffer.c:318
tty_insert_flip_string include/linux/tty_flip.h:37 [inline]
pty_write+0x126/0x1f0 drivers/tty/pty.c:122 <-- lock(&port->lock);
n_tty_write+0xa7a/0xfc0 drivers/tty/n_tty.c:2356
do_tty_write drivers/tty/tty_io.c:961 [inline]
tty_write+0x512/0x930 drivers/tty/tty_io.c:1045
__vfs_write+0x76/0x100 fs/read_write.c:494
[...]
other info that might help us debug this:
Chain exists of:
console_owner --> &port_lock_key --> &port->lock
Link: https://lkml.kernel.org/r/20220511061951.1114-2-zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/20220510113809.80626-2-zhengqi.arch@bytedance.com
Fixes: b6da31b ("tty: Fix data race in tty_insert_flip_string_fixed_flag")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Jiri Slaby <jirislaby@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
KazuDante89
pushed a commit
to KazuDante89/android_kernel_ghost_lisa
that referenced
this pull request
Jan 30, 2023
commit 6213f5d4d23c50d393a31dc8e351e63a1fd10dbe upstream. Let's avoid false-alarmed lockdep warning. [ 58.914674] [T1501146] -> adithya2306#2 (&sb->s_type->i_mutex_key#20){+.+.}-{3:3}: [ 58.915975] [T1501146] system_server: down_write+0x7c/0xe0 [ 58.916738] [T1501146] system_server: f2fs_quota_sync+0x60/0x1a8 [ 58.917563] [T1501146] system_server: block_operations+0x16c/0x43c [ 58.918410] [T1501146] system_server: f2fs_write_checkpoint+0x114/0x318 [ 58.919312] [T1501146] system_server: f2fs_issue_checkpoint+0x178/0x21c [ 58.920214] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c [ 58.920999] [T1501146] system_server: f2fs_do_sync_file+0x334/0x738 [ 58.921862] [T1501146] system_server: f2fs_sync_file+0x30/0x48 [ 58.922667] [T1501146] system_server: __arm64_sys_fsync+0x84/0xf8 [ 58.923506] [T1501146] system_server: el0_svc_common.llvm.12821150825140585682+0xd8/0x20c [ 58.924604] [T1501146] system_server: do_el0_svc+0x28/0xa0 [ 58.925366] [T1501146] system_server: el0_svc+0x24/0x38 [ 58.926094] [T1501146] system_server: el0_sync_handler+0x88/0xec [ 58.926920] [T1501146] system_server: el0_sync+0x1b4/0x1c0 [ 58.927681] [T1501146] -> adithya2306#1 (&sbi->cp_global_sem){+.+.}-{3:3}: [ 58.928889] [T1501146] system_server: down_write+0x7c/0xe0 [ 58.929650] [T1501146] system_server: f2fs_write_checkpoint+0xbc/0x318 [ 58.930541] [T1501146] system_server: f2fs_issue_checkpoint+0x178/0x21c [ 58.931443] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c [ 58.932226] [T1501146] system_server: sync_filesystem+0xac/0x130 [ 58.933053] [T1501146] system_server: generic_shutdown_super+0x38/0x150 [ 58.933958] [T1501146] system_server: kill_block_super+0x24/0x58 [ 58.934791] [T1501146] system_server: kill_f2fs_super+0xcc/0x124 [ 58.935618] [T1501146] system_server: deactivate_locked_super+0x90/0x120 [ 58.936529] [T1501146] system_server: deactivate_super+0x74/0xac [ 58.937356] [T1501146] system_server: cleanup_mnt+0x128/0x168 [ 58.938150] [T1501146] system_server: __cleanup_mnt+0x18/0x28 [ 58.938944] [T1501146] system_server: task_work_run+0xb8/0x14c [ 58.939749] [T1501146] system_server: do_notify_resume+0x114/0x1e8 [ 58.940595] [T1501146] system_server: work_pending+0xc/0x5f0 [ 58.941375] [T1501146] -> #0 (&sbi->gc_lock){+.+.}-{3:3}: [ 58.942519] [T1501146] system_server: __lock_acquire+0x1270/0x2868 [ 58.943366] [T1501146] system_server: lock_acquire+0x114/0x294 [ 58.944169] [T1501146] system_server: down_write+0x7c/0xe0 [ 58.944930] [T1501146] system_server: f2fs_issue_checkpoint+0x13c/0x21c [ 58.945831] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c [ 58.946614] [T1501146] system_server: f2fs_do_sync_file+0x334/0x738 [ 58.947472] [T1501146] system_server: f2fs_ioc_commit_atomic_write+0xc8/0x14c [ 58.948439] [T1501146] system_server: __f2fs_ioctl+0x674/0x154c [ 58.949253] [T1501146] system_server: f2fs_ioctl+0x54/0x88 [ 58.950018] [T1501146] system_server: __arm64_sys_ioctl+0xa8/0x110 [ 58.950865] [T1501146] system_server: el0_svc_common.llvm.12821150825140585682+0xd8/0x20c [ 58.951965] [T1501146] system_server: do_el0_svc+0x28/0xa0 [ 58.952727] [T1501146] system_server: el0_svc+0x24/0x38 [ 58.953454] [T1501146] system_server: el0_sync_handler+0x88/0xec [ 58.954279] [T1501146] system_server: el0_sync+0x1b4/0x1c0 Cc: stable@vger.kernel.org Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KazuDante89
pushed a commit
to KazuDante89/android_kernel_ghost_lisa
that referenced
this pull request
Jan 30, 2023
commit 520778042ccca019f3ffa136dd0ca565c486cedd upstream. Since 3e135cd ("netfilter: nft_dynset: dynamic stateful expression instantiation"), it is possible to attach stateful expressions to set elements. cd5125d ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") introduces conditional destruction on the object to accomodate transaction semantics. nft_expr_init() calls expr->ops->init() first, then check for NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful lookup expressions which points to a set, which might lead to UAF since the set is not properly detached from the set->binding for this case. Anyway, this combination is non-sense from nf_tables perspective. This patch fixes this problem by checking for NFT_STATEFUL_EXPR before expr->ops->init() is called. The reporter provides a KASAN splat and a poc reproducer (similar to those autogenerated by syzbot to report use-after-free errors). It is unknown to me if they are using syzbot or if they use similar automated tool to locate the bug that they are reporting. For the record, this is the KASAN splat. [ 85.431824] ================================================================== [ 85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20 [ 85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776 [ 85.434756] [ 85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G W 5.18.0+ adithya2306#2 [ 85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Fixes: 0b2d8a7 ("netfilter: nf_tables: add helper functions for expression handling") Reported-and-tested-by: Aaron Adams <edg-e@nccgroup.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> [Ajay: Regenerated the patch for v5.4.y] Signed-off-by: Ajay Kaher <akaher@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KazuDante89
pushed a commit
to KazuDante89/android_kernel_ghost_lisa
that referenced
this pull request
Jan 30, 2023
commit 863e0d81b6683c4cbc588ad831f560c90e494bef upstream. When user_dlm_destroy_lock failed, it didn't clean up the flags it set before exit. For USER_LOCK_IN_TEARDOWN, if this function fails because of lock is still in used, next time when unlink invokes this function, it will return succeed, and then unlink will remove inode and dentry if lock is not in used(file closed), but the dlm lock is still linked in dlm lock resource, then when bast come in, it will trigger a panic due to user-after-free. See the following panic call trace. To fix this, USER_LOCK_IN_TEARDOWN should be reverted if fail. And also error should be returned if USER_LOCK_IN_TEARDOWN is set to let user know that unlink fail. For the case of ocfs2_dlm_unlock failure, besides USER_LOCK_IN_TEARDOWN, USER_LOCK_BUSY is also required to be cleared. Even though spin lock is released in between, but USER_LOCK_IN_TEARDOWN is still set, for USER_LOCK_BUSY, if before every place that waits on this flag, USER_LOCK_IN_TEARDOWN is checked to bail out, that will make sure no flow waits on the busy flag set by user_dlm_destroy_lock(), then we can simplely revert USER_LOCK_BUSY when ocfs2_dlm_unlock fails. Fix user_dlm_cluster_lock() which is the only function not following this. [ 941.336392] (python,26174,16):dlmfs_unlink:562 ERROR: unlink 004fb0000060000b5a90b8c847b72e1, error -16 from destroy [ 989.757536] ------------[ cut here ]------------ [ 989.757709] kernel BUG at fs/ocfs2/dlmfs/userdlm.c:173! [ 989.757876] invalid opcode: 0000 [adithya2306#1] SMP [ 989.758027] Modules linked in: ksplice_2zhuk2jr_ib_ipoib_new(O) ksplice_2zhuk2jr(O) mptctl mptbase xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn cdc_ether usbnet mii ocfs2 jbd2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfsv3 nfs_acl nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc fcoe libfcoe libfc scsi_transport_fc sunrpc ipmi_devintf bridge stp llc rds_rdma rds bonding ib_sdp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm falcon_lsm_serviceable(PE) falcon_nf_netcontain(PE) mlx4_vnic falcon_kal(E) falcon_lsm_pinned_13402(E) mlx4_ib ib_sa ib_mad ib_core ib_addr xenfs xen_privcmd dm_multipath iTCO_wdt iTCO_vendor_support pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core ipmi_ssif i2c_core ipmi_si ipmi_msghandler [ 989.760686] ioatdma sg ext3 jbd mbcache sd_mod ahci libahci ixgbe dca ptp pps_core vxlan udp_tunnel ip6_udp_tunnel megaraid_sas mlx4_core crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ksplice_2zhuk2jr_ib_ipoib_old] [ 989.761987] CPU: 10 PID: 19102 Comm: dlm_thread Tainted: P OE 4.1.12-124.57.1.el6uek.x86_64 adithya2306#2 [ 989.762290] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30350100 06/17/2021 [ 989.762599] task: ffff880178af6200 ti: ffff88017f7c8000 task.ti: ffff88017f7c8000 [ 989.762848] RIP: e030:[<ffffffffc07d4316>] [<ffffffffc07d4316>] __user_dlm_queue_lockres.part.4+0x76/0x80 [ocfs2_dlmfs] [ 989.763185] RSP: e02b:ffff88017f7cbcb8 EFLAGS: 00010246 [ 989.763353] RAX: 0000000000000000 RBX: ffff880174d48008 RCX: 0000000000000003 [ 989.763565] RDX: 0000000000120012 RSI: 0000000000000003 RDI: ffff880174d48170 [ 989.763778] RBP: ffff88017f7cbcc8 R08: ffff88021f4293b0 R09: 0000000000000000 [ 989.763991] R10: ffff880179c8c000 R11: 0000000000000003 R12: ffff880174d48008 [ 989.764204] R13: 0000000000000003 R14: ffff880179c8c000 R15: ffff88021db7a000 [ 989.764422] FS: 0000000000000000(0000) GS:ffff880247480000(0000) knlGS:ffff880247480000 [ 989.764685] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 989.764865] CR2: ffff8000007f6800 CR3: 0000000001ae0000 CR4: 0000000000042660 [ 989.765081] Stack: [ 989.765167] 0000000000000003 ffff880174d48040 ffff88017f7cbd18 ffffffffc07d455f [ 989.765442] ffff88017f7cbd88 ffffffff816fb639 ffff88017f7cbd38 ffff8800361b5600 [ 989.765717] ffff88021db7a000 ffff88021f429380 0000000000000003 ffffffffc0453020 [ 989.765991] Call Trace: [ 989.766093] [<ffffffffc07d455f>] user_bast+0x5f/0xf0 [ocfs2_dlmfs] [ 989.766287] [<ffffffff816fb639>] ? schedule_timeout+0x169/0x2d0 [ 989.766475] [<ffffffffc0453020>] ? o2dlm_lock_ast_wrapper+0x20/0x20 [ocfs2_stack_o2cb] [ 989.766738] [<ffffffffc045303a>] o2dlm_blocking_ast_wrapper+0x1a/0x20 [ocfs2_stack_o2cb] [ 989.767010] [<ffffffffc0864ec6>] dlm_do_local_bast+0x46/0xe0 [ocfs2_dlm] [ 989.767217] [<ffffffffc084f5cc>] ? dlm_lockres_calc_usage+0x4c/0x60 [ocfs2_dlm] [ 989.767466] [<ffffffffc08501f1>] dlm_thread+0xa31/0x1140 [ocfs2_dlm] [ 989.767662] [<ffffffff816f78da>] ? __schedule+0x24a/0x810 [ 989.767834] [<ffffffff816f78ce>] ? __schedule+0x23e/0x810 [ 989.768006] [<ffffffff816f78da>] ? __schedule+0x24a/0x810 [ 989.768178] [<ffffffff816f78ce>] ? __schedule+0x23e/0x810 [ 989.768349] [<ffffffff816f78da>] ? __schedule+0x24a/0x810 [ 989.768521] [<ffffffff816f78ce>] ? __schedule+0x23e/0x810 [ 989.768693] [<ffffffff816f78da>] ? __schedule+0x24a/0x810 [ 989.768893] [<ffffffff816f78ce>] ? __schedule+0x23e/0x810 [ 989.769067] [<ffffffff816f78da>] ? __schedule+0x24a/0x810 [ 989.769241] [<ffffffff810ce4d0>] ? wait_woken+0x90/0x90 [ 989.769411] [<ffffffffc084f7c0>] ? dlm_kick_thread+0x80/0x80 [ocfs2_dlm] [ 989.769617] [<ffffffff810a8bbb>] kthread+0xcb/0xf0 [ 989.769774] [<ffffffff816f78da>] ? __schedule+0x24a/0x810 [ 989.769945] [<ffffffff816f78da>] ? __schedule+0x24a/0x810 [ 989.770117] [<ffffffff810a8af0>] ? kthread_create_on_node+0x180/0x180 [ 989.770321] [<ffffffff816fdaa1>] ret_from_fork+0x61/0x90 [ 989.770492] [<ffffffff810a8af0>] ? kthread_create_on_node+0x180/0x180 [ 989.770689] Code: d0 00 00 00 f0 45 7d c0 bf 00 20 00 00 48 89 83 c0 00 00 00 48 89 83 c8 00 00 00 e8 55 c1 8c c0 83 4b 04 10 48 83 c4 08 5b 5d c3 <0f> 0b 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 48 83 [ 989.771892] RIP [<ffffffffc07d4316>] __user_dlm_queue_lockres.part.4+0x76/0x80 [ocfs2_dlmfs] [ 989.772174] RSP <ffff88017f7cbcb8> [ 989.772704] ---[ end trace ebd1e38cebcc93a8 ]--- [ 989.772907] Kernel panic - not syncing: Fatal exception [ 989.773173] Kernel Offset: disabled Link: https://lkml.kernel.org/r/20220518235224.87100-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
adithya2306
pushed a commit
that referenced
this pull request
Mar 7, 2023
[ Upstream commit fb8396aeda5872369a8ed6d2301e2c86e303c520 ] The driver incorrectly frees client instance and subsequent i40e module removal leads to kernel crash. Reproducer: 1. Do ethtool offline test followed immediately by another one host# ethtool -t eth0 offline; ethtool -t eth0 offline 2. Remove recursively irdma module that also removes i40e module host# modprobe -r irdma Result: [ 8675.035651] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.193774] i40e 0000:3d:00.0 eno1: testing finished [ 8675.201316] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.358921] i40e 0000:3d:00.0 eno1: testing finished [ 8675.496921] i40e 0000:3d:00.0: IRDMA hardware initialization FAILED init_state=2 status=-110 [ 8686.188955] i40e 0000:3d:00.1: i40e_ptp_stop: removed PHC on eno2 [ 8686.943890] i40e 0000:3d:00.1: Deleted LAN device PF1 bus=0x3d dev=0x00 func=0x01 [ 8686.952669] i40e 0000:3d:00.0: i40e_ptp_stop: removed PHC on eno1 [ 8687.761787] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 8687.768755] #PF: supervisor read access in kernel mode [ 8687.773895] #PF: error_code(0x0000) - not-present page [ 8687.779034] PGD 0 P4D 0 [ 8687.781575] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 8687.785935] CPU: 51 PID: 172891 Comm: rmmod Kdump: loaded Tainted: G W I 5.19.0+ #2 [ 8687.794800] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0X.02.0001.051420190324 05/14/2019 [ 8687.805222] RIP: 0010:i40e_lan_del_device+0x13/0xb0 [i40e] [ 8687.810719] Code: d4 84 c0 0f 84 b8 25 01 00 e9 9c 25 01 00 41 bc f4 ff ff ff eb 91 90 0f 1f 44 00 00 41 54 55 53 48 8b 87 58 08 00 00 48 89 fb <48> 8b 68 30 48 89 ef e8 21 8a 0f d5 48 89 ef e8 a9 78 0f d5 48 8b [ 8687.829462] RSP: 0018:ffffa604072efce0 EFLAGS: 00010202 [ 8687.834689] RAX: 0000000000000000 RBX: ffff8f43833b2000 RCX: 0000000000000000 [ 8687.841821] RDX: 0000000000000000 RSI: ffff8f4b0545b298 RDI: ffff8f43833b2000 [ 8687.848955] RBP: ffff8f43833b2000 R08: 0000000000000001 R09: 0000000000000000 [ 8687.856086] R10: 0000000000000000 R11: 000ffffffffff000 R12: ffff8f43833b2ef0 [ 8687.863218] R13: ffff8f43833b2ef0 R14: ffff915103966000 R15: ffff8f43833b2008 [ 8687.870342] FS: 00007f79501c3740(0000) GS:ffff8f4adffc0000(0000) knlGS:0000000000000000 [ 8687.878427] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8687.884174] CR2: 0000000000000030 CR3: 000000014276e004 CR4: 00000000007706e0 [ 8687.891306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8687.898441] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8687.905572] PKRU: 55555554 [ 8687.908286] Call Trace: [ 8687.910737] <TASK> [ 8687.912843] i40e_remove+0x2c0/0x330 [i40e] [ 8687.917040] pci_device_remove+0x33/0xa0 [ 8687.920962] device_release_driver_internal+0x1aa/0x230 [ 8687.926188] driver_detach+0x44/0x90 [ 8687.929770] bus_remove_driver+0x55/0xe0 [ 8687.933693] pci_unregister_driver+0x2a/0xb0 [ 8687.937967] i40e_exit_module+0xc/0xf48 [i40e] Two offline tests cause IRDMA driver failure (ETIMEDOUT) and this failure is indicated back to i40e_client_subtask() that calls i40e_client_del_instance() to free client instance referenced by pf->cinst and sets this pointer to NULL. During the module removal i40e_remove() calls i40e_lan_del_device() that dereferences pf->cinst that is NULL -> crash. Do not remove client instance when client open callbacks fails and just clear __I40E_CLIENT_INSTANCE_OPENED bit. The driver also needs to take care about this situation (when netdev is up and client is NOT opened) in i40e_notify_client_of_netdev_close() and calls client close callback only when __I40E_CLIENT_INSTANCE_OPENED is set. Fixes: 0ef2d5a ("i40e: KISS the client interface") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Helena Anna Dubel <helena.anna.dubel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
adithya2306
pushed a commit
that referenced
this pull request
Mar 7, 2023
[ Upstream commit 84a53580c5d2138c7361c7c3eea5b31827e63b35 ] The SRv6 layer allows defining HMAC data that can later be used to sign IPv6 Segment Routing Headers. This configuration is realised via netlink through four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual length of the SECRET attribute, it is possible to provide invalid combinations (e.g., secret = "", secretlen = 64). This case is not checked in the code and with an appropriately crafted netlink message, an out-of-bounds read of up to 64 bytes (max secret length) can occur past the skb end pointer and into skb_shared_info: Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 208 memcpy(hinfo->secret, secret, slen); (gdb) bt #0 seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 #1 0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600, extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>, family=<optimized out>) at net/netlink/genetlink.c:731 #2 0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00, family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775 #3 genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792 #4 0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>) at net/netlink/af_netlink.c:2501 #5 0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803 #6 0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000) at net/netlink/af_netlink.c:1319 #7 netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>) at net/netlink/af_netlink.c:1345 #8 0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921 ... (gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end $1 = 0xffff88800b1b76c0 (gdb) p/x secret $2 = 0xffff88800b1b76c0 (gdb) p slen $3 = 64 '@' The OOB data can then be read back from userspace by dumping HMAC state. This commit fixes this by ensuring SECRETLEN cannot exceed the actual length of SECRET. Reported-by: Lucas Leong <wmliang.tw@gmail.com> Tested: verified that EINVAL is correctly returned when secretlen > len(secret) Fixes: 4f4853d ("ipv6: sr: implement API to control SR HMAC structure") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
adithya2306
pushed a commit
that referenced
this pull request
Apr 30, 2023
commit b740d806166979488e798e41743aaec051f2443f upstream.
Syzbot reported the following lockdep splat
======================================================
WARNING: possible circular locking dependency detected
6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0 Not tainted
------------------------------------------------------
syz-executor307/3029 is trying to acquire lock:
ffff0000c02525d8 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x54/0xb4 mm/memory.c:5576
but task is already holding lock:
ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock fs/btrfs/locking.c:134 [inline]
ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_tree_read_lock fs/btrfs/locking.c:140 [inline]
ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_read_lock_root_node+0x13c/0x1c0 fs/btrfs/locking.c:279
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (btrfs-root-00){++++}-{3:3}:
down_read_nested+0x64/0x84 kernel/locking/rwsem.c:1624
__btrfs_tree_read_lock fs/btrfs/locking.c:134 [inline]
btrfs_tree_read_lock fs/btrfs/locking.c:140 [inline]
btrfs_read_lock_root_node+0x13c/0x1c0 fs/btrfs/locking.c:279
btrfs_search_slot_get_root+0x74/0x338 fs/btrfs/ctree.c:1637
btrfs_search_slot+0x1b0/0xfd8 fs/btrfs/ctree.c:1944
btrfs_update_root+0x6c/0x5a0 fs/btrfs/root-tree.c:132
commit_fs_roots+0x1f0/0x33c fs/btrfs/transaction.c:1459
btrfs_commit_transaction+0x89c/0x12d8 fs/btrfs/transaction.c:2343
flush_space+0x66c/0x738 fs/btrfs/space-info.c:786
btrfs_async_reclaim_metadata_space+0x43c/0x4e0 fs/btrfs/space-info.c:1059
process_one_work+0x2d8/0x504 kernel/workqueue.c:2289
worker_thread+0x340/0x610 kernel/workqueue.c:2436
kthread+0x12c/0x158 kernel/kthread.c:376
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
-> #2 (&fs_info->reloc_mutex){+.+.}-{3:3}:
__mutex_lock_common+0xd4/0xca8 kernel/locking/mutex.c:603
__mutex_lock kernel/locking/mutex.c:747 [inline]
mutex_lock_nested+0x38/0x44 kernel/locking/mutex.c:799
btrfs_record_root_in_trans fs/btrfs/transaction.c:516 [inline]
start_transaction+0x248/0x944 fs/btrfs/transaction.c:752
btrfs_start_transaction+0x34/0x44 fs/btrfs/transaction.c:781
btrfs_create_common+0xf0/0x1b4 fs/btrfs/inode.c:6651
btrfs_create+0x8c/0xb0 fs/btrfs/inode.c:6697
lookup_open fs/namei.c:3413 [inline]
open_last_lookups fs/namei.c:3481 [inline]
path_openat+0x804/0x11c4 fs/namei.c:3688
do_filp_open+0xdc/0x1b8 fs/namei.c:3718
do_sys_openat2+0xb8/0x22c fs/open.c:1313
do_sys_open fs/open.c:1329 [inline]
__do_sys_openat fs/open.c:1345 [inline]
__se_sys_openat fs/open.c:1340 [inline]
__arm64_sys_openat+0xb0/0xe0 fs/open.c:1340
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581
-> #1 (sb_internal#2){.+.+}-{0:0}:
percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
__sb_start_write include/linux/fs.h:1826 [inline]
sb_start_intwrite include/linux/fs.h:1948 [inline]
start_transaction+0x360/0x944 fs/btrfs/transaction.c:683
btrfs_join_transaction+0x30/0x40 fs/btrfs/transaction.c:795
btrfs_dirty_inode+0x50/0x140 fs/btrfs/inode.c:6103
btrfs_update_time+0x1c0/0x1e8 fs/btrfs/inode.c:6145
inode_update_time fs/inode.c:1872 [inline]
touch_atime+0x1f0/0x4a8 fs/inode.c:1945
file_accessed include/linux/fs.h:2516 [inline]
btrfs_file_mmap+0x50/0x88 fs/btrfs/file.c:2407
call_mmap include/linux/fs.h:2192 [inline]
mmap_region+0x7fc/0xc14 mm/mmap.c:1752
do_mmap+0x644/0x97c mm/mmap.c:1540
vm_mmap_pgoff+0xe8/0x1d0 mm/util.c:552
ksys_mmap_pgoff+0x1cc/0x278 mm/mmap.c:1586
__do_sys_mmap arch/arm64/kernel/sys.c:28 [inline]
__se_sys_mmap arch/arm64/kernel/sys.c:21 [inline]
__arm64_sys_mmap+0x58/0x6c arch/arm64/kernel/sys.c:21
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581
-> #0 (&mm->mmap_lock){++++}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3095 [inline]
check_prevs_add kernel/locking/lockdep.c:3214 [inline]
validate_chain kernel/locking/lockdep.c:3829 [inline]
__lock_acquire+0x1530/0x30a4 kernel/locking/lockdep.c:5053
lock_acquire+0x100/0x1f8 kernel/locking/lockdep.c:5666
__might_fault+0x7c/0xb4 mm/memory.c:5577
_copy_to_user include/linux/uaccess.h:134 [inline]
copy_to_user include/linux/uaccess.h:160 [inline]
btrfs_ioctl_get_subvol_rootref+0x3a8/0x4bc fs/btrfs/ioctl.c:3203
btrfs_ioctl+0xa08/0xa64 fs/btrfs/ioctl.c:5556
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__arm64_sys_ioctl+0xd0/0x140 fs/ioctl.c:856
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581
other info that might help us debug this:
Chain exists of:
&mm->mmap_lock --> &fs_info->reloc_mutex --> btrfs-root-00
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(btrfs-root-00);
lock(&fs_info->reloc_mutex);
lock(btrfs-root-00);
lock(&mm->mmap_lock);
*** DEADLOCK ***
1 lock held by syz-executor307/3029:
#0: ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock fs/btrfs/locking.c:134 [inline]
#0: ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_tree_read_lock fs/btrfs/locking.c:140 [inline]
#0: ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_read_lock_root_node+0x13c/0x1c0 fs/btrfs/locking.c:279
stack backtrace:
CPU: 0 PID: 3029 Comm: syz-executor307 Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
Call trace:
dump_backtrace+0x1c4/0x1f0 arch/arm64/kernel/stacktrace.c:156
show_stack+0x2c/0x54 arch/arm64/kernel/stacktrace.c:163
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x104/0x16c lib/dump_stack.c:106
dump_stack+0x1c/0x58 lib/dump_stack.c:113
print_circular_bug+0x2c4/0x2c8 kernel/locking/lockdep.c:2053
check_noncircular+0x14c/0x154 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3095 [inline]
check_prevs_add kernel/locking/lockdep.c:3214 [inline]
validate_chain kernel/locking/lockdep.c:3829 [inline]
__lock_acquire+0x1530/0x30a4 kernel/locking/lockdep.c:5053
lock_acquire+0x100/0x1f8 kernel/locking/lockdep.c:5666
__might_fault+0x7c/0xb4 mm/memory.c:5577
_copy_to_user include/linux/uaccess.h:134 [inline]
copy_to_user include/linux/uaccess.h:160 [inline]
btrfs_ioctl_get_subvol_rootref+0x3a8/0x4bc fs/btrfs/ioctl.c:3203
btrfs_ioctl+0xa08/0xa64 fs/btrfs/ioctl.c:5556
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__arm64_sys_ioctl+0xd0/0x140 fs/ioctl.c:856
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581
We do generally the right thing here, copying the references into a
temporary buffer, however we are still holding the path when we do
copy_to_user from the temporary buffer. Fix this by freeing the path
before we copy to user space.
Reported-by: syzbot+4ef9e52e464c6ff47d9d@syzkaller.appspotmail.com
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
adithya2306
pushed a commit
that referenced
this pull request
Apr 30, 2023
… css_sets for migration Each cset (css_set) is pinned by its tasks. When we're moving tasks around across csets for a migration, we need to hold the source and destination csets to ensure that they don't go away while we're moving tasks about. This is done by linking cset->mg_preload_node on either the mgctx->preloaded_dst_csets or mgctx->preloaded_dst_csets list. Using the same cset->mg_preload_node for both the src and dst lists was deemed okay as a cset can't be both the source and destination at the same time. Unfortunately, this overloading becomes problematic when multiple tasks are involved in a migration and some of them are identity noop migrations while others are actually moving across cgroups. For example, this can happen with the following sequence on cgroup1: #1> mkdir -p /sys/fs/cgroup/misc/a/b #2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs #3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS & #4> PID=$! #5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks #6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs the process including the group leader back into a. In this final migration, non-leader threads would be doing identity migration while the group leader is doing an actual one. After #3, let's say the whole process was in cset A, and that after #4, the leader moves to cset B. Then, during #6, the following happens: 1. cgroup_migrate_add_src() is called on B for the leader. 2. cgroup_migrate_add_src() is called on A for the other threads. 3. cgroup_migrate_prepare_dst() is called. It scans the src list. 3. It notices that B wants to migrate to A, so it tries to A to the dst list but realizes that its ->mg_preload_node is already busy. 4. and then it notices A wants to migrate to A as it's an identity migration, it culls it by list_del_init()'ing its ->mg_preload_node and putting references accordingly. 5. The rest of migration takes place with B on the src list but nothing on the dst list. This means that A isn't held while migration is in progress. If all tasks leave A before the migration finishes and the incoming task pins it, the cset will be destroyed leading to use-after-free. This is caused by overloading cset->mg_preload_node for both src and dst preload lists. We wanted to exclude the cset from the src list but ended up inadvertently excluding it from the dst list too. This patch fixes the issue by separating out cset->mg_preload_node into ->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst preloadings don't interfere with each other. Bug: 236582926 Change-Id: Ieaf1c0c8fc23753570897fd6e48a54335ab939ce Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Reported-by: shisiyuan <shisiyuan19870131@gmail.com> Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com Link: https://lore.kernel.org/lkml/Yh+RGIJ0f3nrqIiN@slm.duckdns.org/#t Fixes: f817de9 ("cgroup: prepare migration path for unified hierarchy") Cc: stable@vger.kernel.org # v3.16+ (cherry picked from commit 07fd5b6cdf3cc30bfde8fe0f644771688be04447 https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-5.19-fixes) Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> [mojha: Move the two new list heads into a wrapper ext_css_set struct to ensure ABI doesn't break and also defined a macro init_css_set which will be replaced with init_ext_css_set.cset to avoid too much code changes] Git-commit: e8fce594347a77af0481c18b6b56509b954fa771 Git-repo: https://android.googlesource.com/kernel/common/ Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com>
adithya2306
pushed a commit
that referenced
this pull request
Aug 23, 2023
[ Upstream commit b18cba09e374637a0a3759d856a6bca94c133952 ] Commit 9130b8d ("SUNRPC: allow for upcalls for the same uid but different gss service") introduced `auth` argument to __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL since it (and auth->service) was not (yet) determined. When multiple upcalls with the same uid and different service are ongoing, it could happen that __gss_find_upcall(), which returns the first match found in the pipe->in_downcall list, could not find the correct gss_msg corresponding to the downcall we are looking for. Moreover, it might return a msg which is not sent to rpc.gssd yet. We could see mount.nfs process hung in D state with multiple mount.nfs are executed in parallel. The call trace below is of CentOS 7.9 kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/ elrepo kernel-ml-6.0.7-1.el7. PID: 71258 TASK: ffff91ebd4be0000 CPU: 36 COMMAND: "mount.nfs" #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9 #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss] #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc] #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss] #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc] #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc] #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc] #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc] #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc] The scenario is like this. Let's say there are two upcalls for services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe. When rpc.gssd reads pipe to get the upcall msg corresponding to service B from pipe->pipe and then writes the response, in gss_pipe_downcall the msg corresponding to service A will be picked because only uid is used to find the msg and it is before the one for B in pipe->in_downcall. And the process waiting for the msg corresponding to service A will be woken up. Actual scheduing of that process might be after rpc.gssd processes the next msg. In rpc_pipe_generic_upcall it clears msg->errno (for A). The process is scheduled to see gss_msg->ctx == NULL and gss_msg->msg.errno == 0, therefore it cannot break the loop in gss_create_upcall and is never woken up after that. This patch adds a simple check to ensure that a msg which is not sent to rpc.gssd yet is not chosen as the matching upcall upon receiving a downcall. Signed-off-by: minoura makoto <minoura@valinux.co.jp> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@nec.com> Tested-by: Hiroshi Shimamoto <h-shimamoto@nec.com> Cc: Trond Myklebust <trondmy@hammerspace.com> Fixes: 9130b8d ("SUNRPC: allow for upcalls for same uid but different gss service") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
adithya2306
pushed a commit
that referenced
this pull request
Aug 23, 2023
commit 88956177db179e4eba7cd590971961857d1565b8 upstream.
When sending packets between nodes in netns, it calls tipc_lxc_xmit() for
peer node to receive the packets where tipc_sk_mcast_rcv()/tipc_sk_rcv()
might be called, and it's pretty much like in tipc_rcv().
Currently the local 'node rw lock' is held during calling tipc_lxc_xmit()
to protect the peer_net not being freed by another thread. However, when
receiving these packets, tipc_node_add_conn() might be called where the
peer 'node rw lock' is acquired. Then a dead lock warning is triggered by
lockdep detector, although it is not a real dead lock:
WARNING: possible recursive locking detected
--------------------------------------------
conn_server/1086 is trying to acquire lock:
ffff8880065cb020 (&n->lock#2){++--}-{2:2}, \
at: tipc_node_add_conn.cold.76+0xaa/0x211 [tipc]
but task is already holding lock:
ffff8880065cd020 (&n->lock#2){++--}-{2:2}, \
at: tipc_node_xmit+0x285/0xb30 [tipc]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&n->lock#2);
lock(&n->lock#2);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by conn_server/1086:
#0: ffff8880036d1e40 (sk_lock-AF_TIPC){+.+.}-{0:0}, \
at: tipc_accept+0x9c0/0x10b0 [tipc]
#1: ffff8880036d5f80 (sk_lock-AF_TIPC/1){+.+.}-{0:0}, \
at: tipc_accept+0x363/0x10b0 [tipc]
#2: ffff8880065cd020 (&n->lock#2){++--}-{2:2}, \
at: tipc_node_xmit+0x285/0xb30 [tipc]
#3: ffff888012e13370 (slock-AF_TIPC){+...}-{2:2}, \
at: tipc_sk_rcv+0x2da/0x1b40 [tipc]
Call Trace:
<TASK>
dump_stack_lvl+0x44/0x5b
__lock_acquire.cold.77+0x1f2/0x3d7
lock_acquire+0x1d2/0x610
_raw_write_lock_bh+0x38/0x80
tipc_node_add_conn.cold.76+0xaa/0x211 [tipc]
tipc_sk_finish_conn+0x21e/0x640 [tipc]
tipc_sk_filter_rcv+0x147b/0x3030 [tipc]
tipc_sk_rcv+0xbb4/0x1b40 [tipc]
tipc_lxc_xmit+0x225/0x26b [tipc]
tipc_node_xmit.cold.82+0x4a/0x102 [tipc]
__tipc_sendstream+0x879/0xff0 [tipc]
tipc_accept+0x966/0x10b0 [tipc]
do_accept+0x37d/0x590
This patch avoids this warning by not holding the 'node rw lock' before
calling tipc_lxc_xmit(). As to protect the 'peer_net', rcu_read_lock()
should be enough, as in cleanup_net() when freeing the netns, it calls
synchronize_rcu() before the free is continued.
Also since tipc_lxc_xmit() is like the RX path in tipc_rcv(), it makes
sense to call it under rcu_read_lock(). Note that the right lock order
must be:
rcu_read_lock();
tipc_node_read_lock(n);
tipc_node_read_unlock(n);
tipc_lxc_xmit();
rcu_read_unlock();
instead of:
tipc_node_read_lock(n);
rcu_read_lock();
tipc_node_read_unlock(n);
tipc_lxc_xmit();
rcu_read_unlock();
and we have to call tipc_node_read_lock/unlock() twice in
tipc_node_xmit().
Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/5bdd1f8fee9db695cfff4528a48c9b9d0523fb00.1670110641.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
adithya2306
pushed a commit
that referenced
this pull request
Aug 23, 2023
commit 27dada070d59c28a441f1907d2cec891b17dcb26 upstream. The defer ops code has been finishing items in the wrong order -- if a top level defer op creates items A and B, and finishing item A creates more defer ops A1 and A2, we'll put the new items on the end of the chain and process them in the order A B A1 A2. This is kind of weird, since it's convenient for programmers to be able to think of A and B as an ordered sequence where all the sub-tasks for A must finish before we move on to B, e.g. A A1 A2 D. Right now, our log intent items are not so complex that this matters, but this will become important for the atomic extent swapping patchset. In order to maintain correct reference counting of extents, we have to unmap and remap extents in that order, and we want to complete that work before moving on to the next range that the user wants to swap. This patch fixes defer ops to satsify that requirement. The primary symptom of the incorrect order was noticed in an early performance analysis of the atomic extent swap code. An astonishingly large number of deferred work items accumulated when userspace requested an atomic update of two very fragmented files. The cause of this was traced to the same ordering bug in the inner loop of xfs_defer_finish_noroll. If the ->finish_item method of a deferred operation queues new deferred operations, those new deferred ops are appended to the tail of the pending work list. To illustrate, say that a caller creates a transaction t0 with four deferred operations D0-D3. The first thing defer ops does is roll the transaction to t1, leaving us with: t1: D0(t0), D1(t0), D2(t0), D3(t0) Let's say that finishing each of D0-D3 will create two new deferred ops. After finish D0 and roll, we'll have the following chain: t2: D1(t0), D2(t0), D3(t0), d4(t1), d5(t1) d4 and d5 were logged to t1. Notice that while we're about to start work on D1, we haven't actually completed all the work implied by D0 being finished. So far we've been careful (or lucky) to structure the dfops callers such that D1 doesn't depend on d4 or d5 being finished, but this is a potential logic bomb. There's a second problem lurking. Let's see what happens as we finish D1-D3: t3: D2(t0), D3(t0), d4(t1), d5(t1), d6(t2), d7(t2) t4: D3(t0), d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3) t5: d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) Let's say that d4-d11 are simple work items that don't queue any other operations, which means that we can complete each d4 and roll to t6: t6: d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) t7: d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) ... t11: d10(t4), d11(t4) t12: d11(t4) <done> When we try to roll to transaction #12, we're holding defer op d11, which we logged way back in t4. This means that the tail of the log is pinned at t4. If the log is very small or there are a lot of other threads updating metadata, this means that we might have wrapped the log and cannot get roll to t11 because there isn't enough space left before we'd run into t4. Let's shift back to the original failure. I mentioned before that I discovered this flaw while developing the atomic file update code. In that scenario, we have a defer op (D0) that finds a range of file blocks to remap, creates a handful of new defer ops to do that, and then asks to be continued with however much work remains. So, D0 is the original swapext deferred op. The first thing defer ops does is rolls to t1: t1: D0(t0) We try to finish D0, logging d1 and d2 in the process, but can't get all the work done. We log a done item and a new intent item for the work that D0 still has to do, and roll to t2: t2: D0'(t1), d1(t1), d2(t1) We roll and try to finish D0', but still can't get all the work done, so we log a done item and a new intent item for it, requeue D0 a second time, and roll to t3: t3: D0''(t2), d1(t1), d2(t1), d3(t2), d4(t2) If it takes 48 more rolls to complete D0, then we'll finally dispense with D0 in t50: t50: D<fifty primes>(t49), d1(t1), ..., d102(t50) We then try to roll again to get a chain like this: t51: d1(t1), d2(t1), ..., d101(t50), d102(t50) ... t152: d102(t50) <done> Notice that in rolling to transaction #51, we're holding on to a log intent item for d1 that was logged in transaction #1. This means that the tail of the log is pinned at t1. If the log is very small or there are a lot of other threads updating metadata, this means that we might have wrapped the log and cannot roll to t51 because there isn't enough space left before we'd run into t1. This is of course problem #2 again. But notice the third problem with this scenario: we have 102 defer ops tied to this transaction! Each of these items are backed by pinned kernel memory, which means that we risk OOM if the chains get too long. Yikes. Problem #1 is a subtle logic bomb that could hit someone in the future; problem #2 applies (rarely) to the current upstream, and problem This is not how incremental deferred operations were supposed to work. The dfops design of logging in the same transaction an intent-done item and a new intent item for the work remaining was to make it so that we only have to juggle enough deferred work items to finish that one small piece of work. Deferred log item recovery will find that first unfinished work item and restart it, no matter how many other intent items might follow it in the log. Therefore, it's ok to put the new intents at the start of the dfops chain. For the first example, the chains look like this: t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0) t3: d5(t1), D1(t0), D2(t0), D3(t0) ... t9: d9(t7), D3(t0) t10: D3(t0) t11: d10(t10), d11(t10) t12: d11(t10) For the second example, the chains look like this: t1: D0(t0) t2: d1(t1), d2(t1), D0'(t1) t3: d2(t1), D0'(t1) t4: D0'(t1) t5: d1(t4), d2(t4), D0''(t4) ... t148: D0<50 primes>(t147) t149: d101(t148), d102(t148) t150: d102(t148) <done> This actually sucks more for pinning the log tail (we try to roll to t10 while holding an intent item that was logged in t1) but we've solved problem #1. We've also reduced the maximum chain length from: sum(all the new items) + nr_original_items to: max(new items that each original item creates) + nr_original_items This solves problem #3 by sharply reducing the number of defer ops that can be attached to a transaction at any given time. The change makes the problem of log tail pinning worse, but is improvement we need to solve problem #2. Actually solving #2, however, is left to the next patch. Note that a subsequent analysis of some hard-to-trigger reflink and COW livelocks on extremely fragmented filesystems (or systems running a lot of IO threads) showed the same symptoms -- uncomfortably large numbers of incore deferred work items and occasional stalls in the transaction grant code while waiting for log reservations. I think this patch and the next one will also solve these problems. As originally written, the code used list_splice_tail_init instead of list_splice_init, so change that, and leave a short comment explaining our actions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
adithya2306
pushed a commit
that referenced
this pull request
Aug 23, 2023
… css_sets for migration Each cset (css_set) is pinned by its tasks. When we're moving tasks around across csets for a migration, we need to hold the source and destination csets to ensure that they don't go away while we're moving tasks about. This is done by linking cset->mg_preload_node on either the mgctx->preloaded_dst_csets or mgctx->preloaded_dst_csets list. Using the same cset->mg_preload_node for both the src and dst lists was deemed okay as a cset can't be both the source and destination at the same time. Unfortunately, this overloading becomes problematic when multiple tasks are involved in a migration and some of them are identity noop migrations while others are actually moving across cgroups. For example, this can happen with the following sequence on cgroup1: #1> mkdir -p /sys/fs/cgroup/misc/a/b #2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs #3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS & #4> PID=$! #5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks #6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs the process including the group leader back into a. In this final migration, non-leader threads would be doing identity migration while the group leader is doing an actual one. After #3, let's say the whole process was in cset A, and that after #4, the leader moves to cset B. Then, during #6, the following happens: 1. cgroup_migrate_add_src() is called on B for the leader. 2. cgroup_migrate_add_src() is called on A for the other threads. 3. cgroup_migrate_prepare_dst() is called. It scans the src list. 3. It notices that B wants to migrate to A, so it tries to A to the dst list but realizes that its ->mg_preload_node is already busy. 4. and then it notices A wants to migrate to A as it's an identity migration, it culls it by list_del_init()'ing its ->mg_preload_node and putting references accordingly. 5. The rest of migration takes place with B on the src list but nothing on the dst list. This means that A isn't held while migration is in progress. If all tasks leave A before the migration finishes and the incoming task pins it, the cset will be destroyed leading to use-after-free. This is caused by overloading cset->mg_preload_node for both src and dst preload lists. We wanted to exclude the cset from the src list but ended up inadvertently excluding it from the dst list too. This patch fixes the issue by separating out cset->mg_preload_node into ->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst preloadings don't interfere with each other. Bug: 236582926 Change-Id: Ieaf1c0c8fc23753570897fd6e48a54335ab939ce Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Reported-by: shisiyuan <shisiyuan19870131@gmail.com> Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com Link: https://lore.kernel.org/lkml/Yh+RGIJ0f3nrqIiN@slm.duckdns.org/#t Fixes: f817de9 ("cgroup: prepare migration path for unified hierarchy") Cc: stable@vger.kernel.org # v3.16+ (cherry picked from commit 07fd5b6cdf3cc30bfde8fe0f644771688be04447 https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-5.19-fixes) Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> [mojha: Move the two new list heads into a wrapper ext_css_set struct to ensure ABI doesn't break and also defined a macro init_css_set which will be replaced with init_ext_css_set.cset to avoid too much code changes] Signed-off-by: Srinivasarao Pathipati <quic_spathi@quicinc.com>
adithya2306
pushed a commit
that referenced
this pull request
Oct 28, 2023
…eues().
commit 62ec33b44e0f7168ff2886520fec6fb62d03b5a3 upstream.
Christoph Paasch reported that commit b5fc29233d28 ("inet6: Remove
inet6_destroy_sock() in sk->sk_prot->destroy().") started triggering
WARN_ON_ONCE(sk->sk_forward_alloc) in sk_stream_kill_queues(). [0 - 2]
Also, we can reproduce it by a program in [3].
In the commit, we delay freeing ipv6_pinfo.pktoptions from sk->destroy()
to sk->sk_destruct(), so sk->sk_forward_alloc is no longer zero in
inet_csk_destroy_sock().
The same check has been in inet_sock_destruct() from at least v2.6,
we can just remove the WARN_ON_ONCE(). However, among the users of
sk_stream_kill_queues(), only CAIF is not calling inet_sock_destruct().
Thus, we add the same WARN_ON_ONCE() to caif_sock_destructor().
[0]: https://lore.kernel.org/netdev/39725AB4-88F1-41B3-B07F-949C5CAEFF4F@icloud.com/
[1]: multipath-tcp/mptcp_net-next#341
[2]:
WARNING: CPU: 0 PID: 3232 at net/core/stream.c:212 sk_stream_kill_queues+0x2f9/0x3e0
Modules linked in:
CPU: 0 PID: 3232 Comm: syz-executor.0 Not tainted 6.2.0-rc5ab24eb4698afbe147b424149c529e2a43ec24eb5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:sk_stream_kill_queues+0x2f9/0x3e0
Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ec 00 00 00 8b ab 08 01 00 00 e9 60 ff ff ff e8 d0 5f b6 fe 0f 0b eb 97 e8 c7 5f b6 fe <0f> 0b eb a0 e8 be 5f b6 fe 0f 0b e9 6a fe ff ff e8 02 07 e3 fe e9
RSP: 0018:ffff88810570fc68 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888101f38f40 RSI: ffffffff8285e529 RDI: 0000000000000005
RBP: 0000000000000ce0 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000ce0 R11: 0000000000000001 R12: ffff8881009e9488
R13: ffffffff84af2cc0 R14: 0000000000000000 R15: ffff8881009e9458
FS: 00007f7fdfbd5800(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32923000 CR3: 00000001062fc006 CR4: 0000000000170ef0
Call Trace:
<TASK>
inet_csk_destroy_sock+0x1a1/0x320
__tcp_close+0xab6/0xe90
tcp_close+0x30/0xc0
inet_release+0xe9/0x1f0
inet6_release+0x4c/0x70
__sock_release+0xd2/0x280
sock_close+0x15/0x20
__fput+0x252/0xa20
task_work_run+0x169/0x250
exit_to_user_mode_prepare+0x113/0x120
syscall_exit_to_user_mode+0x1d/0x40
do_syscall_64+0x48/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f7fdf7ae28d
Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00000000007dfbb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7fdf7ae28d
RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000007f338e0f R09: 0000000000000e0f
R10: 000000007f338e13 R11: 0000000000000293 R12: 00007f7fdefff000
R13: 00007f7fdefffcd8 R14: 00007f7fdefffce0 R15: 00007f7fdefffcd8
</TASK>
[3]: https://lore.kernel.org/netdev/20230208004245.83497-1-kuniyu@amazon.com/
Fixes: b5fc29233d28 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Christoph Paasch <christophpaasch@icloud.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
adithya2306
pushed a commit
that referenced
this pull request
Oct 28, 2023
[ Upstream commit e64e71056f323a1e178dccf04d4c0f032d84436c ] pnv_ioda_setup_pe_res() calls opal to map a resource with a PE. However, the code assumes the resource is allocated and it uses the resource address to find out the segment(s) which need to be mapped to the PE. In the unlikely case where the resource hasn't been allocated, the computation for the segment number is garbage, which can lead to invalid memory access and potentially a kernel crash, such as: [ ] pci_bus 0002:02: Configuring PE for bus [ ] pci 0002:02 : [PE# fc] Secondary bus 0x0000000000000002..0x0000000000000002 associated with PE#fc [ ] BUG: Kernel NULL pointer dereference on write at 0x00000000 [ ] Faulting instruction address: 0xc00000000005eac4 [ ] Oops: Kernel access of bad area, sig: 7 [#1] [ ] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV [ ] Modules linked in: [ ] CPU: 12 PID: 1 Comm: swapper/20 Not tainted 5.10.50-openpower1 #2 [ ] NIP: c00000000005eac4 LR: c00000000005ea44 CTR: 0000000030061b9c [ ] REGS: c000200007383650 TRAP: 0300 Not tainted (5.10.50-openpower1) [ ] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44000224 XER: 20040000 [ ] CFAR: c00000000005eaa0 DAR: 0000000000000000 DSISR: 02080000 IRQMASK: 0 [ ] GPR00: c00000000005dd98 c0002000073838e0 c00000000185de00 c000200fff018960 [ ] GPR04: 00000000000000fc 0000000000000003 0000000000000000 0000000000000000 [ ] GPR08: 0000000000000000 0000000000000000 0000000000000000 9000000000001033 [ ] GPR12: 0000000031cb0000 c000000ffffe6a80 c000000000010a58 0000000000000000 [ ] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ ] GPR20: 0000000000000000 0000000000000000 0000000000000000 c00000000711e200 [ ] GPR24: 0000000000000100 c000200009501120 c00020000cee2800 00000000000003ff [ ] GPR28: c000200fff018960 0000000000000000 c000200ffcb7fd00 0000000000000000 [ ] NIP [c00000000005eac4] pnv_ioda_setup_pe_res+0x94/0x1a0 [ ] LR [c00000000005ea44] pnv_ioda_setup_pe_res+0x14/0x1a0 [ ] Call Trace: [ ] [c0002000073838e0] [c00000000005eb98] pnv_ioda_setup_pe_res+0x168/0x1a0 (unreliable) [ ] [c000200007383970] [c00000000005dd98] pnv_pci_ioda_dma_dev_setup+0x43c/0x970 [ ] [c000200007383a60] [c000000000032cdc] pcibios_bus_add_device+0x78/0x18c [ ] [c000200007383aa0] [c00000000028f2bc] pci_bus_add_device+0x28/0xbc [ ] [c000200007383b10] [c00000000028f3a0] pci_bus_add_devices+0x50/0x7c [ ] [c000200007383b50] [c00000000028f3c4] pci_bus_add_devices+0x74/0x7c [ ] [c000200007383b90] [c00000000028f3c4] pci_bus_add_devices+0x74/0x7c [ ] [c000200007383bd0] [c00000000069ad0c] pcibios_init+0xf0/0x104 [ ] [c000200007383c50] [c0000000000106d8] do_one_initcall+0x84/0x1c4 [ ] [c000200007383d20] [c0000000006910b8] kernel_init_freeable+0x264/0x268 [ ] [c000200007383dc0] [c000000000010a68] kernel_init+0x18/0x138 [ ] [c000200007383e20] [c00000000000cbfc] ret_from_kernel_thread+0x5c/0x80 [ ] Instruction dump: [ ] 7f89e840 409d000c 7fbbf840 409c000c 38210090 4848f448 809c002c e95e0120 [ ] 7ba91764 38a00003 57a7043e 38c00000 <7c8a492e> 5484043e e87e0018 4bff23bd Hitting the problem is not that easy. It was seen with a (semi-bogus) PCI device with a class code of 0. The generic PCI framework doesn't allocate resources in such a case. The patch is simply skipping resources which are still flagged with IORESOURCE_UNSET. We don't have the problem with 64-bit mem resources, as the address of the resource is checked to be within the range of the 64-bit mmio window. See pnv_ioda_reserve_dev_m64_pe() and pnv_pci_is_m64(). Reported-by: Andrew Jeffery <andrew@aj.id.au> Fixes: 23e7942 ("powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230120093215.19496-1-fbarrat@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
adithya2306
pushed a commit
that referenced
this pull request
Oct 28, 2023
…dler commit 42e19e6f04984088b6f9f0507c4c89a8152d9730 upstream. Recent test_kprobe_missed kprobes kunit test uncovers the following error (reported when CONFIG_DEBUG_ATOMIC_SLEEP is enabled): BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 662, name: kunit_try_catch preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 no locks held by kunit_try_catch/662. irq event stamp: 280 hardirqs last enabled at (279): [<00000003e60a3d42>] __do_pgm_check+0x17a/0x1c0 hardirqs last disabled at (280): [<00000003e3bd774a>] kprobe_exceptions_notify+0x27a/0x318 softirqs last enabled at (0): [<00000003e3c5c890>] copy_process+0x14a8/0x4c80 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 46 PID: 662 Comm: kunit_try_catch Tainted: G N 6.2.0-173644-g44c18d77f0c0 #2 Hardware name: IBM 3931 A01 704 (LPAR) Call Trace: [<00000003e60a3a00>] dump_stack_lvl+0x120/0x198 [<00000003e3d02e82>] __might_resched+0x60a/0x668 [<00000003e60b9908>] __mutex_lock+0xc0/0x14e0 [<00000003e60bad5a>] mutex_lock_nested+0x32/0x40 [<00000003e3f7b460>] unregister_kprobe+0x30/0xd8 [<00000003e51b2602>] test_kprobe_missed+0xf2/0x268 [<00000003e51b5406>] kunit_try_run_case+0x10e/0x290 [<00000003e51b7dfa>] kunit_generic_run_threadfn_adapter+0x62/0xb8 [<00000003e3ce30f8>] kthread+0x2d0/0x398 [<00000003e3b96afa>] __ret_from_fork+0x8a/0xe8 [<00000003e60ccada>] ret_from_fork+0xa/0x40 The reason for this error report is that kprobes handling code failed to restore irqs. The problem is that when kprobe is triggered from another kprobe post_handler current sequence of enable_singlestep / disable_singlestep is the following: enable_singlestep <- original kprobe (saves kprobe_saved_imask) enable_singlestep <- kprobe triggered from post_handler (clobbers kprobe_saved_imask) disable_singlestep <- kprobe triggered from post_handler (restores kprobe_saved_imask) disable_singlestep <- original kprobe (restores wrong clobbered kprobe_saved_imask) There is just one kprobe_ctlblk per cpu and both calls saves and loads irq mask to kprobe_saved_imask. To fix the problem simply move resume_execution (which calls disable_singlestep) before calling post_handler. This also fixes the problem that post_handler is called with pt_regs which were not yet adjusted after single-stepping. Cc: stable@vger.kernel.org Fixes: 4ba069b ("[S390] add kprobes support.") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
adithya2306
pushed a commit
that referenced
this pull request
Oct 28, 2023
commit 60eed1e3d45045623e46944ebc7c42c30a4350f0 upstream.
code path:
ocfs2_ioctl_move_extents
ocfs2_move_extents
ocfs2_defrag_extent
__ocfs2_move_extent
+ ocfs2_journal_access_di
+ ocfs2_split_extent //sub-paths call jbd2_journal_restart
+ ocfs2_journal_dirty //crash by jbs2 ASSERT
crash stacks:
PID: 11297 TASK: ffff974a676dcd00 CPU: 67 COMMAND: "defragfs.ocfs2"
#0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
#1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
#2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
#3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
#4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
#5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
#6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
[exception RIP: jbd2_journal_dirty_metadata+0x2ba]
RIP: ffffffffc09ca54a RSP: ffffb25d8dad3b70 RFLAGS: 00010207
RAX: 0000000000000000 RBX: ffff9706eedc5248 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff97337029ea28 RDI: ffff9706eedc5250
RBP: ffff9703c3520200 R8: 000000000f46b0b2 R9: 0000000000000000
R10: 0000000000000001 R11: 00000001000000fe R12: ffff97337029ea28
R13: 0000000000000000 R14: ffff9703de59bf60 R15: ffff9706eedc5250
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
#8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
#9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]
Analysis
This bug has the same root cause of 'commit 7f27ec9 ("ocfs2: call
ocfs2_journal_access_di() before ocfs2_journal_dirty() in
ocfs2_write_end_nolock()")'. For this bug, jbd2_journal_restart() is
called by ocfs2_split_extent() during defragmenting.
How to fix
For ocfs2_split_extent() can handle journal operations totally by itself.
Caller doesn't need to call journal access/dirty pair, and caller only
needs to call journal start/stop pair. The fix method is to remove
journal access/dirty from __ocfs2_move_extent().
The discussion for this patch:
https://oss.oracle.com/pipermail/ocfs2-devel/2023-February/000647.html
Link: https://lkml.kernel.org/r/20230217003717.32469-1-heming.zhao@suse.com
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
[ Upstream commit cba6cfdc7c3f1516f0d08ddfb24e689af0932573 ] An automated bot told me that there was a potential lockdep problem with regulators. This was on the chromeos-5.15 kernel, but I see nothing that would be different downstream compared to upstream. The bot said: ============================================ WARNING: possible recursive locking detected 5.15.104-lockdep-17461-gc1e499ed6604 adithya2306#1 Not tainted -------------------------------------------- kworker/u16:4/115 is trying to acquire lock: ffffff8083110170 (regulator_ww_class_mutex){+.+.}-{3:3}, at: create_regulator+0x398/0x7ec but task is already holding lock: ffffff808378e170 (regulator_ww_class_mutex){+.+.}-{3:3}, at: ww_mutex_trylock+0x3c/0x7b8 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(regulator_ww_class_mutex); lock(regulator_ww_class_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/u16:4/115: #0: ffffff808006a948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x520/0x1348 adithya2306#1: ffffffc00e0a7cc0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x55c/0x1348 adithya2306#2: ffffff80828a2260 (&dev->mutex){....}-{3:3}, at: __device_attach_async_helper+0xd0/0x2a4 adithya2306#3: ffffff808378e170 (regulator_ww_class_mutex){+.+.}-{3:3}, at: ww_mutex_trylock+0x3c/0x7b8 stack backtrace: CPU: 2 PID: 115 Comm: kworker/u16:4 Not tainted 5.15.104-lockdep-17461-gc1e499ed6604 adithya2306#1 9292e52fa83c0e23762b2b3aa1bacf5787a4d5da Hardware name: Google Quackingstick (rev0+) (DT) Workqueue: events_unbound async_run_entry_fn Call trace: dump_backtrace+0x0/0x4ec show_stack+0x34/0x50 dump_stack_lvl+0xdc/0x11c dump_stack+0x1c/0x48 __lock_acquire+0x16d4/0x6c74 lock_acquire+0x208/0x750 __mutex_lock_common+0x11c/0x11f8 ww_mutex_lock+0xc0/0x440 create_regulator+0x398/0x7ec regulator_resolve_supply+0x654/0x7c4 regulator_register_resolve_supply+0x30/0x120 class_for_each_device+0x1b8/0x230 regulator_register+0x17a4/0x1f40 devm_regulator_register+0x60/0xd0 reg_fixed_voltage_probe+0x728/0xaec platform_probe+0x150/0x1c8 really_probe+0x274/0xa20 __driver_probe_device+0x1dc/0x3f4 driver_probe_device+0x78/0x1c0 __device_attach_driver+0x1ac/0x2c8 bus_for_each_drv+0x11c/0x190 __device_attach_async_helper+0x1e4/0x2a4 async_run_entry_fn+0xa0/0x3ac process_one_work+0x638/0x1348 worker_thread+0x4a8/0x9c4 kthread+0x2e4/0x3a0 ret_from_fork+0x10/0x20 The problem was first reported soon after we made many of the regulators probe asynchronously, though nothing I've seen implies that the problems couldn't have also happened even without that. I haven't personally been able to reproduce the lockdep issue, but the issue does look somewhat legitimate. Specifically, it looks like in regulator_resolve_supply() we are holding a "rdev" lock while calling set_supply() -> create_regulator() which grabs the lock of a _different_ "rdev" (the one for our supply). This is not necessarily safe from a lockdep perspective since there is no documented ordering between these two locks. In reality, we should always be locking a regulator before the supplying regulator, so I don't expect there to be any real deadlocks in practice. However, the regulator framework in general doesn't express this to lockdep. Let's fix the issue by simply grabbing the two locks involved in the same way we grab multiple locks elsewhere in the regulator framework: using the "wound/wait" mechanisms. Fixes: eaa7995c529b ("regulator: core: avoid regulator_resolve_supply() race condition") Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20230329143317.RFC.v2.2.I30d8e1ca10cfbe5403884cdd192253a2e063eb9e@changeid Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
[ Upstream commit cbe16f35bee6880becca6f20d2ebf6b457148552 ] Many drivers don't want interrupts enabled automatically via request_irq(). So they are handling this issue by either way of the below two: (1) irq_set_status_flags(irq, IRQ_NOAUTOEN); request_irq(dev, irq...); (2) request_irq(dev, irq...); disable_irq(irq); The code in the second way is silly and unsafe. In the small time gap between request_irq() and disable_irq(), interrupts can still come. The code in the first way is safe though it's subobtimal. Add a new IRQF_NO_AUTOEN flag which can be handed in by drivers to request_irq() and request_nmi(). It prevents the automatic enabling of the requested interrupt/nmi in the same safe way as adithya2306#1 above. With that the various usage sites of adithya2306#1 and adithya2306#2 above can be simplified and corrected. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: dmitry.torokhov@gmail.com Link: https://lore.kernel.org/r/20210302224916.13980-2-song.bao.hua@hisilicon.com Stable-dep-of: 39db65a0a17b ("ASoC: es8316: Handle optional IRQ assignment") Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
…rors
commit 085a9f43433f30cbe8a1ade62d9d7827c3217f4d upstream.
Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter.
This fixes the following false-positive lockdep report when unplugging a
Lenovo X1C8 from a Lenovo 2nd gen TB3 dock:
pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
============================================
WARNING: possible recursive locking detected
5.16.0-rc2+ #621 Not tainted
--------------------------------------------
irq/124-pciehp/86 is trying to acquire lock:
ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80
but task is already holding lock:
ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&ctrl->reset_lock);
lock(&ctrl->reset_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by irq/124-pciehp/86:
#0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
adithya2306#1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
adithya2306#2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40
stack backtrace:
CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
Call Trace:
<TASK>
dump_stack_lvl+0x59/0x73
__lock_acquire.cold+0xc5/0x2c6
lock_acquire+0xb5/0x2b0
down_read+0x3e/0x50
pciehp_check_presence+0x23/0x80
pciehp_runtime_resume+0x5c/0xa0
device_for_each_child+0x45/0x70
pcie_port_device_runtime_resume+0x20/0x30
pci_pm_runtime_resume+0xa7/0xc0
__rpm_callback+0x41/0x110
rpm_callback+0x59/0x70
rpm_resume+0x512/0x7b0
__pm_runtime_resume+0x4a/0x90
__device_release_driver+0x28/0x240
device_release_driver+0x26/0x40
pci_stop_bus_device+0x68/0x90
pci_stop_bus_device+0x2c/0x90
pci_stop_and_remove_bus_device+0xe/0x20
pciehp_unconfigure_device+0x6c/0x110
pciehp_disable_slot+0x5b/0xe0
pciehp_handle_presence_or_link_change+0xc3/0x2f0
pciehp_ist+0x179/0x180
This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock, so
the lockdep splat is a false positive.
Because locks at the same hierarchy level are never acquired recursively, a
per-level lockdep class is sufficient to fix the lockdep warning.
The choice to use one lockdep subclass per pcie-hotplug controller in the
path to the root-bus was made to conserve class keys because their number
is limited and the complexity grows quadratically with number of keys
according to Documentation/locking/lockdep-design.rst.
Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/
Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
[lukas: backport to v5.4-stable]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
[ Upstream commit dfd9248c071a3710c24365897459538551cb7167 ] KCSAN found a data race in sock_recv_cmsgs() where the read access to sk->sk_stamp needs READ_ONCE(). BUG: KCSAN: data-race in packet_recvmsg / packet_recvmsg write (marked) to 0xffff88803c81f258 of 8 bytes by task 19171 on cpu 0: sock_write_timestamp include/net/sock.h:2670 [inline] sock_recv_cmsgs include/net/sock.h:2722 [inline] packet_recvmsg+0xb97/0xd00 net/packet/af_packet.c:3489 sock_recvmsg_nosec net/socket.c:1019 [inline] sock_recvmsg+0x11a/0x130 net/socket.c:1040 sock_read_iter+0x176/0x220 net/socket.c:1118 call_read_iter include/linux/fs.h:1845 [inline] new_sync_read fs/read_write.c:389 [inline] vfs_read+0x5e0/0x630 fs/read_write.c:470 ksys_read+0x163/0x1a0 fs/read_write.c:613 __do_sys_read fs/read_write.c:623 [inline] __se_sys_read fs/read_write.c:621 [inline] __x64_sys_read+0x41/0x50 fs/read_write.c:621 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc read to 0xffff88803c81f258 of 8 bytes by task 19183 on cpu 1: sock_recv_cmsgs include/net/sock.h:2721 [inline] packet_recvmsg+0xb64/0xd00 net/packet/af_packet.c:3489 sock_recvmsg_nosec net/socket.c:1019 [inline] sock_recvmsg+0x11a/0x130 net/socket.c:1040 sock_read_iter+0x176/0x220 net/socket.c:1118 call_read_iter include/linux/fs.h:1845 [inline] new_sync_read fs/read_write.c:389 [inline] vfs_read+0x5e0/0x630 fs/read_write.c:470 ksys_read+0x163/0x1a0 fs/read_write.c:613 __do_sys_read fs/read_write.c:623 [inline] __se_sys_read fs/read_write.c:621 [inline] __x64_sys_read+0x41/0x50 fs/read_write.c:621 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc value changed: 0xffffffffc4653600 -> 0x0000000000000000 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 19183 Comm: syz-executor.5 Not tainted 6.3.0-rc7-02330-gca6270c12e20 adithya2306#2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: 6c7c98b ("sock: avoid dirtying sk_stamp, if possible") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230508175543.55756-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
[ Upstream commit 679ed006d416ea0cecfe24a99d365d1dea69c683 ] KCSAN found a data race of sk->sk_receive_queue->qlen where recvmsg() updates qlen under the queue lock and sendmsg() checks qlen under unix_state_sock(), not the queue lock, so the reader side needs READ_ONCE(). BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_wait_for_peer write (marked) to 0xffff888019fe7c68 of 4 bytes by task 49792 on cpu 0: __skb_unlink include/linux/skbuff.h:2347 [inline] __skb_try_recv_from_queue+0x3de/0x470 net/core/datagram.c:197 __skb_try_recv_datagram+0xf7/0x390 net/core/datagram.c:263 __unix_dgram_recvmsg+0x109/0x8a0 net/unix/af_unix.c:2452 unix_dgram_recvmsg+0x94/0xa0 net/unix/af_unix.c:2549 sock_recvmsg_nosec net/socket.c:1019 [inline] ____sys_recvmsg+0x3a3/0x3b0 net/socket.c:2720 ___sys_recvmsg+0xc8/0x150 net/socket.c:2764 do_recvmmsg+0x182/0x560 net/socket.c:2858 __sys_recvmmsg net/socket.c:2937 [inline] __do_sys_recvmmsg net/socket.c:2960 [inline] __se_sys_recvmmsg net/socket.c:2953 [inline] __x64_sys_recvmmsg+0x153/0x170 net/socket.c:2953 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc read to 0xffff888019fe7c68 of 4 bytes by task 49793 on cpu 1: skb_queue_len include/linux/skbuff.h:2127 [inline] unix_recvq_full net/unix/af_unix.c:229 [inline] unix_wait_for_peer+0x154/0x1a0 net/unix/af_unix.c:1445 unix_dgram_sendmsg+0x13bc/0x14b0 net/unix/af_unix.c:2048 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x148/0x160 net/socket.c:747 ____sys_sendmsg+0x20e/0x620 net/socket.c:2503 ___sys_sendmsg+0xc6/0x140 net/socket.c:2557 __sys_sendmmsg+0x11d/0x370 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x58/0x70 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc value changed: 0x0000000b -> 0x00000001 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 49793 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 adithya2306#2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: 1da177e ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
[ Upstream commit e1d09c2c2f5793474556b60f83900e088d0d366d ] KCSAN found a data race around sk->sk_shutdown where unix_release_sock() and unix_shutdown() update it under unix_state_lock(), OTOH unix_poll() and unix_dgram_poll() read it locklessly. We need to annotate the writes and reads with WRITE_ONCE() and READ_ONCE(). BUG: KCSAN: data-race in unix_poll / unix_release_sock write to 0xffff88800d0f8aec of 1 bytes by task 264 on cpu 0: unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631 unix_release+0x59/0x80 net/unix/af_unix.c:1042 __sock_release+0x7d/0x170 net/socket.c:653 sock_close+0x19/0x30 net/socket.c:1397 __fput+0x179/0x5e0 fs/file_table.c:321 ____fput+0x15/0x20 fs/file_table.c:349 task_work_run+0x116/0x1a0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc read to 0xffff88800d0f8aec of 1 bytes by task 222 on cpu 1: unix_poll+0xa3/0x2a0 net/unix/af_unix.c:3170 sock_poll+0xcf/0x2b0 net/socket.c:1385 vfs_poll include/linux/poll.h:88 [inline] ep_item_poll.isra.0+0x78/0xc0 fs/eventpoll.c:855 ep_send_events fs/eventpoll.c:1694 [inline] ep_poll fs/eventpoll.c:1823 [inline] do_epoll_wait+0x6c4/0xea0 fs/eventpoll.c:2258 __do_sys_epoll_wait fs/eventpoll.c:2270 [inline] __se_sys_epoll_wait fs/eventpoll.c:2265 [inline] __x64_sys_epoll_wait+0xcc/0x190 fs/eventpoll.c:2265 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc value changed: 0x00 -> 0x03 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 222 Comm: dbus-broker Not tainted 6.3.0-rc7-02330-gca6270c12e20 adithya2306#2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: 3c73419 ("af_unix: fix 'poll for write'/ connected DGRAM sockets") Fixes: 1da177e ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
[ Upstream commit d2c48b2387eb89e0bf2a2e06e30987cf410acad4 ]
Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra
triggers:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by cpuhp/0/24:
#0: ffffda30217c70d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248
adithya2306#1: ffffda30217c7120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248
adithya2306#2: ffffda3021c711f0 (sdei_list_lock){....}-{3:3}, at: sdei_cpuhp_up+0x3c/0x130
irq event stamp: 36
hardirqs last enabled at (35): [<ffffda301e85b7bc>] finish_task_switch+0xb4/0x2b0
hardirqs last disabled at (36): [<ffffda301e812fec>] cpuhp_thread_fun+0x21c/0x248
softirqs last enabled at (0): [<ffffda301e80b184>] copy_process+0x63c/0x1ac0
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 0 PID: 24 Comm: cpuhp/0 Not tainted 5.19.0-rc3-rt5-[...]
Hardware name: WIWYNN Mt.Jade Server [...]
Call trace:
dump_backtrace+0x114/0x120
show_stack+0x20/0x70
dump_stack_lvl+0x9c/0xd8
dump_stack+0x18/0x34
__might_resched+0x188/0x228
rt_spin_lock+0x70/0x120
sdei_cpuhp_up+0x3c/0x130
cpuhp_invoke_callback+0x250/0xf08
cpuhp_thread_fun+0x120/0x248
smpboot_thread_fn+0x280/0x320
kthread+0x130/0x140
ret_from_fork+0x10/0x20
sdei_cpuhp_up() is called in the STARTING hotplug section,
which runs with interrupts disabled. Use a CPUHP_AP_ONLINE_DYN entry
instead to execute the cpuhp cb later, with preemption enabled.
SDEI originally got its own cpuhp slot to allow interacting
with perf. It got superseded by pNMI and this early slot is not
relevant anymore. [1]
Some SDEI calls (e.g. SDEI_1_0_FN_SDEI_PE_MASK) take actions on the
calling CPU. It is checked that preemption is disabled for them.
_ONLINE cpuhp cb are executed in the 'per CPU hotplug thread'.
Preemption is enabled in those threads, but their cpumask is limited
to 1 CPU.
Move 'WARN_ON_ONCE(preemptible())' statements so that SDEI cpuhp cb
don't trigger them.
Also add a check for the SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI call
which acts on the calling CPU.
[1]:
https://lore.kernel.org/all/5813b8c5-ae3e-87fd-fccc-94c9cd08816d@arm.com/
Suggested-by: James Morse <james.morse@arm.com>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20230216084920.144064-1-pierre.gondois@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
[ Upstream commit 05bb0167c80b8f93c6a4e0451b7da9b96db990c2 ] ACPICA commit 770653e3ba67c30a629ca7d12e352d83c2541b1e Before this change we see the following UBSAN stack trace in Fuchsia: #0 0x000021e4213b3302 in acpi_ds_init_aml_walk(struct acpi_walk_state*, union acpi_parse_object*, struct acpi_namespace_node*, u8*, u32, struct acpi_evaluate_info*, u8) ../../third_party/acpica/source/components/dispatcher/dswstate.c:682 <platform-bus-x86.so>+0x233302 adithya2306#1.2 0x000020d0f660777f in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x3d77f adithya2306#1.1 0x000020d0f660777f in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x3d77f adithya2306#1 0x000020d0f660777f in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:387 <libclang_rt.asan.so>+0x3d77f adithya2306#2 0x000020d0f660b96d in handlepointer_overflow_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:809 <libclang_rt.asan.so>+0x4196d adithya2306#3 0x000020d0f660b50d in compiler-rt/lib/ubsan/ubsan_handlers.cpp:815 <libclang_rt.asan.so>+0x4150d adithya2306#4 0x000021e4213b3302 in acpi_ds_init_aml_walk(struct acpi_walk_state*, union acpi_parse_object*, struct acpi_namespace_node*, u8*, u32, struct acpi_evaluate_info*, u8) ../../third_party/acpica/source/components/dispatcher/dswstate.c:682 <platform-bus-x86.so>+0x233302 #5 0x000021e4213e2369 in acpi_ds_call_control_method(struct acpi_thread_state*, struct acpi_walk_state*, union acpi_parse_object*) ../../third_party/acpica/source/components/dispatcher/dsmethod.c:605 <platform-bus-x86.so>+0x262369 #6 0x000021e421437fac in acpi_ps_parse_aml(struct acpi_walk_state*) ../../third_party/acpica/source/components/parser/psparse.c:550 <platform-bus-x86.so>+0x2b7fac #7 0x000021e4214464d2 in acpi_ps_execute_method(struct acpi_evaluate_info*) ../../third_party/acpica/source/components/parser/psxface.c:244 <platform-bus-x86.so>+0x2c64d2 #8 0x000021e4213aa052 in acpi_ns_evaluate(struct acpi_evaluate_info*) ../../third_party/acpica/source/components/namespace/nseval.c:250 <platform-bus-x86.so>+0x22a052 #9 0x000021e421413dd8 in acpi_ns_init_one_device(acpi_handle, u32, void*, void**) ../../third_party/acpica/source/components/namespace/nsinit.c:735 <platform-bus-x86.so>+0x293dd8 #10 0x000021e421429e98 in acpi_ns_walk_namespace(acpi_object_type, acpi_handle, u32, u32, acpi_walk_callback, acpi_walk_callback, void*, void**) ../../third_party/acpica/source/components/namespace/nswalk.c:298 <platform-bus-x86.so>+0x2a9e98 #11 0x000021e4214131ac in acpi_ns_initialize_devices(u32) ../../third_party/acpica/source/components/namespace/nsinit.c:268 <platform-bus-x86.so>+0x2931ac #12 0x000021e42147c40d in acpi_initialize_objects(u32) ../../third_party/acpica/source/components/utilities/utxfinit.c:304 <platform-bus-x86.so>+0x2fc40d #13 0x000021e42126d603 in acpi::acpi_impl::initialize_acpi(acpi::acpi_impl*) ../../src/devices/board/lib/acpi/acpi-impl.cc:224 <platform-bus-x86.so>+0xed603 Add a simple check that avoids incrementing a pointer by zero, but otherwise behaves as before. Note that our findings are against ACPICA 20221020, but the same code exists on master. Link: acpica/acpica@770653e3 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
commit 597441b3436a43011f31ce71dc0a6c0bf5ce958a upstream.
Our CI system caught a lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc7+ #1167 Not tainted
------------------------------------------------------
kswapd0/46 is trying to acquire lock:
ffff8c6543abd650 (sb_internal#2){++++}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x5f/0x120
but task is already holding lock:
ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> adithya2306#1 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire+0xa5/0xe0
kmem_cache_alloc+0x31/0x2c0
alloc_extent_state+0x1d/0xd0
__clear_extent_bit+0x2e0/0x4f0
try_release_extent_mapping+0x216/0x280
btrfs_release_folio+0x2e/0x90
invalidate_inode_pages2_range+0x397/0x470
btrfs_cleanup_dirty_bgs+0x9e/0x210
btrfs_cleanup_one_transaction+0x22/0x760
btrfs_commit_transaction+0x3b7/0x13a0
create_subvol+0x59b/0x970
btrfs_mksubvol+0x435/0x4f0
__btrfs_ioctl_snap_create+0x11e/0x1b0
btrfs_ioctl_snap_create_v2+0xbf/0x140
btrfs_ioctl+0xa45/0x28f0
__x64_sys_ioctl+0x88/0xc0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
-> #0 (sb_internal#2){++++}-{0:0}:
__lock_acquire+0x1435/0x21a0
lock_acquire+0xc2/0x2b0
start_transaction+0x401/0x730
btrfs_commit_inode_delayed_inode+0x5f/0x120
btrfs_evict_inode+0x292/0x3d0
evict+0xcc/0x1d0
inode_lru_isolate+0x14d/0x1e0
__list_lru_walk_one+0xbe/0x1c0
list_lru_walk_one+0x58/0x80
prune_icache_sb+0x39/0x60
super_cache_scan+0x161/0x1f0
do_shrink_slab+0x163/0x340
shrink_slab+0x1d3/0x290
shrink_node+0x300/0x720
balance_pgdat+0x35c/0x7a0
kswapd+0x205/0x410
kthread+0xf0/0x120
ret_from_fork+0x29/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(sb_internal#2);
lock(fs_reclaim);
lock(sb_internal#2);
*** DEADLOCK ***
3 locks held by kswapd0/46:
#0: ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0
adithya2306#1: ffffffffabe50270 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x113/0x290
adithya2306#2: ffff8c6543abd0e0 (&type->s_umount_key#44){++++}-{3:3}, at: super_cache_scan+0x38/0x1f0
stack backtrace:
CPU: 0 PID: 46 Comm: kswapd0 Not tainted 6.3.0-rc7+ #1167
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x58/0x90
check_noncircular+0xd6/0x100
? save_trace+0x3f/0x310
? add_lock_to_list+0x97/0x120
__lock_acquire+0x1435/0x21a0
lock_acquire+0xc2/0x2b0
? btrfs_commit_inode_delayed_inode+0x5f/0x120
start_transaction+0x401/0x730
? btrfs_commit_inode_delayed_inode+0x5f/0x120
btrfs_commit_inode_delayed_inode+0x5f/0x120
btrfs_evict_inode+0x292/0x3d0
? lock_release+0x134/0x270
? __pfx_wake_bit_function+0x10/0x10
evict+0xcc/0x1d0
inode_lru_isolate+0x14d/0x1e0
__list_lru_walk_one+0xbe/0x1c0
? __pfx_inode_lru_isolate+0x10/0x10
? __pfx_inode_lru_isolate+0x10/0x10
list_lru_walk_one+0x58/0x80
prune_icache_sb+0x39/0x60
super_cache_scan+0x161/0x1f0
do_shrink_slab+0x163/0x340
shrink_slab+0x1d3/0x290
shrink_node+0x300/0x720
balance_pgdat+0x35c/0x7a0
kswapd+0x205/0x410
? __pfx_autoremove_wake_function+0x10/0x10
? __pfx_kswapd+0x10/0x10
kthread+0xf0/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
This happens because when we abort the transaction in the transaction
commit path we call invalidate_inode_pages2_range on our block group
cache inodes (if we have space cache v1) and any delalloc inodes we may
have. The plain invalidate_inode_pages2_range() call passes through
GFP_KERNEL, which makes sense in most cases, but not here. Wrap these
two invalidate callees with memalloc_nofs_save/memalloc_nofs_restore to
make sure we don't end up with the fs reclaim dependency under the
transaction dependency.
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
…phys
[ Upstream commit 85d38d5810e285d5aec7fb5283107d1da70c12a9 ]
When booting with "intremap=off" and "x2apic_phys" on the kernel command
line, the physical x2APIC driver ends up being used even when x2APIC
mode is disabled ("intremap=off" disables x2APIC mode). This happens
because the first compound condition check in x2apic_phys_probe() is
false due to x2apic_mode == 0 and so the following one returns true
after default_acpi_madt_oem_check() having already selected the physical
x2APIC driver.
This results in the following panic:
kernel BUG at arch/x86/kernel/apic/io_apic.c:2409!
invalid opcode: 0000 [adithya2306#1] PREEMPT SMP NOPTI
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc2-ver4.1rc2 adithya2306#2
Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021
RIP: 0010:setup_IO_APIC+0x9c/0xaf0
Call Trace:
<TASK>
? native_read_msr
apic_intr_mode_init
x86_late_time_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
secondary_startup_64_no_verify
</TASK>
which is:
setup_IO_APIC:
apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n");
for_each_ioapic(ioapic)
BUG_ON(mp_irqdomain_create(ioapic));
Return 0 to denote that x2APIC has not been enabled when probing the
physical x2APIC driver.
[ bp: Massage commit message heavily. ]
Fixes: 9ebd680 ("x86, apic: Use probe routines to simplify apic selection")
Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230616212236.1389-1-dheerajkumar.srivastava@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
[ Upstream commit ce3aee7114c575fab32a5e9e939d4bbb3dcca79f ] syzkaller reported use-after-free in __gtp_encap_destroy(). [0] It shows the same process freed sk and touched it illegally. Commit e198987 ("gtp: fix suspicious RCU usage") added lock_sock() and release_sock() in __gtp_encap_destroy() to protect sk->sk_user_data, but release_sock() is called after sock_put() releases the last refcnt. [0]: BUG: KASAN: slab-use-after-free in instrument_atomic_read_write include/linux/instrumented.h:96 [inline] BUG: KASAN: slab-use-after-free in atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:541 [inline] BUG: KASAN: slab-use-after-free in queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] BUG: KASAN: slab-use-after-free in do_raw_spin_lock include/linux/spinlock.h:186 [inline] BUG: KASAN: slab-use-after-free in __raw_spin_lock_bh include/linux/spinlock_api_smp.h:127 [inline] BUG: KASAN: slab-use-after-free in _raw_spin_lock_bh+0x75/0xe0 kernel/locking/spinlock.c:178 Write of size 4 at addr ffff88800dbef398 by task syz-executor.2/2401 CPU: 1 PID: 2401 Comm: syz-executor.2 Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 adithya2306#2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:351 [inline] print_report+0xcc/0x620 mm/kasan/report.c:462 kasan_report+0xb2/0xe0 mm/kasan/report.c:572 check_region_inline mm/kasan/generic.c:181 [inline] kasan_check_range+0x39/0x1c0 mm/kasan/generic.c:187 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:541 [inline] queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] do_raw_spin_lock include/linux/spinlock.h:186 [inline] __raw_spin_lock_bh include/linux/spinlock_api_smp.h:127 [inline] _raw_spin_lock_bh+0x75/0xe0 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:355 [inline] release_sock+0x1f/0x1a0 net/core/sock.c:3526 gtp_encap_disable_sock drivers/net/gtp.c:651 [inline] gtp_encap_disable+0xb9/0x220 drivers/net/gtp.c:664 gtp_dev_uninit+0x19/0x50 drivers/net/gtp.c:728 unregister_netdevice_many_notify+0x97e/0x1520 net/core/dev.c:10841 rtnl_delete_link net/core/rtnetlink.c:3216 [inline] rtnl_dellink+0x3c0/0xb30 net/core/rtnetlink.c:3268 rtnetlink_rcv_msg+0x450/0xb10 net/core/rtnetlink.c:6423 netlink_rcv_skb+0x15d/0x450 net/netlink/af_netlink.c:2548 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x700/0x930 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x91c/0xe30 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x1b7/0x200 net/socket.c:747 ____sys_sendmsg+0x75a/0x990 net/socket.c:2493 ___sys_sendmsg+0x11d/0x1c0 net/socket.c:2547 __sys_sendmsg+0xfe/0x1d0 net/socket.c:2576 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f1168b1fe5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007f1167edccc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f1168b1fe5d RDX: 0000000000000000 RSI: 00000000200002c0 RDI: 0000000000000003 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f1168b80530 R15: 0000000000000000 </TASK> Allocated by task 1483: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 __kasan_slab_alloc+0x59/0x70 mm/kasan/common.c:328 kasan_slab_alloc include/linux/kasan.h:186 [inline] slab_post_alloc_hook mm/slab.h:711 [inline] slab_alloc_node mm/slub.c:3451 [inline] slab_alloc mm/slub.c:3459 [inline] __kmem_cache_alloc_lru mm/slub.c:3466 [inline] kmem_cache_alloc+0x16d/0x340 mm/slub.c:3475 sk_prot_alloc+0x5f/0x280 net/core/sock.c:2073 sk_alloc+0x34/0x6c0 net/core/sock.c:2132 inet6_create net/ipv6/af_inet6.c:192 [inline] inet6_create+0x2c7/0xf20 net/ipv6/af_inet6.c:119 __sock_create+0x2a1/0x530 net/socket.c:1535 sock_create net/socket.c:1586 [inline] __sys_socket_create net/socket.c:1623 [inline] __sys_socket_create net/socket.c:1608 [inline] __sys_socket+0x137/0x250 net/socket.c:1651 __do_sys_socket net/socket.c:1664 [inline] __se_sys_socket net/socket.c:1662 [inline] __x64_sys_socket+0x72/0xb0 net/socket.c:1662 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 2401: kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free mm/kasan/common.c:200 [inline] __kasan_slab_free+0x10c/0x1b0 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:162 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook mm/slub.c:1807 [inline] slab_free mm/slub.c:3786 [inline] kmem_cache_free+0xb4/0x490 mm/slub.c:3808 sk_prot_free net/core/sock.c:2113 [inline] __sk_destruct+0x500/0x720 net/core/sock.c:2207 sk_destruct+0xc1/0xe0 net/core/sock.c:2222 __sk_free+0xed/0x3d0 net/core/sock.c:2233 sk_free+0x7c/0xa0 net/core/sock.c:2244 sock_put include/net/sock.h:1981 [inline] __gtp_encap_destroy+0x165/0x1b0 drivers/net/gtp.c:634 gtp_encap_disable_sock drivers/net/gtp.c:651 [inline] gtp_encap_disable+0xb9/0x220 drivers/net/gtp.c:664 gtp_dev_uninit+0x19/0x50 drivers/net/gtp.c:728 unregister_netdevice_many_notify+0x97e/0x1520 net/core/dev.c:10841 rtnl_delete_link net/core/rtnetlink.c:3216 [inline] rtnl_dellink+0x3c0/0xb30 net/core/rtnetlink.c:3268 rtnetlink_rcv_msg+0x450/0xb10 net/core/rtnetlink.c:6423 netlink_rcv_skb+0x15d/0x450 net/netlink/af_netlink.c:2548 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x700/0x930 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x91c/0xe30 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x1b7/0x200 net/socket.c:747 ____sys_sendmsg+0x75a/0x990 net/socket.c:2493 ___sys_sendmsg+0x11d/0x1c0 net/socket.c:2547 __sys_sendmsg+0xfe/0x1d0 net/socket.c:2576 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc The buggy address belongs to the object at ffff88800dbef300 which belongs to the cache UDPv6 of size 1344 The buggy address is located 152 bytes inside of freed 1344-byte region [ffff88800dbef300, ffff88800dbef840) The buggy address belongs to the physical page: page:00000000d31bfed5 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88800dbeed40 pfn:0xdbe8 head:00000000d31bfed5 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 memcg:ffff888008ee0801 flags: 0x100000000010200(slab|head|node=0|zone=1) page_type: 0xffffffff() raw: 0100000000010200 ffff88800c7a3000 dead000000000122 0000000000000000 raw: ffff88800dbeed40 0000000080160015 00000001ffffffff ffff888008ee0801 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88800dbef280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800dbef300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88800dbef380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88800dbef400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88800dbef480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: e198987 ("gtp: fix suspicious RCU usage") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/20230622213231.24651-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
[ Upstream commit 6f67fbf8192da80c4db01a1800c7fceaca9cf1f9 ] The `shift` variable which indicates the offset in the string at which to start matching the pattern is initialized to `bm->patlen - 1`, but it is not reset when a new block is retrieved. This means the implemen- tation may start looking at later and later positions in each successive block and miss occurrences of the pattern at the beginning. E.g., consider a HTTP packet held in a non-linear skb, where the HTTP request line occurs in the second block: [... 52 bytes of packet headers ...] GET /bmtest HTTP/1.1\r\nHost: www.example.com\r\n\r\n and the pattern is "GET /bmtest". Once the first block comprising the packet headers has been examined, `shift` will be pointing to somewhere near the end of the block, and so when the second block is examined the request line at the beginning will be missed. Reinitialize the variable for each new block. Fixes: 8082e4e ("[LIB]: Boyer-Moore extension for textsearch infrastructure strike adithya2306#2") Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1390 Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
[ Upstream commit 5f4fa1672d98fe99d2297b03add35346f1685d6b ]
We do netif_napi_add() for all allocated q_vectors[], but potentially
do netif_napi_del() for part of them, then kfree q_vectors and leave
invalid pointers at dev->napi_list.
Reproducer:
[root@host ~]# cat repro.sh
#!/bin/bash
pf_dbsf="0000:41:00.0"
vf0_dbsf="0000:41:02.0"
g_pids=()
function do_set_numvf()
{
echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
sleep $((RANDOM%3+1))
echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
sleep $((RANDOM%3+1))
}
function do_set_channel()
{
local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
[ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
ifconfig $nic 192.168.18.5 netmask 255.255.255.0
ifconfig $nic up
ethtool -L $nic combined 1
ethtool -L $nic combined 4
sleep $((RANDOM%3))
}
function on_exit()
{
local pid
for pid in "${g_pids[@]}"; do
kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
done
g_pids=()
}
trap "on_exit; exit" EXIT
while :; do do_set_numvf ; done &
g_pids+=($!)
while :; do do_set_channel ; done &
g_pids+=($!)
wait
Result:
[ 4093.900222] ==================================================================
[ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390
[ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699
[ 4093.900233]
[ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 adithya2306#1
[ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
[ 4093.900239] Call Trace:
[ 4093.900244] dump_stack+0x71/0xab
[ 4093.900249] print_address_description+0x6b/0x290
[ 4093.900251] ? free_netdev+0x308/0x390
[ 4093.900252] kasan_report+0x14a/0x2b0
[ 4093.900254] free_netdev+0x308/0x390
[ 4093.900261] iavf_remove+0x825/0xd20 [iavf]
[ 4093.900265] pci_device_remove+0xa8/0x1f0
[ 4093.900268] device_release_driver_internal+0x1c6/0x460
[ 4093.900271] pci_stop_bus_device+0x101/0x150
[ 4093.900273] pci_stop_and_remove_bus_device+0xe/0x20
[ 4093.900275] pci_iov_remove_virtfn+0x187/0x420
[ 4093.900277] ? pci_iov_add_virtfn+0xe10/0xe10
[ 4093.900278] ? pci_get_subsys+0x90/0x90
[ 4093.900280] sriov_disable+0xed/0x3e0
[ 4093.900282] ? bus_find_device+0x12d/0x1a0
[ 4093.900290] i40e_free_vfs+0x754/0x1210 [i40e]
[ 4093.900298] ? i40e_reset_all_vfs+0x880/0x880 [i40e]
[ 4093.900299] ? pci_get_device+0x7c/0x90
[ 4093.900300] ? pci_get_subsys+0x90/0x90
[ 4093.900306] ? pci_vfs_assigned.part.7+0x144/0x210
[ 4093.900309] ? __mutex_lock_slowpath+0x10/0x10
[ 4093.900315] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
[ 4093.900318] sriov_numvfs_store+0x214/0x290
[ 4093.900320] ? sriov_totalvfs_show+0x30/0x30
[ 4093.900321] ? __mutex_lock_slowpath+0x10/0x10
[ 4093.900323] ? __check_object_size+0x15a/0x350
[ 4093.900326] kernfs_fop_write+0x280/0x3f0
[ 4093.900329] vfs_write+0x145/0x440
[ 4093.900330] ksys_write+0xab/0x160
[ 4093.900332] ? __ia32_sys_read+0xb0/0xb0
[ 4093.900334] ? fput_many+0x1a/0x120
[ 4093.900335] ? filp_close+0xf0/0x130
[ 4093.900338] do_syscall_64+0xa0/0x370
[ 4093.900339] ? page_fault+0x8/0x30
[ 4093.900341] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 4093.900357] RIP: 0033:0x7f16ad4d22c0
[ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
[ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0
[ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001
[ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700
[ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
[ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001
[ 4093.900367]
[ 4093.900368] Allocated by task 820:
[ 4093.900371] kasan_kmalloc+0xa6/0xd0
[ 4093.900373] __kmalloc+0xfb/0x200
[ 4093.900376] iavf_init_interrupt_scheme+0x63b/0x1320 [iavf]
[ 4093.900380] iavf_watchdog_task+0x3d51/0x52c0 [iavf]
[ 4093.900382] process_one_work+0x56a/0x11f0
[ 4093.900383] worker_thread+0x8f/0xf40
[ 4093.900384] kthread+0x2a0/0x390
[ 4093.900385] ret_from_fork+0x1f/0x40
[ 4093.900387] 0xffffffffffffffff
[ 4093.900387]
[ 4093.900388] Freed by task 6699:
[ 4093.900390] __kasan_slab_free+0x137/0x190
[ 4093.900391] kfree+0x8b/0x1b0
[ 4093.900394] iavf_free_q_vectors+0x11d/0x1a0 [iavf]
[ 4093.900397] iavf_remove+0x35a/0xd20 [iavf]
[ 4093.900399] pci_device_remove+0xa8/0x1f0
[ 4093.900400] device_release_driver_internal+0x1c6/0x460
[ 4093.900401] pci_stop_bus_device+0x101/0x150
[ 4093.900402] pci_stop_and_remove_bus_device+0xe/0x20
[ 4093.900403] pci_iov_remove_virtfn+0x187/0x420
[ 4093.900404] sriov_disable+0xed/0x3e0
[ 4093.900409] i40e_free_vfs+0x754/0x1210 [i40e]
[ 4093.900415] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
[ 4093.900416] sriov_numvfs_store+0x214/0x290
[ 4093.900417] kernfs_fop_write+0x280/0x3f0
[ 4093.900418] vfs_write+0x145/0x440
[ 4093.900419] ksys_write+0xab/0x160
[ 4093.900420] do_syscall_64+0xa0/0x370
[ 4093.900421] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 4093.900422] 0xffffffffffffffff
[ 4093.900422]
[ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200
which belongs to the cache kmalloc-8k of size 8192
[ 4093.900425] The buggy address is located 5184 bytes inside of
8192-byte region [ffff88b4dc144200, ffff88b4dc146200)
[ 4093.900425] The buggy address belongs to the page:
[ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0
[ 4093.900430] flags: 0x10000000008100(slab|head)
[ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80
[ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
[ 4093.900434] page dumped because: kasan: bad access detected
[ 4093.900435]
[ 4093.900435] Memory state around the buggy address:
[ 4093.900436] ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900437] ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900438] ^
[ 4093.900439] ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900440] ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900440] ==================================================================
Although the patch adithya2306#2 (of 2) can avoid the issue triggered by this
repro.sh, there still are other potential risks that if num_active_queues
is changed to less than allocated q_vectors[] by unexpected, the
mismatched netif_napi_add/del() can also cause UAF.
Since we actually call netif_napi_add() for all allocated q_vectors
unconditionally in iavf_alloc_q_vectors(), so we should fix it by
letting netif_napi_del() match to netif_napi_add().
Fixes: 5eae00c ("i40evf: main driver core")
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
Cc: Huang Cun <huangcun@sangfor.com.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 17, 2024
Running a rt-kernel base on 6.2.0-rc3-rt1 on an Ampere Altra outputs
the following:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: kworker/u320:0
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by kworker/u320:0/9:
#0: ffff3fff8c27d128 ((wq_completion)efi_rts_wq){+.+.}-{0:0}, at: process_one_work (./include/linux/atomic/atomic-long.h:41)
adithya2306#1: ffff80000861bdd0 ((work_completion)(&efi_rts_work.work)){+.+.}-{0:0}, at: process_one_work (./include/linux/atomic/atomic-long.h:41)
adithya2306#2: ffffdf7e1ed3e460 (efi_rt_lock){+.+.}-{3:3}, at: efi_call_rts (drivers/firmware/efi/runtime-wrappers.c:101)
Preemption disabled at:
efi_virtmap_load (./arch/arm64/include/asm/mmu_context.h:248)
CPU: 0 PID: 9 Comm: kworker/u320:0 Tainted: G W 6.2.0-rc3-rt1
Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
Workqueue: efi_rts_wq efi_call_rts
Call trace:
dump_backtrace (arch/arm64/kernel/stacktrace.c:158)
show_stack (arch/arm64/kernel/stacktrace.c:165)
dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
dump_stack (lib/dump_stack.c:114)
__might_resched (kernel/sched/core.c:10134)
rt_spin_lock (kernel/locking/rtmutex.c:1769 (discriminator 4))
efi_call_rts (drivers/firmware/efi/runtime-wrappers.c:101)
[...]
This seems to come from commit ff7a167961d1 ("arm64: efi: Execute
runtime services from a dedicated stack") which adds a spinlock. This
spinlock is taken through:
efi_call_rts()
\-efi_call_virt()
\-efi_call_virt_pointer()
\-arch_efi_call_virt_setup()
Make 'efi_rt_lock' a raw_spinlock to avoid being preempted.
[ardb: The EFI runtime services are called with a different set of
translation tables, and are permitted to use the SIMD registers.
The context switch code preserves/restores neither, and so EFI
calls must be made with preemption disabled, rather than only
disabling migration.]
Bug: 254441685
Fixes: ff7a167961d1 ("arm64: efi: Execute runtime services from a dedicated stack")
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Cc: <stable@vger.kernel.org> # v6.1+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
(cherry picked from commit 0e68b5517d3767562889f1d83fdb828c26adb24f)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Idcd18ddcc099aa063e042677984ce7f4914659ed
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 18, 2024
commit 4f3175979e62de3b929bfa54a0db4b87d36257a7 upstream. With hardened usercopy enabled (CONFIG_HARDENED_USERCOPY=y), using the /proc/powerpc/rtas/firmware_update interface to prepare a system firmware update yields a BUG(): kernel BUG at mm/usercopy.c:102! Oops: Exception in kernel mode, sig: 5 [adithya2306#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 2232 Comm: dd Not tainted 6.5.0-rc3+ adithya2306#2 Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 of:IBM,FW860.50 (SV860_146) hv:phyp pSeries NIP: c0000000005991d0 LR: c0000000005991cc CTR: 0000000000000000 REGS: c0000000148c76a0 TRAP: 0700 Not tainted (6.5.0-rc3+) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002242 XER: 0000000c CFAR: c0000000001fbd34 IRQMASK: 0 [ ... GPRs omitted ... ] NIP usercopy_abort+0xa0/0xb0 LR usercopy_abort+0x9c/0xb0 Call Trace: usercopy_abort+0x9c/0xb0 (unreliable) __check_heap_object+0x1b4/0x1d0 __check_object_size+0x2d0/0x380 rtas_flash_write+0xe4/0x250 proc_reg_write+0xfc/0x160 vfs_write+0xfc/0x4e0 ksys_write+0x90/0x160 system_call_exception+0x178/0x320 system_call_common+0x160/0x2c4 The blocks of the firmware image are copied directly from user memory to objects allocated from flash_block_cache, so flash_block_cache must be created using kmem_cache_create_usercopy() to mark it safe for user access. Fixes: 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> [mpe: Trim and indent oops] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230810-rtas-flash-vs-hardened-usercopy-v2-1-dcf63793a938@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 18, 2024
[ Upstream commit c1f848f12103920ca165758aedb1c10904e193e1 ] When the tdm lane mask is computed, the driver currently fills the 1st lane before moving on to the next. If the stream has less channels than the lanes can accommodate, slots will be disabled on the last lanes. Unfortunately, the HW distribute channels in a different way. It distribute channels in pair on each lanes before moving on the next slots. This difference leads to problems if a device has an interface with more than 1 lane and with more than 2 slots per lane. For example: a playback interface with 2 lanes and 4 slots each (total 8 slots - zero based numbering) - Playing a 8ch stream: - All slots activated by the driver - channel adithya2306#2 will be played on lane adithya2306#1 - slot #0 following HW placement - Playing a 4ch stream: - Lane adithya2306#1 disabled by the driver - channel adithya2306#2 will be played on lane #0 - slot adithya2306#2 This behaviour is obviously not desirable. Change the way slots are activated on the TDM lanes to follow what the HW does and make sure each channel always get mapped to the same slot/lane. Fixes: 1a11d88 ("ASoC: meson: add tdm formatter base driver") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20230809171931.1244502-1-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 18, 2024
[ Upstream commit 94741eba63c23b0f1527b0ae0125e6b553bde10e ] Current code enables TCSR.TE and RCSR.RE together, and disable TCSR.TE and RCSR.RE together in trigger(), which only supports one operation mode: 1. Rx synchronous with Tx: TE is last enabled and first disabled Other operation mode need to be considered also: 2. Tx synchronous with Rx: RE is last enabled and first disabled. 3. Asynchronous mode: Tx and Rx are independent. So the enable TCSR.TE and RCSR.RE sequence and the disable sequence need to be refined accordingly for adithya2306#2 and adithya2306#3. There is slightly against what RM recommennds with this change. For example in Rx synchronous with Tx mode, case "aplay 1.wav; arecord 2.wav" enable TE before RE. But it should be safe to do so, judging by years of testing results. Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Reviewed-by: Nicolin Chen <nicoleotsuka@gmail.com> Link: https://lore.kernel.org/r/20200805063413.4610-2-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org> Stable-dep-of: 269f399dc19f ("ASoC: fsl_sai: Disable bit clock with transmitter") Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 18, 2024
…eate commit da62cb7230f0871c30dc9789071f63229158d261 upstream. I got a use-after-free report when doing some fuzz test: If ttm_bo_init() fails, the "gbo" and "gbo->bo.base" will be freed by ttm_buffer_object_destroy() in ttm_bo_init(). But then drm_gem_vram_create() and drm_gem_vram_init() will free "gbo" and "gbo->bo.base" again. BUG: KMSAN: use-after-free in drm_vma_offset_remove+0xb3/0x150 CPU: 0 PID: 24282 Comm: syz-executor.1 Tainted: G B W 5.7.0-rc4-msan adithya2306#2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack dump_stack+0x1c9/0x220 kmsan_report+0xf7/0x1e0 __msan_warning+0x58/0xa0 drm_vma_offset_remove+0xb3/0x150 drm_gem_free_mmap_offset drm_gem_object_release+0x159/0x180 drm_gem_vram_init drm_gem_vram_create+0x7c5/0x990 drm_gem_vram_fill_create_dumb drm_gem_vram_driver_dumb_create+0x238/0x590 drm_mode_create_dumb drm_mode_create_dumb_ioctl+0x41d/0x450 drm_ioctl_kernel+0x5a4/0x710 drm_ioctl+0xc6f/0x1240 vfs_ioctl ksys_ioctl __do_sys_ioctl __se_sys_ioctl+0x2e9/0x410 __x64_sys_ioctl+0x4a/0x70 do_syscall_64+0xb8/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x4689b9 Code: fd e0 fa ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb e0 fa ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f368fa4dc98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000000000076bf00 RCX: 00000000004689b9 RDX: 0000000020000240 RSI: 00000000c02064b2 RDI: 0000000000000003 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00000000004d17e0 R14: 00007f368fa4e6d4 R15: 000000000076bf0c Uninit was created at: kmsan_save_stack_with_flags kmsan_internal_poison_shadow+0x66/0xd0 kmsan_slab_free+0x6e/0xb0 slab_free_freelist_hook slab_free kfree+0x571/0x30a0 drm_gem_vram_destroy ttm_buffer_object_destroy+0xc8/0x130 ttm_bo_release kref_put ttm_bo_put+0x117d/0x23e0 ttm_bo_init_reserved+0x11c0/0x11d0 ttm_bo_init+0x289/0x3f0 drm_gem_vram_init drm_gem_vram_create+0x775/0x990 drm_gem_vram_fill_create_dumb drm_gem_vram_driver_dumb_create+0x238/0x590 drm_mode_create_dumb drm_mode_create_dumb_ioctl+0x41d/0x450 drm_ioctl_kernel+0x5a4/0x710 drm_ioctl+0xc6f/0x1240 vfs_ioctl ksys_ioctl __do_sys_ioctl __se_sys_ioctl+0x2e9/0x410 __x64_sys_ioctl+0x4a/0x70 do_syscall_64+0xb8/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 If ttm_bo_init() fails, the "gbo" will be freed by ttm_buffer_object_destroy() in ttm_bo_init(). But then drm_gem_vram_create() and drm_gem_vram_init() will free "gbo" again. Reported-by: Hulk Robot <hulkci@huawei.com> Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Cc: x kaneiki <xkaneiki@gmail.com> Signed-off-by: Jia Yang <jiayang5@huawei.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20200714083238.28479-2-tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 18, 2024
[ Upstream commit ef23cb593304bde0cc046fd4cc83ae7ea2e24f16 ] While debugging a segfault on 'perf lock contention' without an available perf.data file I noticed that it was basically calling: perf_session__delete(ERR_PTR(-1)) Resulting in: (gdb) run lock contention Starting program: /root/bin/perf lock contention [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". failed to open perf.data: No such file or directory (try 'perf record' first) Initializing perf session failed Program received signal SIGSEGV, Segmentation fault. 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 2858 if (!session->auxtrace) (gdb) p session $1 = (struct perf_session *) 0xffffffffffffffff (gdb) bt #0 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 adithya2306#1 0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300 adithya2306#2 0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161 adithya2306#3 0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604 adithya2306#4 0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322 #5 0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375 #6 0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419 #7 0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535 (gdb) So just set it to NULL after using PTR_ERR(session) to decode the error as perf_session__delete(NULL) is supported. The same problem was found in 'perf top' after an audit of all perf_session__new() failure handling. Fixes: 6ef81c5 ("perf session: Return error code for perf_session__new() function on failure") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mukesh Ojha <mojha@codeaurora.org> Cc: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Shawn Landden <shawn@git.icu> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com> Link: https://lore.kernel.org/lkml/ZN4Q2rxxsL08A8rd@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 18, 2024
… delayed items
commit e110f8911ddb93e6f55da14ccbbe705397b30d0b upstream.
When running delayed items we are holding a delayed node's mutex and then
we will attempt to modify a subvolume btree to insert/update/delete the
delayed items. However if have an error during the insertions for example,
btrfs_insert_delayed_items() may return with a path that has locked extent
buffers (a leaf at the very least), and then we attempt to release the
delayed node at __btrfs_run_delayed_items(), which requires taking the
delayed node's mutex, causing an ABBA type of deadlock. This was reported
by syzbot and the lockdep splat is the following:
WARNING: possible circular locking dependency detected
6.5.0-rc7-syzkaller-00024-g93f5de5f648d #0 Not tainted
------------------------------------------------------
syz-executor.2/13257 is trying to acquire lock:
ffff88801835c0c0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256
but task is already holding lock:
ffff88802a5ab8e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x3c/0x2a0 fs/btrfs/locking.c:198
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> adithya2306#1 (btrfs-tree-00){++++}-{3:3}:
__lock_release kernel/locking/lockdep.c:5475 [inline]
lock_release+0x36f/0x9d0 kernel/locking/lockdep.c:5781
up_write+0x79/0x580 kernel/locking/rwsem.c:1625
btrfs_tree_unlock_rw fs/btrfs/locking.h:189 [inline]
btrfs_unlock_up_safe+0x179/0x3b0 fs/btrfs/locking.c:239
search_leaf fs/btrfs/ctree.c:1986 [inline]
btrfs_search_slot+0x2511/0x2f80 fs/btrfs/ctree.c:2230
btrfs_insert_empty_items+0x9c/0x180 fs/btrfs/ctree.c:4376
btrfs_insert_delayed_item fs/btrfs/delayed-inode.c:746 [inline]
btrfs_insert_delayed_items fs/btrfs/delayed-inode.c:824 [inline]
__btrfs_commit_inode_delayed_items+0xd24/0x2410 fs/btrfs/delayed-inode.c:1111
__btrfs_run_delayed_items+0x1db/0x430 fs/btrfs/delayed-inode.c:1153
flush_space+0x269/0xe70 fs/btrfs/space-info.c:723
btrfs_async_reclaim_metadata_space+0x106/0x350 fs/btrfs/space-info.c:1078
process_one_work+0x92c/0x12c0 kernel/workqueue.c:2600
worker_thread+0xa63/0x1210 kernel/workqueue.c:2751
kthread+0x2b8/0x350 kernel/kthread.c:389
ret_from_fork+0x2e/0x60 arch/x86/kernel/process.c:145
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
-> #0 (&delayed_node->mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
__mutex_lock_common+0x1d8/0x2530 kernel/locking/mutex.c:603
__mutex_lock kernel/locking/mutex.c:747 [inline]
mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799
__btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256
btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline]
__btrfs_run_delayed_items+0x2b5/0x430 fs/btrfs/delayed-inode.c:1156
btrfs_commit_transaction+0x859/0x2ff0 fs/btrfs/transaction.c:2276
btrfs_sync_file+0xf56/0x1330 fs/btrfs/file.c:1988
vfs_fsync_range fs/sync.c:188 [inline]
vfs_fsync fs/sync.c:202 [inline]
do_fsync fs/sync.c:212 [inline]
__do_sys_fsync fs/sync.c:220 [inline]
__se_sys_fsync fs/sync.c:218 [inline]
__x64_sys_fsync+0x196/0x1e0 fs/sync.c:218
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(btrfs-tree-00);
lock(&delayed_node->mutex);
lock(btrfs-tree-00);
lock(&delayed_node->mutex);
*** DEADLOCK ***
3 locks held by syz-executor.2/13257:
#0: ffff88802c1ee370 (btrfs_trans_num_writers){++++}-{0:0}, at: spin_unlock include/linux/spinlock.h:391 [inline]
#0: ffff88802c1ee370 (btrfs_trans_num_writers){++++}-{0:0}, at: join_transaction+0xb87/0xe00 fs/btrfs/transaction.c:287
adithya2306#1: ffff88802c1ee398 (btrfs_trans_num_extwriters){++++}-{0:0}, at: join_transaction+0xbb2/0xe00 fs/btrfs/transaction.c:288
adithya2306#2: ffff88802a5ab8e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x3c/0x2a0 fs/btrfs/locking.c:198
stack backtrace:
CPU: 0 PID: 13257 Comm: syz-executor.2 Not tainted 6.5.0-rc7-syzkaller-00024-g93f5de5f648d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
__mutex_lock_common+0x1d8/0x2530 kernel/locking/mutex.c:603
__mutex_lock kernel/locking/mutex.c:747 [inline]
mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799
__btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256
btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline]
__btrfs_run_delayed_items+0x2b5/0x430 fs/btrfs/delayed-inode.c:1156
btrfs_commit_transaction+0x859/0x2ff0 fs/btrfs/transaction.c:2276
btrfs_sync_file+0xf56/0x1330 fs/btrfs/file.c:1988
vfs_fsync_range fs/sync.c:188 [inline]
vfs_fsync fs/sync.c:202 [inline]
do_fsync fs/sync.c:212 [inline]
__do_sys_fsync fs/sync.c:220 [inline]
__se_sys_fsync fs/sync.c:218 [inline]
__x64_sys_fsync+0x196/0x1e0 fs/sync.c:218
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f3ad047cae9
Code: 28 00 00 00 75 (...)
RSP: 002b:00007f3ad12510c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
RAX: ffffffffffffffda RBX: 00007f3ad059bf80 RCX: 00007f3ad047cae9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007f3ad04c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f3ad059bf80 R15: 00007ffe56af92f8
</TASK>
------------[ cut here ]------------
Fix this by releasing the path before releasing the delayed node in the
error path at __btrfs_run_delayed_items().
Reported-by: syzbot+a379155f07c134ea9879@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/000000000000abba27060403b5bd@google.com/
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 18, 2024
[ Upstream commit a154f5f643c6ecddd44847217a7a3845b4350003 ] The following call trace shows a deadlock issue due to recursive locking of mutex "device_mutex". First lock acquire is in target_for_each_device() and second in target_free_device(). PID: 148266 TASK: ffff8be21ffb5d00 CPU: 10 COMMAND: "iscsi_ttx" #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f adithya2306#1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224 adithya2306#2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee adithya2306#3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7 adithya2306#4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3 #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod] #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod] #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod] #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod] #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod] #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod] #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod] #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod] #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod] #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364 Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 18, 2024
commit 5a22fbcc10f3f7d94c5d88afbbffa240a3677057 upstream.
When LAN9303 is MDIO-connected two callchains exist into
mdio->bus->write():
1. switch ports 1&2 ("physical" PHYs):
virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})->
lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested
2. LAN9303 virtual PHY:
virtual MDIO bus (lan9303_phy_{read|write}) ->
lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write}
If the latter functions just take
mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP
false-positive splat. It's false-positive because the first
mdio_lock in the second callchain above belongs to virtual MDIO bus, the
second mdio_lock belongs to physical MDIO bus.
Consequent annotation in lan9303_mdio_{read|write} as nested lock
(similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus)
prevents the following splat:
WARNING: possible circular locking dependency detected
5.15.71 adithya2306#1 Not tainted
------------------------------------------------------
kworker/u4:3/609 is trying to acquire lock:
ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex
but task is already holding lock:
ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> adithya2306#1 (&bus->mdio_lock){+.+.}-{3:3}:
lock_acquire
__mutex_lock
mutex_lock_nested
lan9303_mdio_read
_regmap_read
regmap_read
lan9303_probe
lan9303_mdio_probe
mdio_probe
really_probe
__driver_probe_device
driver_probe_device
__device_attach_driver
bus_for_each_drv
__device_attach
device_initial_probe
bus_probe_device
deferred_probe_work_func
process_one_work
worker_thread
kthread
ret_from_fork
-> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}:
__lock_acquire
lock_acquire.part.0
lock_acquire
__mutex_lock
mutex_lock_nested
regmap_lock_mutex
regmap_read
lan9303_phy_read
dsa_slave_phy_read
__mdiobus_read
mdiobus_read
get_phy_device
mdiobus_scan
__mdiobus_register
dsa_register_switch
lan9303_probe
lan9303_mdio_probe
mdio_probe
really_probe
__driver_probe_device
driver_probe_device
__device_attach_driver
bus_for_each_drv
__device_attach
device_initial_probe
bus_probe_device
deferred_probe_work_func
process_one_work
worker_thread
kthread
ret_from_fork
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&bus->mdio_lock);
lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock);
lock(&bus->mdio_lock);
lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock);
*** DEADLOCK ***
5 locks held by kworker/u4:3/609:
#0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work
adithya2306#1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work
adithya2306#2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach
adithya2306#3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch
adithya2306#4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read
stack backtrace:
CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 adithya2306#1
Workqueue: events_unbound deferred_probe_work_func
Call trace:
dump_backtrace
show_stack
dump_stack_lvl
dump_stack
print_circular_bug
check_noncircular
__lock_acquire
lock_acquire.part.0
lock_acquire
__mutex_lock
mutex_lock_nested
regmap_lock_mutex
regmap_read
lan9303_phy_read
dsa_slave_phy_read
__mdiobus_read
mdiobus_read
get_phy_device
mdiobus_scan
__mdiobus_register
dsa_register_switch
lan9303_probe
lan9303_mdio_probe
...
Cc: stable@vger.kernel.org
Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pascoato
pushed a commit
to Pascoato/android_kernel_xiaomi_lahaina
that referenced
this pull request
Feb 18, 2024
[ Upstream commit 15319a4e8ee4b098118591c6ccbd17237f841613 ] As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock adithya2306#2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.