[AutoBump] Merge with ff99af7e (Feb 20) (62) #606

jorickert · 2025-06-18T09:27:29Z

No description provided.

- Support finding implementors of a protocol and discovering subclasses for ObjC interfaces via the implementations call - Support jumping to the overridden method when you trigger goto definition on an override - Properly find references to overridden methods

…inimum(maximum)_num (llvm#127711) For targets that support IEEE fminimum_num/fmaximum_num, the corresponding *_min_num_fXY/*_max_num_fXY instructions themselves already did the canonicalization for the inputs. As a result, we do not need to explicitly canonicalize the inputs for fminnum/fmaxnum.

…`modulemap` (llvm#127839) b41b86a added a new textual header `clang/Lex/HLSLRootSignatureTokenKinds.def` but did not add it to `clang`'s module map. This causes build failure when building llvm with `-DLLVM_ENABLE_MODULES=ON`. This PR adds the new textual header to the module map and fixes the build break. Fixing rdar://145148093.

Addresses llvm#125604 - Implements `and` as an HLSL builtin function - The `and` HLSL builtin function gets lowered to the the LLVM `and` instruction

) This is similar to llvm#125748

…store address. (llvm#127151) SelectionDAG will not reassociate adds to the end of a chain if there are multiple users of later additions. This prevents isel from folding the immediate into a load/store address. One easy way to see this is accessing an array in a struct with two different indices. An ADDI will be used to get to the start of the array then 2 different SHXADD instructions will be used to add the scaled indices. Finally the SHXADD will be used by different load instructions. We can remove the ADDI by folding the offset into each load. This patch adds a new pass that analyzes how an ADDI constant propagates through address arithmetic. If the arithmetic is only used by a load/store and the offset is small enough, we can adjust the load/store offset and remove the ADDI. This pass is placed before MachineCSE to allow cleanups if some instructions become common after removing offsets from their inputs. This pass gives ~3% improvement on dynamic instruction count on 541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's a ~1% improvement on 557.xz_r.

…7684)

…llvm#126876) Using LLVM build itself for PGO training is convenient and a great starting point but it also has several issues: * LLVM build implicitly depends on tools other than CMake and C/C++ compiler and if those tools aren't available in PATH, the build will fail. * LLVM build also requires standard headers and libraries which may not always be available in the default location requiring an explicit sysroot. * Building a single configuration (-DCMAKE_BUILD_TYPE=Release) only exercises the -O3 pipeline and can pesimize other configurations. * Building for the host target doesn't exercise all other targets. * Since LLVMSupport is a static library, this doesn't exercise the linker (beyond what the CMake itself does). Rather than using LLVM build, ideally we would provide a more minimal, purpose built corpus. While we're working on building such a corpus, provide a CMake option that lets vendors disable the use LLVM build for PGO training.

…27844) There was a discrepancy between the type-converter and rewrite-pattern parts of conversion to LLVM used in various GPU targets, at least ROCDL and NVVM: - The TypeConverter part was handling vectors of arbitrary rank, converting them to nests of `!llvm.array< ... >` with a vector at the inner-most dimension: https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp#L629-L655 - The rewrite pattern part was not handling `llvm.array`: https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp#L594-L596 That led to conversion failures when lowering `math` dialect ops on rank-2 vectors, as in the testcase being added in this PR. This PR fixes this by reusing a shared utility already used in other conversions to LLVM: https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/LLVMCommon/VectorPattern.cpp#L80-L104 --------- Signed-off-by: Benoit Jacob <[email protected]>

This patch implements a new subclass of the Value class used for Sandbox IR Values that we don't support, like metadata or inline asm. The goal is to never have null sandboxir::Value objects, because this is not the expected behavior.

…to load/store address. (llvm#127151)" This reverts commit c3ebbfd. Seeing some test failures on the build bot.

…ic dims (llvm#118208) This pr fixes how iteration domain of linalg.generic is collapsed when fusing with tensor.expand_shape. Previously, the output_shape for tensor.expand shape was infered, which doesn't always work except some special cases. This patch makes the logic explicitly set the bounds of the new collapsed iteration domain, because we already know them. --------- Co-authored-by: Jakub Kuderski <[email protected]>

…into load/store address. (llvm#127151)" Tests have been re-generated with recent scheduler changes. Original message: SelectionDAG will not reassociate adds to the end of a chain if there are multiple users of later additions. This prevents isel from folding the immediate into a load/store address. One easy way to see this is accessing an array in a struct with two different indices. An ADDI will be used to get to the start of the array then 2 different SHXADD instructions will be used to add the scaled indices. Finally the SHXADD will be used by different load instructions. We can remove the ADDI by folding the offset into each load. This patch adds a new pass that analyzes how an ADDI constant propagates through address arithmetic. If the arithmetic is only used by a load/store and the offset is small enough, we can adjust the load/store offset and remove the ADDI. This pass is placed before MachineCSE to allow cleanups if some instructions become common after removing offsets from their inputs. This pass gives ~3% improvement on dynamic instruction count on 541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's a ~1% improvement on 557.xz_r.

…s. NFC (llvm#127678) In removePartiallyOverlappedStores we iterate over InstOverlapIntervalsTy which is a DenseMap. Change that map into using MapVector to ensure that we apply the transforms in a deterministic order. I've only seen that the order matters if starting to use names for the instructions created when doing the transforms. But such things are a bit annoying when debugging etc.

…s. NFC (llvm#127848) This is only used to get the Module and the LLVMContext. We can get both of those from the GlobalVariable*.

…lvm#113829) See https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116/6 for more information.

In order to facilitate cloning of recursive cycles, we first identify backedges using a standard DFS search from the root callers, then initially defer recursively invoking the cloning function via those edges. This is because the cloning opportunity along the backedge may not be exposed until the current node is cloned for other non-backedge callers that are cold after the earlier recursive cloning, resulting in a cold predecessor of the backedge. So we recursively invoke the cloning function for the backedges during the cloning of the current node for its caller edges (which were sorted to enable handling cold callers first). There was no significant time or memory overhead measured for several large applications.

…rnal() (llvm#127891) Move the most common if statement to the top and the least common ones to the bottom. This should save CPU cycles during compilation. This patch also prefixes the llvm variables with the LLVM prefix to make the naming convention in this function more uniform. For example `C` to `LLVMC`.

…ariable checkers. (llvm#127570) Like a C++ member variable, every Objective-C++ instance variable must be a RefPtr, Ref CheckedPtr, or CheckedRef to an object, not a raw pointer or reference.

…122951) Add option `AllowedTypes` which allow users to specify types they want to exclude from const-correctness check. Small real-world example: ```cpp #include <mutex> int main() { std::mutex m; std::lock_guard<std::mutex> l(m); // we want to ignore it since std::lock_guard is already immutable. } ``` Closes issue llvm#122592

…116465) This patch adds codegen for all Arm9.6-a compare-and-branch instructions, that operate on full w or x registers. The instruction variants operating on half-words (cbh) and bytes (cbb) are added in a subsequent patch. Since CB doesn't use standard 4-bit Arm condition codes but a reduced set of conditions, encoded in 3 bits, some conditions are expressed by modifying operands, namely incrementing or decrementing immediate operands and swapping register operands. To invert a CB instruction it's therefore not enough to just modify the condition code which doesn't play particularly well with how the backend is currently organized. We therefore introduce a number of pseudos which operate on the standard 4-bit condition codes and lower them late during codegen.

Fixes llvm#126603 --------- Signed-off-by: ZakyHermawan <[email protected]>

This patch fixes: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:3409:8: error: unused variable 'I' [-Werror,-Wunused-variable]

) Add support for sm101 and sm120 target architectures. It requires CUDA 12.8. --------- Co-authored-by: Sebastian Jodlowski <[email protected]>

Summary: The scan operation implemented here only works if there are contiguous ones in the executation mask that can be used to propagate the result. There are two solutions to this, one is to enter 'whole-wave-mode' and forcibly turn them back on, or to do this serially. This implementation does the latter because it's more portable, but checks to see if the parallel fast-path is applicable. Needs to be backported for correct behavior and because it fixes a failing libc test.

Looks like llvm#124936 was reverted (for modifying JSON output), but the test for JSON output with errors was deleted in llvm#126587 (to attempt to fix failing build) This will add back a test and a new one for llvm-dwarfdump to validate the JSON for errors. One case where the sub-categories will eventually appear and another where not. test plan: ninja check-llvm-tools-llvm-dwarfdump

…llvm#127972) checkUsers currently does two things, a) work out the minimum VL read by every user and b) check that the operand info of the MI and users match. getMinimumVLForUser handles most of a), with the exception of the check for instructions that read past VL e.g. vrgather which is still in checkUsers. This moves it into getMinimumVLForUser to keep all that logic in one place and simplifies an upcoming patch.

The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select it appears the order of the operands was chosen badly. This switches the conditions used to keep the constant on the RHS.

I suspect it was fixed by llvm#127059. aarch64 is the only windows bot we have now, so it's can't be certain it's fixed everywhere, but also I have no reason to believe otherwise. Fixes llvm#43774.

…lvm#126751) Fold: (select (not m), 1, 0) -> (select m, 0, 1) (select (not m), -1, 0) -> (select m, 0, -1)

Fix pr llvm#127553. x86_64 failed to run readcyclecounter.ll when enable expensive_check, it would error "Using an undefined physical register".

…emented (llvm#127980) Implemented in abc8812

This patch adds benchmarks for the copy family of algorithms (copy, copy_n, copy_if, copy_backward).

…tem from the headers (llvm#117764) Many parts of the locale base API are only required when building the shared/static library, but not from the headers. Document those functions and carve out a few of those that don't work when _XOPEN_SOURCE is defined to something old. Fixes llvm#117630

…o find additional (free) extract_subvector nodes

This patch adds the -fvectorize and -fno-vectorize flags to flang. Note that this also changes the behaviour of `flang -fc1` to match that of `clang -cc1`, which is that vectorization is only enabled in the presence of the `-vectorize-loops` flag. Additionally, this patch changes the behaviour of the default optimisation levels to match clang, such that vectorization only happens at the same levels as it does there. This patch is in draft while I write an RFC to discuss the above two changes.

From: llvm#125880 (comment)

…6545) This change introduces support of `OpTypeStructContinuedINTEL` instruction. Specification: https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_long_composites.html

Change getDependenceDistanceStrideAndSize to scale strides by TypeByteSize, scaling the returned CommonStride and MaxStride. Even though there is a seemingly-functional change of setting CommonStride when scaled strides are equal, it ends up being a non-functional change due to aggressive HasSameSize checking.

Several users of (mostly math/) gentype.inc rely on types other than the 'gentype'. This is commonly intN as several maths builtins expose this as a return or paramter type. We were previously explicitly defining this type for every gentype. Other implementations rely on integer types of the same size and element width as the gentype, such as short/ushort for half, long/ulong for double, etc. Users might also rely on as_type or convert_type builtins to/from these types. The previous method we used to define intN was unscalable if we wanted to expose more types and helpers. This commit introduces a simpler system whereby several macros are defined at the beginning of gentype.inc. These rely on concatenating with the vector size. To facilitate this system, scalar gentypes now define an empty vector size. It was previously undefined, which was dangerous. An added benefit is that it matches how the integer gentype.inc vector size has been working. These macros will be especially helpful for the definitions of logb/ilogb in an upcoming patch.

In preparation for llvm#127543

If we have a shuffle mask which can be represented as two slides + some conditional masking, we can emit a VLA sequence which is at most O(2*LMUL). This is essentially a generalization of the existing isElementRotate, but is staged to only introduce the new match for the moment. A follow up change will start consolidating code - see the notes below. A couple of notes: 1) I'm excluding bit rotates mostly to keep the diffs manageable. 2) The existing isElementRotate logic is nearly redundant after this change. However, we have some intersection between the bit rotate and element rotate matching. To keep things simple, I left that in place for now, and will merge/cleanup in a separate change. 3) The individual asVSlideup and asVSlidedown are closely related, but the former looks through extracts and the later changes VL. I'm leaving these in place for now, but hope to common them up a bit as well.

…l is UB (llvm#127979) Proof: https://alive2.llvm.org/ce/z/mzVj-u I will add some follow-up patches to avoid duplicate code, support more memory instructions, and bypass gep instructions.

Co-authored-by: Christian Ulmann <[email protected]>

The assertion was left over from a time when VPBBs still had an associated condition bit. This is not the case any more (comment was stale). In case a branch on condition is needed, a BranchOnCond VPInstruction is added when constructing recipes. That's also where it is checked if the condition is available. Exposed by 38376de.

…127907) Add a facility to check if there is any intersection between 2 sets. This will be used in some follow on changes to MemProf.

This PR llvm#123902 broke python bindings for `tensor.pack`/`unpack`. This PR fixes that. It also 1. adds convenience wrappers for pack/unpack 2. cleans up matmul-like ops in the linalg bindings 3. fixes linalg docs missing pack/unpack

This was accidentaly introduced in llvm#126876.

…sters (llvm#124475) Extending the conditionals in `AugmentRegisterInfo` to support alternative names for lldb. Fixes llvm#124023 There is an exception with register `X8` which is not covered here but more details can be found in the issue llvm#127900.

…lvm#127922) Recent versions of the system changed where these functions live.

This patch fixes: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:5723:24: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] llvm/lib/Target/RISCV/RISCVISelLowering.cpp:5728:76: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions]

…NFC (llvm#127968) Use nonstatic member instead. This requires explicit conversions, but many will go away as we continue converting unsigned to Register. In a few places where it was simple, I changed unsigned to Register.

DavidGoldman and others added 30 commits February 19, 2025 13:21

[MLIR] Fix doc build (NFC)

8337d01

[HLSL] Implement the 'and' HLSL function (llvm#127098)

1c762c2

Addresses llvm#125604 - Implements `and` as an HLSL builtin function - The `and` HLSL builtin function gets lowered to the the LLVM `and` instruction

[MLIR][LLVMIR] Add support for asin acos atan intrinsics op (llvm#127317

5caefe2

) This is similar to llvm#125748

[OpenMP][NFC] Remove unused clock function types and globals (llvm#12…

b1f882f

…7684)

[OpenMP][NFC] Remove unused __kmp_dispatch_lock global (llvm#127686)

851177c

[SandboxIR] OpaqueValue (llvm#127699)

1987f93

This patch implements a new subclass of the Value class used for Sandbox IR Values that we don't support, like metadata or inline asm. The goal is to never have null sandboxir::Value objects, because this is not the expected behavior.

Revert "[RISCV] Add a pass to remove ADDI by reassociating to fold in…

37d0f20

…to load/store address. (llvm#127151)" This reverts commit c3ebbfd. Seeing some test failures on the build bot.

[gn build] Port 26e3750

84eacd3

[GlobalOpt] Remove Function* argument from tryWidenGlobalArrayAndDest…

1761066

…s. NFC (llvm#127848) This is only used to get the Module and the LLVMContext. We can get both of those from the GlobalVariable*.

[mlir][Vector] Deprecate vector.extractelement/vector.insertelement (l…

1a6ed4d

…lvm#113829) See https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116/6 for more information.

[InstCombine] Test for regession with trunc in foldSelectICmpAnd

3836559

Check the type of Objective-C++ instance variables in WebKit member v…

5f8b256

…ariable checkers. (llvm#127570) Like a C++ member variable, every Objective-C++ instance variable must be a RefPtr, Ref CheckedPtr, or CheckedRef to an object, not a raw pointer or reference.

[libc][POSIX][unistd] implement getsid (llvm#127341)

ef49760

Fixes llvm#126603 --------- Signed-off-by: ZakyHermawan <[email protected]>

[memprof] Fix a warning

6342095

This patch fixes: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:3409:8: error: unused variable 'I' [-Werror,-Wunused-variable]

[CUDA] Add support for sm101 and sm120 target architectures (llvm#127187

0127f16

) Add support for sm101 and sm120 target architectures. It requires CUDA 12.8. --------- Co-authored-by: Sebastian Jodlowski <[email protected]>

lukel97 and others added 30 commits February 20, 2025 20:17

[GlobalISel][AArch64] Fix fptoi.sat lowering. (llvm#127901)

70ed381

The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select it appears the order of the operands was chosen badly. This switches the conditions used to keep the constant on the RHS.

[lldb] Un-XFAIL TestDeadStrip.py on windows

58571c8

I suspect it was fixed by llvm#127059. aarch64 is the only windows bot we have now, so it's can't be certain it's fixed everywhere, but also I have no reason to believe otherwise. Fixes llvm#43774.

[RISCV] Avoid VMNOT by swapping VMERGE operands for mask extensions (l…

0a8341f

…lvm#126751) Fold: (select (not m), 1, 0) -> (select m, 0, 1) (select (not m), -1, 0) -> (select m, 0, -1)

[Mips] Reserve hardware register HWR2 (llvm#127775)

0c809ea

Fix pr llvm#127553. x86_64 failed to run readcyclecounter.ll when enable expensive_check, it would error "Using an undefined physical register".

[Clang] Mark P1061 (Structured Bindings can introduce a Pack) as impl…

12f8ed5

…emented (llvm#127980) Implemented in abc8812

[libc++] Add benchmarks for copy algorithms (llvm#127328)

8feb5ba

This patch adds benchmarks for the copy family of algorithms (copy, copy_n, copy_if, copy_backward).

[X86] combineX86ShufflesRecursively - peek through one use bitcasts t…

a03f064

…o find additional (free) extract_subvector nodes

[OpenMP][NFC] Remove unused debug lock (llvm#127928)

1c4e986

[MemCpyOpt] Add test for call slot opt with ret-only capture (NFC)

24cd933

From: llvm#125880 (comment)

[SPIR-V] Initial implementation of SPV_INTEL_long_composites (llvm#12…

9ffab56

…6545) This change introduces support of `OpTypeStructContinuedINTEL` instruction. Specification: https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_long_composites.html

[LV] Regen a couple of tests with UTC (llvm#127785)

3b9f964

[LV] Add inbounds to interleave test.

04b5c63

In preparation for llvm#127543

[InstCombine] Simplify the pointer operand of store if writing to nul…

1b78ff6

…l is UB (llvm#127979) Proof: https://alive2.llvm.org/ce/z/mzVj-u I will add some follow-up patches to avoid duplicate code, support more memory instructions, and bypass gep instructions.

[MLIR][Math] Add floating point value folders (llvm#127947)

1b610e6

[MLIR][LLVM] Fold extract of constant (llvm#127927)

c2e5142

Co-authored-by: Christian Ulmann <[email protected]>

[ADT] Add set_intersects to check if there is any intersection (llvm#…

60c6202

…127907) Add a facility to check if there is any intersection between 2 sets. This will be used in some follow on changes to MemProf.

[CMake] Fix variable name (llvm#127967)

81ed485

This was accidentaly introduced in llvm#126876.

[lldb] Update PlatformDarwin list of libraries with thread functions (l…

81bc28d

…lvm#127922) Recent versions of the system changed where these functions live.

[AutoBump] Merge with ff99af7 (Feb 20)

526b929

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with ff99af7e (Feb 20) (62) #606

[AutoBump] Merge with ff99af7e (Feb 20) (62) #606

Uh oh!

jorickert commented Jun 18, 2025

Uh oh!

Uh oh!

[AutoBump] Merge with ff99af7e (Feb 20) (62) #606

Are you sure you want to change the base?

[AutoBump] Merge with ff99af7e (Feb 20) (62) #606

Uh oh!

Conversation

jorickert commented Jun 18, 2025

Uh oh!

Uh oh!