[AutoBump] Merge with 27fe2c95 (Feb 18) (50) #594

jorickert · 2025-06-17T14:23:56Z

No description provided.

…1 for big endian (llvm#126933) These relocations apply to 16-bit Thumb instructions, so reading 16 bits rather than 32 bits ensures the correct bits are masked and written back. This fixes the incorrect masking and aligns the relocation logic with the instruction encoding. Before this patch, 32 bits were read from the ELF object. This did not align with the instruction size of 16 bits, but the masking incidentally made it all work nonetheless. However, this was the case only in little endian. In big endian mode, the read 32-bit word had to have its bytes reversed. With this byte reordering, the masking would be applied to the wrong bits, hence causing the incorrect encoding to be produced as a result of the relocation resolution. The added test checks the result for both little and big endian modes.

…2nd attempt) (llvm#127406) In my previous attempt (llvm#126913) of fixing the flaky case was on a good track when I used the begin locations as a stable ordering. However, I forgot to consider the case when the begin locations are the same among the Exprs. In an `EXPENSIVE_CHECKS` build, arrays are randomly shuffled prior to sorting them. This exposed the flaky behavior much more often basically breaking the "stability" of the vector - as it should. Because of this, I had to revert the previous fix attempt in llvm#127034. To fix this, I use this time `Expr::getID` for a stable ID for an Expr. Hopefully fixes llvm#126619 Hopefully fixes llvm#126804

`computeStaticLoopSizes()` is functionally identical to `getStaticLoopRanges()`. Replace all uses of `computeStaticLoopSizes()` by `getStaticLoopRanges()` and remove the former.

…lvm#127456) Properly reset to the last ID and return the current ID from getCurrentDecl().

…. NFC.

Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change was discussed in the following RFC: * https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg This change involves significant churn but only relocates existing code - no new functionality is added. **Note for Downstream Users** Downstream users must update references to `PackOp` and `UnPackOp` as follows: * Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g` * Tests: `s/tensor.(un)pack/linalg.(un)pack/g` No other modifications should be required.

…compiler (llvm#127457) This PR adds `cmd-options` to the `gpu-lower-to-nvvm-pipeline` pipeline and the `nvvm-attach-target` pass, allowing users to pass flags to the downstream compiler, *ptxas*. Example: ``` mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-chip=sm_80 ptxas-cmd-options='-v --register-usage-level=8'" ```

@f

…lvm#126737) The pattern was returning success() by default which made the greedy pattern application act as if the IR was modified and even though nothing was changed and thus it can prevent it from converging for no legitimate reason. The patch makes the rewrite pattern return failure() by default and success() if and only if the IR changed. An example of unexpected behavior is by running `mlir-opt input.mlir --linalg-specialize-generic-ops`, we obtain an empty mlir as output with `input.mlir` as follows: ``` #map = affine_map<(d0) -> (d0)> func.func @f(%arg0: tensor<8xi32>, %arg1: tensor<8xi32>) -> tensor<8xi32> { %0 = tensor.empty() : tensor<8xi32> %1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%arg0, %arg1: tensor<8xi32>, tensor<8xi32>) outs(%0: tensor<8xi32>) { ^bb0(%in: i32, %in_0: i32, %out: i32): %2 = arith.addi %in, %in_0: i32 linalg.yield %2: i32 } -> tensor<8xi32> return %1 : tensor<8xi32> } ```

This patch adds support for describing per-write resource cycle counts for ReadAdvance records via a new optional field called `tunables`. This makes it possible to declare ReadAdvance records such as: def : ReadAdvance<Read_C, 1, [Write_A, Write_B], [2]>; The above will effectively declare two entries in the ReadAdvance table for Read_C, one for Write_A with a cycle count of 1+2, and one for Write_B with a cycle count of 1+0 (omitted values are assumed 0). The field `tunables` provides a list of deltas relative to the base `cycle` count of the ReadAdvance. Since the field is optional and defaults to a list of 0's, this change doesn't affect current targets.

…127096) In soft floating-point ABI, this function takes the double argument as a pair of registers r0 and r1. The ordering of these two registers follow the endianness rules, therefore the register on which the bit flipping must happen depends on the endianness.

Under non-Windows platforms, also create a dynamic library version of the runtime. Build of either version of the library can be switched on using FLANG_RT_ENABLE_STATIC=ON respectively FLANG_RT_ENABLE_SHARED=ON. Default is to build only the static library, consistent with previous behaviour. This is because the way the flang driver invokes the linker, most linkers choose the dynamic library by default, if available. Building the dynamic library therefore causes flang-built executables to depend on `libflang_rt.so`, unless explicitly told otherwise.

Exhaustively test makeExactICmpRegion by comparing makeAllowedICmpRegion against makeSatisfyingICmpRegion for all APInts.

On z/OS, dlopen is guarded by _XOPEN_SOURCE=600 so define it when checking for the symbol.

…m#125412)

…lvm#127460) This fixes a false positive caused by llvm#114044. For `GSLPointer*` types, it's less clear whether the lifetime issue is about the GSLPointer object itself or the owner it points to. To avoid false positives, we take a conservative approach in our heuristic. Fixes llvm#127195 (This will be backported to release 20).

Adds the equivalent watchOS and tvOS version checks to check for support for aligned_alloc, we already have macOS and iOS checks.

…tor = ` (NFC) We should avoid specifying it manually and instead rely on TableGen, see also cleanups in llvm#127403

These cannot be static compile errors, and should be treated as poison. Invalid casts may be introduced which are dynamically dead. For example: ``` void foo(volatile generic int* x) { __builtin_assume(is_shared(x)); *x = 4; } void bar() { private int y; foo(&y); // violation, wrong address space } ``` This could produce a compile time backend error or not depending on the optimization level. Similarly, the new test demonstrates a failure on a lowered atomicrmw which required inserting runtime address space checks. The invalid cases are dynamically dead, we should not error, and the AtomicExpand pass shouldn't have to consider the details of the incoming pointer to produce valid IR. This should go to the release branch. This fixes broken -O0 compiles with 64-bit atomics which would have started failing in 1d03708.

The `applyOpPatternsAndFold` is deprecated, use `applyOpPatternsGreedily` instead.

Extends generic `loop` directive support by supporting the `bind` clause. Since semantic checking does the heavy lifting of verifying the proper usage of the clause modifier, we can simply enable code-gen for `teams loop bind(...)` without the need to differentiate between the values the the clause can accept.

…BLOCK streams (llvm#127049) this PR close llvm#124474 when calling `read` and `recv` function for a non-block file descriptor or a invalid file descriptor(`-1`), it will not cause block inside a critical section. this commit checks for non-block file descriptor assigned by `open` function with `O_NONBLOCK` flag. --------- Co-authored-by: Balazs Benics <[email protected]>

This is necessary to enable composing subregisters in peephole-opt. For now use a brute force table to find the return value. The worst case target is AMDGPU with a 399 x 399 entry table.

After llvm#117558 landed, this code would assert "Value is not an N-bit unsigned value" in getConstant(), from a test case in zig. Co-authored-by: Craig Topper <[email protected]> Fixes llvm#127296

Update VPWidenPHIRecipe to use the predecessors in VPlan to determine the incoming blocks instead of tracking them separately. This brings VPWidenPHIRecipe in line with the other phi recipes. PR: llvm#126388

…nctions (llvm#126958) This is a fix for llvm#126949 There are two issues being fixed here. First, in some cases, OMPIRBuilder generates empty target task proxy functions. This happens when the target kernel doesn't use any stack-allocated data (either no data or only globals). The second problem is encountered when the target task i.e the code that makes the target call spans a single basic block. This usually happens when we do not generate a target or device kernel launch and instead fall back to the host. In such cases, we end up not outlining the target task entirely. This can cause us to call target kernel twice - once via the target task proxy function and a second time via the host fallback This PR fixes both of these problems and updates some tests to catch these problems should this patch fail.

…END_VECTOR_INREG(y)) -> EXTEND_VECTOR_INREG(unpack(x,y)) (llvm#127502) Concat/unpack the src subvectors together in the bottom 128-bit vector and then extend with a single EXTEND/EXTEND_VECTOR_INREG instruction Required the getEXTEND_VECTOR_INREG helper to be tweaked to accept EXTEND_VECTOR_INREG opcodes as well to avoid us having to remap the opcode between both types.

The OpenCL C specification states that for out-of-range dimension indices, `get_num_groups` must return 1 instead of 0.

… stack is enabled (llvm#127592) The `-fcf-protection=[full|return]` flag enables shadow stack implementation based on RISC-V Zicfiss extension. This patch adds the `__riscv_shadow_stack` predefined macro to preprocessing when such a shadow stack implementation is enabled.

Adds AVX512 bf16 conversion from packed f32 to bf16 elements. Tests are slightly refactored to better follow file's convention.

…127129) It does not change the estimate because getInstSizeInBytes() already returns 0 for meta instructions, but added a test and early bail.

…lvm#127484) It was also too permissive for a more general utilty, only return the original immediate if there is no subregister.

) Currently there are many casts that are not modeled (i.e. ignored) by the analyzer, which can cause paradox states (e.g. negative value stored in `unsigned` variable) and false positive reports from various checkers, e.g. from `security.ArrayBound`. Unfortunately this issue is deeply rooted in the architectural limitations of the analyzer (if we started to model the casts, it would break other things). For details see the umbrella ticket llvm#39492 This commit adds an ugly hack in `security.ArrayBound` to silence most of the false positives caused by this shortcoming of the engine. Fixes llvm#126884

…lvm#127485)

The stripGetElementPtr function is mysteriously named, and calls into another mysterious getGEPInductionOperand which does something complicated with GEP indices. The real purpose of the badly-named stripGetElementPtr function is to get a loop-variant GEP index, if there is one. The getGEPInductionOperand is totally redundant, as stripping off zeros from the end of GEP indices has no effect on computing the loop-variant GEP index, as constant zeros are always loop-invariant. Moreover, the GEP induction operand is simply the first non-zero index from the end, which stripGetElementPtr returns when it finds that any of the GEP indices are loop-variant: this is a completely unrelated value to the GEP index that is loop-variant. The implicit assumption here is that there is only ever one loop-variant index, and it is the first non-zero one from the end. The logic is unnecessarily complicated for what stripGetElementPtr wants to achieve, and the header comments are confusing as well. Strip getGEPInductionOperand, rework and rename stripGetElementPtr.

See llvm#126411 / llvm#127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.

…#127320) * Remove `vector.create_mask` from tests. Instead, pass masks as arguments. This simplifies the tests without sacrificing test coverage. * Update `@xfer_read_minor_identity_tranposed_with_mask_scalable` to use similar shapes as other tests and to avoid using test Ops (e.g. `@test.some_use`). This improves consistency between tests. * Fix some comment typos.

… and SPIR-V friendly builtins for Image Read/Write instructions (llvm#127242) This PR improves built-in variables and functions support: * extends mapping from an OpenCL C built-in function to the SPIR-V BuiltIn variables as in https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_built_in_variables, and * adds SPIR-V friendly builtins for Image Read/Write instructions. Test cases are extended accordingly.

…ACTERS (llvm#126924) `mbstate_t` needs to be visible to libcpp, even when it is not providing wide character functionality (i.e. `_LIBCPP_HAS_WIDE_CHARACTERS` is turned off) and thus not using any of the C library's wide character functions. There are C libraries (such as newlib-nano/nanolib/picolibc) which do provide their definition of `mbstate_t` in `<wchar.h>` even though they do not come with wide character functions. Since there is a way to conditionally include the C library's `<wchar.h>` only if it exists, we should rely on the fact that if it exists, it will provide `mbstate_t`. Removing this guard will allow using libc++ on top of newlib-nano/picolibc while not breaking the cases where it is used on top of a C library which doesn't provide `<wchar.h>` (since it would then still go look for `<uchar.h>` or error out).

…irectables. Updates JITLinkRedirectableSymbolManager to take alias flags into account when setting the scope and linkage of the created stubs (weak aliases get now get weak linkage, hidden stubs get hidden visibility). Updates lazyReexports to propagate alias flags (rather than trampoline flags) when building the initial destinations map for the redirectable symbols manager. Together these changes allow the LazyObjectLinkingLayer to link objects containing weak and hidden symbols.

…17309) Legacy pass used to provide the advisor, so this extracts that logic into a provider class used by both analysis passes. All three (Default, Release, Development) legacy passes `*AdvisorAnalysis` are basically renamed to `*AdvisorProvider`, so the actual legacy wrapper passes are `*AdvisorAnalysisLegacy`. There is only one NPM analysis `RegAllocEvictionAnalysis` that switches between the three providers in the `::run` method, to be cached by the NPM. Also adds `RequireAnalysis<RegAllocEvictionAnalysis>` to the optimized target reg alloc codegen builder.

The BUILD file changes in llvm#127544 adds `LinalgInterfaces` which is incomplete without `LinalgDialect`. For now, just add the `LinalgDialect` as dependency to tests which do not otherwise depend on it (but depend on `LinalgInterfaces` through e.g. `TensorDialect`). This is a temporary solution until the dependency of `TensorDialect` is trimmed to just the `linalg::RelayoutOpInterface`, but not the other linalg interfaces. See llvm#127544 (review).

…7586)

libclc uses llvm-link to link together all of the individually built libclc builtins files into one module. Some of these builtins files are compiled from source by clang whilst others are converted from LLVM IR directly to bytecode. When llvm-link links a 'source' module into a 'destination' module, it warns if the two modules have differing data layouts. The LLVM IR files libclc links either have no data layout (shared submodule files) or an explicit data layout in the case of certain amdgcn/r600 files. The warnings are very noisy and largely inconsequential. We can suppress them exploiting a specific behaviours exhibited by llvm-link. When the destination module has no data layout, it is given the source module's data layout. Thus, if we link together all IR files first, followed by the clang-compiled modules, 99% of the warnings are suppressed as they arose from linking an empty data layout into a non-empty one. The remaining warnings came from the amdgcn and r600 targets. Some of these were because the data layouts were out of date compared with what clang currently produced, so those could have been updated. However, even with those changes and by grouping the IR files together, the linker may still link explicit data layouts with empty ones depending on the order the IR files are processed. As it happens, the data layouts aren't essential. With the changes to the link line we can rely on those IR files receiving the correct data layout from the clang-compiled modules later in the link line. This also makes the previously AMDGPU-specific IR files available to be used by all targets in a generic capacity in the future.

Although the operation is deprecated in the most recent version of the SPIR-V spec, it is still used by older shaders, so having it defined is valuable and incurs negligible maintenance overhead, due to op simplicity.

…27594) The same literal can be used multiple times in an instruction, not just once. We were not tracking the used value to verify this, so correct this. This helps avoid regressions in a future patch.

If the target type is a pointer type.

…lvm#127600) Following on from llvm#126737, adds a negative test that: * prior to llvm#126737, would incorrectly generated empty output, * with the fix in-tree, simply outputs the input IR (i.e. the specialization "fails"). I've also made minor editorial changes.

This patch continues the work that was started here https://reviews.llvm.org/D99426 to correctly open text files in text mode.

This fixes build errors on mac OS.

vhscampos and others added 30 commits February 17, 2025 10:10

[mlir][linalg] Remove computeStaticLoopSizes (llvm#124778)

b302829

`computeStaticLoopSizes()` is functionally identical to `getStaticLoopRanges()`. Replace all uses of `computeStaticLoopSizes()` by `getStaticLoopRanges()` and remove the former.

[clang][bytecode] Restructure Program::CurrentDeclaration handling (l…

f09fd94

…lvm#127456) Properly reset to the last ID and return the current ID from getCurrentDecl().

[X86] combineConcatVectorOps - remove duplicate DAG.getContext() call…

9d24f94

…. NFC.

Reformat reglists in SystemZMCTargetDesc.cpp (NFC) (llvm#127472)

02c44ce

Fix typo in LangImpl03.rst (llvm#127389)

9c9157b

ConstRange: exhaustively test makeExactICmpRegion (llvm#127058)

8eba128

Exhaustively test makeExactICmpRegion by comparing makeAllowedICmpRegion against makeSatisfyingICmpRegion for all APInts.

[SystemZ][z/OS] Define _XOPEN_SOURCE=600 for dlopen (llvm#127254)

81a8b20

On z/OS, dlopen is guarded by _XOPEN_SOURCE=600 so define it when checking for the symbol.

[libunwind] Silence -Wunused-parameter warnings in Unwind-wasm.c (llv…

f4206f9

…m#125412)

[libc++] Add watchOS and tvOS checks for aligned_alloc (llvm#126862)

949e404

Adds the equivalent watchOS and tvOS version checks to check for support for aligned_alloc, we already have macOS and iOS checks.

[MLIR][Doc] Update the pass infra doc to advise against `let construc…

d25beca

…tor = ` (NFC) We should avoid specifying it manually and instead rely on TableGen, see also cleanups in llvm#127403

[mlir] Update docs for Greedy Pattern Rewrite Driver(NFC) (llvm#126701)

4e41e9a

The `applyOpPatternsAndFold` is deprecated, use `applyOpPatternsGreedily` instead.

TableGen: Generate reverseComposeSubRegIndices (llvm#127050)

ab2d330

This is necessary to enable composing subregisters in peephole-opt. For now use a brute force table to find the return value. The worst case target is AMDGPU with a 399 x 399 entry table.

[libc++] Synchronize status pages with Github issues list

fb29f19

[libc++] Synchronize a few remaining status page rows with Github issues

ec54403

[Hexagon] Explicitly truncate constant in UAddSubO (llvm#127360)

788cb72

After llvm#117558 landed, this code would assert "Value is not an N-bit unsigned value" in getConstant(), from a test case in zig. Co-authored-by: Craig Topper <[email protected]> Fixes llvm#127296

[VPlan] Use VPlan predecessors in VPWidenPHIRecipe (NFC). (llvm#126388)

6c62783

Update VPWidenPHIRecipe to use the predecessors in VPlan to determine the incoming blocks instead of tracking them separately. This brings VPWidenPHIRecipe in line with the other phi recipes. PR: llvm#126388

svenvh and others added 30 commits February 18, 2025 10:24

[SPIR-V] Fix out-of-range value for NumWorkgroups builtin (llvm#127198)

61ab476

The OpenCL C specification states that for out-of-range dimension indices, `get_num_groups` must return 1 instead of 0.

[mlir][x86vector] AVX512-BF16 Convert packed F32 to BF16 (llvm#125685)

2b71df5

Adds AVX512 bf16 conversion from packed f32 to bf16 elements. Tests are slightly refactored to better follow file's convention.

[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. (llvm#…

bc4f05d

…127129) It does not change the estimate because getInstSizeInBytes() already returns 0 for meta instructions, but added a test and early bail.

AMDGPU: Extract lambda used in foldImmediate into a helper function (l…

7c03865

…lvm#127484) It was also too permissive for a more general utilty, only return the original immediate if there is no subregister.

AMDGPU: Handle subregister uses in SIFoldOperands constant folding (l…

cd10c01

…lvm#127485)

[bolt][bazel] Port llvm@e235fcb.

ef21831

[AArch64] Add a phase-ordering test for dividing vscale. NFC

c71f914

See llvm#126411 / llvm#127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.

[FunctionAttrs] Fix typo in getArgumentAccessInfo name (NFC)

719c46b

[MLIR] Update operator<< in objects of DataFlowFramework.h (llvm#12…

91ef371

…7586)

[llvm][docs] Fix typo in Backporting section of GitHub.rst.

df300a4

[mlir][spirv] Add definition for OpKill (llvm#126554)

93d3e20

Although the operation is deprecated in the most recent version of the SPIR-V spec, it is still used by older shaders, so having it defined is valuable and incurs negligible maintenance overhead, due to op simplicity.

AMDGPU: Correct legal literal operand logic for multiple uses (llvm#1…

eb7c947

…27594) The same literal can be used multiple times in an instruction, not just once. We were not tracking the used value to verify this, so correct this. This helps avoid regressions in a future patch.

[gn] Move write_target_def_file to its own .gni file

e5ce1d3

[gn] port e235fcb (bolt TargetConfig.def)

09c2441

[clang][bytecode] Allow up/down casts of nullptr (llvm#127615)

5fbb6d9

If the target type is a pointer type.

[SystemZ][z/OS] Mark text files as text in ClangScanDeps (llvm#127514)

3b6cc94

This patch continues the work that was started here https://reviews.llvm.org/D99426 to correctly open text files in text mode.

[BasicAA] Add test for llvm#126670 (NFC)

0d66659

[bazel]Move HAVE_GETAUXVAL from config.h to config.bzl (llvm#127637)

27fe2c9

This fixes build errors on mac OS.

[AutoBump] Merge with 27fe2c9 (Feb 18)

6666d7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with 27fe2c95 (Feb 18) (50) #594

[AutoBump] Merge with 27fe2c95 (Feb 18) (50) #594

Uh oh!

jorickert commented Jun 17, 2025

Uh oh!

Uh oh!

[AutoBump] Merge with 27fe2c95 (Feb 18) (50) #594

Are you sure you want to change the base?

[AutoBump] Merge with 27fe2c95 (Feb 18) (50) #594

Uh oh!

Conversation

jorickert commented Jun 17, 2025

Uh oh!

Uh oh!