Skip to content

[AutoBump] Merge with 27fe2c95 (Feb 18) (50) #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 128 commits into
base: bump_to_c1a22925
Choose a base branch
from

Conversation

jorickert
Copy link

No description provided.

vhscampos and others added 30 commits February 17, 2025 10:10
…1 for big endian (llvm#126933)

These relocations apply to 16-bit Thumb instructions, so reading 16 bits
rather than 32 bits ensures the correct bits are masked and written
back. This fixes the incorrect masking and aligns the relocation logic
with the instruction encoding.

Before this patch, 32 bits were read from the ELF object. This did not
align with the instruction size of 16 bits, but the masking incidentally
made it all work nonetheless. However, this was the case only in little
endian.

In big endian mode, the read 32-bit word had to have its bytes reversed.
With this byte reordering, the masking would be applied to the wrong
bits, hence causing the incorrect encoding to be produced as a result of
the relocation resolution.

The added test checks the result for both little and big endian modes.
…2nd attempt) (llvm#127406)

In my previous attempt (llvm#126913) of fixing the flaky case was on a good
track when I used the begin locations as a stable ordering. However, I
forgot to consider the case when the begin locations are the same among
the Exprs.

In an `EXPENSIVE_CHECKS` build, arrays are randomly shuffled prior to
sorting them. This exposed the flaky behavior much more often basically
breaking the "stability" of the vector - as it should.
Because of this, I had to revert the previous fix attempt in llvm#127034.

To fix this, I use this time `Expr::getID` for a stable ID for an Expr.

Hopefully fixes llvm#126619
Hopefully fixes llvm#126804
`computeStaticLoopSizes()` is functionally identical to `getStaticLoopRanges()`.
Replace all uses of `computeStaticLoopSizes()` by `getStaticLoopRanges()` and remove the former.
…lvm#127456)

Properly reset to the last ID and return the current ID from
getCurrentDecl().
Moves `PackOp` and `UnPackOp` from the Tensor dialect to Linalg. This change
was discussed in the following RFC:
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg

This change involves significant churn but only relocates existing code - no new
functionality is added.

**Note for Downstream Users**
Downstream users must update references to `PackOp` and `UnPackOp` as follows:
  * Code: `s/tensor::(Up)PackOp/linalg::(Un)PackOp/g`
  * Tests: `s/tensor.(un)pack/linalg.(un)pack/g`

No other modifications should be required.
…compiler (llvm#127457)

This PR adds `cmd-options` to the `gpu-lower-to-nvvm-pipeline` pipeline
and the `nvvm-attach-target` pass, allowing users to pass flags to the
downstream compiler, *ptxas*.

Example:
```
mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-chip=sm_80 ptxas-cmd-options='-v --register-usage-level=8'"
```
…lvm#126737)

The pattern was returning success() by default which made the greedy
pattern application act as if the IR was modified and even though
nothing was changed and thus it can prevent it from converging for no
legitimate reason.

The patch makes the rewrite pattern return failure() by default and
success() if and only if the IR changed.

An example of unexpected behavior is by running `mlir-opt input.mlir
--linalg-specialize-generic-ops`, we obtain an empty mlir as output with
`input.mlir` as follows:
```
#map = affine_map<(d0) -> (d0)>
func.func @f(%arg0: tensor<8xi32>, %arg1: tensor<8xi32>) -> tensor<8xi32> {
  %0 = tensor.empty() : tensor<8xi32>
  %1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%arg0, %arg1: tensor<8xi32>, tensor<8xi32>) outs(%0: tensor<8xi32>) {
    ^bb0(%in: i32, %in_0: i32, %out: i32):
      %2 = arith.addi %in, %in_0: i32
      linalg.yield %2: i32
  } -> tensor<8xi32>
  return %1 : tensor<8xi32>
}
```
This patch adds support for describing per-write resource cycle counts
for ReadAdvance records via a new optional field called `tunables`.
    
This makes it possible to declare ReadAdvance records such as:
    
      def : ReadAdvance<Read_C, 1, [Write_A, Write_B], [2]>;
    
The above will effectively declare two entries in the ReadAdvance
table for Read_C, one for Write_A with a cycle count of 1+2, and one for
Write_B with a cycle count of 1+0 (omitted values are assumed 0).
    
The field `tunables` provides a list of deltas relative to the base
`cycle` count of the ReadAdvance. Since the field is optional and
defaults to a list of 0's, this change doesn't affect current targets.
…127096)

In soft floating-point ABI, this function takes the double argument as a
pair of registers r0 and r1.

The ordering of these two registers follow the endianness rules,
therefore the register on which the bit flipping must happen depends on
the endianness.
Under non-Windows platforms, also create a dynamic library version of
the runtime. Build of either version of the library can be switched on
using FLANG_RT_ENABLE_STATIC=ON respectively FLANG_RT_ENABLE_SHARED=ON.
Default is to build only the static library, consistent with previous
behaviour. This is because the way the flang driver invokes the linker,
most linkers choose the dynamic library by default, if available.
Building the dynamic library therefore causes flang-built executables to
depend on `libflang_rt.so`, unless explicitly told otherwise.
Exhaustively test makeExactICmpRegion by comparing makeAllowedICmpRegion
against makeSatisfyingICmpRegion for all APInts.
On z/OS, dlopen is guarded by _XOPEN_SOURCE=600 so define it when
checking for the symbol.
…lvm#127460)

This fixes a false positive caused by llvm#114044.

For `GSLPointer*` types, it's less clear whether the lifetime issue is
about the GSLPointer object itself or the owner it points to. To avoid
false positives, we take a conservative approach in our heuristic.

Fixes llvm#127195

(This will be backported to release 20).
Adds the equivalent watchOS and tvOS version checks to check for support
for aligned_alloc, we already have macOS and iOS checks.
…tor = ` (NFC)

We should avoid specifying it manually and instead rely on TableGen, see also
cleanups in llvm#127403
These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

```
  void foo(volatile generic int* x) {
    __builtin_assume(is_shared(x));
    *x = 4;
  }

  void bar() {
    private int y;
    foo(&y); // violation, wrong address space
  }
```

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d03708.
The `applyOpPatternsAndFold` is deprecated, use
`applyOpPatternsGreedily` instead.
Extends generic `loop` directive support by supporting the `bind`
clause. Since semantic checking does the heavy lifting of verifying the
proper usage of the clause modifier, we can simply enable code-gen for
`teams loop bind(...)` without the need to differentiate between the
values the the clause can accept.
…BLOCK streams (llvm#127049)

this PR close llvm#124474 
when calling `read` and `recv` function for a non-block file descriptor
or a invalid file descriptor(`-1`), it will not cause block inside a
critical section.
this commit checks for non-block file descriptor assigned by `open`
function with `O_NONBLOCK` flag.

---------

Co-authored-by: Balazs Benics <[email protected]>
This is necessary to enable composing subregisters in peephole-opt.
For now use a brute force table to find the return value. The worst
case target is AMDGPU with a 399 x 399 entry table.
After llvm#117558 landed, this code would assert "Value is not an N-bit
unsigned value" in getConstant(), from a test case in zig.

Co-authored-by:  Craig Topper <[email protected]>
Fixes llvm#127296
Update VPWidenPHIRecipe to use the predecessors in VPlan to determine
the incoming blocks instead of tracking them separately. This brings
VPWidenPHIRecipe in line with the other phi recipes.

PR: llvm#126388
…nctions (llvm#126958)

This is a fix for llvm#126949
There are two issues being fixed here.
First, in some cases, OMPIRBuilder generates empty target task proxy functions. This happens
when the target kernel doesn't use any stack-allocated data (either no data or only globals).

The second problem is encountered when the target task i.e the code that makes the target call
spans a single basic block. This usually happens when we do not generate a target or device kernel
launch and instead fall back to the host. In such cases, we end up not outlining the target task entirely. 
This can cause us to call target kernel twice - once via the target task proxy function and a second time
via the host fallback

This PR fixes both of these problems and updates some tests to catch these problems should this patch fail.
…END_VECTOR_INREG(y)) -> EXTEND_VECTOR_INREG(unpack(x,y)) (llvm#127502)

Concat/unpack the src subvectors together in the bottom 128-bit vector and then extend with a single EXTEND/EXTEND_VECTOR_INREG instruction

Required the getEXTEND_VECTOR_INREG helper to be tweaked to accept EXTEND_VECTOR_INREG opcodes as well to avoid us having to remap the opcode between both types.
svenvh and others added 30 commits February 18, 2025 10:24
The OpenCL C specification states that for out-of-range dimension
indices, `get_num_groups` must return 1 instead of 0.
… stack is enabled (llvm#127592)

The `-fcf-protection=[full|return]` flag enables shadow stack
implementation based on RISC-V Zicfiss extension. This patch adds the
`__riscv_shadow_stack` predefined macro to preprocessing when such a
shadow stack implementation is enabled.
Adds AVX512 bf16 conversion from packed f32 to bf16 elements.

Tests are slightly refactored to better follow file's convention.
…127129)

It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.
…lvm#127484)

It was also too permissive for a more general utilty, only return
the original immediate if there is no subregister.
)

Currently there are many casts that are not modeled (i.e. ignored) by
the analyzer, which can cause paradox states (e.g. negative value stored
in `unsigned` variable) and false positive reports from various
checkers, e.g. from `security.ArrayBound`.

Unfortunately this issue is deeply rooted in the architectural
limitations of the analyzer (if we started to model the casts, it would
break other things). For details see the umbrella ticket
llvm#39492

This commit adds an ugly hack in `security.ArrayBound` to silence most
of the false positives caused by this shortcoming of the engine.

Fixes llvm#126884
The stripGetElementPtr function is mysteriously named, and calls into
another mysterious getGEPInductionOperand which does something
complicated with GEP indices. The real purpose of the badly-named
stripGetElementPtr function is to get a loop-variant GEP index, if there
is one. The getGEPInductionOperand is totally redundant, as stripping
off zeros from the end of GEP indices has no effect on computing the
loop-variant GEP index, as constant zeros are always loop-invariant.
Moreover, the GEP induction operand is simply the first non-zero index
from the end, which stripGetElementPtr returns when it finds that any of
the GEP indices are loop-variant: this is a completely unrelated value
to the GEP index that is loop-variant. The implicit assumption here is
that there is only ever one loop-variant index, and it is the first
non-zero one from the end.

The logic is unnecessarily complicated for what stripGetElementPtr wants
to achieve, and the header comments are confusing as well. Strip
getGEPInductionOperand, rework and rename stripGetElementPtr.
See llvm#126411 / llvm#127055, the test isn't expected to fold in a single instcombine
iteration, needing instcombine->cse->instcombine.
…#127320)

* Remove `vector.create_mask` from tests. Instead, pass masks as
  arguments. This simplifies the tests without sacrificing test
  coverage.
* Update `@xfer_read_minor_identity_tranposed_with_mask_scalable` to use
  similar shapes as other tests and to avoid using test Ops (e.g.
  `@test.some_use`). This improves consistency between tests.
* Fix some comment typos.
… and SPIR-V friendly builtins for Image Read/Write instructions (llvm#127242)

This PR improves built-in variables and functions support:
* extends mapping from an OpenCL C built-in function to the SPIR-V
BuiltIn variables as in
https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_built_in_variables,
and
* adds SPIR-V friendly builtins for Image Read/Write instructions.

Test cases are extended accordingly.
…ACTERS (llvm#126924)

`mbstate_t` needs to be visible to libcpp, even when it is not providing
wide
character functionality (i.e. `_LIBCPP_HAS_WIDE_CHARACTERS` is turned
off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of `mbstate_t` in `<wchar.h>` even though they
do not
come with wide character functions.

Since there is a way to conditionally include the C library's
`<wchar.h>`
only if it exists, we should rely on the fact that if it exists, it will
provide `mbstate_t`. Removing this guard will allow using libc++ on top
of
newlib-nano/picolibc while not breaking the cases where it is used on
top
of a C library which doesn't provide `<wchar.h>` (since it would then
still
go look for `<uchar.h>` or error out).
…irectables.

Updates JITLinkRedirectableSymbolManager to take alias flags into account when
setting the scope and linkage of the created stubs (weak aliases get now get weak
linkage, hidden stubs get hidden visibility).

Updates lazyReexports to propagate alias flags (rather than trampoline flags)
when building the initial destinations map for the redirectable symbols manager.

Together these changes allow the LazyObjectLinkingLayer to link objects
containing weak and hidden symbols.
…17309)

Legacy pass used to provide the advisor, so this extracts that logic
into a provider class used by both analysis passes.

All three (Default, Release, Development) legacy passes
`*AdvisorAnalysis` are basically renamed to `*AdvisorProvider`, so the
actual legacy wrapper passes are `*AdvisorAnalysisLegacy`.

There is only one NPM analysis `RegAllocEvictionAnalysis` that switches
between the three providers in the `::run` method, to be cached by the
NPM.

Also adds `RequireAnalysis<RegAllocEvictionAnalysis>` to the optimized
target reg alloc codegen builder.
The BUILD file changes in llvm#127544 adds `LinalgInterfaces` which is incomplete without `LinalgDialect`.

For now, just add the `LinalgDialect` as dependency to tests which do not otherwise depend on it (but depend on `LinalgInterfaces` through e.g. `TensorDialect`).

This is a temporary solution until the dependency of `TensorDialect` is trimmed to just the `linalg::RelayoutOpInterface`, but not the other linalg interfaces. See llvm#127544 (review).
libclc uses llvm-link to link together all of the individually built
libclc builtins files into one module. Some of these builtins files are
compiled from source by clang whilst others are converted from LLVM IR
directly to bytecode.

When llvm-link links a 'source' module into a 'destination' module, it
warns if the two modules have differing data layouts.

The LLVM IR files libclc links either have no data layout (shared
submodule files) or an explicit data layout in the case of certain
amdgcn/r600 files.

The warnings are very noisy and largely inconsequential. We can suppress
them exploiting a specific behaviours exhibited by llvm-link. When the
destination module has no data layout, it is given the source module's
data layout. Thus, if we link together all IR files first, followed by
the clang-compiled modules, 99% of the warnings are suppressed as they
arose from linking an empty data layout into a non-empty one.

The remaining warnings came from the amdgcn and r600 targets. Some of
these were because the data layouts were out of date compared with what
clang currently produced, so those could have been updated.

However, even with those changes and by grouping the IR files together,
the linker may still link explicit data layouts with empty ones
depending on the order the IR files are processed.

As it happens, the data layouts aren't essential. With the changes to
the link line we can rely on those IR files receiving the correct data
layout from the clang-compiled modules later in the link line. This also
makes the previously AMDGPU-specific IR files available to be used by
all targets in a generic capacity in the future.
Although the operation is deprecated in the most recent version of the
SPIR-V spec, it is still used by older shaders, so having it defined is
valuable and incurs negligible maintenance overhead, due to op
simplicity.
…27594)

The same literal can be used multiple times in an instruction,
not just once. We were not tracking the used value to verify this,
so correct this.

This helps avoid regressions in a future patch.
…lvm#127600)

Following on from llvm#126737, adds a negative test that:
* prior to llvm#126737, would incorrectly generated empty output,
* with the fix in-tree, simply outputs the input IR (i.e. the
  specialization "fails").

I've also made minor editorial changes.
This patch continues the work that was started here
https://reviews.llvm.org/D99426 to correctly open text files in text
mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.