Skip to content

[AutoBump] Merge with ff99af7e (Feb 20) (62) #606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 123 commits into
base: bump_to_d57479cf
Choose a base branch
from

Conversation

jorickert
Copy link

No description provided.

DavidGoldman and others added 30 commits February 19, 2025 13:21
- Support finding implementors of a protocol and discovering subclasses for ObjC interfaces via the implementations call
- Support jumping to the overridden method when you trigger goto definition on an override
- Properly find references to overridden methods
…inimum(maximum)_num (llvm#127711)

  For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding *_min_num_fXY/*_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.
…`modulemap` (llvm#127839)

b41b86a added a new textual header
`clang/Lex/HLSLRootSignatureTokenKinds.def` but did not add it to
`clang`'s module map. This causes build failure when building llvm with
`-DLLVM_ENABLE_MODULES=ON`. This PR adds the new textual header to the
module map and fixes the build break.

Fixing rdar://145148093.
Addresses llvm#125604 

- Implements `and` as an HLSL builtin function
- The `and` HLSL builtin function gets lowered to the the LLVM `and`
instruction
…store address. (llvm#127151)

SelectionDAG will not reassociate adds to the end of a chain if
there are multiple users of later additions. This prevents isel
from folding the immediate into a load/store address.
    
One easy way to see this is accessing an array in a struct with
two different indices. An ADDI will be used to get to the start
of the array then 2 different SHXADD instructions will be used to
add the scaled indices. Finally the SHXADD will be used by different
load instructions. We can remove the ADDI by folding the offset into
each load.
    
This patch adds a new pass that analyzes how an ADDI constant
propagates through address arithmetic. If the arithmetic is only
used by a load/store and the offset is small enough, we can adjust
the load/store offset and remove the ADDI.
    
This pass is placed before MachineCSE to allow cleanups if some
instructions become common after removing offsets from their inputs.

This pass gives ~3% improvement on dynamic instruction count on
541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's
a ~1% improvement on 557.xz_r.
…llvm#126876)

Using LLVM build itself for PGO training is convenient and a great
starting point but it also has several issues:

* LLVM build implicitly depends on tools other than CMake and C/C++
compiler and if those tools aren't available in PATH, the build will
fail.
* LLVM build also requires standard headers and libraries which may not
always be available in the default location requiring an explicit
sysroot.
* Building a single configuration (-DCMAKE_BUILD_TYPE=Release) only
exercises the -O3 pipeline and can pesimize other configurations.
* Building for the host target doesn't exercise all other targets.
* Since LLVMSupport is a static library, this doesn't exercise the
linker (beyond what the CMake itself does).

Rather than using LLVM build, ideally we would provide a more minimal,
purpose built corpus. While we're working on building such a corpus,
provide a CMake option that lets vendors disable the use LLVM build for
PGO training.
…27844)

There was a discrepancy between the type-converter and rewrite-pattern
parts of conversion to LLVM used in various GPU targets, at least ROCDL
and NVVM:
- The TypeConverter part was handling vectors of arbitrary rank,
converting them to nests of `!llvm.array< ... >` with a vector at the
inner-most dimension:
https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp#L629-L655
- The rewrite pattern part was not handling `llvm.array`:
https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp#L594-L596

That led to conversion failures when lowering `math` dialect ops on
rank-2 vectors, as in the testcase being added in this PR.

This PR fixes this by reusing a shared utility already used in other
conversions to LLVM:

https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/LLVMCommon/VectorPattern.cpp#L80-L104

---------

Signed-off-by: Benoit Jacob <[email protected]>
This patch implements a new subclass of the Value class used for Sandbox
IR Values that we don't support, like metadata or inline asm. The goal
is to never have null sandboxir::Value objects, because this is not the
expected behavior.
…to load/store address. (llvm#127151)"

This reverts commit c3ebbfd.

Seeing some test failures on the build bot.
…ic dims (llvm#118208)

This pr fixes how iteration domain of linalg.generic is collapsed when
fusing with tensor.expand_shape. Previously, the output_shape for
tensor.expand shape was infered, which doesn't always work except some
special cases.

This patch makes the logic explicitly set the bounds of the new
collapsed iteration domain, because we already know them.

---------

Co-authored-by: Jakub Kuderski <[email protected]>
…into load/store address. (llvm#127151)"

Tests have been re-generated with recent scheduler changes.

Original message:

SelectionDAG will not reassociate adds to the end of a chain if
there are multiple users of later additions. This prevents isel
from folding the immediate into a load/store address.

One easy way to see this is accessing an array in a struct with
two different indices. An ADDI will be used to get to the start
of the array then 2 different SHXADD instructions will be used to
add the scaled indices. Finally the SHXADD will be used by different
load instructions. We can remove the ADDI by folding the offset into
each load.

This patch adds a new pass that analyzes how an ADDI constant
propagates through address arithmetic. If the arithmetic is only
used by a load/store and the offset is small enough, we can adjust
the load/store offset and remove the ADDI.

This pass is placed before MachineCSE to allow cleanups if some
instructions become common after removing offsets from their inputs.

This pass gives ~3% improvement on dynamic instruction count on
541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's
a ~1% improvement on 557.xz_r.
…s. NFC (llvm#127678)

In removePartiallyOverlappedStores we iterate over
InstOverlapIntervalsTy which is a DenseMap. Change that map into using
MapVector to ensure that we apply the transforms in a deterministic
order. I've only seen that the order matters if starting to use names
for the instructions created when doing the transforms. But such things
are a bit annoying when debugging etc.
…s. NFC (llvm#127848)

This is only used to get the Module and the LLVMContext. We can get both
of those from the GlobalVariable*.
In order to facilitate cloning of recursive cycles, we first identify
backedges using a standard DFS search from the root callers, then
initially defer recursively invoking the cloning function via those
edges. This is because the cloning opportunity along the backedge may
not be exposed until the current node is cloned for other non-backedge
callers that are cold after the earlier recursive cloning, resulting
in a cold predecessor of the backedge. So we recursively invoke the
cloning function for the backedges during the cloning of the current
node for its caller edges (which were sorted to enable handling cold
callers first).

There was no significant time or memory overhead measured for several
large applications.
…rnal() (llvm#127891)

Move the most common if statement to the top and the least common ones
to the bottom. This should save CPU cycles during compilation.

This patch also prefixes the llvm variables with the LLVM prefix to make
the naming convention in this function more uniform. For example `C` to
`LLVMC`.
…ariable checkers. (llvm#127570)

Like a C++ member variable, every Objective-C++ instance variable must
be a RefPtr, Ref CheckedPtr, or CheckedRef to an object, not a raw
pointer or reference.
…122951)

Add option `AllowedTypes` which allow users to specify types they want
to exclude from const-correctness check.

Small real-world example:
```cpp
#include <mutex>

int main() {
  std::mutex m;
  std::lock_guard<std::mutex> l(m); // we want to ignore it since std::lock_guard is already immutable.
}
```

Closes issue llvm#122592
…116465)

This patch adds codegen for all Arm9.6-a compare-and-branch
instructions, that operate on full w or x registers. The instruction
variants operating on half-words (cbh) and bytes (cbb) are added in a
subsequent patch.

Since CB doesn't use standard 4-bit Arm condition codes but a reduced
set of conditions, encoded in 3 bits, some conditions are expressed by
modifying operands, namely incrementing or decrementing immediate
operands and swapping register operands. To invert a CB instruction it's
therefore not enough to just modify the condition code which doesn't
play particularly well with how the backend is currently organized. We
therefore introduce a number of pseudos which operate on the standard
4-bit condition codes and lower them late during codegen.
This patch fixes:

  llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:3409:8:
  error: unused variable 'I' [-Werror,-Wunused-variable]
)

Add support for sm101 and sm120 target architectures. It requires CUDA
12.8.

---------

Co-authored-by: Sebastian Jodlowski <[email protected]>
Summary:
The scan operation implemented here only works if there are contiguous
ones in the executation mask that can be used to propagate the result.
There are two solutions to this, one is to enter 'whole-wave-mode' and
forcibly turn them back on, or to do this serially. This implementation
does the latter because it's more portable, but checks to see if the
parallel fast-path is applicable.

Needs to be backported for correct behavior and because it fixes a
failing libc test.
Looks like llvm#124936 was reverted
(for modifying JSON output), but the test for JSON output with errors
was deleted in llvm#126587 (to
attempt to fix failing build)
This will add back a test and a new one for llvm-dwarfdump to validate
the JSON for errors. One case where the sub-categories will eventually
appear and another where not.

test plan:
ninja check-llvm-tools-llvm-dwarfdump
lukel97 and others added 30 commits February 20, 2025 20:17
…llvm#127972)

checkUsers currently does two things, a) work out the minimum VL read by
every user and b) check that the operand info of the MI and users match.

getMinimumVLForUser handles most of a), with the exception of the check
for instructions that read past VL e.g. vrgather which is still in
checkUsers.

This moves it into getMinimumVLForUser to keep all that logic in one
place and simplifies an upcoming patch.
The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select
it appears the order of the operands was chosen badly. This switches the
conditions used to keep the constant on the RHS.
I suspect it was fixed by llvm#127059. aarch64 is the only windows bot we have now, so it's can't be certain it's fixed everywhere, but also I have no reason to believe otherwise.

Fixes llvm#43774.
…lvm#126751)

Fold:

    (select (not m),  1, 0) -> (select m, 0,  1)
    (select (not m), -1, 0) -> (select m, 0, -1)
Fix pr llvm#127553.
x86_64 failed to run readcyclecounter.ll when enable expensive_check,
it would error "Using an undefined physical register".
This patch adds benchmarks for the copy family of algorithms (copy,
copy_n, copy_if, copy_backward).
…tem from the headers (llvm#117764)

Many parts of the locale base API are only required when building the
shared/static library, but not from the headers. Document those
functions and carve out a few of those that don't work when
_XOPEN_SOURCE is defined to something old.

Fixes llvm#117630
…o find additional (free) extract_subvector nodes
This patch adds the -fvectorize and -fno-vectorize flags to flang. 

Note that this also changes the behaviour of `flang -fc1` to match that
of `clang -cc1`, which is that vectorization is only enabled in the
presence of the `-vectorize-loops` flag.

Additionally, this patch changes the behaviour of the default
optimisation levels to match clang, such that vectorization only happens
at the same levels as it does there.

This patch is in draft while I write an RFC to discuss the above two
changes.
Change getDependenceDistanceStrideAndSize to scale strides by
TypeByteSize, scaling the returned CommonStride and MaxStride. Even
though there is a seemingly-functional change of setting CommonStride
when scaled strides are equal, it ends up being a non-functional change
due to aggressive HasSameSize checking.
Several users of (mostly math/) gentype.inc rely on types other than the
'gentype'. This is commonly intN as several maths builtins expose this
as a return or paramter type. We were previously explicitly defining
this type for every gentype.

Other implementations rely on integer types of the same size and element
width as the gentype, such as short/ushort for half, long/ulong for
double, etc.

Users might also rely on as_type or convert_type builtins to/from these
types.

The previous method we used to define intN was unscalable if we wanted
to expose more types and helpers.

This commit introduces a simpler system whereby several macros are
defined at the beginning of gentype.inc. These rely on concatenating
with the vector size. To facilitate this system, scalar gentypes now
define an empty vector size. It was previously undefined, which was
dangerous. An added benefit is that it matches how the integer
gentype.inc vector size has been working.

These macros will be especially helpful for the definitions of
logb/ilogb in an upcoming patch.
If we have a shuffle mask which can be represented as two slides + some
conditional masking, we can emit a VLA sequence which is at most
O(2*LMUL). This is essentially a generalization of the existing
isElementRotate, but is staged to only introduce the new match for the
moment. A follow up change will start consolidating code - see the notes
below.

A couple of notes:
1) I'm excluding bit rotates mostly to keep the diffs manageable. 
2) The existing isElementRotate logic is nearly redundant after this
   change.  However, we have some intersection between the bit rotate
   and element rotate matching.  To keep things simple, I left that in
   place for now, and will merge/cleanup in a separate change.
3) The individual asVSlideup and asVSlidedown are closely related, but
the former looks through extracts and the later changes VL. I'm leaving
these in place for now, but hope to common them up a bit as well.
…l is UB (llvm#127979)

Proof: https://alive2.llvm.org/ce/z/mzVj-u
I will add some follow-up patches to avoid duplicate code, support more
memory instructions, and bypass gep instructions.
The assertion was left over from a time when VPBBs still had an
associated condition bit. This is not the case any more (comment was
stale). In case a branch on condition is needed, a BranchOnCond
VPInstruction is added when constructing recipes. That's also where it
is checked if the condition is available.

Exposed by 38376de.
…127907)

Add a facility to check if there is any intersection between 2 sets.
This will be used in some follow on changes to MemProf.
This PR llvm#123902 broke python
bindings for `tensor.pack`/`unpack`. This PR fixes that. It also

1. adds convenience wrappers for pack/unpack
2. cleans up matmul-like ops in the linalg bindings
3. fixes linalg docs missing pack/unpack
This was accidentaly introduced in llvm#126876.
…sters (llvm#124475)

Extending the conditionals in `AugmentRegisterInfo` to support
alternative names for lldb.

Fixes llvm#124023

There is an exception with register `X8` which is not covered here but
more details can be found in the issue
llvm#127900.
…lvm#127922)

Recent versions of the system changed where these functions live.
This patch fixes:

  llvm/lib/Target/RISCV/RISCVISelLowering.cpp:5723:24: error: captured
  structured bindings are a C++20 extension
  [-Werror,-Wc++20-extensions]

  llvm/lib/Target/RISCV/RISCVISelLowering.cpp:5728:76: error: captured
  structured bindings are a C++20 extension
  [-Werror,-Wc++20-extensions]
…NFC (llvm#127968)

Use nonstatic member instead. This requires explicit conversions, but
many will go away as we continue converting unsigned to Register.

In a few places where it was simple, I changed unsigned to Register.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.