forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 5
[AutoBump] Merge with ff99af7e (Feb 20) (62) #606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jorickert
wants to merge
123
commits into
bump_to_d57479cf
Choose a base branch
from
bump_to_ff99af7e
base: bump_to_d57479cf
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Support finding implementors of a protocol and discovering subclasses for ObjC interfaces via the implementations call - Support jumping to the overridden method when you trigger goto definition on an override - Properly find references to overridden methods
…inimum(maximum)_num (llvm#127711) For targets that support IEEE fminimum_num/fmaximum_num, the corresponding *_min_num_fXY/*_max_num_fXY instructions themselves already did the canonicalization for the inputs. As a result, we do not need to explicitly canonicalize the inputs for fminnum/fmaxnum.
…`modulemap` (llvm#127839) b41b86a added a new textual header `clang/Lex/HLSLRootSignatureTokenKinds.def` but did not add it to `clang`'s module map. This causes build failure when building llvm with `-DLLVM_ENABLE_MODULES=ON`. This PR adds the new textual header to the module map and fixes the build break. Fixing rdar://145148093.
Addresses llvm#125604 - Implements `and` as an HLSL builtin function - The `and` HLSL builtin function gets lowered to the the LLVM `and` instruction
) This is similar to llvm#125748
…store address. (llvm#127151) SelectionDAG will not reassociate adds to the end of a chain if there are multiple users of later additions. This prevents isel from folding the immediate into a load/store address. One easy way to see this is accessing an array in a struct with two different indices. An ADDI will be used to get to the start of the array then 2 different SHXADD instructions will be used to add the scaled indices. Finally the SHXADD will be used by different load instructions. We can remove the ADDI by folding the offset into each load. This patch adds a new pass that analyzes how an ADDI constant propagates through address arithmetic. If the arithmetic is only used by a load/store and the offset is small enough, we can adjust the load/store offset and remove the ADDI. This pass is placed before MachineCSE to allow cleanups if some instructions become common after removing offsets from their inputs. This pass gives ~3% improvement on dynamic instruction count on 541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's a ~1% improvement on 557.xz_r.
…llvm#126876) Using LLVM build itself for PGO training is convenient and a great starting point but it also has several issues: * LLVM build implicitly depends on tools other than CMake and C/C++ compiler and if those tools aren't available in PATH, the build will fail. * LLVM build also requires standard headers and libraries which may not always be available in the default location requiring an explicit sysroot. * Building a single configuration (-DCMAKE_BUILD_TYPE=Release) only exercises the -O3 pipeline and can pesimize other configurations. * Building for the host target doesn't exercise all other targets. * Since LLVMSupport is a static library, this doesn't exercise the linker (beyond what the CMake itself does). Rather than using LLVM build, ideally we would provide a more minimal, purpose built corpus. While we're working on building such a corpus, provide a CMake option that lets vendors disable the use LLVM build for PGO training.
…27844) There was a discrepancy between the type-converter and rewrite-pattern parts of conversion to LLVM used in various GPU targets, at least ROCDL and NVVM: - The TypeConverter part was handling vectors of arbitrary rank, converting them to nests of `!llvm.array< ... >` with a vector at the inner-most dimension: https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp#L629-L655 - The rewrite pattern part was not handling `llvm.array`: https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp#L594-L596 That led to conversion failures when lowering `math` dialect ops on rank-2 vectors, as in the testcase being added in this PR. This PR fixes this by reusing a shared utility already used in other conversions to LLVM: https://github.com/llvm/llvm-project/blob/8337d01e3058e7f47675f5b2b908b4e7821895d7/mlir/lib/Conversion/LLVMCommon/VectorPattern.cpp#L80-L104 --------- Signed-off-by: Benoit Jacob <[email protected]>
This patch implements a new subclass of the Value class used for Sandbox IR Values that we don't support, like metadata or inline asm. The goal is to never have null sandboxir::Value objects, because this is not the expected behavior.
…to load/store address. (llvm#127151)" This reverts commit c3ebbfd. Seeing some test failures on the build bot.
…ic dims (llvm#118208) This pr fixes how iteration domain of linalg.generic is collapsed when fusing with tensor.expand_shape. Previously, the output_shape for tensor.expand shape was infered, which doesn't always work except some special cases. This patch makes the logic explicitly set the bounds of the new collapsed iteration domain, because we already know them. --------- Co-authored-by: Jakub Kuderski <[email protected]>
…into load/store address. (llvm#127151)" Tests have been re-generated with recent scheduler changes. Original message: SelectionDAG will not reassociate adds to the end of a chain if there are multiple users of later additions. This prevents isel from folding the immediate into a load/store address. One easy way to see this is accessing an array in a struct with two different indices. An ADDI will be used to get to the start of the array then 2 different SHXADD instructions will be used to add the scaled indices. Finally the SHXADD will be used by different load instructions. We can remove the ADDI by folding the offset into each load. This patch adds a new pass that analyzes how an ADDI constant propagates through address arithmetic. If the arithmetic is only used by a load/store and the offset is small enough, we can adjust the load/store offset and remove the ADDI. This pass is placed before MachineCSE to allow cleanups if some instructions become common after removing offsets from their inputs. This pass gives ~3% improvement on dynamic instruction count on 541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's a ~1% improvement on 557.xz_r.
…s. NFC (llvm#127678) In removePartiallyOverlappedStores we iterate over InstOverlapIntervalsTy which is a DenseMap. Change that map into using MapVector to ensure that we apply the transforms in a deterministic order. I've only seen that the order matters if starting to use names for the instructions created when doing the transforms. But such things are a bit annoying when debugging etc.
…s. NFC (llvm#127848) This is only used to get the Module and the LLVMContext. We can get both of those from the GlobalVariable*.
In order to facilitate cloning of recursive cycles, we first identify backedges using a standard DFS search from the root callers, then initially defer recursively invoking the cloning function via those edges. This is because the cloning opportunity along the backedge may not be exposed until the current node is cloned for other non-backedge callers that are cold after the earlier recursive cloning, resulting in a cold predecessor of the backedge. So we recursively invoke the cloning function for the backedges during the cloning of the current node for its caller edges (which were sorted to enable handling cold callers first). There was no significant time or memory overhead measured for several large applications.
…rnal() (llvm#127891) Move the most common if statement to the top and the least common ones to the bottom. This should save CPU cycles during compilation. This patch also prefixes the llvm variables with the LLVM prefix to make the naming convention in this function more uniform. For example `C` to `LLVMC`.
…ariable checkers. (llvm#127570) Like a C++ member variable, every Objective-C++ instance variable must be a RefPtr, Ref CheckedPtr, or CheckedRef to an object, not a raw pointer or reference.
…122951) Add option `AllowedTypes` which allow users to specify types they want to exclude from const-correctness check. Small real-world example: ```cpp #include <mutex> int main() { std::mutex m; std::lock_guard<std::mutex> l(m); // we want to ignore it since std::lock_guard is already immutable. } ``` Closes issue llvm#122592
…116465) This patch adds codegen for all Arm9.6-a compare-and-branch instructions, that operate on full w or x registers. The instruction variants operating on half-words (cbh) and bytes (cbb) are added in a subsequent patch. Since CB doesn't use standard 4-bit Arm condition codes but a reduced set of conditions, encoded in 3 bits, some conditions are expressed by modifying operands, namely incrementing or decrementing immediate operands and swapping register operands. To invert a CB instruction it's therefore not enough to just modify the condition code which doesn't play particularly well with how the backend is currently organized. We therefore introduce a number of pseudos which operate on the standard 4-bit condition codes and lower them late during codegen.
Fixes llvm#126603 --------- Signed-off-by: ZakyHermawan <[email protected]>
This patch fixes: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:3409:8: error: unused variable 'I' [-Werror,-Wunused-variable]
) Add support for sm101 and sm120 target architectures. It requires CUDA 12.8. --------- Co-authored-by: Sebastian Jodlowski <[email protected]>
Summary: The scan operation implemented here only works if there are contiguous ones in the executation mask that can be used to propagate the result. There are two solutions to this, one is to enter 'whole-wave-mode' and forcibly turn them back on, or to do this serially. This implementation does the latter because it's more portable, but checks to see if the parallel fast-path is applicable. Needs to be backported for correct behavior and because it fixes a failing libc test.
Looks like llvm#124936 was reverted (for modifying JSON output), but the test for JSON output with errors was deleted in llvm#126587 (to attempt to fix failing build) This will add back a test and a new one for llvm-dwarfdump to validate the JSON for errors. One case where the sub-categories will eventually appear and another where not. test plan: ninja check-llvm-tools-llvm-dwarfdump
…llvm#127972) checkUsers currently does two things, a) work out the minimum VL read by every user and b) check that the operand info of the MI and users match. getMinimumVLForUser handles most of a), with the exception of the check for instructions that read past VL e.g. vrgather which is still in checkUsers. This moves it into getMinimumVLForUser to keep all that logic in one place and simplifies an upcoming patch.
The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select it appears the order of the operands was chosen badly. This switches the conditions used to keep the constant on the RHS.
I suspect it was fixed by llvm#127059. aarch64 is the only windows bot we have now, so it's can't be certain it's fixed everywhere, but also I have no reason to believe otherwise. Fixes llvm#43774.
…lvm#126751) Fold: (select (not m), 1, 0) -> (select m, 0, 1) (select (not m), -1, 0) -> (select m, 0, -1)
Fix pr llvm#127553. x86_64 failed to run readcyclecounter.ll when enable expensive_check, it would error "Using an undefined physical register".
…emented (llvm#127980) Implemented in abc8812
This patch adds benchmarks for the copy family of algorithms (copy, copy_n, copy_if, copy_backward).
…tem from the headers (llvm#117764) Many parts of the locale base API are only required when building the shared/static library, but not from the headers. Document those functions and carve out a few of those that don't work when _XOPEN_SOURCE is defined to something old. Fixes llvm#117630
…o find additional (free) extract_subvector nodes
This patch adds the -fvectorize and -fno-vectorize flags to flang. Note that this also changes the behaviour of `flang -fc1` to match that of `clang -cc1`, which is that vectorization is only enabled in the presence of the `-vectorize-loops` flag. Additionally, this patch changes the behaviour of the default optimisation levels to match clang, such that vectorization only happens at the same levels as it does there. This patch is in draft while I write an RFC to discuss the above two changes.
…6545) This change introduces support of `OpTypeStructContinuedINTEL` instruction. Specification: https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_long_composites.html
Change getDependenceDistanceStrideAndSize to scale strides by TypeByteSize, scaling the returned CommonStride and MaxStride. Even though there is a seemingly-functional change of setting CommonStride when scaled strides are equal, it ends up being a non-functional change due to aggressive HasSameSize checking.
Several users of (mostly math/) gentype.inc rely on types other than the 'gentype'. This is commonly intN as several maths builtins expose this as a return or paramter type. We were previously explicitly defining this type for every gentype. Other implementations rely on integer types of the same size and element width as the gentype, such as short/ushort for half, long/ulong for double, etc. Users might also rely on as_type or convert_type builtins to/from these types. The previous method we used to define intN was unscalable if we wanted to expose more types and helpers. This commit introduces a simpler system whereby several macros are defined at the beginning of gentype.inc. These rely on concatenating with the vector size. To facilitate this system, scalar gentypes now define an empty vector size. It was previously undefined, which was dangerous. An added benefit is that it matches how the integer gentype.inc vector size has been working. These macros will be especially helpful for the definitions of logb/ilogb in an upcoming patch.
In preparation for llvm#127543
If we have a shuffle mask which can be represented as two slides + some conditional masking, we can emit a VLA sequence which is at most O(2*LMUL). This is essentially a generalization of the existing isElementRotate, but is staged to only introduce the new match for the moment. A follow up change will start consolidating code - see the notes below. A couple of notes: 1) I'm excluding bit rotates mostly to keep the diffs manageable. 2) The existing isElementRotate logic is nearly redundant after this change. However, we have some intersection between the bit rotate and element rotate matching. To keep things simple, I left that in place for now, and will merge/cleanup in a separate change. 3) The individual asVSlideup and asVSlidedown are closely related, but the former looks through extracts and the later changes VL. I'm leaving these in place for now, but hope to common them up a bit as well.
…l is UB (llvm#127979) Proof: https://alive2.llvm.org/ce/z/mzVj-u I will add some follow-up patches to avoid duplicate code, support more memory instructions, and bypass gep instructions.
Co-authored-by: Christian Ulmann <[email protected]>
The assertion was left over from a time when VPBBs still had an associated condition bit. This is not the case any more (comment was stale). In case a branch on condition is needed, a BranchOnCond VPInstruction is added when constructing recipes. That's also where it is checked if the condition is available. Exposed by 38376de.
…127907) Add a facility to check if there is any intersection between 2 sets. This will be used in some follow on changes to MemProf.
This PR llvm#123902 broke python bindings for `tensor.pack`/`unpack`. This PR fixes that. It also 1. adds convenience wrappers for pack/unpack 2. cleans up matmul-like ops in the linalg bindings 3. fixes linalg docs missing pack/unpack
This was accidentaly introduced in llvm#126876.
…sters (llvm#124475) Extending the conditionals in `AugmentRegisterInfo` to support alternative names for lldb. Fixes llvm#124023 There is an exception with register `X8` which is not covered here but more details can be found in the issue llvm#127900.
…lvm#127922) Recent versions of the system changed where these functions live.
This patch fixes: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:5723:24: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] llvm/lib/Target/RISCV/RISCVISelLowering.cpp:5728:76: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions]
…NFC (llvm#127968) Use nonstatic member instead. This requires explicit conversions, but many will go away as we continue converting unsigned to Register. In a few places where it was simple, I changed unsigned to Register.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.