Skip to content

[AutoBump] Merge with f7d03707 (Feb 18) (52) #596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 65 commits into
base: bump_to_4cc7d60f
Choose a base branch
from

Conversation

jorickert
Copy link

No description provided.

tbaederr and others added 30 commits February 18, 2025 15:36
…es (llvm#127625)

Handle different src/mask operand ordering of X86ISD::VPERMV nodes
This commit is a broad update across libclc to use the CLC conversion
builtins in CLC functions, even those with a '__clc' prefix in the
generic folder. This better prepares them for an official move to the
CLC library in time.

The CLC conversion builtins have an additional benefit in that they
support scalars, unlike the __builtin_convertvector builtin which we
were using previously. This allows us to simplify some shared
definitions.

There is one change to the IR, in the scalar upsample(char, uchar)
builtin. It now sign-extends the first argument to i16, where before it
zero-extended it. This appears to be correct, and matches the vector
behaviour.
…orceTailAgnostic. NFC (llvm#127575)

Add a policy operand to set the tail agnostic policy instead of using
ForceTailAgnostic. The masked to unmasked transforms had to be updated
to drop the policy operand when converting to unmasked.
llvm#124432)

Use HCFGBuilder to build an initial VPlan 0, which wraps all input
instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes
to replace the VPInstructions with widened recipes.

At the moment, widened recipes are created based on the underlying
instruction of the VPInstruction. Masks are also still created based on
the input IR basic blocks and the loop CFG is flattened in the main loop
processing the VPInstructions.

This patch also incldues support for Switch instructions in HCFGBuilder
using just a VPInstruction with Instruction::Switch opcode.

There are multiple follow-ups planned:
 * Perform predication on the VPlan directly,
* Unify code constructing VPlan 0 to be shared by both inner and outer
loop code paths.
 * Construct VPlan 0 once, clone subsequent ones for VFs

PR: llvm#124432
Details:

Make LLDB's TelemetryManager a "plugin" so that vendor can supply
appropriate implementation.
The rest of LLDB code will simply call `TelemetryManager::getInstance`

---------

Co-authored-by: Pavel Labath <[email protected]>
This patch adds the erfc op to the math dialect. It also does lowering
of the math.erfc op to libm calls. There is also a f32 polynomial
approximation for the function based on

https://stackoverflow.com/questions/35966695/vectorizable-implementation-of-complementary-error-function-erfcf
This is in turn based on
M. M. Shepherd and J. G. Laframboise, "Chebyshev Approximation of
(1+2x)exp(x^2)erfc x in 0 <= x < INF", Mathematics of Computation, Vol.
36, No. 153, January 1981, pp. 249-253.
The code has a ULP error less than 3, which was tested, and MLIR test
values were verified against the C implementation.
…eshold

Need to check if the tree is too small before attempting to vectorize the tree to prevent hanging on small trees with phis only.
…m#114500)

Implement new pseudos with the suffix _t16 for FLAT_LOAD which have
VGPR_16 as the load dst. Lower the pseudos to the existing real
instructions with VGPR_32 src or dst (which makes them consistent with
the hardware encoding). This patch reduces VGPR usage by making hi
halves of VGPRs available for other values.

There are more 8/16 bits ld/st instructions to be supported in the
up-coming patches
)

This commit adds "instantiated_from" to the AST dump for EnumDecl,
improving consistency with CXXRecordDecl and FunctionDecl, which also
include this information. To achieve this, TextNodeDumper::VisitEnumDecl
is updated with analogous lines found in
TextNodeDumper::VisitFunctionDecl and
TextNodeDumper::VisitCXXRecordDecl.
This patch fixes:

  third-party/unittest/googletest/include/gtest/gtest.h:1379:11:
  error: comparison of integers of different signs: 'const int' and
  'const unsigned long' [-Werror,-Wsign-compare]
This patch adds the OMP.DeclareMapperOp to MLIR.
The HLFIR/FIR lowering for Declare Mapper is available here llvm#117046.
…ion (llvm#114500)"

This reverts commit f7a5f06.

Fails to build with:

llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:126:37: error: no member named 'OPERAND_LAST' in 'llvm::AMDGPU::OpName'
  126 |   uint16_t OpName = AMDGPU::OpName::OPERAND_LAST;
Need to check if the block is reachable before comparing phis from it to
avoid compiler crash when requesting node.

Fixes report in llvm#110529 (comment)
…n RISCVTargetParser.h. NFC (llvm#127585)

The VLMUL and policy enums originally lived in RISCVBaseInfo.h in the
backend which is where everything else in the RISCVII namespace is
defined.

RISCVTargetParser.h is used by much more of the compiler and it
doesn't really make sense to have 2 different namespaces exposed.
These enums are both associated with VTYPE so using the RISCVVType
namespace seems like a good home for them.
…ve (llvm#117046)

This patch adds HLFIR/FIR lowering support for OpenMP Declare Mapper
directive.
Depends on llvm#117045.
Unifies imports to use a single insertion point, `globalInsertionOp`,
for global values.
Refactors insertion point setup into `setGlobalInsertionPoint`, which
sets insertion point after `globalInsertionOp` or defaults to the start
of the module if it is not set.
Inspired by llvm#90738 (although that is a clang codegen issue)
topperc and others added 30 commits February 18, 2025 09:05
This patch adds the mapper field to the omp.map.info op.

Depends on llvm#117046.
…clause (llvm#121001)

Add Lowering support for OpenMP mapper field in mapInfoOp.

NOTE: This patch only supports explicit mapper lowering. I'll add a
separate PR soon which handles implicit default mapper recognition.

Depends on llvm#120994.
…P DeclareMapper (llvm#121005)

Add conversion support from FIR to LLVM Dialect for OMP DeclareMapper.

Depends on llvm#121001
…pers (llvm#124746)

This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare
Mapper directive.

Since both MLIR and Clang now support custom mappers, I've changed the
respective function params to no longer be optional as well.

Depends on llvm#121005
The last uses of these functions were removed in:

  commit 58bc98c
  Author: Arthur Eubanks <[email protected]>
  Date:   Fri Jul 12 10:02:50 2024 -0700
The last use was removed in:

  commit fa6ea7a
  Author: Arthur Eubanks <[email protected]>
  Date:   Mon Mar 20 11:18:35 2023 -0700
… Pass (llvm#127646)

Add cmdOption suffix consumer function in GpuModuleToBinary Pass, which
can tokenize and remove a specific suffix of cmdOption.
)

AdaptedConstIterator currently doesn't have iterator traits, so I can't
use STL algorithms with containers like WatchpointList.

This patch replaces AdaptedConstIterator and AdaptedIterable with
llvm::iterator_adaped_base and llvm::iterator_range respectively.
See
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792
for detailed introduction.

This is a follow up PR of llvm#121187, by integrating OpAsmTypeInterface
with AsmPrinter. There are a few conditions when OpAsmTypeInterface
comes into play

* There is no OpAsmOpInterface
* Or OpAsmOpInterface::getAsmResultName/getBlockArgumentName does not
invoke `setName` (i.e. the default impl)
* All results have OpAsmTypeInterface (otherwise we can not handle
result grouping behavior)

Cc @River707 @jpienaar @ftynse for review.
See
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792
for detailed introduction.

This PR adds

* Definition of `OpAsmAttrInterface`
* Integration of `OpAsmAttrInterface` with `AsmPrinter`

In
llvm#121187 (comment)
I mentioned splitting them into two PRs, but I realized that a PR with
only definition of `OpAsmAttrInterface` is hard to test as it requires a
custom Dialect with `OpAsmDialectInterface` to hook with `AsmPrinter`,
so I just put them together to have a e2e test.

Cc @River707 @jpienaar @ftynse for review.
)

Once LLVM 20 is released the clang-18 will no longer be supported.
This avoids dozens of regressions in a future patch. These
primarily manifested as assertions where we had copies of 64-bit
registers to 32-bit registers.

This is testable in principle with hand written MIR, but that's
a bit too much x86 for me.
On some (Linux) systems /etc/localtime is not a symlink to the time
zone, but contains a copy of the binary time zone file. In these case
there usually is a file named /etc/timezone which contains the text for
the current time zone name.

Instead of throwing when /etc/localtime does not exist or is not a
symlink use this fallback.

Fixes: llvm#105634

---------

Co-authored-by: Louis Dionne <[email protected]>
See
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792
for detailed introduction.

This PR should be rebased once llvm#124721 is merged.

This PR adds

* Definition of `getAlias` for `OpAsmTypeInterface`
* Integration of `OpAsmTypeInterface` with `AsmPrinter` alias handling
part

This is partly in response to
https://github.com/llvm/llvm-project/pull/124721/files#r1940399862

Cc @River707 for review.
…ame as canonicalizing (llvm#127670)

Fixes this crash: https://hlsl.godbolt.org/z/9aP74s4bP
Which happens because the de-sugared type is the same as the
canonicalized type.
Check if the de-sugared type is canonical before getting the
ArrayParameterType of the canonical type.
Add AST test to ensure crash doesn't happen.
…lvm#127361)

This patch implements dependency maintenance upon receiveing the
notification that an instruction gets deleted.
…7132)

This patch moves the seed collection logic from the BottomUpVec pass
into a new Sandbox IR Function pass. The new "seed-collection" pass
collects the seeds, builds a region and runs the region pass pipeline.
Users can use the version inherited from MCRegisterInfo.

This version was added by e7694f3 to return a Register. It was
later changed to return MCRegister by bab72dd making it identical
MCRegisterInfo.
llvm#126329)

Fixes llvm#126162

I check locally that it works without warning for:
- neither options are defined
- both defined to the same value

And I checked that it warns if:
- only one is defined
- they defined to different values
)

This PR adds a few more tests to validate some error scenarios of root
signature metadata representation.

Closes: llvm#127280

---------

Co-authored-by: joaosaffran <[email protected]>
Move LinalgInterfaces.cpp from LinalgInterfaces to LinalgDialect target.

This allows TensorDialect to use header-only RelayoutOpInterface without introducing a hidden dependency on LinalgDialect (producing an `no-allow-shlib-undefined` error if a target depends on TensorDialect but not LinalgDialect).

Also reverts llvm@d64f177 because it's no longer needed.
…calls

This can happen when using a LTO build of compiler-rt for ARM and the
program uses 64-bit division.
The 64-bit division function in compiler-rt (__aeabi_ldivmod) is written
in assembly and calls the C function __divmoddi4, which works fine in
non-LTO links. However, when building with LTO the call inside
__aeabi_ldivmod is replaced with a jump to address zero, which then
crashes the program.

Building with -pie generates an error instead of a jump to address zero,
and surprisingly just declaring the __aeabi_ldivmod function (but not
calling it) in the input IR also avoids this issue.

Reported as llvm#127284

Co-authored-by: Fangrui Song <[email protected]>

Reviewed By: MaskRay

Pull Request: llvm#127286
…7510)

Corresponding Github issues will be created shortly.
)

Attempting to pass a `ptr addrspace(7)` to functions that take `ptr`
arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7)
to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP
operations on buffer resources, which can't be GEP'd. (However, note
that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr
is legal - it's just an effective address computation)

To resolve this problem, and thus prevent illegal
`getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this
commit extends amdgcn.make.buffer.rsrc to also be variadic in its result
type, auto-upgrading old manglings.

The logic for handling a make.buffer.rsrc in instruction selection
remains untouched and expects the output type to be a ptr addrspace(8),
as does the Clang lowering for its builtin (the pointer-to-pointer
version might want a different name in clang). LowerBufferFatPointers
has been updated to lower
amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* .

This'll also make exposing buffer fat pointers in Clang easier, since
you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.