-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Added RISC-V V extension intrinsics for LLVM #18182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
997c57a
to
d150dd9
Compare
cc @cbalint13 can you help to take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you much for this work !
I found it good, nice to see RVV enhanchements.
One note, in the aprofile's arch parser, could we reuse this global func here ?
cpp: llvm_get_vector_width(target)
py: _ffi_api.llvm_get_vector_width(target)
It was introduced:
https://github.com/apache/tvm/pull/17641/files
If not, it is fine for now will improve/simplify parser later.
I have some work that can be added on top of this for spacemit's IME
@fzi-peccia , can look at i386 CI failure ? |
Permit me a change proposal on how to avoid aprofile (serving ARM only), don't know if this will be kept in future.
I am all-in to see this merged, a very good start for future IME tensorization, beyond what LLVM (will?) supports. LATER UPDATE
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice work but requires adaptation to the current TVM infra.
There is a review regarding the integer variants as per their use-case coverage.
Sorry all, I was on vacation, I will tackle these comments this week. |
6f5aec2
to
f9b2667
Compare
Hi @cbalint13 . Thank you very much for the feedback and the diff. I implemented the changes you suggested and also rebased on main. Regarding the mixed dtype cases, the original idea was to support this, and this kernel_dtype is a mistake that stayed there from those days. I replaced it now with the input_dtype, and maybe for this version we could merge a version without mix cases, and then add this feature in the future. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code should be fine now, I started to run some real tests on my side.
- Proposed to merge this as experimental for a while
- Align intrinsic initialization to happen only once, just like for other arches
- Fix of generated vs. consumed intrinsic variant inconsistency
Lets have some rounds of tests with real networks in order to elevate to non-experimental, meanwhile in some subsequent PRs we can add IME XSMTVDot (this promise up to 2TOPS on spacemit-x60) and/or RVV 0.7.1 backward compatibility for the THead boards, also as a separate PR we can also have the int8
case having mixed-dtype combinations.
Tests were done by tuning a resnet18 model TestsIn a rpc setup, I used the provided
This needs investigation. |
Based on #18224 investigation, it seems the RVV intrinsic templates needs double check (see example fix of issue). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ ./msch-database-tir-parse.py Parsed #5000 records No tensorized schedules found.
This needs investigation.
Based on #18224 investigation, it seems the RVV intrinsic templates needs double check (see example fix of issue).
Based on latest real tests and investigations here this still needs changes as shown
To maintain long term, ideally, tensorization templates could have some testcases
Further, investigated the corectness of the proposed tensorization kernels. All tests here needs #18232
For this working sample, 4 x (4x4) -> 4xlanes for VLEN=256 @ fp32 case is the maximum for a fully occupied RVV machine. Now, beside the matching template issues due to relax flow (exemplified with a working dense/matmul testcase), the numerical implementation of the kernels itself are also wrong and personally I don't see how they fully exploit the RVV machine (also provided a working testcase). |
I dont know how to help to forward this, fell free to reuse this working draft. |
Implementation of paper Tensor Program Optimization for the RISC-V Vector Extension Using Probabilistic Programs