-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detecting wasted slots due to instruction dependency chain #7
Comments
Hello,
Here is a demo using
Now this is high IPC code so it is not the data-dependency case you are after. The You can tweak that to run your app and then share output using |
In one program, an instruction X is stuck and nothing else can go through until the instruction X is unstuck, e.g. when the piece of data from the memory arrives or a computational resource becomes available. The pipeline is full of instructions in various stages of execution, but nothing can go forward because X has not completed. But, in another program, instruction Y is also stuck for the same reason, but the CPU is still executing instructions because there is enough ILP. Both programs will have a high number in the corresponding CoreBound metric. Now, is it possible to read some CPU counters to figure out what percentage of cycles/slots the CPU was idle because it didn't have anything to do? |
The program with instruction X might be either memory bound or Core Bound depending whether the RS got drained while the memory load from X was pending. This key in the TMA split of Backend Bound. There are multiple counters. What is the pipe stage in your counter quest? |
Hi!
Is there a way to detect wasted slots due to instruction dependency chains in my code? The code is high on data cache misses, but to make things worse, there are loop carried dependecies and little available instruction level parallelism. There are two approaches to fix this: decrease data cache misses or increase available ILP. How to detect this?
The text was updated successfully, but these errors were encountered: