Skip to content

[MCA] Inaccuracy in small snippet #99395

Open
@boomanaiden154

Description

@boomanaiden154

Take the small snippet:

incq %r15
addq $0x4, %r13
cmpq $0x3f, %r15

Running this through MCA on skylake/skylake-avx512 produces the following:

Iterations:        100
Instructions:      300
Total Cycles:      104
Total uOps:        300

Dispatch Width:    6
uOps Per Cycle:    2.88
IPC:               2.88
Block RThroughput: 0.8


Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     0.25                        incq	%r15
 1      1     0.25                        addq	$4, %r13
 1      1     0.25                        cmpq	$63, %r15


Resources:
[0]   - SKXDivider
[1]   - SKXFPDivider
[2]   - SKXPort0
[3]   - SKXPort1
[4]   - SKXPort2
[5]   - SKXPort3
[6]   - SKXPort4
[7]   - SKXPort5
[8]   - SKXPort6
[9]   - SKXPort7


Resource pressure per iteration:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]
 -      -     0.75   0.75    -      -      -     0.75   0.75    -

Resource pressure by instruction:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    Instructions:
 -      -     0.24   0.25    -      -      -     0.26   0.25    -     incq	%r15
 -      -     0.25   0.25    -      -      -     0.25   0.25    -     addq	$4, %r13
 -      -     0.26   0.25    -      -      -     0.24   0.25    -     cmpq	$63, %r15

However, running this within llvm-exegesis (llvm-exegesis -snippets-file=/tmp/test.s --mode=latency) produces the following:

---
mode:            latency
key:
  instructions:
    - 'INC64r R15 R15'
    - 'ADD64ri8 R13 R13 i_0x4'
    - 'CMP64ri8 R15 i_0x3f'
  config:          ''
  register_initial_values:
    - 'R15=0x123456'
    - 'R13=0x123456'
cpu_name:        skylake-avx512
llvm_triple:     x86_64-pc-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 0.4234, per_snippet_value: 1.26995, validation_counters: {} }
error:           ''
info:            ''
assembled_snippet: 4157415549BF563412000000000049BD563412000000000049FFC74983C5044983FF3F49FFC74983C5044983FF3F49FFC74983C5044983FF3F49FFC74983C5044983FF3F415D415FC3
...

The predicted throughput from llvm-mca is almost 40% less than the experimental value. UICA seems to agree with the experimental value, predicting 1.25 cycles/iteration as the reciprocal throughput.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions