Skip to content

Releases: ROCm/TransferBench

rocm-7.1.1

26 Nov 06:33
a824bc1

Choose a tag to compare

ROCm release v7.1.1

TransferBench v1.65.00

13 Nov 05:56
3f8d00d

Choose a tag to compare

v1.65.00

Added

  • Added warp-level dispatch support via GFX_SE_TYPE environment variable
    • GFX_SE_TYPE=0 (default): Threadblock-level dispatch, each subexecutor is a threadblock
    • GFX_SE_TYPE=1: Warp-level dispatch, each subexecutor is a single warp

rocm-7.1.0

30 Oct 05:21
a824bc1

Choose a tag to compare

ROCm release v7.1.0

rocm-7.0.2

10 Oct 12:08
6bcbcf4

Choose a tag to compare

ROCm release v7.0.2

rocm-6.4.4

24 Sep 14:00
0fbfbdd

Choose a tag to compare

ROCm release v6.4.4

rocm-7.0.1

17 Sep 16:40
6bcbcf4

Choose a tag to compare

ROCm release v7.0.1

rocm-7.0.0

16 Sep 06:36
6bcbcf4

Choose a tag to compare

ROCm release v7.0.0

TransferBench v1.64.00

04 Sep 22:43
a824bc1

Choose a tag to compare

v1.64.00

Added

  • Added BLOCKSIZES to a2asweep preset to allow also sweeping over threadblock sizes
  • Added FILL_COMPRESS to allow more control over input data pattern
    • FILL_COMPRESS takes in a comma-separated list of integer percentages (that must add up to 100)
      that sets the percentages of 64B lines to be filled by random/1B0/2B0/4B0/32B0 data patterns
      • Bins:
        • 0 - random
        • 1 - 1B0 upper 1 byte of each aligned 2 bytes is 0
        • 2 - 2B0 upper 2 bytes of each aligned 4 bytes is 0
        • 3 - 4B0 upper 4 bytes of each aligned 8 bytes is 0
        • 4 - 32B0 upper 32 bytes of each aligned 64-byte line are 0
    • FILL_PATTERN will be ignored if FILL_COMPRESS is specified
  • Additional details about data patterns generated will be printed if the debug env var DUMP_LINES is
    set to a non-zero value, which also corresponds to how many 64 byte lines will be printed

Modified

  • Increased GFX_BLOCKSIZE limit from 512 to 1024 (still requires multiple of 64)

Fixed

  • Fixed bug when using BYTE_OFFSET

TransferBench v1.63.00

08 Aug 22:58
023ce41

Choose a tag to compare

v1.63.00

Added

  • Added gfx950, gfx1150, and gfx1151 to default GPU targets list in CMake builds

Modified

  • Removing self-GPU check for DMA engine copies
  • Switched to amdclang++ as primary compiler
  • healthcheck preset adds HBM testing and support for more MI3XX variants

Fixed

  • Fixed issue when using "P" memory type and specific DMA subengines
  • Fixed issue with subiteration timing reports

rocm-6.4.3

07 Aug 14:19
0fbfbdd

Choose a tag to compare

ROCm release v6.4.3