Pm/preseg reorder sharded #4256

Priya2698 · 2025-04-16T05:16:49Z

No description provided.

… permuting

Priya2698 · 2025-04-16T05:17:03Z

!test

github-actions · 2025-04-16T05:17:52Z

Description

Added support for resharding in ReorderShardedAxisPass
Introduced getReshardingIdPair to identify resharding between producer and consumer
Updated axisIndex to work with any domain vector
Added tests for AllgatherLoopSplit and ReduceScatterLoopSplit

Changes walkthrough 📝

Relevant files

Enhancement

utils.cpp `Add resharding support and utility functions` csrc/multidevice/utils.cpp Added `axisIndex` function to find axis index in a domain vector Introduced `getReshardingIdPair` to identify resharding between producer and consumer Updated `isInnerResharding` to use `axisIndex`	+57/-13
reorder_sharded_axis.cpp `Implement resharding logic in pass` csrc/preseg_passes/reorder_sharded_axis.cpp Added `IdModel` and `ValGraph` for exact graph building Implemented resharding logic in `ReorderShardedAxisPass` Added `propagateTransform` function to propagate transformations	+49/-3
utils.h `Add function declarations for resharding` csrc/multidevice/utils.h Added `axisIndex` declaration Added `getReshardingIdPair` declaration Added `getInputsInTargetDomain` declaration	+14/-0

Tests

test_multidevice_communications.cpp `Add loop split tests for communications` tests/cpp/test_multidevice_communications.cpp Added `AllgatherLoopSplit` test case Added `ReduceScatterLoopSplit` test case Added `AllreduceLoopSplit` test case	+128/-0
test_multidevice_lower_communication.cpp `Update communication lowering tests` tests/cpp/test_multidevice_lower_communication.cpp Disabled `AllgatherLoopSplit_Noncontig` test Added `ReorderShardedAxisPass` to test setup	+24/-17

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Possible Issue

The function axisIndex does not correctly handle the case where find_id is not found in the domain. It should return -1 in such cases, but the current implementation returns the index of the last non-trivial dimension instead.

int64_t axisIndex(std::vector<IterDomain*> domain, IterDomain* find_id) {
  int64_t index = 0;
  for (auto* id : domain) {
    if (id == find_id) {
      return index;
    }
    if (!id->isDeviceDim() && !id->isReduction() &&
        !id->isBroadcast()) {
      index++;
    }
  }

Redundant Code

The function getReshardingIdPair checks if both producer and consumer IDs are not reduction or broadcast axes. This check is redundant because it is already ensured in the previous loop.

std::optional<std::pair<IterDomain*, IterDomain*>> getReshardingIdPair(TensorView* producer, TensorView* consumer, ValGraph& graph) {
  auto p_loop_domain = producer->getLoopDomain();
  auto c_loop_domain = consumer->getLoopDomain();
  auto p2c_map = graph.buildMapBetween(
            p_loop_domain, c_loop_domain);

  std::vector<std::pair<IterDomain*, IterDomain*>> resharding_id_pairs;

  bool has_sharding_changes = false;

  IterDomain* resharded_p_id = nullptr;
  IterDomain* resharded_c_id = nullptr;

  for (auto [p_val, c_vals] : p2c_map) {
    auto p_id = p_val->as<IterDomain>();
    auto c_id = c_vals.front()->as<IterDomain>();

    if (!p_id->isDeviceDim() && !c_id->isDeviceDim()) {
      continue;
    }

    // No reordering for reduction and broadcast axes.
    if (p_id->isReduction() || p_id->isBroadcast()) {
      continue;
    }
    if (c_id->isReduction() || c_id->isBroadcast()) {
      continue;
    }

Disabled Test

The test AllgatherLoopSplit_Noncontig is disabled. It should be enabled and verified to ensure that the new functionality works as expected.

TEST_P(LowerCollectiveTest, DISABLED_AllgatherLoopSplit_Noncontig) {
  auto fusion = std::make_unique<Fusion>();
  FusionGuard fg(fusion.get());

  // ProcessGroupNCCL requires the gathered axis to be outermost.
  // We change the allocation of tensorviews to reflect this.
  // We do not modify the logical shape of the tensorview.
  // This would still require one copy on each device if the input tensor is in
  // a different layout.
  const auto d = communicator_->size();
  auto mesh = DeviceMesh::createForNumDevices(d);

  TensorView* tv0 = makeConcreteTensor({5, d * 3});
  tv0->outer_split(1, d);
  tv0->axis(1)->parallelize(ParallelType::DIDx);
  // tv0->reorder({{1, 0}, {2, 1}, {0, 2}});
  // tv0: Logical = [5, d*3], Loop/Allocation = [DIDx(d), 3, 5]

  TensorView* tv1 = set(tv0);
  tv1->outer_split(1, d);
  tv1->axis(1)->parallelize(ParallelType::Serial);
  // tv1->reorder({{1, 0}, {2, 1}, {0, 2}});
  // tv1: Logical = [5, d*3], Loop/Allocation = [Serial(d), 3, 5]

  for (auto tv : {tv0, tv1}) {
    tv->setDeviceMesh(mesh);
    // tv->setAllocationDomain(tv->getLoopDomain(), true);
  }

  fusion->addInput(tv0);
  fusion->addOutput(tv1);

  preseg_passes::OptimizationPass<preseg_passes::ReorderShardedAxisPass>::runPass(fusion.get());
  for (auto tv : fusion->allTvs()) {
    debug() << tv->toString() << std::endl;
    debug() << tv->getMaybeAllocationDomain() << std::endl;
  }

  // at::Tensor unsharded_in_tensor = at::randn({5, d * 3}, tensor_options);
  // at::Tensor in_tensor = shardTensor(unsharded_in_tensor, 1, mesh);

  // FusionExecutorCache executor_cache(std::move(fusion));
  // at::Tensor out_tensor =
  //     executor_cache.runFusionWithInputs({in_tensor})[0].as<at::Tensor>();

  // testValidate(
  //     executor_cache.fusion(),
  //     {out_tensor},
  //     {in_tensor},
  //     {unsharded_in_tensor.transpose(0, 1)},
  //     __LINE__,
  //     __FILE__);
}

Priya2698 added 19 commits April 15, 2025 20:04

allgather loop split, contig + noncontig

3f7ef9c

no devices logical domain

a95898a

check non-device, non-reduction logical shape

8e6877d

fix scatter for loop split

7c1ce7c

update postAllScatter, add tests for ReduceScatter

b7f6a41

another approach for noncontig tensors

698fa01

move scatter, pointwise changes to other PR

5bef6df

undo extraneous change

f61f235

avoid using getShardedLogicalAxis

d38eb59

undo adding sharding to communication test

98fa0ee

comment;

4388a2d

validate tensors against tvs, flatten inp/out in allgather instead of…

0f59ea7

… permuting

pm/reorder

eebe34f

lintrunner

7257cda

lintrunner

5ef3188

wip reorder sharded axis pass

99eee27

wip reorder sharded axis pass

e6eae63

undo allgather changes

6aaa773

wip

b420957

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pm/preseg reorder sharded #4256

Pm/preseg reorder sharded #4256

Priya2698 commented Apr 16, 2025

Priya2698 commented Apr 16, 2025

github-actions bot commented Apr 16, 2025

Pm/preseg reorder sharded #4256

Are you sure you want to change the base?

Pm/preseg reorder sharded #4256

Conversation

Priya2698 commented Apr 16, 2025

Priya2698 commented Apr 16, 2025

github-actions bot commented Apr 16, 2025

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍