Skip to content

Conversation

@zyang6
Copy link
Contributor

@zyang6 zyang6 commented Sep 21, 2025

This PR adds Ascend version support for the wan2.1 functionality, with the following key implementations:

  1. NPU Platform Integration:

    • Added dedicated platform interface for Ascend NPU in platforms/ directory
    • Implemented NPU-specific initialization and device management logic
  2. Communicator Enhancement:

    • Developed NPU-optimized communicator module for efficient data transmission
    • Added support for collective communication operations on Ascend chips
  3. End-to-End Functionality:

    • Integrated the above components to fully enable wan2.1 features on Ascend platform

This implementation allows wan2.1 to run natively on Ascend NPUs

@zyang6 zyang6 changed the title Add wan2.1 functionality support for Ascend platform Add wan2.1 functionality support for Ascend NPU platform Sep 21, 2025
@@ -0,0 +1,74 @@
# SPDX-License-Identifier: Apache-2.0
# Adapted from https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/distributed/device_communicators/cuda_communicator.py

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this. DeviceCommunicatorBase is also defined here, and your code is based on the NPU implementation of this class.

@@ -0,0 +1,165 @@
#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete these comments.

@@ -0,0 +1,250 @@
#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

else:
backend = "nccl"
logger.info("Using nccl backend for CUDA platform")
# if backend == "nccl" and not current_platform.is_cuda_alike():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused code comments. This will not be repeated hereafter. Please conduct a comprehensive check.

# Use gloo backend for non-CUDA platforms (MPS, CPU)
backend = "gloo"
logger.info("Using gloo backend for %s platform",
if backend == "nccl" or backend == "hccl":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is backend assigned as "hccl"? No possible assignment has been found.

if current_platform.is_cuda_alike():
device = torch.device(f"cuda:{local_rank}")
torch.cuda.set_device(device)
if current_platform.is_npu():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be elif

device = torch.device(f"cuda:{local_rank}")
torch.cuda.set_device(device)
if current_platform.is_npu():
device = torch.device(f"npu:{local_rank}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicate code. Try another branching approach with less code duplication.

def get_attn_backend_cls(cls, selected_backend: AttentionBackendEnum | None,
head_size: int, dtype: torch.dtype) -> str:
# the NPU only supports Flash Attention
# TODO(will): Other tasks will be synchronized in subsequent updates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove todo

@classmethod
def is_pin_memory_available(cls):
return True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standardize the number of blank lines.

from fastvideo.training.training_pipeline import TrainingPipeline
from fastvideo.utils import is_vsa_available
from fastvideo.platforms import current_platform
if current_platform.is_npu():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use this. It's not advisable to replace APIs without careful consideration. We need to analyze the adaptation points one by one and replace them individually.

Copy link
Collaborator

@SolitaryThinker SolitaryThinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your contribution! I've left some comments. Please let me know when this PR is ready for CI tests. Meanwhile you can install/run our pre-commit linters using the following commands:

# Linting, formatting and static type checking
pre-commit install --hook-type pre-commit --hook-type commit-msg

# You can manually run pre-commit with
pre-commit run --all-files

.gitignore Outdated
preprocess_output_text/
=======
log/
>>>>>>> 1a6592a4 (add: npu platform)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix please

"hcclComm_t",
"aclrtStream_t",
"buffer_type",
] No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add newline character to end of last line please

torch.npu.reset_peak_memory_stats()

@classmethod
def get_attn_backend_cls(cls, selected_backend: AttentionBackendEnum | None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ascend NPU support all these attention backends? It would be good to remove any that is not supported yet and fallback to torch sdpa

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, currently only SDPA is supported, and modifications have been made.


def all_reduce(self, input_, op: torch.distributed.ReduceOp | None = None):
pyhccl_comm = self.pyhccl_comm
assert pyhccl_comm is not None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assert failed message

@zyang6
Copy link
Contributor Author

zyang6 commented Sep 23, 2025

Hi, thanks for your contribution! I've left some comments. Please let me know when this PR is ready for CI tests. Meanwhile you can install/run our pre-commit linters using the following commands:

# Linting, formatting and static type checking
pre-commit install --hook-type pre-commit --hook-type commit-msg

# You can manually run pre-commit with
pre-commit run --all-files

Thank you for your review and feedback! I'll address all the comments promptly and let you know once the PR is ready for CI tests.
I'll also run the pre-commit linters using the provided commands to ensure code quality before finalizing the changes.

try:
self.hccl = HCCLLibrary(library_path)
except Exception:
# disable because of missing HCCL library
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the error message has been added

stream = current_stream()
if src == self.rank:
buffer = buffer_type(tensor.data_ptr())
else:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same code in if and else

@zyang6 zyang6 requested a review from tardis-key September 27, 2025 09:47
collate_fn=passthrough,
num_workers=num_data_workers,
pin_memory=True,
pin_memory_device = current_platform.device_name,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no extra spaces on either side of the equals sign

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Linting, formatting and static type checking
pre-commit install --hook-type pre-commit --hook-type commit-msg

# You can manually run pre-commit with
pre-commit run --all-files

try:
self.hccl = HCCLLibrary(library_path)
except Exception:
print("disable hccl because of missing HCCL library")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use logger.error or warning instead

@zyang6 zyang6 marked this pull request as ready for review September 28, 2025 02:12
logger.info("NPU is available")
else:
logger.info("NPU is not available")
except Exception as e:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use detailed exception if possible

raise NotImplementedError

@classmethod
def get_torch_device(cls) -> Any:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this "Any"? If we can't, just not using typing for this function

@SolitaryThinker SolitaryThinker added the go Trigger Buildkite CI label Oct 1, 2025
@SolitaryThinker
Copy link
Collaborator

Hi, please run pre-commit and address any lint errors

@SolitaryThinker SolitaryThinker merged commit 87489f0 into hao-ai-lab:main Oct 9, 2025
1 check failed
qimcis pushed a commit to qimcis/FastVideo that referenced this pull request Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Trigger Buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants