Add wan2.1 functionality support for Ascend NPU platform #810

zyang6 · 2025-09-21T06:10:28Z

This PR adds Ascend version support for the wan2.1 functionality, with the following key implementations:

NPU Platform Integration:
- Added dedicated platform interface for Ascend NPU in platforms/ directory
- Implemented NPU-specific initialization and device management logic
Communicator Enhancement:
- Developed NPU-optimized communicator module for efficient data transmission
- Added support for collective communication operations on Ascend chips
End-to-End Functionality:
- Integrated the above components to fully enable wan2.1 features on Ascend platform

This implementation allows wan2.1 to run natively on Ascend NPUs

add: npu platform

tardis-key · 2025-09-22T07:24:05Z

fastvideo/distributed/device_communicators/npu_communicator.py

@@ -0,0 +1,74 @@
+# SPDX-License-Identifier: Apache-2.0
+# Adapted from https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/distributed/device_communicators/cuda_communicator.py


Remove this. DeviceCommunicatorBase is also defined here, and your code is based on the NPU implementation of this class.

tardis-key · 2025-09-22T07:25:24Z

fastvideo/distributed/device_communicators/pyhccl.py

@@ -0,0 +1,165 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.


Delete these comments.

tardis-key · 2025-09-22T07:26:56Z

fastvideo/distributed/device_communicators/pyhccl_wrapper.py

@@ -0,0 +1,250 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.


tardis-key · 2025-09-22T07:30:14Z

fastvideo/distributed/parallel_state.py

+        else:
+            backend = "nccl"
+            logger.info("Using nccl backend for CUDA platform")
+    # if backend == "nccl" and not current_platform.is_cuda_alike():


Remove unused code comments. This will not be repeated hereafter. Please conduct a comprehensive check.

tardis-key · 2025-09-22T07:34:15Z

fastvideo/distributed/parallel_state.py

-        # Use gloo backend for non-CUDA platforms (MPS, CPU)
-        backend = "gloo"
-        logger.info("Using gloo backend for %s platform",
+    if backend == "nccl" or backend == "hccl":


Where is backend assigned as "hccl"? No possible assignment has been found.

tardis-key · 2025-09-22T07:39:32Z

fastvideo/distributed/parallel_state.py

    if current_platform.is_cuda_alike():
        device = torch.device(f"cuda:{local_rank}")
        torch.cuda.set_device(device)
+    if current_platform.is_npu():


It should be elif

tardis-key · 2025-09-22T07:40:45Z

fastvideo/distributed/parallel_state.py

        device = torch.device(f"cuda:{local_rank}")
        torch.cuda.set_device(device)
+    if current_platform.is_npu():
+        device = torch.device(f"npu:{local_rank}")


This is duplicate code. Try another branching approach with less code duplication.

tardis-key · 2025-09-22T07:43:06Z

fastvideo/platforms/npu.py

+    def get_attn_backend_cls(cls, selected_backend: AttentionBackendEnum | None,
+                             head_size: int, dtype: torch.dtype) -> str:
+        # the NPU only supports Flash Attention
+        # TODO(will): Other tasks will be synchronized in subsequent updates.


Remove todo

tardis-key · 2025-09-22T07:44:38Z

fastvideo/platforms/npu.py

+    @classmethod
+    def is_pin_memory_available(cls):
+        return True
+


Standardize the number of blank lines.

tardis-key · 2025-09-22T07:46:16Z

fastvideo/training/wan_training_pipeline.py

 from fastvideo.training.training_pipeline import TrainingPipeline
 from fastvideo.utils import is_vsa_available
+from fastvideo.platforms import current_platform
+if current_platform.is_npu():


Don't use this. It's not advisable to replace APIs without careful consideration. We need to analyze the adaptation points one by one and replace them individually.

SolitaryThinker

Hi, thanks for your contribution! I've left some comments. Please let me know when this PR is ready for CI tests. Meanwhile you can install/run our pre-commit linters using the following commands:

# Linting, formatting and static type checking
pre-commit install --hook-type pre-commit --hook-type commit-msg

# You can manually run pre-commit with
pre-commit run --all-files

SolitaryThinker · 2025-09-22T21:11:56Z

.gitignore

 preprocess_output_text/
+=======
+log/
+>>>>>>> 1a6592a4 (add: npu platform)


SolitaryThinker · 2025-09-22T21:16:19Z

fastvideo/distributed/device_communicators/pyhccl_wrapper.py

+    "hcclComm_t",
+    "aclrtStream_t",
+    "buffer_type",
+]


add newline character to end of last line please

SolitaryThinker · 2025-09-22T21:20:33Z

fastvideo/platforms/npu.py

+        torch.npu.reset_peak_memory_stats()
+
+    @classmethod
+    def get_attn_backend_cls(cls, selected_backend: AttentionBackendEnum | None,


Does ascend NPU support all these attention backends? It would be good to remove any that is not supported yet and fallback to torch sdpa

Okay, currently only SDPA is supported, and modifications have been made.

mitseng · 2025-09-23T03:22:39Z

fastvideo/distributed/device_communicators/npu_communicator.py

+
+    def all_reduce(self, input_, op: torch.distributed.ReduceOp | None = None):
+        pyhccl_comm = self.pyhccl_comm
+        assert pyhccl_comm is not None


add assert failed message

zyang6 · 2025-09-23T15:11:13Z

Hi, thanks for your contribution! I've left some comments. Please let me know when this PR is ready for CI tests. Meanwhile you can install/run our pre-commit linters using the following commands:
# Linting, formatting and static type checking
pre-commit install --hook-type pre-commit --hook-type commit-msg

# You can manually run pre-commit with
pre-commit run --all-files

Thank you for your review and feedback! I'll address all the comments promptly and let you know once the PR is ready for CI tests.
I'll also run the pre-commit linters using the provided commands to ensure code quality before finalizing the changes.

mitseng · 2025-09-24T07:06:31Z

fastvideo/distributed/device_communicators/pyhccl.py

+        try:
+            self.hccl = HCCLLibrary(library_path)
+        except Exception:
+            # disable because of missing HCCL library


add error message

ok, the error message has been added

mitseng · 2025-09-24T08:41:24Z

fastvideo/distributed/device_communicators/pyhccl.py

+            stream = current_stream()
+        if src == self.rank:
+            buffer = buffer_type(tensor.data_ptr())
+        else:


same code in if and else

tardis-key · 2025-09-28T01:24:05Z

fastvideo/dataset/parquet_dataset_map_style.py

        collate_fn=passthrough,
        num_workers=num_data_workers,
        pin_memory=True,
+        pin_memory_device = current_platform.device_name,


no extra spaces on either side of the equals sign

# Linting, formatting and static type checking pre-commit install --hook-type pre-commit --hook-type commit-msg # You can manually run pre-commit with pre-commit run --all-files

tardis-key · 2025-09-28T01:30:08Z

fastvideo/distributed/device_communicators/pyhccl.py

+        try:
+            self.hccl = HCCLLibrary(library_path)
+        except Exception:
+            print("disable hccl because of missing HCCL library")


use logger.error or warning instead

szm111 · 2025-09-28T02:50:07Z

fastvideo/platforms/__init__.py

+            logger.info("NPU is available")
+        else:
+            logger.info("NPU is not available")
+    except Exception as e:


Please use detailed exception if possible

szm111 · 2025-09-28T02:59:08Z

fastvideo/platforms/interface.py

        raise NotImplementedError

+    @classmethod
+    def get_torch_device(cls) -> Any:


Can we replace this "Any"? If we can't, just not using typing for this function

SolitaryThinker · 2025-10-01T20:34:48Z

Hi, please run pre-commit and address any lint errors

) Co-authored-by: kiritorl <[email protected]>

zyang6 and others added 2 commits September 20, 2025 09:02

resolve conflicts with npu platform commit

df9d24e

add: npu platform

add: device npu communicator

8c79f57

zyang6 changed the title ~~Add wan2.1 functionality support for Ascend platform~~ Add wan2.1 functionality support for Ascend NPU platform Sep 21, 2025

zyang6 force-pushed the main branch from 79610a3 to fc764f3 Compare September 21, 2025 06:50

fix: clean up unnecessary changes

7548b7e

zyang6 force-pushed the main branch from fc764f3 to 7548b7e Compare September 21, 2025 08:09

tardis-key suggested changes Sep 22, 2025

View reviewed changes

SolitaryThinker reviewed Sep 22, 2025

View reviewed changes

mitseng reviewed Sep 23, 2025

View reviewed changes

mitseng reviewed Sep 24, 2025

View reviewed changes

kiritorl and others added 3 commits September 27, 2025 13:17

fix: fix some review comments

67f791f

fix: fix some review comments

c5d71da

fix: revisions in response to code review feedback

6271eb1

zyang6 requested a review from tardis-key September 27, 2025 09:47

tardis-key reviewed Sep 28, 2025

View reviewed changes

tardis-key approved these changes Sep 28, 2025

View reviewed changes

fix:Format modification

62764db

zyang6 marked this pull request as ready for review September 28, 2025 02:12

zyang6 requested review from SolitaryThinker and mitseng September 28, 2025 02:12

szm111 reviewed Sep 28, 2025

View reviewed changes

SolitaryThinker added the go Trigger Buildkite CI label Oct 1, 2025

zyang6 and others added 3 commits October 8, 2025 09:52

fix: detailed exception and remove Any

0baf4fc

Merge branch 'main' into main

93b77ed

fix: fix device bug

449654c

SolitaryThinker approved these changes Oct 9, 2025

View reviewed changes

SolitaryThinker merged commit 87489f0 into hao-ai-lab:main Oct 9, 2025
1 check failed

qimcis pushed a commit to qimcis/FastVideo that referenced this pull request Oct 30, 2025

Add wan2.1 functionality support for Ascend NPU platform (hao-ai-lab#810

3da26b5

) Co-authored-by: kiritorl <[email protected]>

		@@ -0,0 +1,74 @@
		# SPDX-License-Identifier: Apache-2.0
		# Adapted from https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/distributed/device_communicators/cuda_communicator.py

		@@ -0,0 +1,165 @@
		#
		# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.

		@@ -0,0 +1,250 @@
		#
		# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.

Add wan2.1 functionality support for Ascend NPU platform #810

Add wan2.1 functionality support for Ascend NPU platform #810

Conversation

zyang6 commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SolitaryThinker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zyang6 commented Sep 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SolitaryThinker commented Oct 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zyang6 commented Sep 21, 2025 •

edited

Loading