fix: replace inf with max or min finite value, then do softmax #3059

KenForever1 · 2025-01-21T04:17:26Z

Motivation

When I deploy a large model inference model, there is an inf value in the scores value, and calling the softmax function results in an nan value. This can cause some errors, such as:

import torch
import torch_npu

#inf_tensor = torch.full((1, 10), float('inf'), dtype=torch.float16)
# or
inf_tensor = torch.tensor([[1, 2, 3, 4, float('inf')]], dtype=torch.float16)

inf_tensor = inf_tensor.npu()
print(inf_tensor)

#res_nan = inf_tensor.softmax(1)
#print(res_nan)

# fix buy replacing inf with max value
res = _softmax_scores(nan_tensor)
print(res)

# error occurred
#sampled_index = torch.multinomial(res_nan,
#                                num_samples=1,
#                                replacement=True)
#print(sampled_index)

Modification

I added the _softmax_stores function and wrapped the softmax function, if score has inf, replace it with max or min finite value, then do softmax.

lvhan028 · 2025-01-21T04:44:43Z

Seems that this issue happened on NPU devices.
Is it better if this patch is applied in dlinfer?
cc @jinminxi104

jinminxi104 · 2025-01-22T13:58:54Z

We faced this issue when the temperature was set to 0. Could you check the value of temperature in your case?

KenForever1 · 2025-01-23T02:34:39Z

We faced this issue when the temperature was set to 0. Could you check the value of temperature in your case?

When I encountered this problem, it was the following configuration.

(Pdb) p gen_config
GenerationConfig(n=1, max_new_tokens=1024, do_sample=True, top_p=1.0, top_k=40, min_p=0.0, temperature=0.0, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, stop_token_ids=None, bad_token_ids=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None, response_format=None, logits_processors=None)

grimoire · 2025-01-23T03:42:28Z

lmdeploy/pytorch/engine/logits_process.py

+        def _softmax_scores(scores: torch.Tensor):
+            """softmax scores."""
+            # if score has inf, replace it with max or min finite value, then do softmax
+            if torch.isinf(scores).any():


any() would synchronize the stream, and harm the performance.

grimoire · 2025-01-23T03:43:39Z

lmdeploy/pytorch/engine/logits_process.py

+
+                device = scores.device
+
+                scores = torch.where(scores == float('inf'), torch.tensor(max_finite_value, dtype=dtype, device=device), scores)


clamp should be better.

jinminxi104 · 2025-01-23T05:20:29Z

temperature=0.0

Please set a non-zero temperature(please check the code in lmdeploy(

lmdeploy/lmdeploy/pytorch/engine/logits_process.py

Line 16 in 76ccef6

def _process_temperature_(scores: torch.Tensor, temperature: torch.Tensor):

), and set a proper value of temperature)

checking inf reduces the performance in my opinion.

KenForever1 · 2025-01-24T03:30:38Z

temperature=0.0

Please set a non-zero temperature(please check the code in lmdeploy(

lmdeploy/lmdeploy/pytorch/engine/logits_process.py

Line 16 in 76ccef6

def _process_temperature_(scores: torch.Tensor, temperature: torch.Tensor):

), and set a proper value of temperature)
checking inf reduces the performance in my opinion.

You're right, when I set temperature=0, it's actually temperature=1e-6. After calling this function, the score becomes inf.
I set temperature=0.1 and the result is correct. But theoretically, how much temperature should be set to ensure that the inference of the same prompt is unique and the result remains unchanged.

fix: replace inf with max or min finite value, then do softmax

c0ecbec

lvhan028 requested a review from grimoire January 21, 2025 04:43

grimoire reviewed Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: replace inf with max or min finite value, then do softmax #3059

fix: replace inf with max or min finite value, then do softmax #3059

KenForever1 commented Jan 21, 2025

lvhan028 commented Jan 21, 2025 •

edited

Loading

jinminxi104 commented Jan 22, 2025

KenForever1 commented Jan 23, 2025

grimoire Jan 23, 2025

grimoire Jan 23, 2025

jinminxi104 commented Jan 23, 2025

KenForever1 commented Jan 24, 2025


		device = scores.device

		scores = torch.where(scores == float('inf'), torch.tensor(max_finite_value, dtype=dtype, device=device), scores)

fix: replace inf with max or min finite value, then do softmax #3059

Are you sure you want to change the base?

fix: replace inf with max or min finite value, then do softmax #3059

Conversation

KenForever1 commented Jan 21, 2025

Motivation

Modification

lvhan028 commented Jan 21, 2025 • edited Loading

jinminxi104 commented Jan 22, 2025

KenForever1 commented Jan 23, 2025

grimoire Jan 23, 2025

Choose a reason for hiding this comment

grimoire Jan 23, 2025

Choose a reason for hiding this comment

jinminxi104 commented Jan 23, 2025

KenForever1 commented Jan 24, 2025

lvhan028 commented Jan 21, 2025 •

edited

Loading