Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: replace inf with max or min finite value, then do softmax #3059

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

KenForever1
Copy link

Motivation

When I deploy a large model inference model, there is an inf value in the scores value, and calling the softmax function results in an nan value. This can cause some errors, such as:

import torch
import torch_npu

#inf_tensor = torch.full((1, 10), float('inf'), dtype=torch.float16)
# or
inf_tensor = torch.tensor([[1, 2, 3, 4, float('inf')]], dtype=torch.float16)

inf_tensor = inf_tensor.npu()
print(inf_tensor)

#res_nan = inf_tensor.softmax(1)
#print(res_nan)

# fix buy replacing inf with max value
res = _softmax_scores(nan_tensor)
print(res)

# error occurred
#sampled_index = torch.multinomial(res_nan,
#                                num_samples=1,
#                                replacement=True)
#print(sampled_index)

Modification

I added the _softmax_stores function and wrapped the softmax function, if score has inf, replace it with max or min finite value, then do softmax.

@lvhan028 lvhan028 requested a review from grimoire January 21, 2025 04:43
@lvhan028
Copy link
Collaborator

lvhan028 commented Jan 21, 2025

Seems that this issue happened on NPU devices.
Is it better if this patch is applied in dlinfer?
cc @jinminxi104

@jinminxi104
Copy link
Collaborator

We faced this issue when the temperature was set to 0. Could you check the value of temperature in your case?

@KenForever1
Copy link
Author

We faced this issue when the temperature was set to 0. Could you check the value of temperature in your case?

When I encountered this problem, it was the following configuration.

(Pdb) p gen_config
GenerationConfig(n=1, max_new_tokens=1024, do_sample=True, top_p=1.0, top_k=40, min_p=0.0, temperature=0.0, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, stop_token_ids=None, bad_token_ids=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None, response_format=None, logits_processors=None)

def _softmax_scores(scores: torch.Tensor):
"""softmax scores."""
# if score has inf, replace it with max or min finite value, then do softmax
if torch.isinf(scores).any():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any() would synchronize the stream, and harm the performance.


device = scores.device

scores = torch.where(scores == float('inf'), torch.tensor(max_finite_value, dtype=dtype, device=device), scores)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clamp should be better.

@jinminxi104
Copy link
Collaborator

temperature=0.0

Please set a non-zero temperature(please check the code in lmdeploy(

def _process_temperature_(scores: torch.Tensor, temperature: torch.Tensor):
), and set a proper value of temperature)

checking inf reduces the performance in my opinion.

@KenForever1
Copy link
Author

temperature=0.0

Please set a non-zero temperature(please check the code in lmdeploy(

def _process_temperature_(scores: torch.Tensor, temperature: torch.Tensor):

), and set a proper value of temperature)
checking inf reduces the performance in my opinion.

You're right, when I set temperature=0, it's actually temperature=1e-6. After calling this function, the score becomes inf.
I set temperature=0.1 and the result is correct. But theoretically, how much temperature should be set to ensure that the inference of the same prompt is unique and the result remains unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants