-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hover 翻译悬停文档的缩进不正确问题 #256
Labels
bug
Something isn't working
Comments
请提供下复现环境 |
操作系统:macOS Sequoia 15.0.1 """
Converts a list of dictionaries with `"role"` and `"content"` keys to a list of token
ids. This method is intended for use with chat models, and will read the tokenizer's chat_template attribute to
determine the format and control tokens to use when converting.
Args:
conversation (Union[List[Dict[str, str]], List[List[Dict[str, str]]]]): A list of dicts
with "role" and "content" keys, representing the chat history so far.
tools (`List[Dict]`, *optional*):
A list of tools (callable functions) that will be accessible to the model. If the template does not
support function calling, this argument will have no effect. Each tool should be passed as a JSON Schema,
giving the name, description and argument types for the tool. See our
[chat templating guide](https://huggingface.co/docs/transformers/main/en/chat_templating#automated-function-conversion-for-tool-use)
for more information.
documents (`List[Dict[str, str]]`, *optional*):
A list of dicts representing documents that will be accessible to the model if it is performing RAG
(retrieval-augmented generation). If the template does not support RAG, this argument will have no
effect. We recommend that each document should be a dict containing "title" and "text" keys. Please
see the RAG section of the [chat templating guide](https://huggingface.co/docs/transformers/main/en/chat_templating#arguments-for-RAG)
for examples of passing documents with chat templates.
chat_template (`str`, *optional*):
A Jinja template to use for this conversion. It is usually not necessary to pass anything to this
argument, as the model's template will be used by default.
add_generation_prompt (bool, *optional*):
If this is set, a prompt with the token(s) that indicate
the start of an assistant message will be appended to the formatted output. This is useful when you want to generate a response from the model.
Note that this argument will be passed to the chat template, and so it must be supported in the
template for this argument to have any effect.
continue_final_message (bool, *optional*):
If this is set, the chat will be formatted so that the final
message in the chat is open-ended, without any EOS tokens. The model will continue this message
rather than starting a new one. This allows you to "prefill" part of
the model's response for it. Cannot be used at the same time as `add_generation_prompt`.
tokenize (`bool`, defaults to `True`):
Whether to tokenize the output. If `False`, the output will be a string.
padding (`bool`, defaults to `False`):
Whether to pad sequences to the maximum length. Has no effect if tokenize is `False`.
truncation (`bool`, defaults to `False`):
Whether to truncate sequences at the maximum length. Has no effect if tokenize is `False`.
max_length (`int`, *optional*):
Maximum length (in tokens) to use for padding or truncation. Has no effect if tokenize is `False`. If
not specified, the tokenizer's `max_length` attribute will be used as a default.
return_tensors (`str` or [`~utils.TensorType`], *optional*):
If set, will return tensors of a particular framework. Has no effect if tokenize is `False`. Acceptable
values are:
- `'tf'`: Return TensorFlow `tf.Tensor` objects.
- `'pt'`: Return PyTorch `torch.Tensor` objects.
- `'np'`: Return NumPy `np.ndarray` objects.
- `'jax'`: Return JAX `jnp.ndarray` objects.
return_dict (`bool`, defaults to `False`):
Whether to return a dictionary with named outputs. Has no effect if tokenize is `False`.
tokenizer_kwargs (`Dict[str: Any]`, *optional*): Additional kwargs to pass to the tokenizer.
return_assistant_tokens_mask (`bool`, defaults to `False`):
Whether to return a mask of the assistant generated tokens. For tokens generated by the assistant,
the mask will contain 1. For user and system tokens, the mask will contain 0.
This functionality is only available for chat templates that support it via the `{% generation %}` keyword.
**kwargs: Additional kwargs to pass to the template renderer. Will be accessible by the chat template.
Returns:
`Union[List[int], Dict]`: A list of token ids representing the tokenized chat so far, including control tokens. This
output is ready to pass to the model, either directly or via methods like `generate()`. If `return_dict` is
set, will return a dict of tokenizer outputs instead.
""" |
dart ? |
python 3.12 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
对于长文档,会把换行变成缩进,造成阅读障碍。
![image](https://private-user-images.githubusercontent.com/151734202/402833064-1b34487f-a4fc-414b-b51c-66c7d9f5c356.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NjEzMTMsIm5iZiI6MTczOTQ2MTAxMywicGF0aCI6Ii8xNTE3MzQyMDIvNDAyODMzMDY0LTFiMzQ0ODdmLWE0ZmMtNDE0Yi1iNTFjLTY2YzdkOWY1YzM1Ni5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjEzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxM1QxNTM2NTNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zNzFhNTYwNzA4YjIxMWY2ZThlOWUwZDRkMzkyOTJmYTFhNDFmNzEwYzUxZWJiYmJlNmEwMzI1ODA0ZDRkN2Y1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.9cdjEbEsNBBgRTwvVAn7bOx7ifz58pJStFXN8bZ1s38)
![image](https://private-user-images.githubusercontent.com/151734202/402833185-9794ba70-afd7-4fc1-9bf1-2affe5430370.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NjEzMTMsIm5iZiI6MTczOTQ2MTAxMywicGF0aCI6Ii8xNTE3MzQyMDIvNDAyODMzMTg1LTk3OTRiYTcwLWFmZDctNGZjMS05YmYxLTJhZmZlNTQzMDM3MC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjEzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxM1QxNTM2NTNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yMWY1MjJjZmQ3NGI5MjM4OTkyNjFiNjdmMjhkMzc3Nzg5OTZiOWI2ZDg2NzE0ZTEwODVhNDljMDA1OGViY2M3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.luTP4mxHwtgTuNoSrkSIjK4r7wa2U87TAOGdirrRuAc)
The text was updated successfully, but these errors were encountered: