Skip to content

Commit adff279

Browse files
committed
Remove reference of video_load_backend and video_fps for processor
Signed-off-by: cyy <[email protected]>
1 parent 44b3888 commit adff279

File tree

7 files changed

+9
-18
lines changed

7 files changed

+9
-18
lines changed

docs/source/en/chat_templating_multimodal.md

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -195,10 +195,6 @@ messages = [
195195

196196
Pass `messages` to [`~ProcessorMixin.apply_chat_template`] to tokenize the input content. There are a few extra parameters to include in [`~ProcessorMixin.apply_chat_template`] that controls the sampling process.
197197

198-
The `video_load_backend` parameter refers to a specific framework to load a video. It supports [PyAV](https://pyav.basswood-io.com/docs/stable/), [Decord](https://github.com/dmlc/decord), [OpenCV](https://github.com/opencv/opencv), and [torchvision](https://pytorch.org/vision/stable/index.html).
199-
200-
The examples below use Decord as the backend because it is a bit faster than PyAV.
201-
202198
<hfoptions id="sampling">
203199
<hfoption id="fixed number of frames">
204200

@@ -213,7 +209,6 @@ processed_chat = processor.apply_chat_template(
213209
return_dict=True,
214210
return_tensors="pt",
215211
num_frames=32,
216-
video_load_backend="decord",
217212
)
218213
print(processed_chat.keys())
219214
```
@@ -223,16 +218,15 @@ These inputs are now ready to be used in [`~GenerationMixin.generate`].
223218
</hfoption>
224219
<hfoption id="fps">
225220

226-
For longer videos, it may be better to sample more frames for better representation with the `video_fps` parameter. This determines how many frames per second to extract. As an example, if a video is 10 seconds long and `video_fps=2`, then the model samples 20 frames. In other words, 2 frames are uniformly sampled every 10 seconds.
221+
For longer videos, it may be better to sample more frames for better representation with the `fps` parameter. This determines how many frames per second to extract. As an example, if a video is 10 seconds long and `fps=2`, then the model samples 20 frames. In other words, 2 frames are uniformly sampled every 10 seconds.
227222

228223
```py
229224
processed_chat = processor.apply_chat_template(
230225
messages,
231226
add_generation_prompt=True,
232227
tokenize=True,
233228
return_dict=True,
234-
video_fps=16,
235-
video_load_backend="decord",
229+
fps=16,
236230
)
237231
print(processed_chat.keys())
238232
```

docs/source/en/model_doc/qwen2_5_omni.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ inputs = processor.apply_chat_template(
8383
tokenize=True,
8484
return_dict=True,
8585
return_tensors="pt",
86-
video_fps=1,
86+
fps=1,
8787

8888
# kwargs to be passed to `Qwen2-5-OmniProcessor`
8989
padding=True,

docs/source/en/model_doc/qwen2_5_vl.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
146146

147147
inputs = processor.apply_chat_template(
148148
conversation,
149-
video_fps=1,
149+
fps=1,
150150
add_generation_prompt=True,
151151
tokenize=True,
152152
return_dict=True,

docs/source/en/model_doc/qwen2_vl.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ conversation = [
9999

100100
inputs = processor.apply_chat_template(
101101
conversation,
102-
video_fps=1,
102+
fps=1,
103103
add_generation_prompt=True,
104104
tokenize=True,
105105
return_dict=True,
@@ -169,7 +169,7 @@ conversations = [conversation1, conversation2, conversation3, conversation4]
169169
# Preparation for batch inference
170170
ipnuts = processor.apply_chat_template(
171171
conversations,
172-
video_fps=1,
172+
fps=1,
173173
add_generation_prompt=True,
174174
tokenize=True,
175175
return_dict=True,

docs/source/ko/model_doc/qwen2_vl.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ conversation = [
9797

9898
inputs = processor.apply_chat_template(
9999
conversation,
100-
video_fps=1,
100+
fps=1,
101101
add_generation_prompt=True,
102102
tokenize=True,
103103
return_dict=True,
@@ -167,7 +167,7 @@ conversations = [conversation1, conversation2, conversation3, conversation4]
167167
# 배치 추론을 위한 준비
168168
ipnuts = processor.apply_chat_template(
169169
conversations,
170-
video_fps=1,
170+
fps=1,
171171
add_generation_prompt=True,
172172
tokenize=True,
173173
return_dict=True,

src/transformers/processing_utils.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1448,7 +1448,6 @@ def validate_init_kwargs(processor_config, valid_kwargs):
14481448

14491449
return unused_kwargs, valid_kwargs
14501450

1451-
@deprecate_kwarg("video_fps", version="4.58", new_name="fps")
14521451
@deprecate_kwarg(
14531452
"video_load_backend",
14541453
version="4.59",
@@ -1622,7 +1621,7 @@ def apply_chat_template(
16221621
if self.tokenizer.bos_token is not None and single_prompt.startswith(self.tokenizer.bos_token):
16231622
kwargs["add_special_tokens"] = False
16241623

1625-
# Always sample frames by default unless explicitly set to `False` by users. If users do not pass `num_frames`/`video_fps`
1624+
# Always sample frames by default unless explicitly set to `False` by users. If users do not pass `num_frames`/`fps`
16261625
# sampling should not done for BC.
16271626
if "do_sample_frames" not in kwargs and (
16281627
kwargs.get("fps") is not None or kwargs.get("num_frames") is not None

tests/models/perception_lm/test_modeling_perception_lm.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -436,7 +436,6 @@ def test_small_model_integration_test(self):
436436
tokenize=True,
437437
return_dict=True,
438438
return_tensors="pt",
439-
video_load_backend="decord",
440439
padding=True,
441440
padding_side="left",
442441
).to(torch_device)
@@ -462,7 +461,6 @@ def test_small_model_integration_test_batched(self):
462461
tokenize=True,
463462
return_dict=True,
464463
return_tensors="pt",
465-
video_load_backend="decord",
466464
padding=True,
467465
padding_side="left",
468466
).to(torch_device)

0 commit comments

Comments
 (0)