[WIP] add deepseek ocr #41797

molbap · 2025-10-22T21:04:03Z

What does this PR do?

As per title. Architecturally: Llava-next used as skeleton with a modified SamModel and a modified ClipVisionModel, keeping the deepseekV2 decoder untouched (using AutoModel) and changing using config only.

Working config + random weights init
Modular draft with subconfigs (two vision configs)
Conversion from original checkpoint done
Modular model finished
Integration tests/OCR tests working as in original codebase
Make modular slimmer
Make processor faster
Complete test suite for transformers

github-actions · 2025-10-22T21:05:13Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto

molbap added 9 commits October 22, 2025 12:05

hop

e931114

iterate

60a825b

fix

72c640c

fixup

690455e

make things simple

e2182c3

update conversion

20e3f6c

I believe this is not needed

7099e23

imports breathing better

edbbd0a

650 loc modular

20c5e0f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] add deepseek ocr #41797

[WIP] add deepseek ocr #41797

molbap commented Oct 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] add deepseek ocr #41797

Are you sure you want to change the base?

[WIP] add deepseek ocr #41797

Conversation

molbap commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

molbap commented Oct 22, 2025 •

edited

Loading