Skip to content

Conversation

@Blaizzy
Copy link
Owner

@Blaizzy Blaizzy commented Nov 19, 2025

Summary:
This PR removes the dependency on torch, torchvision, and transformers by porting the necessary processors directly into mlx-vlm. It also restructures pyproject.toml to support optional installations.

Changes:

  • Removed Dependencies: Core installation no longer requires Torch or Transformers.
  • New Extras: Added optional flags for [trainer], [server], and [audio].
  • Refactoring:
    • Replaced mlx-audio with soundfile.
    • Moved audio imports to be lazy-loaded within functions to avoid crashes for users without audio dependencies.
    • Cleaned up redundant imports in utils.py.
  • Docs: Added installation instructions for optional dependencies to the README.

- Added new optional dependencies: `trainer` for dataset tooling and `server` for FastAPI support.
- Updated `audio` dependency to include `soundfile`.
- Enhanced README with a detailed table of optional dependencies and installation commands.
@altaic
Copy link

altaic commented Nov 21, 2025

Sort of related, have you considered replacing py-opencv which pulls in a rather hefty set of deps (120+)? It looks like it's currently only used to load and resize the frames of videos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants