LTX Video Generator for Mac

A beautiful, native macOS application for generating AI videos with synchronized audio from text prompts using the LTX-2 model, running natively on Apple Silicon with MLX.

Features

Native macOS App - Built with SwiftUI for a seamless Mac experience
Apple Silicon Native - Uses MLX framework for optimal performance on M-series chips
Text-to-Video Generation - Transform text prompts into video clips
Image-to-Video - Animate images into videos
Built-in Audio Generation - Available model variants generate synchronized audio with video automatically
Voiceover Narration - Add TTS voiceover using ElevenLabs (cloud) or MLX-Audio (local)
Background Music - Generate instrumental music with 54 genre presets via ElevenLabs Music API
Auto Package Installer - Missing Python packages are detected and can be installed with one click
Generation Queue - Queue multiple generations with real-time progress tracking
History Management - Browse, preview, and manage all your generated videos
Presets - Save and load generation parameter presets
Customizable Parameters - Fine-tune resolution, frames, steps, guidance scale, and more

Requirements

macOS 14.0 or later
Apple Silicon Mac (M1, M2, M3, M4 series)
32GB RAM minimum (64GB+ recommended for higher resolutions)
Python 3.10+ installed (via Homebrew, pyenv, or system)
~20-42GB disk space for model weights (depends on selected model)

Installation

1. Download the App

Download the latest release from the Releases page.

2. First Launch Setup

Open LTX Video Generator
Go to Preferences (⌘,)
Click Auto Detect to find your Python installation, or manually set the path
Click Validate Setup - the app will check for required packages

3. Install Python Packages

If packages are missing, the app will show an "Install Missing Packages" button. Click it to automatically install:

mlx mlx-vlm mlx-video-with-audio transformers safetensors huggingface_hub numpy opencv-python tqdm

Or install manually:

pip install mlx mlx-vlm mlx-video-with-audio transformers safetensors huggingface_hub numpy opencv-python tqdm

The mlx-video-with-audio package is available on PyPI and provides the unified audio-video generation.

4. First Generation - Model Download

Important: On first generation, the app downloads your selected model from Hugging Face. This is a one-time download that may take 15-30 minutes depending on model size and internet connection.

The model is cached in ~/.cache/huggingface/ and will not be re-downloaded on subsequent runs.

Progress is shown in the app during download.

Available models:

LTX-2 Unified (notapalindrome/ltx2-mlx-av, ~42GB)
LTX-2.3 Distilled Q4 (dgrauet/ltx-2.3-mlx-distilled-q4, ~19.4GB)

Usage

Enter a descriptive prompt in the text field
Adjust parameters using presets or manual controls
Click Generate to start
Watch progress in the Queue sidebar
Find completed videos in your configured output directory (default: Application Support)

Gemma Prompt Enhancement

When enabled in Settings > Generation, Gemma rewrites your prompt before generation—expanding short descriptions into detailed, LTX-2–optimized prompts with visuals, audio, camera movement, and style. Use the Preview enhanced prompt button to see the rewritten prompt before generating.

Note: This enhancer is optional.
The core text encoder used for generation embeddings is still required even when prompt enhancement is off.

Go to Settings > Generation
Turn on Enable Gemma Prompt Enhancement
First run downloads the Gemma enhancer (~7GB)
In the prompt view, expand Prompt Enhancement (Gemma) and adjust sliders (Repetition Penalty, Top-P) if desired
Click Preview enhanced prompt to see the enhanced version before generating
Generate as usual—the enhanced prompt is used automatically

If enhancement fails for any reason, generation automatically falls back to your original prompt.

Tips for Better Results

Be descriptive: "A river flowing through a misty forest at dawn" works better than "river forest"
Use camera directions: "The camera slowly pans across..."
Specify lighting: "golden hour lighting", "dramatic shadows"
Include motion: "waves crashing", "leaves falling"

For more detailed, copy-paste-ready prompts, see Example Prompts.

Audio Features

Built-in Audio (Default)

Selected models generate synchronized audio alongside video automatically. No additional configuration needed - just generate and your video will have audio.

For best speech/lip-sync alignment, use 24 FPS.

You can still layer additional voiceover or background music on top of the built-in audio if desired.

Voiceover / Narration

Add text-to-speech voiceover to your videos:

Expand Voiceover / Narration in the generation view
Choose your source: MLX-Audio (local, free) or ElevenLabs (cloud, requires API key)
Select a voice from the dropdown
Enter your narration text
Audio generates with your video or can be added later from History

Background Music

Add AI-generated instrumental music (requires ElevenLabs API key):

Expand Background Music in the generation view
Toggle Generate background music
Choose from 54 genre presets:
- Electronic: EDM, House, Techno, Ambient, Synthwave, etc.
- Hip-Hop/R&B: Trap, Lo-Fi, Boom Bap, Soul, etc.
- Rock: Classic, Alternative, Indie, Metal, etc.
- Pop: Modern, Indie, Dance, Acoustic
- Jazz/Blues: Smooth Jazz, Bebop, Lounge, Blues
- Classical/Cinematic: Orchestral, Piano, Epic, Tense, Uplifting
- World: Latin, Reggae, Afrobeat, Middle Eastern, Asian
- Country/Folk: Modern, Classic, Acoustic, Indie
- Functional: Corporate, Motivational, Relaxing, Suspense, Action, Romantic, etc.

Music automatically matches your video length and is mixed at background volume (30%) or ducked further (20%) when combined with voiceover.

Adding Audio to Existing Videos

Right-click any video thumbnail in Video Archive and select Add Audio to add voiceover, music, or both to previously generated videos.

Example

Here's an example video generated with LTX Video Generator:

E0B09876-A6BE-4A70-A1F3-09D7152F5003.mp4

Prompt used:

Create a 15-second cinematic product commercial for a sleek, premium TIME MACHINE device called "ChronoShift One."

Overall style: glossy tech product ad, filmed in 4K, smooth dolly and slider shots, soft studio lighting, subtle retro‑futuristic aesthetic (think brushed aluminum, glowing rings, clean UI). The time machine looks like a compact desktop appliance about the size of a toaster: brushed metal body, circular time dial with glowing blue light, small display, and a single illuminated control knob.

Example (X/Twitter Link)

And a second run produced this one:

1BB3F818-F52F-4E5F-B52F-2DB9C18717FE.mp4

Open X/Twitter post

Prompt used:

Scene tone: quiet, reflective, fragmented memory. Cinematic realism, muted natural colors. Overcast but DRY weather. No rain, no raindrops, no wet falling precipitation.

START FRAME (0-2.5s)
Extreme close-up (85mm) of the elderly man's face. He breathes slowly. A tiny tremor in the lower eyelid. Strands of white hair drift gently in a light breeze.
Dialogue (man, barely above a whisper):
"I remember."

Motion: micro push-in only.

JUMP CUT 1 (2.5-5s)
Hard cut to an extreme close-up of his hands: weathered fingers rubbing a small object (a coin / pebble / ring) in his palm.
Dialogue (man):
"Not the day..."

Motion: hands move slowly, deliberately.

JUMP CUT 2 (5-7.5s)
Hard cut to close-up (50-85mm) of his boots stepping into soft mud at the lake edge. The movement is careful, almost hesitant. No splashing, just a quiet press into wet ground.
Dialogue (man):
"The feeling."

Motion: one slow step, then stillness.

JUMP CUT 3 (7.5-10s)
Hard cut to close-up of the lake surface: perfectly still water with faint ripples spreading outward (from a dropped pebble or a gentle touch).
Dialogue (man):
"It stayed."

Building from Source

# Clone the repository
git clone https://github.com/james-see/ltx-video-mac.git
cd ltx-video-mac

# Open in Xcode
open LTXVideoGenerator/LTXVideoGenerator.xcodeproj

# Or build from command line
./scripts/build-local.sh

Technical Details

Frontend: SwiftUI
Python Bridge: Subprocess execution with progress streaming
ML Framework: MLX (Apple's machine learning framework)
Models:
- LTX-2 Unified (~42GB, synchronized audio+video)
- LTX-2.3 Distilled Q4 (~19.4GB, synchronized audio+video)
Precision: bfloat16

Architecture

Generation uses a 2-stage pipeline:

Stage 1: Generate at half resolution
Stage 2: Upsample and refine to full resolution

Troubleshooting

"Model download stuck"

The download progress updates every 1%. Download time depends on selected model size (~19.4GB or ~42GB). Be patient.

"Out of memory"

Reduce resolution (512x320 is fastest)
Reduce frame count (25/33/49 recommended)
Use 24 FPS
Set VAE tiling to aggressive
Close other applications
32GB RAM minimum, 64GB recommended

"Python not found"

Install Python via Homebrew: brew install [email protected]
Or use pyenv: pyenv install 3.12
Then click "Auto Detect" in Preferences

"LTX 2.3 conversion / LoRA compatibility"

This app supports multiple AV model repos, including notapalindrome/ltx2-mlx-av and dgrauet/ltx-2.3-mlx-distilled-q4.
Converting additional upstream checkpoints can require package-level updates in mlx-video-with-audio before they run reliably here.
Standard LTX LoRA workflows are not guaranteed to transfer directly to the MLX-converted AV path without conversion tooling support.

License

MIT License - see LICENSE for details.

Acknowledgments

Lightricks for the LTX-2 model
mlx-video-with-audio for unified audio-video generation
MLX Community for the MLX-converted weights
Blaizzy/mlx-video for the original MLX video generation code
Hugging Face for model hosting

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
.github		.github
LTXVideoGenerator		LTXVideoGenerator
docs		docs
scripts		scripts
.gitignore		.gitignore
AppIcon_new.aseprite		AppIcon_new.aseprite
CHANGELOG.md		CHANGELOG.md
EXAMPLES.md		EXAMPLES.md
LICENSE		LICENSE
README.md		README.md

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LTX Video Generator for Mac

Features

Requirements

Installation

1. Download the App

2. First Launch Setup

3. Install Python Packages

4. First Generation - Model Download

Usage

Gemma Prompt Enhancement

Tips for Better Results

Audio Features

Built-in Audio (Default)

Voiceover / Narration

Background Music

Adding Audio to Existing Videos

Example

Example (X/Twitter Link)

Building from Source

Technical Details

Architecture

Troubleshooting

"Model download stuck"

"Out of memory"

"Python not found"

"LTX 2.3 conversion / LoRA compatibility"

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 86

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages