Skip to content

[Bug] Model unloading does not release VRAM properly when using TensorRT provider #238

@Umstead

Description

@Umstead

Description

When using the TensorRT execution provider, closing a feature toggle (e.g., Occluder, FaceParser) does not release the associated model from GPU memory (VRAM). This causes VRAM to be rapidly exhausted after multiple setting changes, even when "Keep Loaded Models in Memory" is disabled.

Environment

  • VisoMaster Version: 3.0.0
  • Execution Provider: TensorRT
  • GPU: NVIDIA (CUDA-enabled)
  • OS: Windows/Linux

Steps to Reproduce

  1. Start VisoMaster with TensorRT provider enabled
  2. Enable any feature that loads a model (e.g., OccluderEnableToggle, FaceParserEnableToggle)
  3. Observe VRAM usage increases
  4. Disable the feature toggle
  5. Expected: VRAM should be released
  6. Actual: VRAM remains occupied
  7. Repeat steps 2-4 multiple times → VRAM is completely exhausted

Root Cause Analysis

This is a regression from version 1.0.0.

In version 1.0.0, the project used a custom TensorRTPredictor class with an explicit cleanup() method that properly released TensorRT engine and context resources:

# From v1.0.0 - models_processor.py
def unload_model(self, model_name_to_unload):
    # ...
    if isinstance(trt_model, TensorRTPredictor):
        trt_model.cleanup()  # Explicit cleanup!
    # ...

In version 3.0.0, the architecture was refactored to use ONNX Runtime's TensorRT Execution Provider directly. However, the InferenceSession deletion does not immediately release underlying TensorRT resources, and the explicit cleanup mechanism was lost.

Proposed Fix

Add explicit CUDA synchronization and multiple garbage collection/cache clearing passes in the model unloading functions:

Changes needed in app/processors/models_processor.py:

  1. unload_model() function:

    • Add torch.cuda.synchronize() and torch.cuda.empty_cache() before deleting the model instance
    • Perform 3 passes of gc.collect() + torch.cuda.synchronize() + torch.cuda.empty_cache() after unloading
  2. unload_dfm_model() function:

    • Same changes as above
  3. delete_models() and delete_models_dfm() functions:

    • Add final cleanup passes after all models are unloaded
  4. clear_gpu_memory() function:

    • Enhance cleanup with 5 passes of garbage collection and cache clearing

Example implementation:

def unload_model(self, model_name_to_unload):
    # ... existing checks ...
    with self.model_lock:
        unloaded = False
        if model_name_to_unload and model_name_to_unload in self.models:
            model_instance = self.models[model_name_to_unload]
            if model_instance is not None:
                # FIX: Force CUDA synchronization before unloading
                if torch.cuda.is_available():
                    torch.cuda.synchronize()
                    torch.cuda.empty_cache()
                
                self.models[model_name_to_unload] = None
                del model_instance
                unloaded = True

        if unloaded:
            # FIX: Multiple passes to ensure complete memory release
            for _ in range(3):
                gc.collect()
                if torch.cuda.is_available():
                    torch.cuda.synchronize()
                    torch.cuda.empty_cache()

Workaround (Until Fixed)

  • Use CUDA execution provider instead of TensorRT
  • Or manually click "CLEAR GPU" button to force release all VRAM

Impact

  • Severity: High - causes application to crash or become unusable after multiple model switches
  • Affected Users: All users using TensorRT provider
  • Regression: Yes - worked correctly in v1.0.0

Additional Context

The issue is not related to the "Keep Loaded Models in Memory" toggle. The problem occurs even when this setting is disabled. Manual "CLEAR GPU" button works because it uses the clear_gpu_memory() function which bypasses the normal unloading checks.


Labels: bug, regression, tensorrt, memory-leak, high-priority

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions