Description
When using the TensorRT execution provider, closing a feature toggle (e.g., Occluder, FaceParser) does not release the associated model from GPU memory (VRAM). This causes VRAM to be rapidly exhausted after multiple setting changes, even when "Keep Loaded Models in Memory" is disabled.
Environment
- VisoMaster Version: 3.0.0
- Execution Provider: TensorRT
- GPU: NVIDIA (CUDA-enabled)
- OS: Windows/Linux
Steps to Reproduce
- Start VisoMaster with TensorRT provider enabled
- Enable any feature that loads a model (e.g., OccluderEnableToggle, FaceParserEnableToggle)
- Observe VRAM usage increases
- Disable the feature toggle
- Expected: VRAM should be released
- Actual: VRAM remains occupied
- Repeat steps 2-4 multiple times → VRAM is completely exhausted
Root Cause Analysis
This is a regression from version 1.0.0.
In version 1.0.0, the project used a custom TensorRTPredictor class with an explicit cleanup() method that properly released TensorRT engine and context resources:
# From v1.0.0 - models_processor.py
def unload_model(self, model_name_to_unload):
# ...
if isinstance(trt_model, TensorRTPredictor):
trt_model.cleanup() # Explicit cleanup!
# ...
In version 3.0.0, the architecture was refactored to use ONNX Runtime's TensorRT Execution Provider directly. However, the InferenceSession deletion does not immediately release underlying TensorRT resources, and the explicit cleanup mechanism was lost.
Proposed Fix
Add explicit CUDA synchronization and multiple garbage collection/cache clearing passes in the model unloading functions:
Changes needed in app/processors/models_processor.py:
-
unload_model() function:
- Add
torch.cuda.synchronize() and torch.cuda.empty_cache() before deleting the model instance
- Perform 3 passes of
gc.collect() + torch.cuda.synchronize() + torch.cuda.empty_cache() after unloading
-
unload_dfm_model() function:
-
delete_models() and delete_models_dfm() functions:
- Add final cleanup passes after all models are unloaded
-
clear_gpu_memory() function:
- Enhance cleanup with 5 passes of garbage collection and cache clearing
Example implementation:
def unload_model(self, model_name_to_unload):
# ... existing checks ...
with self.model_lock:
unloaded = False
if model_name_to_unload and model_name_to_unload in self.models:
model_instance = self.models[model_name_to_unload]
if model_instance is not None:
# FIX: Force CUDA synchronization before unloading
if torch.cuda.is_available():
torch.cuda.synchronize()
torch.cuda.empty_cache()
self.models[model_name_to_unload] = None
del model_instance
unloaded = True
if unloaded:
# FIX: Multiple passes to ensure complete memory release
for _ in range(3):
gc.collect()
if torch.cuda.is_available():
torch.cuda.synchronize()
torch.cuda.empty_cache()
Workaround (Until Fixed)
- Use CUDA execution provider instead of TensorRT
- Or manually click "CLEAR GPU" button to force release all VRAM
Impact
- Severity: High - causes application to crash or become unusable after multiple model switches
- Affected Users: All users using TensorRT provider
- Regression: Yes - worked correctly in v1.0.0
Additional Context
The issue is not related to the "Keep Loaded Models in Memory" toggle. The problem occurs even when this setting is disabled. Manual "CLEAR GPU" button works because it uses the clear_gpu_memory() function which bypasses the normal unloading checks.
Labels: bug, regression, tensorrt, memory-leak, high-priority
Description
When using the TensorRT execution provider, closing a feature toggle (e.g., Occluder, FaceParser) does not release the associated model from GPU memory (VRAM). This causes VRAM to be rapidly exhausted after multiple setting changes, even when "Keep Loaded Models in Memory" is disabled.
Environment
Steps to Reproduce
Root Cause Analysis
This is a regression from version 1.0.0.
In version 1.0.0, the project used a custom
TensorRTPredictorclass with an explicitcleanup()method that properly released TensorRT engine and context resources:In version 3.0.0, the architecture was refactored to use ONNX Runtime's TensorRT Execution Provider directly. However, the
InferenceSessiondeletion does not immediately release underlying TensorRT resources, and the explicit cleanup mechanism was lost.Proposed Fix
Add explicit CUDA synchronization and multiple garbage collection/cache clearing passes in the model unloading functions:
Changes needed in
app/processors/models_processor.py:unload_model()function:torch.cuda.synchronize()andtorch.cuda.empty_cache()before deleting the model instancegc.collect()+torch.cuda.synchronize()+torch.cuda.empty_cache()after unloadingunload_dfm_model()function:delete_models()anddelete_models_dfm()functions:clear_gpu_memory()function:Example implementation:
Workaround (Until Fixed)
Impact
Additional Context
The issue is not related to the "Keep Loaded Models in Memory" toggle. The problem occurs even when this setting is disabled. Manual "CLEAR GPU" button works because it uses the
clear_gpu_memory()function which bypasses the normal unloading checks.Labels: bug, regression, tensorrt, memory-leak, high-priority