[Bug] Model unloading does not release VRAM properly when using TensorRT provider

## Description

When using the TensorRT execution provider, closing a feature toggle (e.g., Occluder, FaceParser) does not release the associated model from GPU memory (VRAM). This causes VRAM to be rapidly exhausted after multiple setting changes, even when "Keep Loaded Models in Memory" is disabled.

## Environment

- **VisoMaster Version:** 3.0.0
- **Execution Provider:** TensorRT
- **GPU:** NVIDIA (CUDA-enabled)
- **OS:** Windows/Linux

## Steps to Reproduce

1. Start VisoMaster with TensorRT provider enabled
2. Enable any feature that loads a model (e.g., OccluderEnableToggle, FaceParserEnableToggle)
3. Observe VRAM usage increases
4. Disable the feature toggle
5. **Expected:** VRAM should be released
6. **Actual:** VRAM remains occupied
7. Repeat steps 2-4 multiple times → VRAM is completely exhausted

## Root Cause Analysis

This is a **regression from version 1.0.0**. 

In version 1.0.0, the project used a custom `TensorRTPredictor` class with an explicit `cleanup()` method that properly released TensorRT engine and context resources:

```python
# From v1.0.0 - models_processor.py
def unload_model(self, model_name_to_unload):
    # ...
    if isinstance(trt_model, TensorRTPredictor):
        trt_model.cleanup()  # Explicit cleanup!
    # ...
```

In version 3.0.0, the architecture was refactored to use ONNX Runtime's TensorRT Execution Provider directly. However, the `InferenceSession` deletion does not immediately release underlying TensorRT resources, and the explicit cleanup mechanism was lost.

## Proposed Fix

Add explicit CUDA synchronization and multiple garbage collection/cache clearing passes in the model unloading functions:

### Changes needed in `app/processors/models_processor.py`:

1. **`unload_model()` function:**
   - Add `torch.cuda.synchronize()` and `torch.cuda.empty_cache()` before deleting the model instance
   - Perform 3 passes of `gc.collect()` + `torch.cuda.synchronize()` + `torch.cuda.empty_cache()` after unloading

2. **`unload_dfm_model()` function:**
   - Same changes as above

3. **`delete_models()` and `delete_models_dfm()` functions:**
   - Add final cleanup passes after all models are unloaded

4. **`clear_gpu_memory()` function:**
   - Enhance cleanup with 5 passes of garbage collection and cache clearing

### Example implementation:

```python
def unload_model(self, model_name_to_unload):
    # ... existing checks ...
    with self.model_lock:
        unloaded = False
        if model_name_to_unload and model_name_to_unload in self.models:
            model_instance = self.models[model_name_to_unload]
            if model_instance is not None:
                # FIX: Force CUDA synchronization before unloading
                if torch.cuda.is_available():
                    torch.cuda.synchronize()
                    torch.cuda.empty_cache()
                
                self.models[model_name_to_unload] = None
                del model_instance
                unloaded = True

        if unloaded:
            # FIX: Multiple passes to ensure complete memory release
            for _ in range(3):
                gc.collect()
                if torch.cuda.is_available():
                    torch.cuda.synchronize()
                    torch.cuda.empty_cache()
```

## Workaround (Until Fixed)

- Use CUDA execution provider instead of TensorRT
- Or manually click "CLEAR GPU" button to force release all VRAM

## Impact

- **Severity:** High - causes application to crash or become unusable after multiple model switches
- **Affected Users:** All users using TensorRT provider
- **Regression:** Yes - worked correctly in v1.0.0

## Additional Context

The issue is not related to the "Keep Loaded Models in Memory" toggle. The problem occurs even when this setting is disabled. Manual "CLEAR GPU" button works because it uses the `clear_gpu_memory()` function which bypasses the normal unloading checks.

---

**Labels:** bug, regression, tensorrt, memory-leak, high-priority

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Model unloading does not release VRAM properly when using TensorRT provider #238

Description

Environment

Steps to Reproduce

Root Cause Analysis

Proposed Fix

Changes needed in `app/processors/models_processor.py`:

Example implementation:

Workaround (Until Fixed)

Impact

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Model unloading does not release VRAM properly when using TensorRT provider #238

Description

Description

Environment

Steps to Reproduce

Root Cause Analysis

Proposed Fix

Changes needed in app/processors/models_processor.py:

Example implementation:

Workaround (Until Fixed)

Impact

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Changes needed in `app/processors/models_processor.py`: