MILo_rtx50 — CUDA 12.8 / RTX 50 Series Local Compilation and Execution Guide (Ubuntu 24.04 + uv + PyTorch 2.7.1+cu128)

This README documents our local compilation adaptation, key modifications, and reproducible experimental steps for the forked project Anttwo/MILo in the RTX 50 series + CUDA 12.8 environment. Goal: Complete submodule compilation, training, mesh extraction, rendering, and evaluation using uv + venv without Conda.

Environment

OS: Ubuntu 24.04
GPU: RTX 50 Series (Blackwell)
CUDA Toolkit: 12.8 (NVCC /usr/local/cuda-12.8/bin/nvcc)
Python: 3.12.3 (venv management, package management with uv)
PyTorch: 2.7.1+cu128 (official binary, C++11 ABI=1)
C/C++: GCC 13.3
CMake: System version (apt)

Important environment variables (commonly used during training/extraction/rendering):

export NVDIFRAST_BACKEND=cuda
export TORCH_CUDA_ARCH_LIST="12.0+PTX"
export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True,max_split_size_mb:32,garbage_collection_threshold:0.6"
export CUDA_DEVICE_MAX_CONNECTIONS=1
# Mesh regularization grid resolution scaling, smaller value saves VRAM
export MILO_MESH_RES_SCALE=0.3
# (Optional) Triangle chunk size to mitigate nvdiffrast CUDA backend VRAM peaks
export MILO_RAST_TRI_CHUNK=150000

Submodules Installation

We have verified these can be successfully compiled/installed with CUDA 12.8 + PyTorch 2.7.1.

1) Install Gaussian Splatting Submodules (via pip)

pip install submodules/diff-gaussian-rasterization_ms
pip install submodules/diff-gaussian-rasterization
pip install submodules/diff-gaussian-rasterization_gof
pip install submodules/simple-knn
pip install submodules/fused-ssim

Note: nvdiffrast uses JIT compilation (triggered at runtime by PyTorch cpp_extension). If choosing OpenGL(GL) backend, system headers are required: sudo apt install -y libegl-dev libopengl-dev libgles2-mesa-dev ninja-build. We switched to CUDA backend for simplicity: export NVDIFRAST_BACKEND=cuda (no EGL headers needed).

2) Install System Dependencies for `tetra_triangulation` (Delaunay Triangulation)

The original project uses conda to install system-level C/C++ dependencies (cmake/gmp/cgal). Since we use uv for Python packages only, we need to install these C/C++ libraries via apt (system package manager):

# Install C/C++ dependencies via apt (Ubuntu 24.04)
sudo apt update
sudo apt install -y \
  build-essential \
  cmake ninja-build \
  libgmp-dev libmpfr-dev libcgal-dev \
  libboost-all-dev

# (Optional) May be needed:
# sudo apt install -y libeigen3-dev

Notes:

libcgal-dev provides CGAL headers (header-only on Ubuntu 24.04)
libgmp-dev and libmpfr-dev are numerical backends for CGAL
uv only manages Python packages; C/C++ dependencies must be installed via system package managers (apt/brew/pacman)
For macOS: brew install cmake cgal gmp mpfr boost
For Arch Linux: sudo pacman -S cgal gmp mpfr boost cmake base-devel

3) Compile `tetra_triangulation` with ABI Alignment

Important: This module requires ABI alignment with PyTorch 2.7.1 (C++11 ABI=1). We use a header file approach to enforce this.

a) Create ABI enforcement header:

Create submodules/tetra_triangulation/src/force_abi.h:

#pragma once
// Force new ABI before any STL headers
#if defined(_GLIBCXX_USE_CXX11_ABI)
#  undef _GLIBCXX_USE_CXX11_ABI
#endif
#define _GLIBCXX_USE_CXX11_ABI 1

b) Modify source files:

Add #include "force_abi.h" as the first line of:

submodules/tetra_triangulation/src/py_binding.cpp
submodules/tetra_triangulation/src/triangulation.cpp

c) Build and install:

cd submodules/tetra_triangulation
rm -rf build CMakeCache.txt CMakeFiles tetranerf/utils/extension/tetranerf_cpp_extension*.so

# Point to current PyTorch's CMake prefix/dynamic library path
export CMAKE_PREFIX_PATH="$(python - <<'PY'
import torch; print(torch.utils.cmake_prefix_path)
PY
)"
export TORCH_LIB_DIR="$(python - <<'PY'
import os, torch; print(os.path.join(os.path.dirname(torch.__file__), 'lib'))
PY
)"
export LD_LIBRARY_PATH="$TORCH_LIB_DIR:$LD_LIBRARY_PATH"

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="$CMAKE_PREFIX_PATH" .
cmake --build . -j"$(nproc)"

# Install (optional, convenient for editable reference)
uv pip install -e .
cd ../../

Note: For troubleshooting ABI issues, see Key Issue 1 section below.

Key Issue 1: `tetra_triangulation` ABI Mismatch (Resolved)

Symptom Running from tetranerf.utils import extension as ext throws error:

undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs

The trailing RKSs indicates old ABI (_GLIBCXX_USE_CXX11_ABI=0), while our PyTorch 2.7.1 uses new ABI (=1).

Fix Add a header file to force ABI in submodules/tetra_triangulation to stably lock ABI=1:

Create file: src/force_abi.h

#pragma once
// Force new ABI before any STL headers
#if defined(_GLIBCXX_USE_CXX11_ABI)
#  undef _GLIBCXX_USE_CXX11_ABI
#endif
#define _GLIBCXX_USE_CXX11_ABI 1

Modify: Add #include "force_abi.h" as the first line of src/py_binding.cpp and src/triangulation.cpp
```
#include "force_abi.h"
```

Note: This header file approach is sufficient to enforce ABI=1. No additional CMakeLists.txt modifications are needed.

Build Commands (in-source, outputs to package path)

cd submodules/tetra_triangulation
rm -rf build CMakeCache.txt CMakeFiles tetranerf/utils/extension/tetranerf_cpp_extension*.so

# Point to current PyTorch's CMake prefix/dynamic library path
export CMAKE_PREFIX_PATH="$(python - <<'PY'
import torch; print(torch.utils.cmake_prefix_path)
PY
)"
export TORCH_LIB_DIR="$(python - <<'PY'
import os, torch; print(os.path.join(os.path.dirname(torch.__file__), 'lib'))
PY
)"
export LD_LIBRARY_PATH="$TORCH_LIB_DIR:$LD_LIBRARY_PATH"

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="$CMAKE_PREFIX_PATH" .
cmake --build . -j"$(nproc)"

# Install (optional, convenient for editable reference)
uv pip install -e .

Key Issue 2: nvdiffrast GL Backend Missing `EGL/egl.h` (Bypassed)

Option A: sudo apt install -y libegl-dev libopengl-dev libgles2-mesa-dev and continue with GL.
Option B (We adopted): Switch to CUDA backend: export NVDIFRAST_BACKEND=cuda, no EGL header dependency.

Key Issue 3: nvdiffrast CUDA Backend VRAM OOM (Resolved)

Symptom cudaMalloc(&m_gpuPtr, bytes) OOM (error: 2), especially during Mesh regularization phase.

Fix (Two Points)

Replace nvdiff_rasterization implementation in milo/scene/mesh.py:

Support triangle chunking (env variable MILO_RAST_TRI_CHUNK specifies chunk size)
Fix CUDA backend ranges must be on CPU (dr.rasterize(..., ranges=<CPU tensor>))

Modified function (click to expand)

def nvdiff_rasterization(
    camera,
    image_height: int,
    image_width: int,
    verts: torch.Tensor,
    faces: torch.Tensor,
    return_indices_only: bool = False,
    glctx=None,
    return_rast_out: bool = False,
    return_positions: bool = False,
):
    """
    Replacement version equivalent to original function, supports triangle chunking (env: MILO_RAST_TRI_CHUNK),
    and fixes: nvdiffrast CUDA backend's `ranges` must be on CPU.
    """
    import os
    import torch
    import nvdiffrast.torch as dr

    device = verts.device
    dtype = verts.dtype

    cam_mtx = camera.full_proj_transform
    pos = torch.cat([verts, torch.ones([verts.shape[0], 1], device=device, dtype=dtype)], dim=1)
    pos = torch.matmul(pos, cam_mtx)[None]  # [1,V,4]

    faces = faces.to(torch.int32).contiguous()
    faces_dev = faces.to(pos.device)

    H, W = int(image_height), int(image_width)
    chunk = int(os.getenv("MILO_RAST_TRI_CHUNK", "0") or "0")
    use_chunking = chunk > 0 and faces.shape[0] > chunk

    if not use_chunking:
        rast_out, _ = dr.rasterize(glctx, pos=pos, tri=faces_dev, resolution=[H, W])
        bary_coords = rast_out[..., :2]
        zbuf = rast_out[..., 2]
        pix_to_face = rast_out[..., 3].to(torch.int32) - 1
        if return_indices_only:
            return pix_to_face
        _out = (bary_coords, zbuf, pix_to_face)
        if return_rast_out:
            _out += (rast_out,)
        if return_positions:
            _out += (pos,)
        return _out

    z_ndc = (pos[..., 2:3] / (pos[..., 3:4] + 1e-20)).contiguous()

    best_rast, best_depth = None, None
    n_faces, start = int(faces.shape[0]), 0

    def _normalize_tri_id(rast_chunk, start_idx, count_idx):
        tri_raw = rast_chunk[..., 3:4].to(torch.int64)
        if tri_raw.numel() == 0:
            return rast_chunk[..., 3:4]
        maxid = int(tri_raw.max().item())
        if maxid == 0:
            return rast_chunk[..., 3:4]
        if maxid <= count_idx:
            tri_adj = torch.where(tri_raw > 0, tri_raw + start_idx, tri_raw)
        else:
            tri_adj = tri_raw
        return tri_adj.to(rast_chunk.dtype)

    while start < n_faces:
        count = min(chunk, n_faces - start)
        # ranges must be on CPU
        ranges_cpu = torch.tensor([[start, count]], device="cpu", dtype=torch.int32)

        rast_chunk, _ = dr.rasterize(glctx, pos=pos, tri=faces_dev, resolution=[H, W], ranges=ranges_cpu)
        depth_chunk, _ = dr.interpolate(z_ndc, rast_chunk, faces_dev)
        tri_id_adj = _normalize_tri_id(rast_chunk, start, count)

        if best_rast is None:
            best_rast = torch.zeros_like(rast_chunk)
            best_depth = torch.full_like(depth_chunk, float("inf"))

        hit = (tri_id_adj > 0)
        prev_hit = (best_rast[..., 3:4] > 0)
        closer = hit & (~prev_hit | (depth_chunk < best_depth))

        rast_chunk = torch.cat([rast_chunk[..., :3], tri_id_adj], dim=-1)

        best_depth = torch.where(closer, depth_chunk, best_depth)
        best_rast = torch.where(closer.expand_as(best_rast), rast_chunk, best_rast)

        start += count

    rast_out = best_rast
    bary_coords = rast_out[..., :2]
    zbuf = rast_out[..., 2]
    pix_to_face = rast_out[..., 3].to(torch.int32) - 1

    if return_indices_only:
        return pix_to_face

    _output = (bary_coords, zbuf, pix_to_face)
    if return_rast_out:
        _output += (rast_out,)
    if return_positions:
        _output += (pos,)
    return _output

Reduce memory peak at runtime:
- MILO_MESH_RES_SCALE=0.3 (mesh regularization resolution scaling)
- MILO_RAST_TRI_CHUNK=150000 (triangle chunk size)
- --data_device cpu (cameras/data on CPU)

Reproduction Steps (Ignatius)

Data path: ./data/Ignatius Output directory: ./output/Ignatius

1) Training

cd milo
export NVDIFRAST_BACKEND=cuda
export TORCH_CUDA_ARCH_LIST="12.0+PTX"
export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True,max_split_size_mb:32,garbage_collection_threshold:0.6"
export CUDA_DEVICE_MAX_CONNECTIONS=1
export MILO_MESH_RES_SCALE=0.3
export MILO_RAST_TRI_CHUNK=150000

python train.py -s ./data/Ignatius -m ./output/Ignatius \
  --imp_metric outdoor \
  --rasterizer gof \
  --mesh_config verylowres \
  --sampling_factor 0.2 \
  --data_device cpu \
  --log_interval 200

Output (located in ./output/Ignatius)

Trained scene (Gaussians + learnable SDF and mesh regularization state, etc.)
Logs and intermediate files (as configured by script, console prints training progress)

2) Mesh Extraction (SDF)

python mesh_extract_sdf.py \
  -s ./data/Ignatius \
  -m ./output/Ignatius \
  --rasterizer gof \
  --config verylowres \
  --data_device cpu

Output

./output/Ignatius/mesh_learnable_sdf.ply (confirmed to open normally in MeshLab)

3) Rendering

python render.py \
  -m ./output/Ignatius \
  -s ./data/Ignatius \
  --rasterizer gof \
  --eval

Output

Rendered images (train/test views), saved to the rendering subdirectory in the model output directory (as indicated by script output)

4) Image Metrics

python metrics.py -m ./output/Ignatius

Output

Console output of PSNR/SSIM (and corresponding files saved by repo script, located in model directory; based on actual implementation)

5) PLY Format Conversion (Optional)

The extracted PLY mesh can be converted to other common 3D formats (OBJ/GLB) using the clean_convert_mesh.py script for use in various 3D software. The script also provides optional mesh cleaning functionality.

Install Additional Dependencies

pip install pymeshlab trimesh plyfile

Basic Usage

# Basic conversion (outputs PLY, OBJ, GLB)
python clean_convert_mesh.py --in ./output/Ignatius/mesh_learnable_sdf.ply

# Convert and simplify to 300k triangles
python clean_convert_mesh.py --in ./output/Ignatius/mesh_learnable_sdf.ply --simplify 300000

# Clean small components during conversion (default 0.02 = remove components with diameter < 2% bbox diagonal)
python clean_convert_mesh.py --in ./output/Ignatius/mesh_learnable_sdf.ply --keep-components 0.02

# Output only specific formats
python clean_convert_mesh.py --in ./output/Ignatius/mesh_learnable_sdf.ply --no-glb  # Skip GLB
python clean_convert_mesh.py --in ./output/Ignatius/mesh_learnable_sdf.ply --no-obj  # Skip OBJ

# Specify output directory and filename
python clean_convert_mesh.py --in ./output/Ignatius/mesh_learnable_sdf.ply \
  --out-dir ./output/Ignatius/converted \
  --stem mesh_final

Main Features

Format Conversion: Convert PLY to OBJ and GLB formats (suitable for different 3D software and Web display)
Optional Cleaning: Remove duplicate vertices/faces, fix non-manifold edges, remove small floating components
Optional Simplification: Shape-preserving simplification based on Quadric decimation

Output (saved in input file directory by default)

mesh_clean.ply - Converted PLY mesh (with vertex colors)
mesh_clean.obj - OBJ format (Note: OBJ doesn't support vertex colors)
mesh_clean.glb - GLB format (suitable for Web display and import into Blender/Unity etc.)

Our Modifications (Relative to Upstream)

submodules/tetra_triangulation
- Added src/force_abi.h, and #include "force_abi.h" at the first line of src/py_binding.cpp and src/triangulation.cpp: Force use of C++11 new ABI (=1)
milo/scene/mesh.py
- Replaced nvdiff_rasterization:
  - Support MILO_RAST_TRI_CHUNK triangle chunking
  - Fixed CUDA backend ranges must be CPU Tensor
  - Other behavior remains consistent with original function
Runtime Configuration
- Default to nvdiffrast CUDA backend (NVDIFRAST_BACKEND=cuda), avoiding EGL dependency
- Specify TORCH_CUDA_ARCH_LIST="12.0+PTX" for Blackwell
- Reduce peak VRAM: MILO_MESH_RES_SCALE=0.3, MILO_RAST_TRI_CHUNK=150000, --data_device cpu

Common Issues and Troubleshooting

undefined symbol: ... torchCheckFail ... RKSs This is an ABI=0 symbol; please recompile tetra_triangulation with the above patch.
fatal error: EGL/egl.h: No such file or directory If insisting on GL path: sudo apt install -y libegl-dev libopengl-dev libgles2-mesa-dev ninja-build; Or directly use export NVDIFRAST_BACKEND=cuda for CUDA path.
nvdiffrast JIT compilation failure / wrong architecture Confirm TORCH_CUDA_ARCH_LIST="12.0+PTX" is exported, and clear cache: rm -rf ~/.cache/torch_extensions.
VRAM OOM Reduce MILO_MESH_RES_SCALE (e.g., 0.5 → 0.3 → 0.25), enable triangle chunking MILO_RAST_TRI_CHUNK, and use --data_device cpu.

Results Summary (This Ignatius Pipeline)

Training (train.py): Completed, output model directory ./output/Ignatius (contains training state and logs).
Mesh Extraction (mesh_extract_sdf.py): Obtained mesh_learnable_sdf.ply, verified visualization in MeshLab.
Rendering (render.py): Obtained rendered images from train/test views (saved in rendering subdirectory of output directory).
Metrics (metrics.py): Console prints PSNR/SSIM (and saves to model directory, filename based on actual implementation).

For Tanks&Temples evaluation, you can symlink mesh_learnable_sdf.ply as recon.ply, then run evaluation scripts.

License and Acknowledgments

This repository is an adaptation and engineering supplement to the original MILo project in the CUDA 12.8 / RTX 50 environment, retaining the original project license and attribution. Thanks to the original authors and all submodule authors (Tetra-NeRF, nvdiffrast, 3D Gaussian Splatting, etc.) for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
milo		milo
submodules		submodules
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
README_CN.md		README_CN.md
README_old.md		README_old.md
install.py		install.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MILo_rtx50 — CUDA 12.8 / RTX 50 Series Local Compilation and Execution Guide (Ubuntu 24.04 + uv + PyTorch 2.7.1+cu128)

Environment

Submodules Installation

1) Install Gaussian Splatting Submodules (via pip)

2) Install System Dependencies for `tetra_triangulation` (Delaunay Triangulation)

3) Compile `tetra_triangulation` with ABI Alignment

Key Issue 1: `tetra_triangulation` ABI Mismatch (Resolved)

Key Issue 2: nvdiffrast GL Backend Missing `EGL/egl.h` (Bypassed)

Key Issue 3: nvdiffrast CUDA Backend VRAM OOM (Resolved)

Reproduction Steps (Ignatius)

1) Training

2) Mesh Extraction (SDF)

3) Rendering

4) Image Metrics

5) PLY Format Conversion (Optional)

Our Modifications (Relative to Upstream)

Common Issues and Troubleshooting

Results Summary (This Ignatius Pipeline)

License and Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MILo_rtx50 — CUDA 12.8 / RTX 50 Series Local Compilation and Execution Guide (Ubuntu 24.04 + uv + PyTorch 2.7.1+cu128)

Environment

Submodules Installation

1) Install Gaussian Splatting Submodules (via pip)

2) Install System Dependencies for tetra_triangulation (Delaunay Triangulation)

3) Compile tetra_triangulation with ABI Alignment

Key Issue 1: tetra_triangulation ABI Mismatch (Resolved)

Key Issue 2: nvdiffrast GL Backend Missing EGL/egl.h (Bypassed)

Key Issue 3: nvdiffrast CUDA Backend VRAM OOM (Resolved)

Reproduction Steps (Ignatius)

1) Training

2) Mesh Extraction (SDF)

3) Rendering

4) Image Metrics

5) PLY Format Conversion (Optional)

Our Modifications (Relative to Upstream)

Common Issues and Troubleshooting

Results Summary (This Ignatius Pipeline)

License and Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2) Install System Dependencies for `tetra_triangulation` (Delaunay Triangulation)

3) Compile `tetra_triangulation` with ABI Alignment

Key Issue 1: `tetra_triangulation` ABI Mismatch (Resolved)

Key Issue 2: nvdiffrast GL Backend Missing `EGL/egl.h` (Bypassed)

Packages