DISCOVERSE unifies real-world captures, 3D AIGC, and any existing 3D assets in formats of 3DGS (.ply), mesh (.obj/.stl), and MJCF physical models (.xml), enabling their use as interactive scene nodes (objects and robots) or the background node (scene). We use 3DGS as a universal visual representation and integrate laser scanning, state-of-the-art generative models, and physically-based relighting to boost the geometry and appearance fidelity of the reconstructed radiance fields.
This repo is tested with Ubuntu 18.04+.
To setup the Python environment for DiffusionLight (Step 3) & Mesh2GS (Step 5), run:
conda create -n mesh2gs python=3.10
conda activate mesh2gs
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # replace your cuda version
pip install LitMesh2GS/submodules/diff-gaussian-rasterization
pip install LitMesh2GS/submodules/simple-knnPlease manually install other dependencies described in requirements.txt.
To setup the Python environment for TRELLIS (Step 1), we recommend to create a new, separate environment following the official guidelines to avoid any conflicts.
Also, install Blender (recommended version: 3.1.2) for Step 4. We strongly recommend running the related scripts (blender_renderer/glb_render.py and blender_renderer/obj_render.py) in the Scripting panel of the Blender executable. We do NOT recommend using Blender Python API (bpy) due to version mismatch issues.
Step 1: Image-to-3D Generation with TRELLIS
Generate object-level interactive scene nodes as high-quality textured mesh from a single RGB image.
Firstly, capture one RGB image of the target object. The object should locate at the center of the image and can not be too small (with an coverage larger than 50% pixels). Note that the object should not be necessarily captured in the exact scene for simulation, we only need to keep the background as clean as possible (easy for instance segmentation) and make the environmental lighting white, uniform, and bright.
Then, run state-of-the-art image-to-3D generation approach to reconstruct textured mesh from the captured RGB image.
-
TRELLIS is the latest, open-source, state-of-the-art 3D generative model that generates high-quality textured meshes, 3DGSs, or radiance fields. We recommend to set up a new environment for TRELLIS and run image-to-3D generation following the official guidelines. We recommend generating textured meshes as
.glbfiles to be compatible with the subsequent lighting estimation, blender relighting, and Mesh2GS steps. Note that, for a quick setup, if you do NOT want to align the appearance of the object with the background, you can directly generate 3DGS (.ply) assets for DISCOVERSE skipping Step3~5. -
For 3D generation with higher quality, we recommend using the commercial software like Deemos Rodin (CLAY), Meshy, TRIPO, etc. All of them have free trials.
Reconstruct the background node as a 3DGS field using scanner or multi-view RGB captures.
We recommend using LixelKity K1 scanner and Lixel CyberColor for generating high-quality 3DGS field to serve as the background node. In cases when the scanner is not available, you can use the vanilla 3DGS for scene reconstruction.
Step 3: Lighting Estimation with DiffusionLight
Estimate HDR environment map from a single RGB image, preparing for Step 4, i.e., the alignment between the object appearance and the reconstructed background node.
Note: If you do NOT want to align the appearance of the object with the background, you can just download an arbitrary HDR map in .exr format from PolyHeaven, and skip the following process and move to Step 4.
If you can not connect to huggingface due to VPN issues, please manually download the pretrained models from this link (Code: 61i2). Then, manually modify the model paths (SD_MODELS, VAE_MODELS, CONTROLNET_MODELS, DEPTH_ESTIMATOR) in DiffusionLight/relighting/argument.py as the absolute path of your downloaded model folders.
Firstly, prepare the input image(s). Capture one RGB image for each target background and resize the input image(s) to 1024x1024. To achieve this, we recommend cropping the image(s) to contain as much background information as possible. As an alternative, you can also padding the image(s) with a black border.
Organize all the processed image(s) into a folder and specify the absolute path of the folder as YourInputPath. Specify YourOutputPath as a folder for saving your results. Then, run by:
cd DiffusionLight
python inpaint.py --dataset YourInputPath --output_dir YourOutputPath
python ball2envmap.py --ball_dir YourOutputPath/square --envmap_dir YourOutputPath/envmap
python exposure2hdr.py --input_dir YourOutputPath/envmap --output_dir YourOutputPath/hdrAnd the final .exr results (saved in YourOutputPath/hdr/) will be used for the subsequent Blender PBR.
Render the mesh of the target object into multi-view images for 3DGS optimization, by uniformly sampling cameras on a sphere and performing (Pre-) physically-based relighting using Blender (bpy) with customized environment HDR map (distant lighting effects).
Note that this is NOT the real PBR functionality, since it simply bakes the lighting to the SH appearance of the 3DGS to mimic the hue of the background scene.
Organize all the hdr maps for (Pre-)PBR into a single folder like:
YourHDRPath
├── hdr_name_0.exr
├── hdr_name_1.exr
├── hdr_name_2.exr
...
└── hdr_name_n.exr
We strongly recommend using .glb 3D mesh assets similar to objaverse. All of the .glb 3D assets to be converted should be put together into a single folder like:
YourInputPath
├── model_or_part_name_0.glb
├── model_or_part_name_1.glb
├── model_or_part_name_2.glb
...
└── model_or_part_name_n.glb
Then, paste and run blender_renderer/glb_render.py in the Scripting panel of the Blender executable and pass into the following arguments as:
--root_in_path YourInputPath
--root_hdr_path YourHDRPath
--root_out_path YourOutputPathThe results will be saved at YourOutputPath, in which each folder (namely {hdr_name_i}_{model_or_part_name_i}) will store the rendered RGB images, depth maps, camera parameters, .obj geometry for one of the 3D models under one of the lightings.
There are several other parameters to tune if the renderings are not satisfactory.
lit_strength: strength of the environment lighting, a larger value leads to a brighter rendering.lens: focal length of the camera. If the object is too small in the rendering, i.e., too many pixels are wasting, try increasing this value. Otherwise, when only a fraction of the object is rendered, try decreasing it.resolution: rendering resolution, default value 512x512, a larger resolution leads to much slower rendering time.
If you are dealing with .obj assets, e.g., robot models, each model will come with several texture and material maps, and the data should be organized into individual folders for each model, as the following:
YourInputPath
├── model_or_part_name_0
│ ├── obj_name_0.obj
│ ├── mtl_name_0.mtl
│ ├── tex_name_0.png
│ └── ...
├── model_or_part_name_1
│ ├── obj_name_1.obj
│ ├── mtl_name_1.mtl
│ ├── tex_name_1.png
│ └── ...
├── model_or_part_name_2
...
└── model_or_part_name_n
The robot models developed by DISCOVER LAB, including MMK2, AirBot, DJI, RM2, etc., can be accessed through this link (Code: 94po).
Then, paste and run blender_renderer/obj_render.py in the Scripting panel of the Blender executable and pass into the following arguments as:
--root_in_path YourInputPath
--root_hdr_path YourHDRPath
--root_out_path YourOutputPathNote the parameter arguments are the same as blender_renderer/glb_render.py.
Convert the camera parameters from Blender rendering to the colmap formats by:
cd blender_renderer
python models2colmap.py --root_path YourOutputPathMake sure to set the intrinsics (i.e., --resolution, --lens, --sensor_size) strictly the same when running obj_render.py / glb_render.py and models2colmap.py.
Convert textured meshes to 3DGSs.
Run Mesh2GS for each 3D asset one-by-one:
cd LitMesh2GS
python train.py -s YourOutputPath/model_or_part_name_i -m YourOutputPath/model_or_part_name_i/mesh2gs --data_device cuda --densify_grad_threshold 0.0002 -r 1The 3DGS results will be saved at a new folder mesh2gs in YourOutputPath/model_or_part_name_i for each 3D asset.
Since 3DGS is memory-inefficient by nature, we recommend to specify --densification_interval to roughly control the amounts of the resulting 3DGS points. A larger value will lead to a sparser 3DGS field consuming less memory.
