How to use semantic segmentation tokenizer for precomputing tokens for this modality? #24

HITESH2002-JAIN · 2024-08-12T11:26:03Z

I am working on precomputing tokens for each modality in my 4M training pipeline. I’m using grayscale semantic segmentation masks as input, but I’m encountering an issue where the regenerated output does not match the original mask.

This is the code I am using for precomputing tokens

from fourm.vq.vqvae import VQVAE, DiVAE
from PIL import Image
from torchvision import transforms
from fourm.utils import denormalize, IMAGENET_INCEPTION_MEAN, IMAGENET_INCEPTION_STD
from torchvision.transforms import Normalize
transform = transforms.ToTensor()
resize = transforms.Resize((224, 224))

tok = VQVAE.from_pretrained('EPFL-VILAB/4M_tokenizers_semseg_4k_224-448').cuda()
tensors_b3hw = []
for image_path in selected_images :
    image = Image.open(image_path)
    rgb_b3hw = transform(resize(image)).unsqueeze(0)  
    tensors_b3hw.append(rgb_b3hw)
stacked_tensors_b3hw = torch.cat(tensors_b3hw, dim=0).int()
squeezed_tensor = torch.squeeze(stacked_tensors_b3hw, dim=1)
squeezed_tensor.shape
_, _, tokens = tok.encode(squeezed_tensor.cuda())
image_size = rgb_b3hw.shape[-1]
output_rgb_b3hw  = tok.decode_tokens(tokens, image_size=image_size)

The output_rgb_b3hw tensor, which is the regenerated output, consists of 134 channels. However, this does not match the original mask that I passed to the tokenizer. I expected the output to have the same number of channels as the input mask.
Am I missing something in the preprocessing or tokenization process? Is there a step I need to adjust to ensure that the regenerated output matches the original mask in terms of channel dimensions?

Any guidance or suggestions would be appreciated. Thank you!

@garjania Please help me out. Am I doing something wrong here ?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use semantic segmentation tokenizer for precomputing tokens for this modality? #24

How to use semantic segmentation tokenizer for precomputing tokens for this modality? #24

HITESH2002-JAIN commented Aug 12, 2024 •

edited

Loading

How to use semantic segmentation tokenizer for precomputing tokens for this modality? #24

How to use semantic segmentation tokenizer for precomputing tokens for this modality? #24

Comments

HITESH2002-JAIN commented Aug 12, 2024 • edited Loading

HITESH2002-JAIN commented Aug 12, 2024 •

edited

Loading