You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on precomputing tokens for each modality in my 4M training pipeline. I’m using grayscale semantic segmentation masks as input, but I’m encountering an issue where the regenerated output does not match the original mask.
This is the code I am using for precomputing tokens
from fourm.vq.vqvae import VQVAE, DiVAE
from PIL import Image
from torchvision import transforms
from fourm.utils import denormalize, IMAGENET_INCEPTION_MEAN, IMAGENET_INCEPTION_STD
from torchvision.transforms import Normalize
transform = transforms.ToTensor()
resize = transforms.Resize((224, 224))
tok = VQVAE.from_pretrained('EPFL-VILAB/4M_tokenizers_semseg_4k_224-448').cuda()
tensors_b3hw = []
for image_path in selected_images :
image = Image.open(image_path)
rgb_b3hw = transform(resize(image)).unsqueeze(0)
tensors_b3hw.append(rgb_b3hw)
stacked_tensors_b3hw = torch.cat(tensors_b3hw, dim=0).int()
squeezed_tensor = torch.squeeze(stacked_tensors_b3hw, dim=1)
squeezed_tensor.shape
_, _, tokens = tok.encode(squeezed_tensor.cuda())
image_size = rgb_b3hw.shape[-1]
output_rgb_b3hw = tok.decode_tokens(tokens, image_size=image_size)
The output_rgb_b3hw tensor, which is the regenerated output, consists of 134 channels. However, this does not match the original mask that I passed to the tokenizer. I expected the output to have the same number of channels as the input mask.
Am I missing something in the preprocessing or tokenization process? Is there a step I need to adjust to ensure that the regenerated output matches the original mask in terms of channel dimensions?
Any guidance or suggestions would be appreciated. Thank you!
@garjania Please help me out. Am I doing something wrong here ?
The text was updated successfully, but these errors were encountered:
I am working on precomputing tokens for each modality in my 4M training pipeline. I’m using grayscale semantic segmentation masks as input, but I’m encountering an issue where the regenerated output does not match the original mask.
This is the code I am using for precomputing tokens
The output_rgb_b3hw tensor, which is the regenerated output, consists of 134 channels. However, this does not match the original mask that I passed to the tokenizer. I expected the output to have the same number of channels as the input mask.
Am I missing something in the preprocessing or tokenization process? Is there a step I need to adjust to ensure that the regenerated output matches the original mask in terms of channel dimensions?
Any guidance or suggestions would be appreciated. Thank you!
@garjania Please help me out. Am I doing something wrong here ?
The text was updated successfully, but these errors were encountered: