Open
Description
Hi, I have similar problem to #24, but I'm using shorter audio than 6 seconds.
MWE:
from msclap import CLAP
import torch
import subprocess
with torch.no_grad():
clap_model = CLAP(version = "2023", use_cuda=False)
f = "/home/simon.mandlik/test.wav"
audio_embeddings_1 = clap_model.get_audio_embeddings([f])
audio_embeddings_2 = clap_model.get_audio_embeddings([f])
print(audio_embeddings_1)
print(audio_embeddings_2)
mse = torch.mean((audio_embeddings_1 - audio_embeddings_2)**2)
print(mse)
print(subprocess.check_output(['ffprobe', f, '-hide_banner']))
print(clap_model.args)
Output:
tensor([[-1.5895, -0.9305, 0.0572, ..., 1.6071, -0.0361, 0.6508]])
tensor([[-1.5228, -1.0532, 0.0794, ..., 1.6698, -0.0152, 0.4471]])
tensor(0.0190)
Input #0, wav, from '/home/simon.mandlik/test.wav':
Metadata:
encoder : Lavf61.1.100
Duration: 00:00:06.00, bitrate: 1536 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, 2 channels, s16, 1536 kb/s
b''
Namespace(text_model='gpt2', text_len=77, transformer_embed_dim=768, freeze_text_encoder_weights=True, audioenc_name='HTSAT', out_emb=768, sampling_rate=44100, duration=7, fmin=50, fmax=8000, n_fft=1024, hop_size=320, mel_bins=64, window_size=1024, d_proj=1024, temperature=0.003, num_classes=527, batch_size=1024, demo=False)
Metadata
Metadata
Assignees
Labels
No labels