Does it support Chinese? #11

lucasjinreal · 2024-11-23T14:24:05Z

when using 2 speaker voice concat extremly closely, would DiaPer able to seperate it out?

fnlandini · 2024-11-23T14:46:52Z

Hi,
Diarization should be language agnostic so in principle it should work on any language. But, we have a few models particularly fine-tuned with data in Mandarin:
On AISHELL-4
On AliMeeting (close-talk)
On AliMeeting far-field
On MagicData-RAMC

Regarding separation, this is a diarization system, not separation. So it will not separate but it can pretty accurately find the parts where the speakers overlap.

lucasjinreal · 2024-11-23T14:53:54Z

Thanks! this is exactly what I want!

I have voice such like: speaker1_speaker2, they are extremly close, which could be easist code I can borrow to resole this issue? (I just need get the time of speaker1 and time of speaker2, it might not just speaker 1 and 2, even might not acurrate before and after had overlap)

PS: my audio is not meeting mic with field array, its normal audio from internet. Will does it work

fnlandini · 2024-11-23T15:12:47Z

I recommend you take a look at this script
You will need to modify this file to have n_attractors: 20, put in models_path the model you want to use (you can try them all). Also rttms_dir: to set the path where you want the output to be generated. If your data are not 16kHz, also modify sampling_rate: 16000
If your data are not far-field, then probably the "close-talk" version will work the best.

lucasjinreal · 2024-11-23T15:19:40Z

Hi, I use this model:

models_path: models/20attractors/SC_LibriSpeech_2spk_adapted1-10_finetuneAISHELL4mix/models
# n_attractors: 10
n_attractors: 20

also, changed into 20,

But got:

rror: Error(s) in loading state_dict for DataParallel:
size mismatch for module.latents2attractors.weights: copying a param with shape torch.Size([128, 10]) from checkpoint, the shape in current model is torch.Size([128, 20]).

When i using customized audio inference, got error:

samplerate: 44100
44100
Resampled to 16000
[[ 1.86886905e-02  1.81345058e-02]
 [-5.93926263e-03 -5.80426085e-03]
 [-4.50214473e-06 -1.67586145e-04]
 ...
 [ 3.55157934e-01  3.55434361e-01]
 [ 2.42458805e-01  2.41786196e-01]
 [ 1.11629131e-01  1.12115863e-01]]
/Users/xx/miniforge3/envs/basenew/lib/python3.10/site-packages/librosa/core/spectrum.py:266: UserWarning: n_fft=512 is too large for input signal of length=2
  warnings.warn(
Traceback (most recent call last):
  File "/Users/xx/dev/codes/ai/xx/xx/vendor/DiaPer/diaper/infer_single_file.py", line 282, in <module>
    Y = transform(Y, args.sampling_rate, args.feature_dim, args.input_transform, False)
  File "/Users/xx/dev/codes/ai/xx/xx/vendor/DiaPer/diaper/common_utils/features.py", line 174, in transform
    Y = np.dot(Y ** 2, mel_basis.T)
ValueError: shapes (1,257,19388) and (257,40) not aligned: 19388 (dim 2) != 257 (dim 0)
Check the output in /Users/xx/dev/codes/ai/xx/xx/vendor/DiaPer/examples

How to resolve this error?

@fnlandini need help!

fnlandini · 2024-12-22T16:30:20Z

Hi @lucasjinreal sorry for the delay.
You can see here that Y needs to be (n_frames, n_bins)-shaped
I don't know where the error starts but here you should be getting something of two dimensions so you can start checking there.

I hope this helps

lucasjinreal · 2024-12-24T13:58:58Z

Hello, I solved the error previously. However, the result is unsatisfactory in my case. I am looking for an alternative method. Do you have any suggestions that you might be aware of?

fnlandini · 2024-12-29T09:55:24Z

I could recommend you DiariZen developed by a colleague of mine

skesiraju closed this as completed Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it support Chinese? #11

Does it support Chinese? #11

lucasjinreal commented Nov 23, 2024

fnlandini commented Nov 23, 2024

lucasjinreal commented Nov 23, 2024 •

edited

Loading

fnlandini commented Nov 23, 2024

lucasjinreal commented Nov 23, 2024 •

edited

Loading

fnlandini commented Dec 22, 2024

lucasjinreal commented Dec 24, 2024 •

edited

Loading

fnlandini commented Dec 29, 2024

Does it support Chinese? #11

Does it support Chinese? #11

Comments

lucasjinreal commented Nov 23, 2024

fnlandini commented Nov 23, 2024

lucasjinreal commented Nov 23, 2024 • edited Loading

fnlandini commented Nov 23, 2024

lucasjinreal commented Nov 23, 2024 • edited Loading

fnlandini commented Dec 22, 2024

lucasjinreal commented Dec 24, 2024 • edited Loading

fnlandini commented Dec 29, 2024

lucasjinreal commented Nov 23, 2024 •

edited

Loading

lucasjinreal commented Nov 23, 2024 •

edited

Loading

lucasjinreal commented Dec 24, 2024 •

edited

Loading