Skip to content

Preparing Train Dataset (mixing strategy) #81

@kthworks

Description

@kthworks

Thank you for your excellent work.

I am in the process of training the EnCodec model and have some questions regarding the mixing strategy.

I am interested in learning more about the entire training dataset. The paper outlines the training/validation set into four parts as follows:
(s1) Sampling a single source from Jamendo with a probability of 0.32;
(s2) Sampling a single source from other datasets with the same probability;
(s3) Mixing two sources from all datasets with a probability of 0.24;
(s4) Mixing three sources from all datasets except music with a probability of 0.12.

Does this mean that the training/validation dataset is composed of segments in the ratio of s1/s2/s3/s4 = 32%/32%/24%/12%? In the appendix, Table 1 indicates that the duration of the Jamendo dataset is 919 hours, but the duration of Common Voice is 9,096 hours. Did you not use all the samples from Common Voices?

I would also like to know more about the process of applying reverberation. Apart from the samples available in DNS, how do you apply reverberation to samples from other datasets? Is there a way to calculate the room impulse response? I would appreciate it if you could let me know where I can refer to any related implementations.

If anyone can provide assistance regarding this matter, please leave a comment, Thank you :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions