The paper introduces MR-MT3 as an enhancement to the MT3 model for multi-instrument automatic music transcription. It addresses the issue of instrument leakage by proposing a memory retention mechanism, prior token sampling, and token shuffling. These enhancements are evaluated on the Slakh2100 dataset, showing improved onset F1 scores and reduced instrument leakage. The study also introduces new metrics like the instrument leakage ratio and instrument detection F1 score for comprehensive assessment. The proposed methods aim to maintain musical context across audio segments, improving transcription quality.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania