MUSCAT: A Multimodal mUSic Collection for Automatic Transcription of real recordings and image scores

Alejandro Galán-Cuenca

Jose J. Valero-Mas

Juan C. Martínez-Sevilla

Antonio Hidalgo-Centeno

Antonio Pertusa

Jorge Calvo-Zaragoza

Pattern Recognition and Artificial Intelligence Group (PRAIG).

University Institute for Computing Research (IUII), University of Alicante, Spain

This dataset was created for the paper:

MUSCAT: a Multimodal mUSic Collection for Automatic Transcription of real recordings and image scores
Alejandro Galan-Cuenca, Jose J. Valero-Mas, Juan C. Martínez-Sevilla, Antonio Hidalgo-Centeno, Antonio Pertusa, Jorge Calvo-Zaragoza
Accepted for oral presentation in ACM Multimedia, 2024

About the dataset

MUSCAT is an assortment of acoustic recordings, image sheets, and their score-level annotations in several notation formats.

Despite a large number of existing works in the Automatic Music Transcription (AMT) field, there is a shortage of end-to-end Audio-to-Score (A2S) transcription efforts, leading to a lack of benchmark corpora, particularly when dealing with real data.

This dataset comprises almost 80 hours of real recordings with varied instrumentation and polyphony degrees (from piano to orchestral music), 1251 scanned sheets, and 880 symbolic scores from 37 composers, which may also be used in other tasks involving metadata such as instrument identification or composer recognition.

A fragmented subset of this collection exclusively focused on acoustic data for score-level AMT (MUSCUTS assortment) is also presented together with a baseline experimentation using short audio excerpts. Finally, a web-based service is also provided to allow increasing the size of the collections collaboratively.

  • Code and data download
  • License
  • The dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.

    CC BY-NC-SA 4.0 License

  • Acknowledgement
  • This work is part of the I+D+i MultiScore project with code PID2020-118447RA-I00, funded by MCIN/AEI/10.13039/501100011033.

    https://www.ciencia.gob.es/en/