Real Saxophone Recordings for Audio-to-Score Music Transcription

Juan C. Martínez-Sevilla

María Alfaro-Contreras

Jose J. Valero-Mas

Jorge Calvo-Zaragoza

University Institute for Computing Research (IUII), University of Alicante, Spain

This dataset was created for the paper:

Insights into end-to-end audio-to-score transcription with real recordings: A case study with saxophone works
Juan C. Martínez-Sevilla, María Alfaro-Contreras, Jose J. Valero-Mas, Jorge Calvo-Zaragoza
INTERSPEECH, 2023

The code for the paper can be found here.

About the dataset

Despite a large number of existing works in the Automatic Music Transcription (AMT) field, there is a shortage of end-to-end Audio-to-Score (A2S) transcription efforts, leading to a lack of benchmark corpora, particularly when dealing with real data.

We present a compilation of recorded saxophone performances together with their digital music scores.

The collection includes a total of 1,026 recordings of real interpretations played on two different types of saxophones: tenor and alto. Each recording is paired with its corresponding score in the Humdrum **kern format. The compositions in the dataset, which amount to approximately 3 hours in total duration, encompass a variety of melodies, exercises, and scales, as well as a small number of music incipits extracted from the PrIMuS dataset.

Data description in terms of the average duration, mean number of symbol annotations per score, and pitch range for the contemplated saxophone types.
Saxophone Average duration (s) Mean symbols per score Pitch range (transposed to C)
Tenor 10.8 ± 2.8 27.0 ± 8.6 A♭2 - G 5
Alto D♭3 - C 6

  • Recording Process
  • All pieces were recorded in a home studio by musicians proficiently trained in the instrument. Different tempi, styles, and rhythm metrics were considered to increase the variability in the data. A metronome was used to avoid significant tempo deviations. It is important to note that the transposing nature of the saxophone —i.e., music notation is not written at concert pitch— prevents its direct use in A2S since a given note token does not represent the same pitch for all variants of this family of instruments.

    The following equipment was used:

  • Download
  • Click here to download the dataset (TGZ file). It has the following structure:

            real_a2s_sax_dataset.tgz/
            |- fluidsyth/
            |----| alto/
            |--------| .wav files
            |----| tenor/
            |--------| .wav files
            |- mididdsp/
            |----| alto/
            |--------| .wav files
            |----| tenor/
            |--------| .wav files
            |- real/
            |----| alto/
            |--------| .wav files
            |----| tenor/
            |--------| .wav files
            |- krn/
            |----| alto/
            |--------| .krn and .skm files
            |----| tenor/
            |--------| .krn and .skm files
            |- Alto_Sax_Index.csv
            |- Dataset_Index.xlsx
            |- Tenor_Sax_Index.csv
            

    Three indexes are included along with the audio files and their corresponding scores.

  • License
  • The dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.

    CC BY-NC-SA 4.0 License