Sheet Music Transformer Datasets

Antonio Ríos-Vila Jorge Calvo-Zaragoza Thierry Paquet

Pattern Recognition and Artificial Intelligence Group, University of Alicante, Spain

LITIS Laboratory - EA 4108, Rouen University, France

The datasets of this website were presented in the following paper:

Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription
Antonio Ríos-Vila, Jorge Calvo-Zaragoza, Thierry Paquet
ICDAR, 2024

The implementation of the model is available at GitHub.

About the Datasets

Two different datasets of polyphonic music scores have been used in order to perform our experiments.

GrandStaff

The GrandStaff dataset is a corpus that consists of 53.882 printed images of single-line (or system) pianoform scores, along with their digital score encoding. The dataset is composed of both original works from six authors, from the Humdrum repository, and synthetic augmentations of the music encodings that make it possible to provide a greater variety of musical sequences and patterns. The dataset comes with an official partition, in which the 7.000 original scores are used as a test set, and the 46.882 samples generated from augmentations comprise the training set. The dataset comes with an alternative version that introduces distortions into images in order to make them resemble low quality photocopies. This version is, from here on, referred to as Camera GrandStaff.

GrandStaff Image Camera GrandStaff Image

Quartets

In this paper, we also introduce the Quartets dataset. Quartets is a well-known collection employed in the Audio to Score field. As the dataset provides the Humdrum **kern transcriptions from the excerpts of music, we produced a single-system transcription version of it. The dataset provides pieces that were randomly split from the original audios, namely pieces, into portions of approximately seven seconds, resulting in a total of 38.051 excerpts. These excerpts were rendered into printed music images using the Verovio Tool. Once the music images had been generated, we distorted the image using the same operations as those employed with Camera GrandStaff and included an additional distortion that simulated old printed ink, which contains bleeding and erasing errors. This distorted image was eventually fused with a random texture from a set of images on old paper. We followed the same partitions as those provided in the original dataset. The training set specifically contains 18.162 samples for Haydn, 7.435 samples for Mozart and 12.454 samples for Beethoven. Each corpus is divided into three splits at piece level: train (70\%), validation (15\%), and test (15\%), and are combined in order to retrieve the partitions of the corpus.

GrandStaff Image

Download

You can download parts of the dataset or the entire dataset using the links below:

GrandStaff Quartets
Once downloaded, place the data in the following directory structure:
        ├── data
        │   ├── GrandStaff
        │   │   ├──grandstaff_dataset
                ├──partitions_grandstaff
            ├── Quartets
            │   ├──quartets_dataset
                ├──partitions_quartets
        

Contact

If you have any questions or suggestions, please reach out to us at arios@dlsi.ua.es.