Authors: Simon Corbillé, Elisa H Barney Smith

Affiliation: Machine Learning, Luleå Tekniska Universitet

Description: 1 - Data specification
The images are resized and padding to a fix size of pixels (regarding the mean value of height and width). The training data is divided randomly into a train (80%) and a validation set (20%). During the training, we use affine augmentation on train data for data augmentation.
We found empirically, the use of a combination of the cipher dataset improve the recognition performance. For the task 2A, we train the model on a combination of Borg and BNF datasets. For task 2B, we train the model on a combination of Borg, Copiale and BNF. For task 3A, we train the model on a combination of Copiale and BNF. For the task 3B, we train the model on combination of Borg, Copiale, Ramanacoil datasets and consider classes where the number of samples in the training set is upper to 10.

2 - Method
We use a Sequence-to-Sequence model, one of the state-of-the-art architectures for handwriting recognition. It is composed by an encoder, an attention component and a decoder. The encoder uses a CRNN architecture. It is composed by convolutional layers for extract spatial features and LSTM layers for extract temporal features. The attention module focusses the decoders on a specific part of the features extracted by the encoder to predict character by character.
The model is trained with a hybrid loss (CTC loss for the encoder and Cross entropy loss for the decoder).

3 - Results
We evaluate our model on the validation set with the Character Error Rate (CER) metric. In this case a character can be a letter (A, B, C …), character, a symbol (Libra, Saturn …) or a letter with a diacritic. We can note the number of samples in validation set is low thus the results should be interpreted with caution.

We obtain:
Task 2A: 7.23% CER
Task 2B: 0.75% CER
Task 3A: 1.55% CER
Task 3B: 4.12% CER

We can note:
-Symbols are clearly separated and the writing is of good quality.
-Task 2A contains images with a fold at the beginning or at the end of the line.
-The lines are not clearly segmented in task 3B and can contain the previous and/or the next line.

Authors: The HR-Ciphers 2024 organizers

Affiliation: Computer Vision Center

Description: An Long Short-Term Memory (LSTM) Recurrent Neural Network model inspired by Baró et al. "Optical Music Recognition by Long Short-Term Memory Networks", GREC 2017

Authors: The HR-Ciphers 2024 organizers

Affiliation: Computer Vision Center

Description: An Long Short-Term Memory (LSTM) Recurrent Neural Network model inspired by Baró et al. "Optical Music Recognition by Long Short-Term Memory Networks", GREC 2017

Ranking Graphic