Results - ICDAR 2024 Competition on Handwriting Recognition of Historical Ciphers

method: Baseline - Long Short-Term Memory - Variant A2024-05-27

Authors: The HR-Ciphers 2024 organizers

Affiliation: Computer Vision Center

Description: An Long Short-Term Memory (LSTM) Recurrent Neural Network model inspired by Baró et al. "Optical Music Recognition by Long Short-Term Memory Networks", GREC 2017

Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A. (2018). Optical Music Recognition by Long Short-Term Memory Networks. In: Fornés, A., Lamiroy, B. (eds) Graphics Recognition. Current Trends and Evolutions. GREC 2017. Lecture Notes in Computer Science(), vol 11009. Springer, Cham. https://doi.org/10.1007/978-3-030-02284-6_7

method: cnn_encoder_strings_860_03112024-03-13

Authors: Jiaqianwen

Description: With r34_encoder_step1 as the pretrained model, modify the dictionaries and finetune the parameters of the network. The data has been enhanced, such as rotation.

method: Character Detection and Classification with Transformer Architecture2024-05-07

Authors: Raphael Baena, Syrine Kalleli , Mathieu Aubry

Affiliation: ENPC Imagine

Description: We employ a transformer-based architecture that detects characters in parallel, ensuring fast and accurate predictions. For each character, it provides a boundary box and its likelihood, which are then used for Optical Character Recognition (OCR). Notably, this approach doesn't rely on any language prior.

We first pre-trained the architecture on synthetic data consisting of text lines with characters from various fonts. We use standard classification and bounding box positioning losses for this process.

Then, we can finetune the architecture on real datasets. Unlike synthetic data, these datasets do not include ground truth bounding boxes, but only text transcriptions. Therefore, we can't use the same training losses as before. Instead, we use the pre-trained model to detect the characters' bounding boxes. We then organize the characters based on these bounding boxes and compute the Connectionist Temporal Classification (CTC) loss. During fine-tuning, our approach demonstrates the ability to learn the bounding boxes of new characters.

Ranking Table

Description Paper Source Code

Date	Method	CER
2024-05-27	Baseline - Long Short-Term Memory - Variant A	0.0783
2024-03-13	cnn_encoder_strings_860_0311	0.0925
2024-05-07	Character Detection and Classification with Transformer Architecture	0.1188
2024-05-14	Baseline - Long Short-Term Memory - Variant B	0.1191
2024-05-09	Baseline - Sequence to Sequence	0.1275
2024-03-04	layer3-step2	0.1655
2024-03-05	r34_encoder_step2	0.1768
2024-04-02	cnn_encoder(6layer4heads)_130_0327	0.1943
2024-03-20	cnn_encoder(6layer8heads)_285_0318	0.1979
2024-03-18	cnn_encoder(6layer8heads)_30_0315	0.2021
2024-02-22	ctrans	0.2264
2024-03-04	r34_encoder_step1	0.2361

Inactive evaluations

method: Baseline - Long Short-Term Memory - Variant A2024-05-27

method: cnn_encoder_strings_860_03112024-03-13

method: Character Detection and Classification with Transformer Architecture2024-05-07

Ranking Table

Ranking Graphic