Results - ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

method: Super_KVer2023-03-16

Authors: Lele Xie, Zuming Huang, Boqian Xia, Yu Wang, Yadong Li, Hongbin Wang, Jingdong Chen

Affiliation: Ant Group

Description: An ensemble of both discriminated and generated models. The former is a multimodal method which utilizes text, layout and image, and we train this model with two different sequence lengths, 2048 and 512 respectively. The texts and boxes are generated by independent OCR models. The latter model is an end-to-end method which directly generates K-V pairs for an input image.

[1] Geewook Kim, Teakgyu Hong, et al. OCR-free Document Understanding Transformer. In ECCV 2022.

[2] LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. In ACM MM 2022.

Source code

Source code 2

method: End-to-end document relationship extraction (single-model)2023-03-15

Authors: Huiyan Wu, Pengfei Li, Can Li, Liang Qiao,

Affiliation: Davar-Lab

Description: Our method realized end-to-end information extraction (single-model) through OCR, NER and RE technologies. Text information extracted by OCR and image information are jointly transmitted to NER to identify key and value entities. RE module extracts entity pair relationships through multi-classification.
Where NER and RE are based on LayoutlmV3, and our training dataset is Hust-Cell.

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. In ACM MM 2022.

method: sample-32023-03-16

Authors: Zhenrong Zhang, Lei Jiang, Youhui Guo, Jianshu Zhang, Jun Du

Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research

Email: zzr666@mail.ustc.edu.cn

Description: 1. A table cell detection[1] model is performed to split images into table and non-table regions.
2.We perform the key-value-background classification for each OCR bounding box using the GraphDoc[2] .
3. For the table regions, we merge OCR boxes into table cells and then find the left and top keys for each value table cell according to manual rules.
4. For non-table regions (including plain text outside table cells in table images), we directly use a MLP to predict all keys for each value box.

Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai, “Real-time scene text detection with differentiable binarization,” in AAAI, 2020, pp. 11474–11

Zhenrong Zhang, Jiefeng Ma, Jun Du, Licheng Wang and Jianshu Zhang. Multimodal Pre-training Based on Graph Attention Network for Document Understanding. 2022, TMM.

Ranking Table

Description Paper Source Code

Date	Method	Score1	Score2	Score
2023-03-16	Super_KVer	49.93%	62.97%	56.45%
2023-03-15	End-to-end document relationship extraction (single-model)	43.55%	57.90%	50.73%
2023-03-16	sample-3	42.52%	56.68%	49.60%
2023-03-16	sample-1	42.13%	56.36%	49.25%
2023-03-16	Pre-trained model based fullpipe pair extraction (opti_v3, no inf_aug)	42.17%	55.63%	48.90%
2023-03-16	Pre-trained model based fullpipe pair extraction (opti_v2, no inf_aug)	42.10%	55.56%	48.83%
2023-03-16	Pre-trained model based fullpipe pair extraction (opti_v2, inf_aug)	42.01%	55.50%	48.76%
2023-03-15	Pre-trained model based fullpipe pair extraction (opti_v1)	41.56%	55.34%	48.45%
2023-03-16	Meituan OCR V4	41.10%	54.55%	47.83%
2023-03-16	Meituan OCR V3	40.67%	54.17%	47.42%
2023-03-15	Meituan OCR V2	40.97%	53.47%	47.22%
2023-03-16	submit-trainall	40.65%	52.98%	46.82%
2023-03-16	submit-ocrkie-8to2	40.15%	52.97%	46.56%
2023-03-14	Meituan OCR	39.85%	52.46%	46.15%
2023-03-16	f2	41.07%	50.82%	45.94%
2023-03-16	final	41.05%	50.80%	45.93%
2023-03-16	submit-8finetune2	39.58%	51.93%	45.75%
2023-03-15	new-model	39.38%	48.59%	43.99%
2023-03-15	800-fix2	37.06%	46.46%	41.76%
2023-03-11	add-pplssm	36.45%	43.83%	40.14%
2023-03-16	LayoutLM & STrucText Based Method	33.09%	45.92%	39.51%
2023-03-15	bug-800	34.17%	43.91%	39.04%
2023-03-16	Layoutlmv3	29.81%	41.45%	35.63%
2023-03-15	old-500-fix1	27.64%	35.52%	31.58%
2023-03-15	数据之关联2	23.26%	35.07%	29.16%
2023-03-16	处理t	17.34%	26.92%	22.13%
2023-03-16	refinet	17.11%	26.60%	21.86%
2023-03-12	FirstResult	16.51%	26.12%	21.32%
2023-03-16	不处理t的结果	16.39%	25.56%	20.97%
2023-03-15	表格结构分析+layout的结果_0315	16.25%	25.38%	20.81%
2023-03-15	数据之关联	16.75%	24.48%	20.61%
2023-03-16	Ant-FinCV	14.44%	22.68%	18.56%
2023-03-16	Ant-FinCV	14.32%	22.70%	18.51%
2023-03-16	Ant-FinCV	14.38%	22.62%	18.50%
2023-03-16	Ant-FinCV	14.21%	22.35%	18.28%
2023-03-16	Ant-FinCV	13.79%	21.75%	17.77%
2023-03-15	layoutxlm-relation and ppstructure box level	12.86%	21.56%	17.21%
2023-03-15	vocr	11.71%	19.13%	15.42%
2023-03-13	FIne tuned DONUT	13.06%	17.15%	15.11%
2023-03-14	Layoutlm relation extraction	10.99%	19.22%	15.10%
2023-03-14	layoutxlm and ppstructure	11.63%	18.43%	15.03%
2023-03-15	layoutxlm-relation and ppstructure token level	11.51%	18.26%	14.89%
2023-03-14	vocr	10.31%	17.53%	13.92%
2023-03-16	Ant-FinCV	8.96%	14.84%	11.90%
2023-03-14	e2e	1.77%	3.44%	2.60%
2023-03-13	first commit	1.22%	2.33%	1.78%
2023-03-14	e2e	0.55%	1.01%	0.78%
2023-03-10	test	0.00%	0.00%	0.00%
2023-03-11	test_t1	0.00%	0.00%	0.00%
2023-03-13	intime	0.00%	0.00%	0.00%
2023-03-13	test2	0.00%	0.00%	0.00%
2023-09-14	Graph Attention	0.00%	0.00%	0.00%

Inactive evaluations

method: Super_KVer2023-03-16

method: End-to-end document relationship extraction (single-model)2023-03-15

method: sample-32023-03-16

Ranking Table

Ranking Graphic