- Task 1 - Text Localization
- Task 2 - Script identification
- Task 3 - Joint text detection and script identification
method: AntAI-Cognition2020-04-22
Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu
Affiliation: Ant Group & PKU
Email: qingpei.gqp@antgroup.com
Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.
[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.
method: TH2020-04-16
Authors: Tsinghua University and Hyundai Motor Group AIRS Company
Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn
Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.
method: Sogou_OCR2019-11-08
Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su
Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.
Date | Method | Hmean | Precision | Recall | Average Precision | |||
---|---|---|---|---|---|---|---|---|
2020-04-22 | AntAI-Cognition | 84.45% | 88.55% | 80.72% | 77.19% | |||
2020-04-16 | TH | 84.36% | 89.66% | 79.65% | 77.33% | |||
2019-11-08 | Sogou_OCR | 83.93% | 89.95% | 78.66% | 75.91% | |||
2019-08-08 | JDAI | 82.82% | 87.83% | 78.35% | 76.15% | |||
2019-06-02 | NJU-ImagineLab | 82.74% | 86.62% | 79.19% | 76.32% | |||
2019-05-30 | PMTD | 82.12% | 87.05% | 77.72% | 75.22% | |||
2019-06-11 | 4Paradigm-Data-Intelligence | 81.60% | 85.27% | 78.22% | 66.62% | |||
2021-03-21 | OSKDet | 81.43% | 87.66% | 76.02% | 73.09% | |||
2019-05-23 | 4Paradigm-Data-Intelligence | 80.99% | 85.33% | 77.08% | 65.66% | |||
2019-05-08 | Baidu-VIS | 80.65% | 86.31% | 75.68% | 65.15% | |||
2019-11-05 | baseline_maskrcnn | 80.24% | 86.62% | 74.74% | 71.18% | |||
2019-03-23 | PMTD | 80.18% | 85.20% | 75.72% | 72.28% | |||
2019-12-13 | BDN | 79.47% | 82.75% | 76.44% | 63.08% | |||
2019-08-20 | juxinli | 78.51% | 85.13% | 72.84% | 69.66% | |||
2021-11-02 | fpa | 78.48% | 85.09% | 72.82% | 69.62% | |||
2024-03-14 | gts | 78.32% | 89.27% | 69.76% | 68.01% | |||
2021-05-03 | NCU_MSP | 78.23% | 84.54% | 72.79% | 61.57% | |||
2021-03-25 | NCU_MSP | 77.93% | 84.23% | 72.51% | 61.07% | |||
2021-05-17 | NCU_FPN | 77.49% | 80.25% | 74.90% | 59.99% | |||
2022-04-22 | TextBPN++(ResNet-50 with DCN) | 77.48% | 83.74% | 72.10% | 60.47% | |||
2021-05-03 | adapt | 77.39% | 80.96% | 74.13% | 60.05% | |||
2018-12-22 | PKU_VDIG | 77.29% | 78.73% | 75.90% | 71.41% | |||
2018-11-15 | USTC-NELSLIP | 76.85% | 79.33% | 74.51% | 69.04% | |||
2021-12-12 | a | 76.38% | 80.42% | 72.73% | 58.88% | |||
2021-12-12 | b | 76.36% | 80.87% | 72.32% | 58.76% | |||
2020-09-28 | DCLNet | 76.29% | 81.93% | 71.37% | 58.77% | |||
2018-10-29 | Amap-CVLab | 76.08% | 80.91% | 71.79% | 67.72% | |||
2021-03-03 | NCU_MSP_light | 75.82% | 82.54% | 70.12% | 57.88% | |||
2019-03-19 | ccnet single scale | 75.77% | 81.27% | 70.97% | 61.97% | |||
2023-05-22 | DeepSolo++ (ResNet-50) | 75.55% | 86.22% | 67.22% | 65.06% | |||
2018-08-23 | Sogou_MM | 75.13% | 80.35% | 70.56% | 66.33% | |||
2020-10-21 | gccnet-ensemble | 75.13% | 79.25% | 71.41% | 66.18% | |||
2020-10-16 | Drew | 75.09% | 83.41% | 68.29% | 64.53% | |||
2018-11-20 | Pixel-Anchor | 74.79% | 84.24% | 67.24% | 56.83% | |||
2020-12-08 | cascade | 74.77% | 84.68% | 66.94% | 64.23% | |||
2019-03-29 | GNNets (single scale) | 74.55% | 81.23% | 68.89% | 62.05% | |||
2018-12-04 | SPCNet_TongJi & UESTC (multi scale) | 74.13% | 80.61% | 68.62% | 55.20% | |||
2018-11-28 | CRAFT | 74.03% | 80.82% | 68.30% | 55.17% | |||
2019-01-08 | ALGCD_CP | 73.84% | 80.84% | 67.96% | 57.13% | |||
2018-03-12 | ATL Cangjie OCR | 73.52% | 78.88% | 68.84% | 64.30% | |||
2018-01-22 | FOTS_v2 | 73.31% | 83.06% | 65.61% | 59.93% | |||
2017-11-09 | EAST++ | 72.86% | 80.42% | 66.61% | 54.94% | |||
2021-12-31 | TextPMs | 72.49% | 80.95% | 65.64% | 53.30% | |||
2020-12-08 | corner | 72.45% | 81.43% | 65.25% | 62.10% | |||
2018-05-18 | PSENet_NJU_ImagineLab (single-scale) | 72.45% | 77.01% | 68.40% | 52.51% | |||
2023-12-17 | mlt_ch_03 | 72.37% | 81.60% | 65.02% | 53.19% | |||
2022-04-11 | TextBPN++(ResNet-50) | 72.33% | 80.49% | 65.67% | 53.05% | |||
2019-07-15 | stela | 71.50% | 78.68% | 65.52% | 60.26% | |||
2018-12-13 | AutoCV | 71.41% | 72.40% | 70.46% | 62.63% | |||
2018-12-02 | Shape-Aware Based Scene Text Detector (single scale) | 70.39% | 76.55% | 65.16% | 49.79% | |||
2018-12-03 | SPCNet_TongJi & UESTC (single scale) | 70.00% | 73.40% | 66.89% | 49.02% | |||
2018-12-05 | EPTN-SJTU | 67.58% | 75.71% | 61.02% | 49.59% | |||
2019-05-30 | Thesis-SE | 67.22% | 75.68% | 60.47% | 47.30% | |||
2024-04-02 | FPDIoU | 66.04% | 84.19% | 54.33% | 45.86% | |||
2019-09-18 | mask RCNN Augment+ | 66.02% | 80.80% | 55.82% | 51.50% | |||
2017-06-28 | SCUT_DLVClab1 | 64.96% | 80.28% | 54.54% | 50.34% | |||
2017-06-30 | Sensetime OCR | 62.56% | 56.93% | 69.43% | 61.24% | |||
2017-06-29 | SARI_FDU_RRPN_v1 | 62.37% | 71.17% | 55.50% | 50.33% | |||
2017-06-28 | SARI_FDU_RRPN_v0 | 60.66% | 67.07% | 55.37% | 48.76% | |||
2022-01-05 | dbnet_resnet18 | 60.60% | 64.62% | 57.05% | 47.71% | |||
2019-01-03 | YY AI OCR Group | 52.60% | 64.77% | 44.28% | 29.67% | |||
2017-06-30 | TH-DL | 45.97% | 67.75% | 34.78% | 30.88% | |||
2017-06-30 | linkage-ER-Flow | 32.49% | 44.48% | 25.59% | 15.47% | |||
2019-10-14 | TextSnake | 21.31% | 28.99% | 16.85% | 4.89% |