Results - Document Visual Question Answering

method: INF-FormRetriever2025-07-09

Authors: Yinghua Hong,Yichen Yao,Junhan Yang,Jiahe Wan,Yuxin Hong,Mengna Zhang,Wei Chu,Yinghui Xu,Yuan Qi

Affiliation: INF

Description: This method is applicable to large-scale form retrieval tasks. First, Infinity-Parser-7B structures the content of each document image and rewrites the queries. Next, dense retriever(Inf-Retriever-V1-1.5B ), BM25, and SQL-based querying fused by rrf retrieve the most relevant forms. Finally, Qwen2.5-VL-7B generate the final answer based on the retrieved forms.

@misc{infly-ai_2025, title={ inf-retriever-v1 (Revision 5f469d7) }, author={Junhan Yang, Jiahe Wan, Yichen Yao, Wei Chu, Yinghui Xu, Yuan Qi }, year={2025}, doi = { 10.57967/hf/4262 }, publisher = { Hugging Face } url={ https://huggingface.co/infly/inf-retriever-v1}, }

@misc{wang2025infinityparserlayoutaware, title={Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing}, author={Baode Wang and Biao Wu and Weizhen Li and Meng Fang and Yanjie Liang and Zuming Huang and Haozhe Wang and Jun Huang and Ling Chen and Wei Chu and Yuan Qi}, year={2025}, eprint={2506.03197}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2506.03197}, }

method: Mybank-DocRetrieval2021-11-18

Authors: Members

Affiliation: Mybank

Description: We use Bert-large to train a classification task and use the logits as the ranking score to rerank the documents. Then they are filtered by the extracted information from questions. We use a QA model trained on Task1 to extract the answer.

method: Infrrd-RADAR2021-04-01

Authors: JiangLong He, Aditya Kumar Sarda, Deepak Kumar, Cesar Duran

Affiliation: Infrrd.ai

Description: The Infrrd-RADAR (Retrieval of Answers by Document Analysis and Re-ranking) performs OCR on the set of images present in the dataset. The OCR data is utilized with the image to extract the information such as pdc filed date, candidate name, office, party and other key information from each forms. The extracted information is stored in a csv format file. Totally 28 fields are extracted from the forms. The natural language questions are parsed using spaCy. The chunks are categorized into subject, object, and dependency object. The entities are categorized into person, geo-political entity, organization. Using the categorized information, each question is converted into a set of SQL queries. The SQL queries are used with fuzzy-search algorithm to retrieve set of relevant documents. BERT-Large based model is then used to rerank the set of relevant documents. The reranked document ids are used to filter the extracted information. Based on the parsed questions, a particular field is collected and posted as an answer.

Ranking Table

Description Paper Source Code

Date	Method	ANLSL	Retrieval MAP
2025-07-09	INF-FormRetriever	0.9058	89.45%
2021-11-18	Mybank-DocRetrieval	0.7930	80.90%
2021-04-01	Infrrd-RADAR	0.7743	74.66%
2021-04-12	(Baseline) Database	0.7068	71.06%
2021-04-12	(Baseline) Text spotting - BERT	0.4513	72.84%
2020-05-16	PingAn-OneConnect-Gammalab-DQA	0.0000	80.90%
2020-05-05	iFLYTEK-DOCR	0.0000	79.15%

Inactive evaluations

method: INF-FormRetriever2025-07-09

method: Mybank-DocRetrieval2021-11-18

method: Infrrd-RADAR2021-04-01

Ranking Table

Ranking Graphic

Ranking Graphic