Computer-Aided Optically Scanned Document Information Extraction System
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
This paper introduced a Computer-Aided Optically Scanned Document Information Extraction System. It could extract information including invoice No., issued date, buyer, etc., from the optically scanned document to meet the demand of customs declaration companies. The system output the structured information to a relational database. In detail, a software architecture for the information extraction of diverse-structure optically scanned document is designed. In this system, the original document is classified firstly. It would put into template-based extraction to improve the extraction performance if its template is pre-defined in the system. Then, a method for image enhancement to improve the image classification is proposed. This method aims to optimize the accuracy of neural network model by extracting the template-related feature and actively removing the unrelated feature. Lastly, the above system is implemented in this paper. This extraction are programed in Python which is a cross-platform languages. This system comprises three parts, classification module, template-based extraction and non-template extraction all of which have APIs and could be ran independently. This feature make this system flexible and easy to customization for the further demand. 445 real-world customs document images were input to evaluate the system. The result revealed that the introduced system ensured the diverse document support with non-template extraction and
reached the overall high performance with template-based extraction showing the goal was basically achieved.
Place, publisher, year, edition, pages
2020. , p. 73
Keywords [en]
information extraction system, image enhancement, image classification, template matching
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:miun:diva-39190Local ID: DT-V20-A2-008OAI: oai:DiVA.org:miun-39190DiVA, id: diva2:1441477
Subject / course
Computer Engineering DT1
Supervisors
Examiners
2020-06-172020-06-162020-06-17Bibliographically approved