Mid Sweden University

miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Computer-Aided Optically Scanned Document Information Extraction System
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This paper introduced a Computer-Aided Optically Scanned Document Information Extraction System. It could extract information including invoice No., issued date, buyer, etc., from the optically scanned document to meet the demand of customs declaration companies. The system output the structured information to a relational database. In detail, a software architecture for the information extraction of diverse-structure optically scanned document is designed. In this system, the original document is classified firstly. It would put into template-based extraction to improve the extraction performance if its template is pre-defined in the system. Then, a method for image enhancement to improve the image classification is proposed. This method aims to optimize the accuracy of neural network model by extracting the template-related feature and actively removing the unrelated feature. Lastly, the above system is implemented in this paper. This extraction are programed in Python which is a cross-platform languages. This system comprises three parts, classification module, template-based extraction and non-template extraction all of which have APIs and could be ran independently. This feature make this system flexible and easy to customization for the further demand. 445 real-world customs document images were input to evaluate the system. The result revealed that the introduced system ensured the diverse document support with non-template extraction and

reached the overall high performance with template-based extraction showing the goal was basically achieved.

Place, publisher, year, edition, pages
2020. , p. 73
Keywords [en]
information extraction system, image enhancement, image classification, template matching
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:miun:diva-39190Local ID: DT-V20-A2-008OAI: oai:DiVA.org:miun-39190DiVA, id: diva2:1441477
Subject / course
Computer Engineering DT1
Supervisors
Examiners
Available from: 2020-06-17 Created: 2020-06-16 Last updated: 2020-06-17Bibliographically approved

Open Access in DiVA

fulltext(1053 kB)975 downloads
File information
File name FULLTEXT01.pdfFile size 1053 kBChecksum SHA-512
364fb85bc583ab9865ad248c3c98988d277591fe6e4e9388199c6d23dc449b89d4739997871254c9a02d5b3f854140f6d99dfe43dcdbbdc85c43b9cf96aca441
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Mei, Zhijie
By organisation
Department of Information Systems and Technology
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 975 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 397 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf