Mid Sweden University

miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Low-resource Language Question Answering Systemwith BERT
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
2021 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The complexity for being at the forefront regarding information retrieval systems are constantly increasing. Recent technology of natural language processing called BERT has reached superhuman performance in high resource languages for reading comprehension tasks. However, several researchers has stated that multilingual model’s are not enough for low-resource languages, since they are lacking a thorough understanding of those languages. Recently, a Swedish pre-trained BERT model has been introduced which is trained on significantly more Swedish data than the multilingual models currently available. This study compares both multilingual and Swedish monolingual inherited BERT model’s for question answering utilizing both a English and a Swedish machine translated SQuADv2 data set during its fine-tuning process. The models are evaluated with SQuADv2 benchmark and within a implemented question answering system built upon the classical retriever-reader methodology. This study introduces a naive and more robust prediction method for the proposed question answering system as well finding a sweet spot for each individual model approach integrated into the system. The question answering system is evaluated and compared against another question answering library at the leading edge within the area, applying a custom crafted Swedish evaluation data set. The results show that the fine-tuned model based on the Swedish pre-trained model and the Swedish SQuADv2 data set were superior in all evaluation metrics except speed. The comparison between the different systems resulted in a higher evaluation score but a slower prediction time for this study’s system.

Place, publisher, year, edition, pages
2021. , p. 74
Keywords [en]
BERT, Question Answering system, Reading Comprehension, Low resource language, SQuADv2
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:miun:diva-42317Local ID: DT-V21-A2-005OAI: oai:DiVA.org:miun-42317DiVA, id: diva2:1568732
Subject / course
Computer Engineering DT1
Educational program
Master of Science in Engineering - Computer Engineering TDTEA 300 higher education credits
Supervisors
Examiners
Available from: 2021-06-18 Created: 2021-06-18 Last updated: 2021-06-18Bibliographically approved

Open Access in DiVA

fulltext(1830 kB)681 downloads
File information
File name FULLTEXT01.pdfFile size 1830 kBChecksum SHA-512
301200e1d23e36db699a5af8cda970b159bb6ed7e8f1d7c5e644f058f017f8b2192957abb24fa676d96ef66472e9acc9df2d2eb3b24ba0c26a31af922594d48b
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Jansson, Herman
By organisation
Department of Information Systems and Technology
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 681 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1116 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf