Mittuniversitetet

miun.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Low-resource Language Question Answering Systemwith BERT
Mittuniversitetet, Fakulteten för naturvetenskap, teknik och medier, Institutionen för informationssystem och –teknologi.
2021 (Engelska)Självständigt arbete på avancerad nivå (yrkesexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
Abstract [en]

The complexity for being at the forefront regarding information retrieval systems are constantly increasing. Recent technology of natural language processing called BERT has reached superhuman performance in high resource languages for reading comprehension tasks. However, several researchers has stated that multilingual model’s are not enough for low-resource languages, since they are lacking a thorough understanding of those languages. Recently, a Swedish pre-trained BERT model has been introduced which is trained on significantly more Swedish data than the multilingual models currently available. This study compares both multilingual and Swedish monolingual inherited BERT model’s for question answering utilizing both a English and a Swedish machine translated SQuADv2 data set during its fine-tuning process. The models are evaluated with SQuADv2 benchmark and within a implemented question answering system built upon the classical retriever-reader methodology. This study introduces a naive and more robust prediction method for the proposed question answering system as well finding a sweet spot for each individual model approach integrated into the system. The question answering system is evaluated and compared against another question answering library at the leading edge within the area, applying a custom crafted Swedish evaluation data set. The results show that the fine-tuned model based on the Swedish pre-trained model and the Swedish SQuADv2 data set were superior in all evaluation metrics except speed. The comparison between the different systems resulted in a higher evaluation score but a slower prediction time for this study’s system.

Ort, förlag, år, upplaga, sidor
2021. , s. 74
Nyckelord [en]
BERT, Question Answering system, Reading Comprehension, Low resource language, SQuADv2
Nationell ämneskategori
Datorsystem
Identifikatorer
URN: urn:nbn:se:miun:diva-42317Lokalt ID: DT-V21-A2-005OAI: oai:DiVA.org:miun-42317DiVA, id: diva2:1568732
Ämne / kurs
Datateknik DT1
Utbildningsprogram
Civilingenjör i datateknik TDTEA 300 hp
Handledare
Examinatorer
Tillgänglig från: 2021-06-18 Skapad: 2021-06-18 Senast uppdaterad: 2025-09-25Bibliografiskt granskad

Open Access i DiVA

fulltext(1830 kB)762 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1830 kBChecksumma SHA-512
301200e1d23e36db699a5af8cda970b159bb6ed7e8f1d7c5e644f058f017f8b2192957abb24fa676d96ef66472e9acc9df2d2eb3b24ba0c26a31af922594d48b
Typ fulltextMimetyp application/pdf

Sök vidare i DiVA

Av författaren/redaktören
Jansson, Herman
Av organisationen
Institutionen för informationssystem och –teknologi
Datorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 763 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 1163 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf