Mid Sweden University

miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Lexeme Extraction for Wikidata: A proof of concept study for Swedish lexeme extraction
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
2020 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Wikipedia has a problem with organizing and managing data as well as references. As a solution, they created Wikidata to make it possible for machines to interpret these data, with the help of lexemes. A lexeme is an abstract lexical unit which consists of a word’s lemmas and its word class. The object of this paper is to present one possible way to provide Swedish lexeme data to Wikidata. This was implemented in two phases, namely, the first phase was to identify the lemmas and their word classes; the second phase was to process these words to create coherent lexemes. The developed model was able to process large amounts of words from the data source but barely succeeded to generate coherent lexemes. Although the lexemes was supposed to provide an efficient way of data understanding for machines, the obtained results lead to the conclusion that the developed model did not achieve the anticipated results. This is due to the amount of words found in correlation to the words processed. It is needed to find a way to import lexeme data to Wikidata from another data source.

Place, publisher, year, edition, pages
2020. , p. 39
Keywords [en]
Lexeme, Lemma, Wikidata
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:miun:diva-40023Local ID: DT-V20-G3-035OAI: oai:DiVA.org:miun-40023DiVA, id: diva2:1473740
Subject / course
Computer Engineering DT1
Educational program
Master of Science in Engineering - Computer Engineering TDTEA 300 higher education credits
Supervisors
Examiners
Available from: 2020-10-07 Created: 2020-10-07 Last updated: 2020-10-07Bibliographically approved

Open Access in DiVA

fulltext(875 kB)304 downloads
File information
File name FULLTEXT01.pdfFile size 875 kBChecksum SHA-512
1eb652706703a8083b565638f012ebacd23299c14eeac8b9806c925cc9ebb42e3cf18f9f645c6400a3179f2f4b97f5ac991a65314a9164bd6dfbdaced4cb3e4a
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Samzelius, Simon
By organisation
Department of Information Systems and Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 304 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 269 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf