Mid Sweden University

miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
From literature to relationship network: Automated relationship extraction with a natural language processing pipeline for Resource constrained computers.
Mid Sweden University, Faculty of Science, Technology and Media, Department of Computer and Electrical Engineering (2023-).
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [sv]

Målet med denna studie är att skapa ett socialt nätverk med användning av NER (Named Entity Recognition) och kärnreferens AI-modeller. Detta görs i förhoppningen att visa fördelar och nackdelar med att implementera en ny funktion i anteckningsprogramvara. Papperet fokuserar i hög grad på att använda Python-biblioteket Spacy och dess verktyg för att slutföra denna uppgift. En prototyp-pipeline kommer att skapas och testas för att hitta eventuella svårigheter med att använda AI NLP (naturlig språkbehandling) på en enkel bärbar dator. Resultaten samlas in med hjälp av F-score och en timer genom att använda osorterad text från populära romaner och skicka texten genom en pipeline som kommer att skapa ett dokumentobjekt. Det resulterande objektet kommer att sökas igenom med en matchningsfunktion för att hitta relationer mellan entiteter som är märkta som PERSON. Resultaten från prototypen visar att på grund av låg beräkningskraft tar processen att sortera text upp till några timmar för stora romaner (200 000 ord). Med en lägre läshastighet tar det längre tid för större romaner. Dock kan mindre romaner (20 000 ord) ta cirka 13 minuter. Studien visar svagheterna och användbarheten med att kunna använda AI lokalt för att skapa ett fungerande socialt nätverk.

Abstract [en]

The objective of this study is to create a social network with the usage of NER (Named Entity Recognition) and coreference AI models. This is being done in the hopes of showing the pros and cons of implementing a new function in note-taking software’s. This thesis focuses heavily on using the python library Spacy and its tools to complete this task. A prototype pipeline will be created and tested to find any difficulties with using AI NLP (natural language processing) on a simple laptop. Results are gathered with the help of f-score and a timer by using unsorted text from popular novels and passing the text through a pipeline which will create a doc object. The resulting object will be searched through with a matcher function, to find relations between PERSON labeled entities. The results from the prototype are, due to low computational power the process to sort text takes up to a few hours for large novels (200 000 words). With a lowering reading speed, the larger the novel. Though on smaller novels (20 000 words) can take around 13 minutes. The study shows the weaknesses and usefulness of being able to locally use AI to create a functional social network.

Place, publisher, year, edition, pages
2024. , p. 47
Keywords [en]
NER, coreference, spacy, social network, python
Keywords [sv]
NER, coreference, spacy, sociala nätverk, python
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:miun:diva-52348Local ID: DT-V24-G3-008OAI: oai:DiVA.org:miun-52348DiVA, id: diva2:1894393
Subject / course
Computer Engineering DT1
Educational program
Master of Science in Engineering - Computer Engineering TDTEA 300 higher education credits
Supervisors
Examiners
Available from: 2024-09-03 Created: 2024-09-03 Last updated: 2024-09-03Bibliographically approved

Open Access in DiVA

fulltext(839 kB)81 downloads
File information
File name FULLTEXT01.pdfFile size 839 kBChecksum SHA-512
483a5c691b1f71dc441bff88ce504f8fc978e8c56ea98a9b98b126e159f20979a7bdd4bad6b1eb4eb89ba549a5edd386d54eefc24a485f8aa97b5f512be2d1d0
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Kemell, Theo
By organisation
Department of Computer and Electrical Engineering (2023-)
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 81 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 80 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf