miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating Combinations of Classification Algorithms and Paragraph Vectors for News Article Classification
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.ORCID iD: 0000-0002-1797-1095
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
2018 (English)In: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems / [ed] Maria Ganzha, Leszek Maciaszek, Marcin Paprzycki, Warzaw: Polskie Towarzystwo Informatyczne , 2018, p. 489-495, article id 8511213Conference paper, Published paper (Refereed)
Abstract [en]

News companies have a need to automate and make the process of writing about popular and new events more effective. Current technologies involve robotic programs that fill in values in templates and website listeners that notify editors when changes are made so that the editor can read up on the source change on the actual website. Editors can provide news faster and better if directly provided with abstracts of the external sources and categorical meta-data that supports what the text is about. In this article, the focus is on the importance of evaluating critical parameter modifications of the four classification algorithms Decisiontree, Randomforest, Multi Layer perceptron and Long-Short-Term-Memory in a combination with the paragraph vector algorithms Distributed Memory and Distributed Bag of Words, with an aim to categorise news articles. The result shows that Decisiontree and Multi Layer perceptron are stable within a short interval, while Randomforest is more dependent on the parameters best split and number of trees. The most accurate model is Long-Short-Term-Memory model that achieves an accuracy of 71%.

Place, publisher, year, edition, pages
Warzaw: Polskie Towarzystwo Informatyczne , 2018. p. 489-495, article id 8511213
Series
Annals of Computer Science and Information Systems, ISSN 2300-5963
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:miun:diva-34767DOI: 10.15439/2018F110ISI: 000454652300071Scopus ID: 2-s2.0-85057226648ISBN: 978-83-949419-7-0 (print)OAI: oai:DiVA.org:miun-34767DiVA, id: diva2:1257989
Conference
3rd International Workshop on Language Technologies and Applications (LTA'18) at 2018 Federated Conference on Computer Science and Information Systems, FedCSIS 2018; Poznan; Poland; 9 September 2018 through 12 September 2018
Projects
SMART (Smarta system och tjänster för ett effektivt och innovativt samhälle)Available from: 2018-10-23 Created: 2018-10-23 Last updated: 2019-09-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Lindén, JohannesForsström, StefanZhang, Tingting

Search in DiVA

By author/editor
Lindén, JohannesForsström, StefanZhang, Tingting
By organisation
Department of Information Systems and Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 174 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf