miun.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Lindén, Johannes
Publications (3 of 3) Show all publications
Lindén, J., Wang, X., Forsström, S. & Zhang, T. (2020). Productify news article classification model with Sagemaker. Advances in Science, Technology and Engineering Systems, 5(2), 13-18
Open this publication in new window or tab >>Productify news article classification model with Sagemaker
2020 (English)In: Advances in Science, Technology and Engineering Systems, ISSN 2415-6698, Vol. 5, no 2, p. 13-18Article in journal (Refereed) Published
Abstract [en]

 News companies have a need to automate and make the process of writing about popular and new events more effective. Current technologies involve robotic programs that fill in values in templates and website listeners that notify editors when changes are made so that the editor can read up on the source change on the actual website. Editors can provide news faster and better if directly provided with abstracts of the external sources and categorical meta-data that supports what the text is about. To make categorical meta-data a reality an auto-categorization model was created and optimized for Swedish articles written by local news journalists. The problem was that it was not scale-able enough to use out of the box. Instead of having this local model that could make good predictions of the text documents, the model is to be deployed in the cloud and an API interface is created. The API can be accessed from the tools where the articles is being written and therefore these services can automatically assign categories to the articles once the journalist is done writing it. To allow scale-ability to several thousands of simultaneously categorized articles and at the same time improving the workflow of deploying new models easier the API is uploaded to Sagemaker where several models are trained and once an improved model is found that model will be used in production in such a way that the system organically adapts to new written articles. An evaluation of Sagemaker API was done and it was concluded that the complexity of this solution was polynomial. 

Keywords
Big data, Data mining, Editors, Journalists, Machine learning, Natural language processing, News events, NLP, Paragraph vectors, Text analysis
National Category
Media and Communications
Identifiers
urn:nbn:se:miun:diva-38834 (URN)10.25046/aj050202 (DOI)
Available from: 2020-04-07 Created: 2020-04-07 Last updated: 2020-04-07Bibliographically approved
Lindén, J., Wang, X., Forsström, S. & Zhang, T. (2019). Bilingual Auto-Categorization Comparison of two LSTM Text Classifiers. In: 2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI): . Paper presented at 8th International Congress on Advanced Applied Informatics, Toyama, Japan, July 7-11 (Main Event) & 12 (Forum), 2019.
Open this publication in new window or tab >>Bilingual Auto-Categorization Comparison of two LSTM Text Classifiers
2019 (English)In: 2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI), 2019Conference paper, Published paper (Refereed)
Abstract [en]

Multi linguistic problems such as auto-categorization is not an easy task. It is possible to train different models for each language, another way to do auto-categorization is to build the model in one base language and use automatic translation from other languages to that base language. Different languages have a bias to a language specific grammar and syntax and will therefore pose problems to be expressed in other languages. Translating from one language into a non-verbal language could potentially have a positive impact of the categorization results. A non-verbal language could for example be pure information in form of a knowledge graph relation extraction from the text. In this article a comparison is conducted between Chinese and Swedish languages. Two categorization models are developed and validated on each dataset. The purpose is to make an auto-categorization model that works for n'importe quel langage. One model is built upon LSTM and optimized for Swedish and the other is an improved Bidirectional-LSTM Convolution model optimized for Chinese. The improved algorithm is trained on both languages and compared with the LSTM algorithm. The Bidirectional-LSTM algorithm performs approximately 20% units better than the LSTM algorithm, which is significant.

National Category
Computer Sciences
Identifiers
urn:nbn:se:miun:diva-37261 (URN)10.1109/IIAI-AAI.2019.00127 (DOI)2-s2.0-85080902973 (Scopus ID)978-1-7281-2627-2 (ISBN)
Conference
8th International Congress on Advanced Applied Informatics, Toyama, Japan, July 7-11 (Main Event) & 12 (Forum), 2019
Projects
SMART (Smarta system och tjänster för ett effektivt och innovativt samhälle)
Available from: 2019-09-19 Created: 2019-09-19 Last updated: 2020-03-19Bibliographically approved
Lindén, J., Forsström, S. & Zhang, T. (2018). Evaluating Combinations of Classification Algorithms and Paragraph Vectors for News Article Classification. In: Maria Ganzha, Leszek Maciaszek, Marcin Paprzycki (Ed.), Proceedings of the 2018 Federated Conference on Computer Science and Information Systems: . Paper presented at 3rd International Workshop on Language Technologies and Applications (LTA'18) at 2018 Federated Conference on Computer Science and Information Systems, FedCSIS 2018; Poznan; Poland; 9 September 2018 through 12 September 2018 (pp. 489-495). Warzaw: Polskie Towarzystwo Informatyczne, Article ID 8511213.
Open this publication in new window or tab >>Evaluating Combinations of Classification Algorithms and Paragraph Vectors for News Article Classification
2018 (English)In: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems / [ed] Maria Ganzha, Leszek Maciaszek, Marcin Paprzycki, Warzaw: Polskie Towarzystwo Informatyczne , 2018, p. 489-495, article id 8511213Conference paper, Published paper (Refereed)
Abstract [en]

News companies have a need to automate and make the process of writing about popular and new events more effective. Current technologies involve robotic programs that fill in values in templates and website listeners that notify editors when changes are made so that the editor can read up on the source change on the actual website. Editors can provide news faster and better if directly provided with abstracts of the external sources and categorical meta-data that supports what the text is about. In this article, the focus is on the importance of evaluating critical parameter modifications of the four classification algorithms Decisiontree, Randomforest, Multi Layer perceptron and Long-Short-Term-Memory in a combination with the paragraph vector algorithms Distributed Memory and Distributed Bag of Words, with an aim to categorise news articles. The result shows that Decisiontree and Multi Layer perceptron are stable within a short interval, while Randomforest is more dependent on the parameters best split and number of trees. The most accurate model is Long-Short-Term-Memory model that achieves an accuracy of 71%.

Place, publisher, year, edition, pages
Warzaw: Polskie Towarzystwo Informatyczne, 2018
Series
Annals of Computer Science and Information Systems, ISSN 2300-5963
National Category
Computer Sciences
Identifiers
urn:nbn:se:miun:diva-34767 (URN)10.15439/2018F110 (DOI)000454652300071 ()2-s2.0-85057226648 (Scopus ID)978-83-949419-7-0 (ISBN)
Conference
3rd International Workshop on Language Technologies and Applications (LTA'18) at 2018 Federated Conference on Computer Science and Information Systems, FedCSIS 2018; Poznan; Poland; 9 September 2018 through 12 September 2018
Projects
SMART (Smarta system och tjänster för ett effektivt och innovativt samhälle)
Available from: 2018-10-23 Created: 2018-10-23 Last updated: 2019-09-09Bibliographically approved
Organisations

Search in DiVA

Show all publications