Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
The crucial parts of text classification with TensorFlow.js and categorisation of news articles
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för programvaruteknik.
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för programvaruteknik.
2020 (Engelska)Självständigt arbete på grundnivå (kandidatexamen), 10 poäng / 15 hpStudentuppsats (Examensarbete)
Abstract [en]

Text classification is a subset of machine learning which is used to classify texts such as tweets, email, news headlines or articles, with tags or categories. As news publishing can have uncertainty in their categorisations, text classification could categorise articles autonomously and distinguish unclear categorisations. The library TensorFlow helps with operations and tools for the machine learning workflow. 

This paper takes focus on the crucial parts of working with machine learning using TensorFlow.js and to what extent this model can categorise a news article. The authors evaluates different models to analyse how optimising the settings will affect the accuracy of the model.

Results of this paper was researched with a literature study of official documentations and peer reviewed reports. An empirical experiment where machine learning models were trained in TensorFlow.js was also performed.

The results showed that the model with the highest accuracy with 87.17% accuracy was trained with 1000 articles using Relu and Softmax activation functions and the Mean squared error loss function. While the model with lowest accuracy had 75.5% using Sigmoid activation functions and Categorical cross-entropy on the 5000 articles training set. Crucial parts for this development were: optimizer function, loss function, batch size, activation functions, training data and test data with labels, normalise function, shapes of layers and computing power.

There are several parts and functions to take in consideration when developing a machine learning model with text classification in TensorFlow.js. The training process needs to be performed multiple times as there are many parameters which has an affect on the model results. The model results can be improved by optimising and finding the best combination between different functions and parameters.

Ort, förlag, år, upplaga, sidor
2020. , s. 29
Nyckelord [en]
TensorFlow.js, Machine learning, Text classification, JavaScript
Nationell ämneskategori
Programvaruteknik
Identifikatorer
URN: urn:nbn:se:bth-19813OAI: oai:DiVA.org:bth-19813DiVA, id: diva2:1442910
Ämne / kurs
PA1445 Kandidatkurs i Programvaruteknik
Utbildningsprogram
PAGWE Webbprogrammering
Handledare
Examinatorer
Tillgänglig från: 2020-06-21 Skapad: 2020-06-17 Senast uppdaterad: 2020-06-21Bibliografiskt granskad

Open Access i DiVA

The crucial parts of text classification with TensorFlow.js and categorisation of news articles(2980 kB)2325 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 2980 kBChecksumma SHA-512
1cd73d75238baa798b9f40caf455c382a289f1848fc9956ddce7029ff5e8b315240882fc2645caf076820965fe9cfd713deec1722372796d69126af38e1deaa0
Typ fulltextMimetyp application/pdf

Av organisationen
Institutionen för programvaruteknik
Programvaruteknik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 2325 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 676 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf