Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
The crucial parts of text classification with TensorFlow.js and categorisation of news articles
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för programvaruteknik.
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för programvaruteknik.
2020 (engelsk)Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
Abstract [en]

Text classification is a subset of machine learning which is used to classify texts such as tweets, email, news headlines or articles, with tags or categories. As news publishing can have uncertainty in their categorisations, text classification could categorise articles autonomously and distinguish unclear categorisations. The library TensorFlow helps with operations and tools for the machine learning workflow. 

This paper takes focus on the crucial parts of working with machine learning using TensorFlow.js and to what extent this model can categorise a news article. The authors evaluates different models to analyse how optimising the settings will affect the accuracy of the model.

Results of this paper was researched with a literature study of official documentations and peer reviewed reports. An empirical experiment where machine learning models were trained in TensorFlow.js was also performed.

The results showed that the model with the highest accuracy with 87.17% accuracy was trained with 1000 articles using Relu and Softmax activation functions and the Mean squared error loss function. While the model with lowest accuracy had 75.5% using Sigmoid activation functions and Categorical cross-entropy on the 5000 articles training set. Crucial parts for this development were: optimizer function, loss function, batch size, activation functions, training data and test data with labels, normalise function, shapes of layers and computing power.

There are several parts and functions to take in consideration when developing a machine learning model with text classification in TensorFlow.js. The training process needs to be performed multiple times as there are many parameters which has an affect on the model results. The model results can be improved by optimising and finding the best combination between different functions and parameters.

sted, utgiver, år, opplag, sider
2020. , s. 29
Emneord [en]
TensorFlow.js, Machine learning, Text classification, JavaScript
HSV kategori
Identifikatorer
URN: urn:nbn:se:bth-19813OAI: oai:DiVA.org:bth-19813DiVA, id: diva2:1442910
Fag / kurs
PA1445 Kandidatkurs i Programvaruteknik
Utdanningsprogram
PAGWE Web Programming
Veileder
Examiner
Tilgjengelig fra: 2020-06-21 Laget: 2020-06-17 Sist oppdatert: 2020-06-21bibliografisk kontrollert

Open Access i DiVA

The crucial parts of text classification with TensorFlow.js and categorisation of news articles(2980 kB)2325 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 2980 kBChecksum SHA-512
1cd73d75238baa798b9f40caf455c382a289f1848fc9956ddce7029ff5e8b315240882fc2645caf076820965fe9cfd713deec1722372796d69126af38e1deaa0
Type fulltextMimetype application/pdf

Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 2325 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 678 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf