Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sentiment Analysis Of IMDB Movie Reviews: A comparative study of Lexicon based approach and BERT Neural Network model
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
2023 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Background: Movies have become an important marketing and advertising tool that can influence consumer behaviour and trends. Reading film reviews is an im- important part of watching a movie, as it can help viewers gain a general under- standing of the film. And also, provide filmmakers with feedback on how their work is being received. Sentiment analysis is a method of determining whether a review has positive or negative sentiment, and this study investigates a machine learning method for classifying sentiment from film reviews.

Objectives: This thesis aims to perform comparative sentiment analysis on textual IMDb movie reviews using lexicon-based and BERT neural network models. Later different performance evaluation metrics are used to identify the most effective learning model.

Methods: This thesis employs a quantitative research technique, with data analysed using traditional machine learning. The labelled data set comes from an online website called Kaggle (https://www.kaggle.com/datasets), which contains movie review information. Algorithms like the lexicon-based approach and the BERT neural networks are trained using the chosen IMDb movie reviews data set. To discover which model performs the best at predicting the sentiment analysis, the constructed models will be assessed on the test set using evaluation metrics such as accuracy, precision, recall and F1 score.

Results: From the conducted experimentation the BERT neural network model is the most efficient algorithm in classifying the IMDb movie reviews into positive and negative sentiments. This model achieved the highest accuracy score of 90.67% over the trained data set, followed by the BoW model with an accuracy of 79.15%, whereas the TF-IDF model has 78.98% accuracy. BERT model has the better precision and recall with 0.88 and 0.92 respectively, followed by both BoW and TF-IDF models. The BoW model has a precision and recall of 0.79 and the TF-IDF has a precision of 0.79 and a recall of 0.78. And also the BERT model has the highest F1 score of 0.88, followed by the BoW model having a F1 score of 0.79 whereas, TF-IDF has 0.78.

Conclusions: Among the two models evaluated, the lexicon-based approach and the BERT transformer neural network, the BERT neural network is the most efficient, having a good performance score based on the measured performance criteria.

Place, publisher, year, edition, pages
2023. , p. 60
Keywords [en]
Bag of Words(BoW), Deep Learning, IMDb Movie Reviews, Machine Learning, Natural Language Processing(NLP), Sentiment Analysis, Term Frequency- Inverse Document Frequency(TF-IDF).
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-25144OAI: oai:DiVA.org:bth-25144DiVA, id: diva2:1779708
Subject / course
DV1478 Bachelor Thesis in Computer Science
Educational program
DVGDT Bachelor Qualification Plan in Computer Science 60.0 hp
Presentation
2023-05-26, J1640, campus grasvik, Karlskrona, 10:30 (English)
Supervisors
Examiners
Available from: 2023-07-05 Created: 2023-07-04 Last updated: 2023-07-05Bibliographically approved

Open Access in DiVA

Sentiment Analysis Of IMDB Movie Reviews - A comparative study of Lexicon based approach and BERT Neural Network model(956 kB)3865 downloads
File information
File name FULLTEXT02.pdfFile size 956 kBChecksum SHA-512
700f35bb69302ca320220bac531cf31a979264c47784b190ed2be2c94f082e6437923d287d4c8ef2f8adae35c8e7235c223ba86da91b5910f6f7d7ce8480b647
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 3865 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1895 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf