Predicting the Helpfulness of Online Product Reviews
2021 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
Review helpfulness prediction has attracted growing attention of researchers that proposed various solutions using Machine Learning (ML) techniques. Most of the studies used online reviews from Amazon to predict helpfulness where each review is accompanied with information indicating how many people found the review helpful. This research aims to analyze the complete process of modelling review helpfulness from several perspectives. Experiments are conducted comparing different methods for representing the review text as well as analyzing the importance of data sampling for regression compared to using non-sampled datasets. Additionally, a set of review, review meta-data and product features are evaluated on their ability to capture the helpfulness of reviews. Two Amazon product review datasets are utilized for the experiments and two of the most widely used machine-learning algorithms, Linear Regression and Convolutional Neural Network (CNN). The experiments empirically demonstrate that the choice of representation of the textual data has an impact on performance with tf-idf and word2Vec obtaining the lowest Mean Squared Error (MSE) values. The importance of data sampling is also evident from the experiments as the imbalanced ratios in the unsampled dataset negatively affected the performance of both models with bias predictions in favor of the majority group of high ratios in the dataset. Lastly, the findings suggest that review features such as unigrams of review text and title, length of review text in words, polarity of title along with rating as review meta-data feature are the most influential features for determining helpfulness of reviews.
Place, publisher, year, edition, pages
2021. , p. 27
Keywords [en]
Review helpfulness prediction, product reviews, machine learning, data sampling, regression
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-22144OAI: oai:DiVA.org:bth-22144DiVA, id: diva2:1595730
Subject / course
PA1445 Kandidatkurs i Programvaruteknik
Educational program
PAGPT Software Engineering
Supervisors
Examiners
2021-09-272021-09-202021-09-27Bibliographically approved