Automated Detection of Fake News in Natural Language Processing: A Comparative Study of TF-IDF and Lexical-Based Stance Detection with Logistic Regression
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
The proliferation of fake news in the digital age has become a critical concern, necessitating effective countermeasures to uphold information integrity and promote media literacy. This thesis addresses the pressing issue of detecting fake news in natural language processing, aiming to contribute to the ongoing battle against misinformation. We delve into the comparative analysis of two prominent methods for fake news detection: the TF-IDF model and Lexical-Based Stance Detection with Logistic Regression.
Our research methodology involves acquiring a diverse dataset from Kaggle, encompassing labeled news articles categorized into various stances such as agree, disagree, discuss, or unrelated. We meticulously preprocess the data, employing techniques like bag-of-words (BoW) and n-gram modeling to extract relevant features while mitigating noise and inconsistencies. Furthermore, we leverage a lexical-based approach to discern the stance of the text regarding specific topics or claims, utilizing sentiment lexicons and dictionaries.
Additionally, we develop and train a sophisticated machine learning classifier to predict text stance based on the extracted lexical features. The performance of our classifier is rigorously evaluated using metrics such as accuracy, precision, recall, and F1-score. By comparing the effectiveness of logistic regression and the TF-IDF model in detecting fake news, we aim to provide valuable insights into the strengths and limitations of these methods for classifying news articles and sentences.
Overall, our research emphasizes the urgent need for reliable fake news detection systems and strives to bridge the gap between information accuracy and reliability in the digital landscape. By evaluating and comparing different approaches, we aim to contribute to the development of robust methodologies that combat misinformation, promote public safety, optimize resources, and restore trust in online platforms. This thesis represents a significant step forward in addressing the challenges posed by fake news and empowering individuals with accurate and trustworthy information.
Place, publisher, year, edition, pages
2024. , p. 60
Keywords [en]
Fake news detection; Machine learning; Feature extraction; Logistic Regression ; Stance Detection
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:bth-26700OAI: oai:DiVA.org:bth-26700DiVA, id: diva2:1883087
Subject / course
DV1478 Bachelor Thesis in Computer Science
Educational program
DVGDT Bachelor Qualification Plan in Computer Science 60.0 hp
Presentation
2024-05-22, 10:15 (English)
Supervisors
Examiners
2024-08-062024-07-082024-08-06Bibliographically approved