Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automated Detection of Fake News in Natural Language Processing: A Comparative Study of TF-IDF and Lexical-Based Stance Detection with Logistic Regression
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

The proliferation of fake news in the digital age has become a critical concern, necessitating effective countermeasures to uphold information integrity and promote media literacy. This thesis addresses the pressing issue of detecting fake news in natural language processing, aiming to contribute to the ongoing battle against misinformation. We delve into the comparative analysis of two prominent methods for fake news detection: the TF-IDF model and Lexical-Based Stance Detection with Logistic Regression.

Our research methodology involves acquiring a diverse dataset from Kaggle, encompassing labeled news articles categorized into various stances such as agree, disagree, discuss, or unrelated. We meticulously preprocess the data, employing techniques like bag-of-words (BoW) and n-gram modeling to extract relevant features while mitigating noise and inconsistencies. Furthermore, we leverage a lexical-based approach to discern the stance of the text regarding specific topics or claims, utilizing sentiment lexicons and dictionaries.

Additionally, we develop and train a sophisticated machine learning classifier to predict text stance based on the extracted lexical features. The performance of our classifier is rigorously evaluated using metrics such as accuracy, precision, recall, and F1-score. By comparing the effectiveness of logistic regression and the TF-IDF model in detecting fake news, we aim to provide valuable insights into the strengths and limitations of these methods for classifying news articles and sentences.

Overall, our research emphasizes the urgent need for reliable fake news detection systems and strives to bridge the gap between information accuracy and reliability in the digital landscape. By evaluating and comparing different approaches, we aim to contribute to the development of robust methodologies that combat misinformation, promote public safety, optimize resources, and restore trust in online platforms. This thesis represents a significant step forward in addressing the challenges posed by fake news and empowering individuals with accurate and trustworthy information.

Place, publisher, year, edition, pages
2024. , p. 60
Keywords [en]
Fake news detection; Machine learning; Feature extraction; Logistic Regression ; Stance Detection
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:bth-26700OAI: oai:DiVA.org:bth-26700DiVA, id: diva2:1883087
Subject / course
DV1478 Bachelor Thesis in Computer Science
Educational program
DVGDT Bachelor Qualification Plan in Computer Science 60.0 hp
Presentation
2024-05-22, 10:15 (English)
Supervisors
Examiners
Available from: 2024-08-06 Created: 2024-07-08 Last updated: 2024-08-06Bibliographically approved

Open Access in DiVA

Automated Detection of Fake News in Natural Language Processing: A Comparative Study of TF-IDF and Lexical-Based Stance Detection with Logistic Regression(948 kB)160 downloads
File information
File name FULLTEXT01.pdfFile size 948 kBChecksum SHA-512
7c4c9fb3fd2b8e90d51558a1a92e48b3359910f6a122bde69f218a1387e627c16fcbcb74b4c3810bb179b00c7e2e9cc4ede7066b88517bd90269a8ff1366dd25
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 161 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 252 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf