Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Email Classification with Machine Learning and Word Embeddings for Improved Customer Support
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
2018 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Classifying emails into distinct labels can have a great impact on customer support. By using machine learning to label emails the system can set up queues containing emails of a specific category. This enables support personnel to handle request quicker and more easily by selecting a queue that match their expertise.

This study aims to improve the manually defined rule based algorithm, currently implemented at a large telecom company, by using machine learning. The proposed model should have higher F1-score and classification rate. Integrating or migrating from a manually defined rule based model to a machine learning model should also reduce the administrative and maintenance work. It should also make the model more flexible.

By using the frameworks, TensorFlow, Scikit-learn and Gensim, the authors conduct five experiments to test the performance of several common machine learning algorithms, text-representations, word embeddings and how they work together.

In this article a web based interface were implemented which can classify emails into 33 different labels with 0.91 F1-score using a Long Short Term Memory network.

The authors conclude that Long Short Term Memory networks outperform other non-sequential models such as Support Vector Machines and ADABoost when predicting labels for emails.

Place, publisher, year, edition, pages
2018.
Keywords [en]
Email Classification, Machine Learning, Long Short Term Memory, Natural Language Processing
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-15946OAI: oai:DiVA.org:bth-15946DiVA, id: diva2:1189491
External cooperation
Telenor
Subject / course
Degree Project in Master of Science in Engineering 30.0
Educational program
DVACD Master of Science in Computer Security
Supervisors
Examiners
Available from: 2018-03-12 Created: 2018-03-11 Last updated: 2022-05-12Bibliographically approved

Open Access in DiVA

fulltext(1302 kB)16184 downloads
File information
File name FULLTEXT01.pdfFile size 1302 kBChecksum SHA-512
b0464368f817bf30ce90e4242db93017d09640ddd95e49ad2518324db4ec0d54b52880d7958c54df264c7410939c417d6bed49aedca845eb7768ad6750f1822b
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science and Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 16224 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2380 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf