E-mail classification with machine learning and word embeddings for improved customer support
2021 (English)In: Neural Computing & Applications, ISSN 0941-0643, E-ISSN 1433-3058, Vol. 33, no 6, p. 1881-1902Article in journal (Refereed) Published
Abstract [en]
Classifying e-mails into distinct labels can have a great impact on customer support. By using machine learning to label e-mails, the system can set up queues containing e-mails of a specific category. This enables support personnel to handle request quicker and more easily by selecting a queue that match their expertise. This study aims to improve a manually defined rule-based algorithm, currently implemented at a large telecom company, by using machine learning. The proposed model should have higher F1-score and classification rate. Integrating or migrating from a manually defined rule-based model to a machine learning model should also reduce the administrative and maintenance work. It should also make the model more flexible. By using the frameworks, TensorFlow, Scikit-learn and Gensim, the authors conduct a number of experiments to test the performance of several common machine learning algorithms, text-representations, word embeddings to investigate how they work together. A long short-term memory network showed best classification performance with an F1-score of 0.91. The authors conclude that long short-term memory networks outperform other non-sequential models such as support vector machines and AdaBoost when predicting labels for e-mails. Further, the study also presents a Web-based interface that were implemented around the LSTM network, which can classify e-mails into 33 different labels. © 2020, The Author(s).
Place, publisher, year, edition, pages
Springer , 2021. Vol. 33, no 6, p. 1881-1902
Keywords [en]
E-mail classification, Long short-term memory, Machine learning, Natural language processing, Adaptive boosting, Brain, Electronic mail, Embeddings, Learning systems, Multimedia systems, Support vector machines, Classification performance, Classification rates, Email classification, Machine learning models, Rule based algorithms, Rule-based models, Text representation, Web-based interface
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
URN: urn:nbn:se:bth-20122DOI: 10.1007/s00521-020-05058-4ISI: 000541326700002Scopus ID: 2-s2.0-85086707161OAI: oai:DiVA.org:bth-20122DiVA, id: diva2:1452273
Note
Open access
2020-07-062020-07-062022-05-04Bibliographically approved