Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Language Models to Support Multi-Label Classification of Industrial Data
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0001-8142-9631
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0003-4118-0952
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0003-3567-9300
University College Dublin, Ireland.
Show others and affiliations
2025 (English)In: Proceedings - 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025, Institute of Electrical and Electronics Engineers (IEEE), 2025, p. 45-55Conference paper, Published paper (Refereed)
Abstract [en]

Background:

Multi-label requirements classification is an inherently challenging task, especially when dealing with numerous classes at varying levels of abstraction. The task becomes even more difficult when a limited number of requirements is available to train a supervised classifier.  Zero-shot learning does not require training data and can potentially address this problem.

Objective:

This paper investigates the performance of zero-shot classifiers on a multi-label industrial dataset. The study focuses on classifying requirements according to a hierarchical taxonomy designed to support requirements tracing.

Method:

We compare multiple variants of zero-shot classifiers using different embeddings, including 9 language models (LMs) with a reduced number of parameters (up to 3B), e.g., BERT, and 5 large LMs (LLMs) with a large number of parameters (up to 70B), e.g., Llama. Our ground truth includes 377 requirements and 1968 labels from 6 output spaces. For the evaluation, we adopt traditional metrics, i.e., precision, recall, $F_1$, and $F_\beta$, as well as a novel label distance metric $D_n$. This aims to better capture the classification's hierarchical nature and to provide a more nuanced evaluation of how far the results are from the ground truth.

Results:

1) The top-performing model on 5 out of 6 output spaces is T5-xl, with maximum  $F_\beta = 0.78$ and $D_n = 0.04$, while BERT base outperformed the other models in one case, with maximum $F_\beta = 0.83$ and $D_n = 0.04$. 2) LMs with smaller parameter size produce the best classification results compared to LLMs. Thus, addressing the problem in practice is feasible as limited computing power is needed. 3) The model architecture (autoencoding, autoregression, and sentence-to-sentence) significantly affects the classifier's performance.

Contribution:

We conclude that using zero-shot learning for multi-label requirements classification offers promising results. We also present a novel metric that can be used to select the top-performing model for this problem.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025. p. 45-55
Series
Proceedings of the ... European Conference on Software Maintenance and Reengineering, ISSN 1534-5351
Keywords [en]
multi-label, requirements classification, taxonomy, language models
National Category
Natural Language Processing Software Engineering
Research subject
Software Engineering
Identifiers
URN: urn:nbn:se:bth-27813DOI: 10.1109/SANER64311.2025.00013ISI: 001506888600005Scopus ID: 2-s2.0-105007293644ISBN: 9798331535100 (print)OAI: oai:DiVA.org:bth-27813DiVA, id: diva2:1957172
Conference
32nd IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025, Monteral, Mar 4-7, 2025
Part of project
SERT- Software Engineering ReThought, Knowledge Foundation
Funder
Knowledge Foundation, 20180010Available from: 2025-05-08 Created: 2025-05-08 Last updated: 2025-09-30Bibliographically approved
In thesis
1. Taxonomic Trace Links in Requirements Engineering
Open this publication in new window or tab >>Taxonomic Trace Links in Requirements Engineering
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Background: Software engineering is a knowledge-intensive activity that requires engineers to manage information to collaborate efficiently and effectively. Within Software Engineering, the Requirements Engineering process bridges the knowledge gap between the customer and the development team by eliciting, managing, and communicating product requirements. The traceability of these requirements supports developers in producing higher-quality software that aligns with customer needs. In addition, traceability supports other activities, such as change impact analysis, software quality assurance, and requirements-based verification.

Problem: Despite decades of research on traceability, practical challenges still hinder the adoption of traceability in practice. This signals a need for new ways of practicing traceability that fit real-world needs. 

Goal: Building on previous work, this thesis instantiates, develops, and empirically evaluates Taxonomic Trace Links, a new way to trace requirements to various software artifacts through domain knowledge captured in a taxonomy. 

Method: The studies included in this theses follows mixed research methods, which are case study, systematic mapping studies, validation study, controlled experiments, and focus groups.

Results: The current state of practice in customer-supplier communication shows persistent challenges that we mapped to solutions in the literature. Our literature study shows that traceability through domain-specific taxonomies has not been empirically evaluated. Our development and evaluation of the technical solution for taxonomic trace links show that semi-automation of trace link creation and maintenance is possible. Finally, our empirical evaluation of taxonomic trace links shows that the solution is feasible in practice and can create trace links for multiple purposes.

Conclusion: Traceability between software artifacts has more benefits than currently realized by practitioners. However, current traceability solutions, based on direct trace links, do not appear to be easily adapted in different scenarios to trace different artifacts. Taxonomic trace links are an alternative approach that could overcome the shortcomings of direct trace links. 

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2025. p. 187
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 2025:08
Keywords
requirements, traceability, domain-knowledge, taxonomy
National Category
Software Engineering
Research subject
Software Engineering
Identifiers
urn:nbn:se:bth-28451 (URN)978-91-7295-504-2 (ISBN)
Public defence
2025-10-07, C413A, Karlskrona, 13:00 (English)
Opponent
Supervisors
Available from: 2025-08-07 Created: 2025-08-07 Last updated: 2025-09-30Bibliographically approved

Open Access in DiVA

fulltext(1547 kB)54 downloads
File information
File name FULLTEXT01.pdfFile size 1547 kBChecksum SHA-512
abb9d003b239e5b1c78fddf94575b104ad965f8c7df9ba3f3b02c591f02f8b53fc6b6e906806b6393372fa483c6b4513868896a398f6c9d830b314d9051ecc40
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Abdeen, WaleedUnterkalmsteiner, MichaelWnuk, Krzysztof

Search in DiVA

By author/editor
Abdeen, WaleedUnterkalmsteiner, MichaelWnuk, Krzysztof
By organisation
Department of Software Engineering
Natural Language ProcessingSoftware Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 54 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 477 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf