1314151617181916 of 38
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring the use of LLMs for the selection phase in systematic literature studies
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0002-8674-657X
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0003-3177-6138
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0003-0619-6027
2025 (English)In: Information and Software Technology, ISSN 0950-5849, E-ISSN 1873-6025, Vol. 184, article id 107757Article in journal (Refereed) Published
Abstract [en]

Context: Systematic literature studies, such as secondary studies, are crucial to aggregate evidence. An essential part of these studies is the selection phase of relevant studies. This, however, is time-consuming, resource-intensive, and error-prone as it highly depends on manual labor and domain expertise. The increasing popularity of Large Language Models (LLMs) raises the question to what extent these manual study selection tasks could be supported in an automated manner.

Objectives: In this manuscript, we report on our effort to explore and evaluate the use of state-of-the-art LLMs to automate the selection phase in systematic literature studies.

Method: We evaluated LLMs for the selection phase using two published systematic literature studies in software engineering as ground truth. Three prompts were designed and applied across five LLMs to the studies’ titles and abstracts based on their inclusion and exclusion criteria. Additionally, we analyzed combining two LLMs to replicate a practical selection phase. We analyzed recall and precision and reflected upon the accuracy of the LLMs, and whether the ground truth studies were conducted by early career scholars or by more advanced ones.

Results: Our results show a high average recall of up to 98% combined with a precision of 27% in a single LLM approach and an average recall of 99% with a precision of 27% in a two-model approach replicating a two-reviewer procedure. Further the Llama 2 models showed the highest average recall 98% across all prompt templates and datasets while GPT4-turbo had the highest average precision 72%.

Conclusions: Our results demonstrate how LLMs could support a selection phase in the future. We recommend a two LLM-approach to archive a higher recall. However, we also critically reflect upon how further studies are required using other models and prompts on more datasets to strengthen the confidence in our presented approach. © 2025 The Authors

Place, publisher, year, edition, pages
Elsevier, 2025. Vol. 184, article id 107757
Keywords [en]
Automation, Large language models, Systematic literature studies
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-27884DOI: 10.1016/j.infsof.2025.107757ISI: 001491965200001Scopus ID: 2-s2.0-105004904751OAI: oai:DiVA.org:bth-27884DiVA, id: diva2:1960537
Part of project
SERT- Software Engineering ReThought, Knowledge FoundationGIST – Gaining actionable Insights from Software Testing, Knowledge Foundation
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsKnowledge Foundation, 20180010Knowledge Foundation, 20220235Available from: 2025-05-23 Created: 2025-05-23 Last updated: 2025-06-02Bibliographically approved

Open Access in DiVA

fulltext(1067 kB)28 downloads
File information
File name FULLTEXT01.pdfFile size 1067 kBChecksum SHA-512
8afe8260bd10c36cbd88cb92180d1b2d2ecdd7dd8c7dbd12789913d9ab448f6f2a3882d2f168b3f046c79f860c338bde044205c100a6ee71c975391db3920d33
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Thode, LukasIftikhar, UmarMendez, Daniel

Search in DiVA

By author/editor
Thode, LukasIftikhar, UmarMendez, Daniel
By organisation
Department of Software Engineering
In the same journal
Information and Software Technology
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 28 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 68 hits
1314151617181916 of 38
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf