Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating Large Language Models for User Story Mining in Technology News
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Background. Starting requirements elicitation for a project remains a significantchallenge in software development; most time organizations rely on structured inputsor direct stakeholder interaction to derive their user stories. Unstructured sources,such as news articles, represent a potentially rich but underutilized resource foridentifying emerging user needs and technological trends. Large Language Models(LLMs) offer a new possibility for automating the extraction of requirements artifacts from such articles.

Objective. This research work investigates the effectiveness of using state-ofthe-art LLMs to generate user stories from Information Technology (IT) news articles. It further compares the quality and characteristics of the LLM-generated userstories against those authored by human practitioners in the field of requirementsengineering and development, with the aim of understanding their potential utilityin downstream development activities.

Methodology. We employed a mixed-methods approach, three prominent LLMs(Grok-3-Preview-02-24, Gemini-2.0-Pro-Exp-02-05, ChatGPT-4o-latest) were promptedusing a standardized template created to extract user stories from a selected andfiltered set of IT news articles. These outputs were compared with user stories generated independently by human experts working with the same articles. The comparison involved automated quality assessment using the AQUAS framework andreadability metrics, alongside a blind evaluation conducted by human experts whorated the stories based on criteria that include Decomposition Potential, Clarity andSpecificity, Traceability to Source, Innovation, and Overall Development Utility.

Results. Our findings indicate that LLMs can effectively generate a substantial volume of syntactically good user stories from news articles, often exceedingthe quantity produced by human experts. However, the model performance varied,with Grok and ChatGPT demonstrating a stronger adherence to instructions andsyntactic quality than Gemini. Also, a significant qualitative differences were observed as LLM-generated stories tended to be more technically detailed, atomic, andclosely tied to the source text’s implementation specifics. On the other hand, humanauthored stories were often more strategic, contextual, and occasionally combinedrelated needs. In the blind human expert evaluation, LLM-generated user storieswere consistently rated significantly higher than the human-authored ones across allassessed dimensions, suggesting a high perceived value for subsequent developmenttasks.

Conclusion. We agree that LLMs demonstrate considerable potential as toolsto augment requirements elicitation by rapidly generating detailed, candidate userstories from unstructured text like IT news articles. While careful model selectionand human oversight are crucial for validation, refinement, and contextualization,the structural and detailed nature of LLM outputs appears highly beneficial fordownstream requirements engineering activities. LLMs and human experts offercomplementary strengths, which suggests a hybrid approach may be most effectivefor leveraging this technology in practice.

Place, publisher, year, edition, pages
2025. , p. 65
Keywords [en]
LLM, Technology News, Requirement Mining, Requirement Elicitation, User Story, Comparative Analysis
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:bth-28200OAI: oai:DiVA.org:bth-28200DiVA, id: diva2:1977131
Subject / course
PA2534 Master's Thesis (120 credits) in Software Engineering
Educational program
PAADA Master Qualification Plan in Software Engineering 120,0 hp
Presentation
2025-05-28, J1650, Valhallavägen 1, Karlskrona, 16:53 (English)
Supervisors
Examiners
Available from: 2025-06-30 Created: 2025-06-25 Last updated: 2025-09-30Bibliographically approved

Open Access in DiVA

fulltext(1975 kB)214 downloads
File information
File name FULLTEXT01.pdfFile size 1975 kBChecksum SHA-512
f266f19311d14f38326e69c5d6fb8f9ebdd51b740dbde68ef299bd80eff0dc5c519266766d1c4e6ccdf27f6074e9b14d4aa8253046a8b205cf87a4331acd6ac2
Type fulltextMimetype application/pdf

By organisation
Department of Software Engineering
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 215 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 250 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf