Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Usage of Generative AI Based Plugin in Unit Testing: Evaluating the Trustworthiness of Generated Test Cases by Codiumate, an IDE Plugin Powered by GPT-3.5 & 4
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering. (16)
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering. (16)
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Background: Unit testing is essential in software development, ensuring the functionality of individual components like functions and classes. However, manual creation of unit test cases is time-consuming and tedious, impacting testing efficiency and reliability.

Problem: Automated unit test generation tools such as EvoSuite and Randoop have addressed some challenges, but they’re limited by language specificity and predefined algorithms. Generative AI tools like ChatGPT and GitHub Copilot powered by OpenAI’sGPT-3.5/4 offer alternatives, but face limitations like user input reliance and operational inconveniences.

Solution: CodiumAI’s Codiumate IDE plugin aims to mitigate these limitations, making code quality assurance easier for developers. This study evaluates Codiumate’s trustworthiness in generating unit tests for the Python functions.

Method: We randomly selected thirty functions from OpenAI’s HumanEval dataset, and wrote selection criteria for relevant test cases based on each function’s doc string to evaluate Codiumate’s trustworthiness using metrics such as Relevance Score, false positive rate, and result consistency rate.

Result: Among all the suggested test cases by Codiumate, 208 unit tests, which consists of 48% of suggested test cases that were relevant. 70% of assertions from these test cases strictly meet selection criteria, while the other 30% while relevant were selected due to our basis and experience in software testing. The average false positive rate is15%. Function groups that have higher Relevance Scores are non-mathematical nature, and simple dependencies. High false positives arise in functions with string and float parameters. All generated unit tests are syntax-error-free, with 20% fail and 80% passed in all five test execution.

Conclusion: Codiumate demonstrates potential in automating unit test generation, offering a convenient means to support developers. However, it is not yet fully reliable for critical applications without developer oversight. Continued refinement and exploration of its capabilities are essential for Codiumate to become an indispensable asset in unit test generation, enhancing its trustworthiness and effectiveness in the software development process.

Place, publisher, year, edition, pages
2024. , p. 32
Keywords [en]
Codiumate, ChatGPT, trustworthiness, unit test, test case generation
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-26473OAI: oai:DiVA.org:bth-26473DiVA, id: diva2:1874264
Subject / course
DV1446 Bachelor Thesis in Computer Science
Educational program
PAGPT Software Engineering
Presentation
2024-05-29, Rum J1630, Valhallavägen 10, 371 79, Karlskrona, 13:00 (English)
Supervisors
Examiners
Available from: 2024-06-24 Created: 2024-06-19 Last updated: 2024-06-24Bibliographically approved

Open Access in DiVA

fulltext(4955 kB)269 downloads
File information
File name FULLTEXT01.pdfFile size 4955 kBChecksum SHA-512
0d2023d552a1a7f9cf89d718aeaf189c94f5ef67d1fabe7286a9122ad775d015d588bf9a264934b517a54f8cdd4e86a55d275e0a01267502d28392e9999c939e
Type fulltextMimetype application/pdf

By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 269 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 464 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf