Context: Test cases and test suites (TCS) are central to software testing. High-quality TCS are essential for boosting practitioners’ confidence in testing. However, the quality of a test suite (a collection of test cases) is not merely the sum of the quality of individual test cases, as suite-level factors must also be considered. Achieving high-quality TCS requires defining relevant quality attributes, establishing appropriate measures for their assessment, and determining their importance within different testing contexts.
Objective: This thesis aims to (1) provide a consolidated view of TCS quality in terms of quality attributes, quality measures, and context information, (2) determine the relative importance of the quality attributes in practice, and (3) develop a reliable approach for assessing a highly prioritized quality attribute identified by practitioners.
Method: We conducted an exploratory study and a tertiary literature review for the first objective, a personal opinion survey for the second, and a comparative experiment with a small-scale evaluation study for the third.
Results: We developed a comprehensive TCS quality model grounded in practitioner insights and existing literature. Based on the survey, maintainability emerged as a critical quality attribute where practitioners need further support. A well-known indicator of poor test design that can negatively impact test-case maintainability is the Eager Test smell, which is defined as “when a test method checks several methods of the object to be tested” or “when a test verifies too much functionality.” The results of existing detection tools for eager tests are found to be inconsistent and unreliable. To better support practitioners in assessing test case maintainability, we proposed a novel, unambiguous definition of the Eager Test smell, developed a heuristic to operationalize it, and implemented a detection tool to automate its identification in practice. Our systematic approach in the tertiary review also yielded valuable insights into constructing and validating automated search results using a quasi-gold standard. We generalized these insights into recommendations for enhancing the current search validation approach.
Conclusions: The thesis makes three main contributions: (1) at the abstract level, a comprehensive quality model to help practitioners and researchers develop guidelines, templates, or tools for designing new test cases and test suites and assessing existing ones; (2) at the strategic level, identification of contextually important quality attributes; and (3), at the operational level, a refined definition of Eager Test smell, a detection heuristic and a tool prototype implementing the heuristic, advancing maintainability assessment in software testing.