The performance of software defect prediction(SDP) models is known to be dependent on the datasets used for training the models. Evolving data in a dynamic software development environment such as significant refactoring and organizational changes introduces new concept to the prediction model, thus making improved classification performance difficult. In this study, we investigate and assess the existence and impact of concept drift on SDP performances. We empirically asses the prediction performance of five models by conducting cross-version experiments using fifty-five releases of five open-source projects. Prediction performance fluctuated as the training datasets changed over time. Our results indicate that the quality and the reliability of defect prediction models fluctuate over time and that this instability should be considered by software quality teams when using historical datasets. The performance of a static predictor constructed with data from historical versions may degrade over time due to the challenges posed by concept drift. © 2020 IEEE.
Cross‐project defect prediction (CPDP), where data from different software projects are used to predict defects, has been proposed as a way to provide data for software projects that lack historical data. Evaluations of CPDP models using the Nearest Neighbour (NN)Filter approach have shown promising results in recent studies. A key challenge with defect‐prediction datasets is class imbalance, that is, highly skewed datasets where nonbuggy modules dominate the buggy modules. In the past, data resampling approaches have been applied to within‐projects defect prediction models to help alleviate the negative effects of class imbalance in the datasets. To address the class imbalance issue in CPDP, the authors assess the impact of data resampling approaches on CPDP models after the NN Filter is applied. The impact on prediction performance of five oversampling approaches (MAHAKIL, SMOTE, Borderline‐SMOTE, Random Oversamplingand ADASYN) and three undersampling approaches (Random Undersampling, Tomek Links and One‐sided selection) is investigated and results are compared to approaches without data resampling. The authors examined six defect prediction models on34 datasets extracted from the PROMISE repository. The authors' results show that there is a significant positive effect of data resampling on CPDP performance, suggesting that software quality teams and researchers should consider applying data resampling approaches for improved recall (pd) and g‐measure prediction performance. However, if the goal is to improve precision and reduce false alarm (pf) then data resampling approaches should be avoided.
Conway’s law assumes a strong association between the system’s architecture and the organization’s communication structure that designs it. In the light of contemporary software development, when many companies rely on geographically distributed teams, which often turn out to be temporarily composed and thus having an often changing communication structure, the importance of Conway’s law and its inspired work grows. In this paper, we examine empirical research related to Conway’s law and its application for cross-site coordination. Based on the results obtained we conjecture that changes in the communication structure alone sooner or later trigger changes in the design structure of the software products to return the sociotechnical system into the state of congruence. This is further used to formulate a concept of a rubber band effect and propose a replication study that goes beyond the original idea of Conway’s law by investigating the evolution of socio-technical congruence over time.
Effort estimation is a project management activity that is mandatory for the execution of softwareprojects. Despite its importance, there have been just a few studies published on such activities within the Agile Global Software Development (AGSD) context. Their aggregated results were recently published as part of a secondary study that reported the state of the art on effort estimationin AGSD. This study aims to complement the above-mentioned secondary study by means of anempirical investigation on the state of the practice on effort estimation in AGSD. To do so, a survey was carried out using as instrument an on-line questionnaire and a sample comprising softwarepractitioners experienced in effort estimation within the AGSD context. Results show that the effortestimation techniques used within the AGSD and collocated contexts remained unchanged, with planning poker being the one employed the most. Sourcing strategies were found to have no or a small influence upon the choice of estimation techniques. With regard to effort predictors, globalchallenges such as cultural and time zone differences were reported, in addition to factors that are commonly considered in the collocated context, such as team experience. Finally, many challenges that impact the accuracy of the effort estimates were reported by the respondents, such as problems with the software requirements and the fact that the communication effort between sites is not properly accounted.
The combination of scale and distribution in software projects makes the onboarding of new developers problematic. To the best of our knowledge, there is no research on the relationship between onboarding strategies and the performance evolution of newcomers in large-scale, globally distributed projects. Furthermore, there are no approaches to support the development of strategies to systematically onboard developers. In this paper, we address these gaps by means of an industrial case study. We identified that the following aspects seem to be related to the observed onboarding results: the distance to mentors, the formal training approach used, the allocation of large and distributed tasks in the early stages of the onboarding process, and team instability. We conclude that onboarding must be planned well ahead and should consider avoiding the aspects mentioned above. Based on the results of this investigation, we propose a process to strategize and evaluate onboarding. To develop the process, we used business process modeling. We conducted a static validation of the proposed process utilizing interviews with experts. The static validation of the process indicates that it can help companies to deal with the challenges associated with the onboarding of newcomers through more systematic, effective, and repeatable onboarding strategies. © 2020 Elsevier Inc.
Large-scale distributed software projects with long life cycles often involve a considerable amount of complex legacy code. The combination of scale and distribution challenges and the difficulty in acquiring knowledge about massive amounts of complex legacy code may make the onboarding of new developers/teams problematic. These problems may lead to extended periods of low performance. The primary objective of this paper is to investigate the performance evolution of offshore newcomers onboarded in a large-scale globally distributed project and how it relates to the employed onboarding strategy. To achieve our objective, we conducted a case study in Ericsson. We identified that the following aspects in the onboarding strategy employed in the investigated case seem to be related to the unexpectedly low performance evolution: i) the distance to mentors; ii) the used formal training approach, which did not fit the sociocultural background of the newcomers; iii) allocation of large and distributed tasks in the early stages of the onboarding process; and iv) team instability. We conclude that the onboarding of newcomers in globally distributed projects must be planned well ahead and should consider avoiding the aspects mentioned above. © 2019 IEEE.
Context: Large-scale distributed software projects with long life cycles often involve a considerable amount ofcomplex legacy code. The combination of scale and distribution challenges, and the diculty to acquire knowledgeabout large amounts of complex legacy code may make the onboarding of new developers/teams problematic. Thismay lead to extended periods of low performance.Objective: The main objective of this paper is to analyze the learning processes and performance evolutions (teamproductivity and team autonomy) of remote software development teams added late to a large-scale legacy softwareproduct development, and to propose recommendations to support the learning of remote teams.Method: We conducted a case study in Ericsson, collecting data through archival research, semi-structured interviews,and workshops. We analyzed the collected data using descriptive, inferential and graphical statistics and softqualitative analysis.Results: The results show that the productivity and autonomy of immature remote teams are on average 3.67 and2.27 times lower than the ones of mature teams, respectively. Furthermore, their performance had a steady increaseduring almost the entire first year and dropped (productivity) or got stagnated (autonomy) for a great part of the secondyear. In addition to these results, we also identified four challenges that aected the learning process and performanceevolution of immature remote teams: complexity of the product and technology stack, distance to the main source ofproduct knowledge, lack of team stability, and training expectation misalignment.Conclusion: The results indicate that scale, distribution and complex legacy code may make learning more dicultand demand a long period to achieve high performance. To support the learning of remote teams, we put forward fiverecommendations. We believe that our quantitative analysis, as well as the identified factors and recommendationscan help other companies to onboard new remote teams in large-scale legacy product development projects.
This report summarizes the results of the tenth workshop on pedagogies and tools for the teaching and learning of object-oriented concepts. The focus of this year’s workshop was on examples, modelling and abstraction. Participants agreed that carefully developed scaffolded examples are a key element for learning to program. For the teaching of modelling and abstraction this area, however, seems badly neglected. The workshop gathered 12 participants, all from academia, from 10 different countries.
CRC-cards are a common lightweight approach to collaborative object-oriented analysis and design. They have been adopted by many educators and trainers to teach object-oriented modelling. In our experience, we have noticed many subtle problems and issues that have largely gone unnoticed in the literature. Two of the major issues are related to the CRC-card role-play as described in the literature. Although CRC-cards are representing classes, they are also utilized as substitutes for the actual objects during the scenario role-play. Furthermore; it is quite difficult to document or trace the scenario role-play. We propose using Role-Play Diagrams (RPDs) to overcome these problems. Our experience so far is quite positive. Novices have fewer problems with role-play activities when using these diagrams. Teaching and learning the new type of diagram adds only little overhead to the overall CRC-approach. Although our improvements specifically target the teaching and learning of object-oriented modelling, we believe that RPDs can be successfully applied in professional software development.
Context: Double-counting in a literature review occurs when the same data, population, or evidence is erroneously counted multiple times during synthesis. Detecting and mitigating the threat of double-counting is particularly challenging in tertiary studies. Although this topic has received much attention in the health sciences, it seems to have been overlooked in software engineering. Objective: We describe issues with double-counting in tertiary studies, investigate the prevalence of the issue in software engineering, and propose ways to identify and address the issue. Method: We analyze 47 tertiary studies in software engineering to investigate in which ways they address double-counting and whether double-counting might be a threat to validity in them. Results: In 19 of the 47 tertiary studies, double-counting might bias their results. Of those 19 tertiary studies, only 5 consider double-counting a threat to their validity, and 7 suggest strategies to address the issue. Overall, only 9 of the 47 tertiary studies, acknowledge double-counting as a potential general threat to validity for tertiary studies. Conclusions: Double-counting is an overlooked issue in tertiary studies in software engineering, and existing design and evaluation guidelines do not address it sufficiently. Therefore, we propose recommendations that may help to identify and mitigate double-counting in tertiary studies. © 2023 The Author(s)
Background: Software engineering research aims to establish software development practice on a scientific basis. However, the evidence of the efficacy of technology is insufficient to ensure its uptake in industry. In the absence of a theoretical frame of reference, we mainly rely on best practices and expert judgment from industry-academia collaboration and software process improvement research to improve the acceptance of the proposed technology. Objective: To identify acceptance models and theories and discuss their applicability in the research of acceptance behavior related to software development.Method: We analyzed literature reviews within an interdisciplinary team to identify models and theories relevant to software engineering research. We further discuss acceptance behavior from the human information processing perspective of automatic and affect-driven processes (“fast” system 1 thinking) and rational and rule-governed processes (“slow” system 2 thinking). Results: We identified 30 potentially relevant models and theories. Several of them have been used in researching acceptance behavior in contexts related to software development, but few have been validated in such contexts. They use constructs that capture aspects of (automatic) system 1 and (rational) system 2 oriented processes. However, their operationalizations focus on system 2-oriented processes indicating a rational view of behavior, thus overlooking important psychological processes underpinning behavior. Conclusions: Software engineering research may use acceptance behavior models and theories more extensively to understand and predict practice adoption in the industry. Such theoretical foundations will help improve the impact of software engineering research. However, more consideration should be given to their validation, overlap, construct operationalization, and employed data collection mechanisms when using these models and theories.
We have developed courseware for UML/SysML modelling based on the needs of the European embedded/automotive industry. The courseware supports interactive modelling exercises. First evaluations show promising results.
There are many aspects of code quality, some of which are difficult to capture or to measure. Despite the importance of software quality, there is a lack of commonly accepted measures or indicators for code quality that can be linked to quality attributes. We investigate software developers’ perceptions of source code quality and the practices they recommend to achieve these qualities. We analyze data from semi-structured interviews with 34 professional software developers, programming teachers and students from Europe and the U.S. For the interviews, participants were asked to bring code examples to exemplify what they consider good and bad code, respectively. Readability and structure were used most commonly as defining properties for quality code. Together with documentation, they were also suggested as the most common target properties for quality improvement. When discussing actual code, developers focused on structure, comprehensibility and readability as quality properties. When analyzing relationships between properties, the most commonly talked about target property was comprehensibility. Documentation, structure and readability were named most frequently as source properties to achieve good comprehensibility. Some of the most important source code properties contributing to code quality as perceived by developers lack clear definitions and are difficult to capture. More research is therefore necessary to measure the structure, comprehensibility and readability of code in ways that matter for developers and to relate these measures of code structure, comprehensibility and readability to common software quality attributes.
Some solutions to a programming problem are more elegant or more simple than others and thus more understandable for students. We review desirable properties of example programs from a cognitive and a measurement point of view. Certain cognitive aspects of example programs are captured by common software measures, but they are not sufficient to capture a key aspect of understandability: readability. We propose and discuss a simple readability measure for software, SRES, and apply it to object-oriented textbook examples. Our results show that readability measures correlate well with human perceptions of quality. Compared with other readability measures, SRES is less sensitive to commenting and white-space. These results also have implications for software maintainability measures.
We present courseware for UML/SysML modelling that supports collaborative learning at a distance. Learners can solve interactive modelling exercises and discuss their solutions. First evaluations show promising results.
This report summarizes the results of the eleventh workshop on pedagogies and tools for the teaching and learning of object-oriented concepts. The focus of this year's workshop was on desirable properties of examples and the usage of simple tools. The workshop gathered 17 participants, all from academia, from 7 different countries.
Team projects are a way to expose students to conflicting project objectives, and "[t]here should be a strong real-world element … to ensure that the experience is realistic" [ACM/IEEE-CS 2015b]. Team projects provide students an opportunity to put their education into practice and prepare them for their professional careers. The aim of this special issue is to collect and share evidence about the state-of-practice of team projects in computing education and to help educators in designing and running team projects. From a record number of 69 submitted abstracts, 19 were invited to submit a full paper. Finally, nine papers were accepted for publication in this and a subsequent issue. The articles presented in the present issue cover the following topics: real projects for real clients, open source projects, multidisciplinary team projects, student and team assessment, and cognitive and psychological aspects of team projects.
Modeling is a key skill in software development. The ability to develop, manipulate and understand models for software is therefore an important learning objective in many CS/SE courses. In this working group, we investigated how and when (software) modeling is taught to help us better understand the key issues in teaching (software) modeling. Several shortcomings were found in common curricula, both in their understanding of the term "modeling" and in how they address its teaching. This WG report summarizes the fi ndings and formulates recommendations on the inclusion of software modeling courses in future CS/SE curricula.
Example programs play an important role in the teaching and learning of programming. Students as well as teachers rank examples as the most important resources for learning to program. Ex- ample programs work as role models and must therefore always be consistent with the principles and rules we are teaching.
However, it is difficult to find or develop examples that are fully faithful to all principles and guidelines of the object-oriented paradigm and also follow general pedagogical principles and practices. Unless students are able to engage with good examples, they will not be able to tell desirable from undesirable properties in their own and others’ programs.
In this paper we report on a study in which experienced educators evaluated the quality of object-oriented example programs for novices from popular Java textbooks. The evaluation was accomplished using an on-line checklist that elicited responses on the technical, object-oriented, and didactic quality of examples.
In total 25 reviewers contributed 215 reviews to our data set, based on 38 example programs from 13 common introductory programming textbooks. Results show that the evaluation instru- ment is reliable in terms of inter-rater agreement. Overall, example quality was not as good as one might expect from common textbooks, in particular regarding certain object-oriented properties.
We conclude that educators should be careful when taking examples straight out of a textbook.
Software readability and comprehension are important factors in software maintenance. There is a large body of research on software measurement, but the actual factors that make software easier to read or easier to comprehend are not well understood. In the present study, we investigate the role of method chains and code comments in software readability and comprehension. Our analysis comprises data from 104 students with varying programming experience. Readability and comprehension were measured by perceived readability, reading time and performance on a simple cloze test. Regarding perceived readability, our results show statistically significant differences between comment variants, but not between method chain variants. Regarding comprehension, there are no significant differences between method chain or comment variants. Student groups with low and high experience, respectively, show significant differences in perceived readability and performance on the cloze tests. Our results do not show any significant relationships between perceived readability and the other measures taken in the present study. Perceived readability might therefore be insufficient as the sole measure of software readability or comprehension. We also did not find any statistically significant relationships between size and perceived readability, reading time and comprehension.
Context. Code quality is a key issue in software development. The ability to develop high quality software is therefore a key learning goal of computing programs. However, there are no universally accepted measures to assess the quality of code and current standards are considered weak. Furthermore, there are many facets to code quality. Defining and explaining the concept of code quality is therefore a challenge faced by many educators.
Objectives. In this working group, we investigated code quality as perceived by students, educators, and professional developers, in particular, the differences in their views of code quality and which quality aspects they consider as more or less important. Furthermore, we investigated their sources for information about code quality and its assessment.
Methods. We interviewed 34 students, educators and professional developers regarding their perceptions of code quality. For the interviews they brought along code from their own experience to discuss and exemplify code quality.
Results. There was no common definition of code quality among or within these groups. Quality was mostly described in terms of indicators that could measure an aspect of code quality. Among these indicators, readability was named most frequently by all groups. The groups showed significant differences in the sources they use for learning about code quality with education ranked lowest in all groups.
Conclusions. Code quality should be discussed more thoroughly in educational programs.
Software product line development has emerged as a leading approach for software reuse. This paper describes an approach to manage natural-language requirements specifications in a software product line context. Variability in such product line specifications is modeled and managed using a feature model. The proposed approach has been introduced in the Swedish defense industry. We present a multiple-case study covering two different product lines with in total eight product instances. These were compared to experiences from previous projects in the organization employing clone-and-own reuse. We conclude that the proposed product line approach performs better than clone-and-own reuse of requirements specifications in this particular industrial context.