Context: Early prediction of software cost and quality is important for better software planning and controlling. In early development phases, design complexity metrics are considered as useful indicators of software testing effort and some quality attributes. Although many studies investigate the relationship between design complexity and cost and quality, it is unclear what we have learned from these studies, because no systematic synthesis exists to date. Aim: The research presented in this thesis is intended to contribute for the body of knowledge about cost and quality prediction. A major part of this thesis presents the systematic review that provides detail discussion about state of the art of research on relationship between software design metric and cost and software quality. Method: This thesis starts with a literature review to identify the important complexity dimensions and potential predictors for predicting external software quality attributes are identified. Second, we aggregated Spearman correlation coefficients and estimated odds ratios from univariate logistic regression models from 59 different data sets from 57 primary studies by a tailored meta-analysis approach. At last, it is an attempt to evaluate and explain for disagreement among selected studies. Result: There are not enough studies for quantitatively summarizing relationship between design complexity and development cost. Fault proneness and maintainability is the main focused characteristics that consume 75% total number of studies. Within fault proneness and maintainability studies, coupling and scale are two complexity dimensions that are most frequently used. Vote counting shows evidence about positive impact of some design metrics on these two quality attributes. Meta analysis shows the aggregated effect size of Line of code (LOC) is stronger than those of WMC, RFC and CBO. The aggregated effect sizes of LCOM, DIT and NOC are at trivial to small level. In subgroup analysis, defect collections phase explains more than 50% of observed variation in five out of seven investigated metrics. Conclusions: Coupling and scale metrics are stronger correlated to fault proneness than cohesion and inheritance metrics. No design metrics are stronger single predictors than LOC. We found that there is a strong disagreement between the individual studies, and that defect collection phase is able to partially explain the differences between studies.