Classifier evaluation has historically been conducted by estimating predictive accuracy via cross-validation tests or similar methods. More recently, ROC analysis has been shown to be a good alternative. However, the characteristics vary greatly between problem domains and it has been shown that some evaluation metrics are more appropriate than others in certain cases. We argue that different problems have different requirements and should therefore make use of evaluation metrics that correspond to the relevant requirements. For this purpose, we motivate the need for generic multi-criteria evaluation methods, i.e., methods that dictate how to integrate metrics but not which metrics to integrate. We present such a generic evaluation method and discuss how to select metrics on the basis of the application at hand.