Search-based software testing (SBST) has shown a potential to decrease cost and increase quality of testingrelated software development activities. Research in SBST has so far mainly focused on the search for isolated tests that are optimal according to a fitness function that guides the search. In this paper we make the case for fitness functions that measure test fitness in relation to existing or previously found tests; a test is good if it is diverse from other tests. We present a model for test variability and propose the use of a theoretically optimal diversity metric at variation points in the model. We then describe how to apply a practically useful approximation to the theoretically optimal metric. The metric is simple and powerful and can be adapted to a multitude of different test diversity measurement scenarios. We present initial results from an experiment to compare how similar to human subjects, the metric can cluster a set of test cases. To carry out the experiment we have extended an existing framework for test automation in an object-oriented, dynamic programming language.