The heuristic evaluation (HE) method is one of the most common in the suite of tools for usability evaluations because it is a fast, inexpensive and resource-efficient process in relation to the many usability issues it generates. The method emphasizes completely independent initial expert evaluations. Inter-rater reliability and agreement coefficients are not calculated. The variability across evaluators, even dual domain experts, can be significant as is seen in the case study here. The implications of this wide variability mean that results are unique to each HE, results are not readily reproducible and HE research on usability is not yet creating a uniform body of knowledge. We offer recommendations to improve the science by incorporating selected techniques from qualitative research: calculating inter-rater reliability and agreement scores, creating a codebook to define concepts/categories and offering crucial information about raters' backgrounds, agreement techniques and the evaluation setting.