The problem of robust extraction of ego-motion from a sequence of images for an eye-in-hand camera configuration is addressed. A novel approach toward solving planar template based tracking is proposed which performs a non-linear image alignment and a planar similarity optimization to recover camera transformations from planar regions of a scene. The planar region tracking problem as a motion optimization problem is solved by maximizing the similarity among the planar regions of a scene. The optimization process employs an evolutionary metaheuristic approach in order to address the problem within a large non-linear search space. The proposed method is validated on image sequences with real as well as synthetic image datasets and found to be successful in recovering the ego-motion. A comparative analysis of the proposed method with various other state-of-art methods reveals that the algorithm succeeds in tracking the planar regions robustly and is comparable to the state-of-the art methods. Such an application of evolutionary metaheuristic in solving complex visual navigation problems can provide different perspective and could help in improving already available methods.