H.264/AVC is the state-of-the-art video coding standard which promises to achieve same video quality at about half the bit rate of previous standards (H.263, MPEG-2). This tremendous achievement in compression and perceptual quality is due to the inclusion of various innovative tools. These tools are highly complex and data intensive as a result poses very heavy computational burden on the processors. De-blocking filter is one among them, it is the most time consuming part of the H.264/AVC reference decoder. In this thesis, a performance analysis of the de-blocking filter is made on Intel Pentium 4 processor and accordingly various optimization techniques have been studied and implemented. For some techniques statistical analysis of video data is done and according to the results obtained optimization is performed and for other techniques SIMD instructions has been used to achieve the optimization. Comparison of optimized techniques using SIMD with the reference software has shown significant speedup thus contributing to the real time implementation of the de-blocking filter on general purpose platform.
De-blocking Filter is the most time consuming part of the H.264 High Profile decoder. The process of De-block filtering specified in the H.264/AVC standard is sequential thus not computationally optimal. In this thesis various optimization algorithms have been studied and implemented. When compared to JM13.2 boundary strength algorithm, Static and ICME algorithms are quite primitive as a result no performance gain is achieved, in fact there is a decrease in performance. This dismal performance is due to various reasons, prominent among them are increased memory access, unrolling of loop to 4x4 boundary and early detection of intra blocks. When it comes to the optimization algorithms of Edge filtering module both the algorithms (SIMD and fast algorithm) showed significant improvement in performance when compared to JM13.2 edge filtering algorithm. This improvement is mainly due to the parallel filtering operation done in edge filtering module. Therefore, by using SSE2 instructions large speed up could be achieved on general purpose processors like Intel, while keeping the conformance with the standard.