Phase diversity method is not only used as an image restoration technique, but also as a wavefront sensor. However, its computations have been perceived as being too burdensome to achieve its real-time applications on a desktop computer platform. In this paper, the implementation of the phase diversity algorithm based on graphic processing unit (GPU) is presented. The redundancy computations for the pupil function, point spread function, and optical transfer function are analyzed. Two kinds of implementation methods based on GPU are compared: one is the general method which is accomplished by GPU library CUFFT without precision loss (method-1) and the other one performed by our own custom FFT with little damage of precision considering the redundant calculations (method-2). The results show the cost and gradient functions can be speeded up by method-2 in contrast with the method-1 and the overhead of global memory access by kernel fusion can be reduced. For the image of 256 x 256 with the sampling factor of 3, the results reveal that method-2 achieves speedup of 1.83x compared with method-1 when the central 128 x 128 pixels of the point spread function are used.