Wednesday 16 September 2015

GPGPU using CUDA Thrust


Long time no see!

I just want to share the presentation that I gave during our weekly meeting in the CGV group in Delft. Usually we do two kind of presentations, the first one is related to our current project, while the second one is more technical and aims at teaching something useful to the other members of the group.

This presentation is of the second type. It aims at showing how CUDA Thrust can be used to implement GPGPU solutions without too much effort.

I think that CUDA is a great platform and I did a lot of work using it. 
However it is difficult to master and, without any high level interface, CUDA code is slow to implement.  I gave a couple of examples in my presentation:

  1. Preparing a CUDA kernel launch is somehow an annoying task. More than that the problem is related to the handling of the memory in the Device. I like how the OpenCV wrapped all the handling in the GpuMat class, however it is something that must be careful planned ahead and well designed with clear specifications.
  2. Memory access influences the performance: This is a key point! It is not so easy to implement an efficient algorithm in the GPU. You can find some details about all the challenges related to memory access at this link.

CUDA Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL) and implements many common algorithms that can be used during your daily work.

So if you are already using STL's algorithms and data structures in your code, and you are tackling big problems by decomposing it to small and well known algorithms, CUDA Thrust can be very useful and easy to pick.