Avoiding if condition in CUDA

September 14, 2019

Branching is one of the most time consuming operation in CUDA ( in general as well).

For example, See below CPU code and GPU code …

void  CPUCode( int* input, int* output, int length)
{
                for ( int  i = 0; i < length; ++i )
                {
                      output[ i ] = input[ i ] + 2 * input[ i + 1 ];
               }
}

GPU code:

__global__
void  GPUCode( int* input, int* output, int length)
{
               int idx = __umul24( blockDim.x, blockIdx.x) + threadIdx.x;
 
               if ( idx < length )
                {
                      output[ idx ] = input[ idx ] + 2 * input[ idx + 1 ];
               }
}

In the above GPU code, there is a if condition which is executed by each thread. If every thread executes the same instruction at the same time, then that execution is very fast. i.e., the kernel code (or __global__ function code) should be serial, no branching in side it.

Look at the modified GPU version code…

__global__
void  GPUCode( int* input, int* output, int length)
{
               int idx = __umul24( blockDim.x, blockIdx.x) + threadIdx.x;
 
               idx = max( idx, 0);
               idx = min( idx, length); 
             
              output[ idx ] = input[ idx ] + 2 * input[ idx + 1 ];              
}

No branching in the above modifed GPU version code. The Kernel code is serial. Every thread is executes the same instruction at a time. This type of code is executes very fast on GPU.

The above techinique is applicable for general CPU code as well.

https://burnignorance.com/wp-content/themes/impeka/images/empty/thumbnail.jpg 150 150 Burnignorance | Where Minds Meet And Sparks Fly! Burnignorance | Where Minds Meet And Sparks Fly! https://burnignorance.com/wp-content/themes/impeka/images/empty/thumbnail.jpg September 14, 2019 March 9, 2025

Avoiding if condition in CUDA

Show truncated string with dots appended with CSS3

How to avoid Memory leak issue in Java

Active Directory Authentication In Web Application