CUDA - typical program structure

Global variables declaration

  • __host__

  • __device__

  • __global__

  • __constant__

  • texture

Function prototypes

  • __global__ void kernelOne(...)

  • __device__ / __host__ float handyFunction(...)

Main()

  • allocate memory space on the device - cudaMalloc(&d_GlobalVarPtr, bytes)

  • transfer data from host to device - cudaMemCpy(d_GlobalVarPtr, h_GlobalVa...)

  • execution configuration setup

  • kernel call - kernelOne<<<execution configuration>>>(args ...);

  • transfer results from device to host - cudaMemCpy(h_GlobalVarPtr, d_Global...)

  • free memory space on device - cudaFree(d_GlobalVarPtr);

Kernel - void kernelOne(type args,...)

  • variables declaration:

    • __shared__

    • automatic variables transparently assigned to registers or local memory

  • __syncthreads()...

Last updated