CUDA -- Compute Unified Device Architecture A C/C++-like language developed by Nvidia for parallel computing using GPU Need to have a Nvidia GPU: $ lspci |grep -i Nvidiafunction ~ kernel $ nvcc hello.cu -o hello_cu
C void c_hello(){ printf("Hello World!\n"); } int main() { printf("Hello 0\n"); c_hello(); return 0; }CUDA __global__ void cuda_hello(){ printf("Hello World from GPU!\n"); } int main() { printf("Hello 0\n"); cuda_hello<<<1,1>>>(); return 0; }
1. Copy data from main memory to GPU memory 2. CPU initiates the GPU compute kernel 3. GPU's CUDA cores execute the kernel in parallel 4. Copy the resulting data from GPU memory to main memory Add cudaDeviceSynchronize(); to flush out data in GPU memory.