`Hello World' in CUDA


CUDA -- Compute Unified Device Architecture

A C/C++-like language developed by Nvidia for parallel 
computing using GPU

Need to have a Nvidia GPU:
  $ lspci |grep -i Nvidia 







C

void c_hello(){
    printf("Hello World!\n");
}

int main() {
    printf("Hello 0\n");
    c_hello();
    return 0;
}




 



CUDA

__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}

int main() {
    printf("Hello 0\n");
    cuda_hello<<<1,1>>>(); 
    return 0;
}





function ~ kernel

$ nvcc hello.cu -o hello_cu


  1.  Copy data from main memory to GPU memory
  2.  CPU initiates the GPU compute kernel
  3.  GPU's CUDA cores execute the kernel in parallel
  4.  Copy the resulting data from GPU memory to main memory

Add 
  cudaDeviceSynchronize();
to flush out data in GPU memory.