A General-purpose Task-parallel Programming System using Modern C++
Sample codes for my CUDA programming book
Thin, unified, C++-flavored wrappers for the CUDA APIs
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Mo...
A simple GPU hash table implemented in CUDA using lock free techniques
This is an archive of materials produced for an introductory class on CU...
From zero to hero CUDA for accelerating maths and machine learning on GPU.
An implementation of HIP that works on CPUs, across OSes.
CUDA kernel author's tools
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-...
CUDA Guide