What Is GPU Programming?

In the high-powered computing world, GPUs (Graphics Processing Units) are everywhere, and scientists working in specific fields are facing a growing challenge. Namely, they need to tweak their computer programs to make the most of these advanced GPU systems. This task is important to unlock the full power of these high-tech GPU architectures.

In this article, we will give a basic understanding of GPU programming for those who are looking for answers to the question - “What is GPU programming?” In addition, you will explore the different GPU approaches and frameworks and the most popular GPU Python programming platform, CUDA and its importance in GPU coding.

What Is a GPU?

A graphics processing unit (GPU) is an electronic circuit that speeds up the creation of images for display by quickly manipulating memory and processing data simultaneously. These GPUs are now ubiquitous and found in various computing devices such as personal computers, gaming consoles, professional workstations, and even smartphones.

Over time, GPUs surpassed CPUs in performance, boasting more transistors and excelling in parallel computing. Evolving into intricate devices with their memory, buses, and processors, GPUs can be seen as an additional brain or processor for the computer system.

What Is GPU Programming?

GPU programming uses GPU accelerators to perform extremely general-purpose computing operations simultaneously. In the past, GPUs were mainly used for computer graphics, but now they are also used for all sorts of general computing tasks. Apart from making graphics, GPUs are now used a lot for things like scientific modeling, machine learning, and other jobs that can be done faster by working on many tasks simultaneously.

What Is the Difference Between GPU vs. CPU?

The main contrast between computing on a GPU and a CPU is found in their architectural variations. Due to its highly parallel nature, a GPU is more effective than a CPU in handling data that can be split up and processed in several ways. GPUs are specifically optimized for intensive computations such as matrix arithmetic and floating-point arithmetic.

The reason for the difference in processing power between CPUs and GPUs is that GPUs are designed to do highly parallel and compute-intensive jobs, which is exactly what graphics rendering requires.

In the architecture of GPUs, priority is given to data processing over data caching and flow control. Two notable benefits emerge when tackling a problem in parallel: first, each element is addressed with the same problem, reducing the need for intricate flow control; second, a sizable dataset and high arithmetic intensity decrease the requirement for low-latency memory.

What Different Approaches or GPU Programming APIs Are Used?

Using GPUs for computations can be approached in different ways. Adjusting parameters and initial configurations may be sufficient in the simplest cases where the code is already written. Alternatively, using libraries can tackle the most intensive parts of the code. However, for more complex scenarios, some programming might be necessary.

Various GPU programming software environments and APIs are available, categorized into directive-based models, non-portable kernel-based models, and portable kernel-based models, as well as high-level frameworks and libraries.

Standard C++/Fortran

Programs in standard C++ and Fortran languages can leverage NVIDIA GPUs without external libraries. NVIDIA SDK compilers optimize the code for GPU execution, as demonstrated in the provided guidelines.

Directive-Based Programming

Directive-based approaches involve annotating existing serial code to indicate loops and regions for GPU execution. This method, exemplified by OpenACC and OpenMP, is focused on productivity and ease of use but may sacrifice some performance.

OpenACC: A consortium formed in 2010, OpenACC aims to provide a standard, portable, and scalable programming model for accelerators, including GPUs. It initially supported only NVIDIA GPUs but is expanding to support more devices and architectures.

OpenMP: Originally a shared-memory parallel programming API for multi-core CPUs, OpenMP now supports GPU offloading, targeting various types of GPUs.

Non-Portable Kernel-Based Models (Native Programming Models)

Direct GPU programming allows developers greater control with low-level code communicating directly with the GPU. Examples include CUDA (NVIDIA) and HIP (AMD), with HIP designed for cross-compatibility between NVIDIA and AMD GPUs.

Portable Kernel-Based Models (Cross-Platform Portability Ecosystems)

Cross-platform portability ecosystems, such as Alpaka, Kokkos, OpenCL, RAJA, and SYCL, offer higher-level abstractions for convenient and portable GPU programming. They aim for performance portability with a single-source application.

Kokkos: Developed at Sandia National Laboratories, Kokkos is a C++-based ecosystem supporting efficient and scalable parallel applications on CPUs, GPUs, and FPGAs.

OpenCL: An open-standard API for general-purpose parallel computing on various hardware, OpenCL provides a low-level programming interface for GPU programming. It supports CPUs, GPUs, and FPGAs from multiple vendors.

SYCL: A royalty-free, open-standard C++ programming model, SYCL provides a high-level, single-source programming model for heterogeneous systems, including GPUs. It can be implemented on top of OpenCL, CUDA, or HIP.

High-Level Language Support

Python supports GPU programming with several libraries, each serving unique purposes:

CuPy

GPU-based data array library.
Seamless integration with NumPy/SciPy.
Easy transition to GPU computing by replacing 'numpy' and 'scipy' with 'cupy' and 'cupyx.scipy' in Python code.

cuDF (RAPIDS)

Part of RAPIDS, implementing CUDA functionalities with Python bindings.
Designed for data frame manipulation on the GPU.
Offers a pandas-like API, enabling users familiar with Pandas to accelerate work without extensive CUDA knowledge.

PyCUDA

Python programming environment for CUDA.
Enables access to NVIDIA's CUDA API from Python.
Limited to NVIDIA GPUs and requires CUDA programming expertise.

Numba

Allows Just-In-Time (JIT) compilation of Python code for GPU execution.
Supports NVIDIA GPUs and anticipates including support for AMD GPUs in the future.

These libraries provide diverse options for GPU programming in Python, accommodating different user preferences and requirements. Whether transitioning from NumPy/SciPy with CuPy or utilizing cuDF's pandas-like API, Python offers flexibility for efficient GPU-accelerated computations.

What Is a CUDA?

CUDA stands as a parallel computing platform and API crafted by NVIDIA, marking the pioneering era of mainstream GPU programming frameworks. This technology empowers developers to craft code resembling C++ that runs on the GPU.

CUDA not only provides a wealth of libraries and tools for exploring low-level GPU programming, but it also dramatically improves the efficiency of computationally demanding applications. It should be noted that CUDA is limited to NVIDIA hardware, even with such a strong ecosystem.

How to Install CUDA on Ubuntu 20.04 for GPU Programming

You can install CUDA on Ubuntu 20.04 by following the below step-by-step instructions:

Step 1: First, download the CUDA Pin Configuration file (cuda-ubuntu2004.pin) necessary for setting up CUDA repositories on Ubuntu 20.04 by using the ‘wget’ or ‘curl’ command.

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin

Step 2: Move the downloaded pin configuration file to the specified directory, setting its pin priority to 600.

$ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600

Step 3: Download the CUDA Repository Package:

wget https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda-repo-ubuntu2004-11-4-local_11.4.2-470.57.02-1_amd64.deb

Step 4: Install the downloaded CUDA repository package using the dpkg package manager.

$ sudo dpkg -i cuda-repo-ubuntu2004-11-4-local_11.2-470.57.02-1_amd64.deb

Step 5: Add the public key required for authenticating the CUDA repository.

$ sudo apt-key add /var/cuda-repo-ubuntu2004-11-4-local/7fa2af80.pub

Step 6: Update Package Lists:

$ sudo apt-get update

Update the local package lists with information from the newly added CUDA repository.

Step 7: Now, install the CUDA toolkit on your Ubuntu 20.04 machine using the following command:

$ sudo apt-get -y install cuda

By following the steps above, you can install CUDA on a machine equipped with a CUDA-capable GPU running Ubuntu 20.04. Now, you can run GPU code and use the CUDA platform on your Linux machine.

Conclusion

Choosing a GPU programming environment depends on factors like targeted hardware platforms, the nature of computation, and developer experience. Compatibility with specific GPUs, alignment with project requirements, and developer preferences for high-level simplicity or low-level control are critical considerations for making an optimal choice.