Course Outline

Introduction

  • What is GPU programming?
  • Why use GPU programming?
  • What are the challenges and trade-offs of GPU programming?
  • What are the frameworks and tools for GPU programming?
  • Choosing the right framework and tool for your application

OpenCL

  • What is OpenCL?
  • What are the advantages and disadvantages of OpenCL?
  • Setting up the development environment for OpenCL
  • Creating a basic OpenCL program that performs vector addition
  • Using OpenCL API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
  • Using OpenCL C language to write kernels that execute on the device and manipulate data
  • Using OpenCL built-in functions, variables, and libraries to perform common tasks and operations
  • Using OpenCL memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses
  • Using OpenCL execution model to control the work-items, work-groups, and ND-ranges that define the parallelism
  • Debugging and testing OpenCL programs using tools such as CodeXL
  • Optimizing OpenCL programs using techniques such as coalescing, caching, prefetching, and profiling

CUDA

  • What is CUDA?
  • What are the advantages and disadvantages of CUDA?
  • Setting up the development environment for CUDA
  • Creating a basic CUDA program that performs vector addition
  • Using CUDA API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
  • Using CUDA C/C++ language to write kernels that execute on the device and manipulate data
  • Using CUDA built-in functions, variables, and libraries to perform common tasks and operations
  • Using CUDA memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses
  • Using CUDA execution model to control the threads, blocks, and grids that define the parallelism
  • Debugging and testing CUDA programs using tools such as CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight
  • Optimizing CUDA programs using techniques such as coalescing, caching, prefetching, and profiling

ROCm

  • What is ROCm?
  • What are the advantages and disadvantages of ROCm?
  • Setting up the development environment for ROCm
  • Creating a basic ROCm program that performs vector addition
  • Using ROCm API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
  • Using ROCm C/C++ language to write kernels that execute on the device and manipulate data
  • Using ROCm built-in functions, variables, and libraries to perform common tasks and operations
  • Using ROCm memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses
  • Using ROCm execution model to control the threads, blocks, and grids that define the parallelism
  • Debugging and testing ROCm programs using tools such as ROCm Debugger and ROCm Profiler
  • Optimizing ROCm programs using techniques such as coalescing, caching, prefetching, and profiling

HIP

  • What is HIP?
  • What are the advantages and disadvantages of HIP?
  • Setting up the development environment for HIP
  • Creating a basic HIP program that performs vector addition
  • Using HIP language to write kernels that execute on the device and manipulate data
  • Using HIP built-in functions, variables, and libraries to perform common tasks and operations
  • Using HIP memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses
  • Using HIP execution model to control the threads, blocks, and grids that define the parallelism
  • Debugging and testing HIP programs using tools such as ROCm Debugger and ROCm Profiler
  • Optimizing HIP programs using techniques such as coalescing, caching, prefetching, and profiling

Comparison

  • Comparing the features, performance, and compatibility of OpenCL, CUDA, ROCm, and HIP
  • Evaluating GPU programs using benchmarks and metrics
  • Learning the best practices and tips for GPU programming
  • Exploring the current and future trends and challenges of GPU programming

Summary and Next Steps

Requirements

  • An understanding of C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors

Audience

  • Developers who wish to learn the basics of GPU programming and the main frameworks and tools for developing GPU applications
  • Developers who wish to write portable and scalable code that can run on different platforms and devices
  • Programmers who wish to explore the benefits and challenges of GPU programming and optimization
 21 Hours

Number of participants



Price per participant

Testimonials (1)

Related Courses

Administration of CUDA

35 Hours

GPU Programming with CUDA and Python

14 Hours

AMD GPU Programming

28 Hours

NVIDIA GPU Programming

14 Hours

GPU Programming with CUDA

28 Hours

GPU Programming with OpenACC

28 Hours

GPU Programming with OpenCL

28 Hours

GPU Programming - OpenCL vs CUDA vs ROCm

28 Hours

NVIDIA GPU Programming - Extended

21 Hours

ROCm for Windows

21 Hours

Hardware-Accelerated Video Analytics

14 Hours

Raster and Vector Graphics (Adobe Photoshop, CorelDraw)

28 Hours

Adobe LiveCycle Designer

14 Hours

Affinity Designer

14 Hours

Adobe Illustrator

14 Hours

Related Categories