Arm DDT with GPUs
Arm DDT with GPUs
Debugging GPU Codes with Arm DDT
See the Arm Forge User Guide for all the info, this is just a place for some notes and tips
- Use
-g -O0to compile host code and-g -G -cudart sharedto compile device code. The-g -Genable debugging symbols and-cudart sharedenable device memory debugging. - A device pointer’s contents can be accessed by preappending
@globalto its type. A permanant version of this can be done with expressions.((@global TYPE *)(VARIABLE_NAME)to get the proper pointer((@global TYPE *)(VARIABLE_NAME)[IDX]to get the value atIDX((@global TYPE *)(VARIABLE_NAME)[IDX]@Nto getNvalues starting atIDX
- The expressions panel: can also be used for any other debugging or math expression you want
- Array Viewer: Any expression can go in the brackets and be displayed in 2D. The correct indexing scheme for Cholla is
xid + yid*nx + zid*nx*ny + field*n_cells.
Arm MAP
map --profile --cuda-kernel-analysis --cuda-transfer-analysis mpirun -n 4 EXECUTABLE ARGS--profile: Standard flag for profiling--cuda-kernel-analysis: Enables CUDA kernel profiling--cuda-transfer-analysis: Enables CUDA memory transfer profiling- The CUDA args require
-lineinfowhen compiling - The CUDA args have a HUGE performance impact on host code so host and device code should be profiled separately. The NVIDIA GPU metrics will be adversely affected by this overhead, particularly the GPU utilization metric
- View with
map —connect MAP_FILE.mapor download and open with Arm MAP - Can make reports from .map files with
perf-report map-file.map
Arm Performance Reports
perf-report mpirun -n 4 EXECUTABLE ARGS or perf-report map-file.map. Generates a nice HTML and text summary of the profiling.
This post is licensed under CC BY 4.0 by the author.