GPU Resident Reduction
I finished the GPU resident reduction and submitted a pull request for it. The issue with slightly different results from last week turned out to be because I had forgotten to set the value in global memory before comparing against it using atomics. As such it was actually always using the highest crossing time in the history of the simulation, not the highest in this time step. Assigning the global memory value to
-DBL_MAX fixed this problem and I added an optional argument to set it to whatever value the user requires.
I adapted my previous MHD plotting code to work with Cholla. Previously this code took an extremely long time (5-10 minutes) to produce the roughly 200 frames required for a ~10 second long animation. This is too long for rapid testing and so after modifying the code to work with Cholla I set about optimizing it. There were three things that made large differences from the initial runtime of ~500s:
- Not running the code in VS Code. Running the code in its own terminal reduced the time to around 200s, a 2.5x speedup.
matplotlib.pyplot.minorticks_on(). Disabling minor ticks roughly doubled performance getting me down to ~97s, a 2.06x speedup.
- Enabling blitting. I had to refactor the code significantly to enable blitting; unfortunately if you set
blit=Trueand it can’t do the blitting it fails silently rather than raise a warning. Refactoring to enable it did result in much cleaner code and I learned about named tuples and partial functions while I was at it which was very helpful. Reduced execution time down to ~85s, a 1.14x speedup.
At this point the animations are generated quickly enough that I’m not going to bother trying more optimization, especially because I believe the performace issues are largely withing matplotlib itself. Just generating the frames and saving them takes a similar amount of time to making the entire animation so the slowdown is primarily in simply generating each frame. While matplotlib is very useful it appears to have fairly poor performance if I can only get 2.5 frames per second out of it. Reducing the number of subplots does help commensurately, however, only nine simple plots should run much faster than 2.5 FPS on modern hardware.
Now that I have the tools I’ve started debugging my MHD code. As of now it’s not updating the grid and the issue persists when running with MHD or a pure hydro problem using the MHD solver. Figuring that out is going to be my first task for next week. It also appears that when the y and z dimension is set to be low the time step is always computed to be zero. I’ve tried a 32x32x32, 512x3x3, and 512x512x512 simulations of the Brio & Wu shock tube and the first two both had zero time steps where the last had a time step of about 1E-4. This may or may not be related to the issue of the grid not updating so I’m going to focus on that issue first.
- Read Beck et al. 1996
- Read Gent et al. 2021