I finished running the scaling tests this week and ran another set of them to see if the degraded performance at scale was real or a just a fluke. The second run confirmed it to be real but also showed degraded performance at a different number or ranks. There’s also sometimes a ~17ms difference between the min and max time step time in the “slow” runs. We’re not sure exactly what’s going on, it might be network congestion, a slow GPU, an issue on the code, or just that running at scale is slow sometimes.
I wrote the section on the scaling plots and did some general revisions.
I wrote up a test for the
_ctSlope function. Given that I’m thinking about rewriting that function to utilize templates I’m holding off on merging the test until I decide what I actually want to test.