Not surprising that this happened on an Adreno chip. In my experience doing mobile graphics, their drivers have been where I’ve seen the most problems, especially with their shader compiler. I’ve seen it give weird behaviour on perfectly valid (if slightly unusual) shader code before, that worked perfectly on every other GPU I tested.
With GPU compute shader/GPGPU performance issues, memory access should always be suspect number one unless proven innocent. In some cases I’ve ended up including conditional compilation for each memory access so I could stub out reads with a fixed value or value computed from scratch, and skip a write entirely in my compute kernels.