Some considerations with Go’s scheduler:
You want medium to long tasks to run on each CPU. The smaller your messages, the more you’re paying the re-balancing taxes, as with smaller messages (i.e. many more in a short amount of time) there are more opportunities for your CPU to have a small/zero queue depth.
Operating systems fight to keep things fair between many axis (users, processes, etc), and work-stealing schedulers in userspace are going to fight with that, as they try to balance your process across all CPU (or GOMAXPROCS) rather than using multiple processes and letting the kernel pick CPUs for work. Pin your CPUs, and don’t run a full ubuntu userspace. Please. Stop the madness.
Memory(NUMA)/cache affinity can be hard to manage correctly with Go as things you’re lining up on a CPU to run through can end up being stolen off, and having to fight across your CPU caches for writes. This probably isn’t a concern if you’re using Go, but it’s an upper limit of performance you’ll hit if you have small message sizes.