@nwf pointed this out to me this morning. It made me happy for two reasons:
I’d said almost ten years ago that CHERI probably exposes some interesting potential improvements for prefetching, but never had time to properly explore it, so great to see other people working on it. In general, in compilers, operating systems, and hardware, the lower levels in your stack will do a better job if they have more visibility into higher-level behaviour. CHERI lets everything down to the hardware know which words are pointers.
In the distant past, I did some machine learning for prefetching, and the things I did back then would likely improve this.
The authors say that this is early work and I’d love to see it continued. I’d like to see a bit more evidence that the tag bit helps. If you just assume 64-bit numbers are pointers if they refer to a valid entry in the VM map, do you get the same performance? If you look at adjacent values reachable from the capability that caused the fault, do you get better results? If you put the prefetched pages into a separate LRU queue or similar, what happens? If you track, per page, whether prefetching with each prefetcher helps and switch between them, what happens?
Lots of interesting work to be done, I’m looking forward to the follow on papers.
@nwf pointed this out to me this morning. It made me happy for two reasons:
The authors say that this is early work and I’d love to see it continued. I’d like to see a bit more evidence that the tag bit helps. If you just assume 64-bit numbers are pointers if they refer to a valid entry in the VM map, do you get the same performance? If you look at adjacent values reachable from the capability that caused the fault, do you get better results? If you put the prefetched pages into a separate LRU queue or similar, what happens? If you track, per page, whether prefetching with each prefetcher helps and switch between them, what happens?
Lots of interesting work to be done, I’m looking forward to the follow on papers.