I posted this because I’m still trying to wrap my mind about the concepts of DPUs and implications they bring. If someone has any experience and good use cases to share, I’d love to hear.
There’s no huge difference between a box full of DPUs and a blade system. They are both physical formats for increasing the density of servers. The blade system will have better support for replacing units without taking down all the neighbors, and should have well-designed cooling and power from the beginning. The DPU box will have a faster backend connection between units – PCIe4 or, soon, 5.
The OCP/OpenRack system is a less extreme but more scalable system. If you’re not familiar with it, it’s a datacenter-scale system where compatible products fit into a specific rack format where 12V DC power is centrally provided in the rack, and there are standards for networking distribution. Essentially, each rack is a chassis into which you fit compute/RAM nodes, storage nodes, and networking nodes. None of them need power supplies of their own.
Unless I’m missing something, it’s just a computer on your SmartNIC. The main reason things like this are attractive in the datacenter is that you clock cycles and RAM on them can be a lot cheaper than on the host. If you run a cloud service on conventional hardware, you need to reserve some amount of host RAM and some amount of CPU (possibly one or more cores) to run host. This includes any device emulation / paravirtualisation and your control plane. With a modern SmartNIC, you can typically offload a lot of the device emulation / paravirtualisation by having the device support SR-IOV (S-IOV / SF-IOV coming soon) expose an MMIO space to each guest that either looks like a real device or the PV device. This can even include things like translating SCSI or ATA commands into iSCSI or similar to talk to your back-end storage system. The control plane still needs to run on the host though. If you put a cheapish Arm core on the device, it can have some very cheap (slow) RAM and do things like DMA page-table updates for the second-level address translation, send interrupts to the guests for startup / shutdown, and even DMA initial boot images into host memory. This combination lets you sell 100% of the RAM (minus a tiny bit for page tables, which are often tiny if you’re able to use 1 GiB superpages) to your customers.
A lot of SmartNICs have a general-purpose core now. They generally have some combination of slightly programmable ASICs for things like line-rate packet filtering, FPGAs for slower / less power-efficient filtering and transforming, and CPU cores for slower control-plane things. Converting ATA commands into iSCSI, for example, is much easier to do on a general-purpose core. The command messages are generally tiny in comparison to the data, so you don’t need the performance of an ASIC and having a general-purpose core means that you can update the translation to add new features (e.g move from iSCSI to some other protocol on the back end or support a different emulated PV interface) trivially by just deploying a firmware update.
A few people (including NVIDIA and AWS) are trying to make security claims from this. Aside from some side-channel resistance, I’m not convinced that running security-critical software on a less-reviewed platform is actually a big win for security.
I have trouble understanding the point of a DPU.
It’s basically a Raspberry Pi 4 attached to the PCI bus. It’s not an optimized chip like a GPU at all. It’s a tiny fraction of the performance of the main host CPU. So why not… use a Raspberry Pi? Or a more expensive industrial-strength equivalent to a Raspberry Pi? Or a small VM running on the main host?
For the cloud providers it does make sense, since they want to sell the entire host as a “metal” instance and they have the scale needed to build custom software to run on the DPU. But the big cloud providers make their own DPUs in-house, so these third-party models are really only targeting smaller cloud providers I think.
DPU CPU resources are relatively cheap when compared to server’s CPU resources, so it makes sense to offload everything that you reasonably can onto it. Having a DPU do things like parity calculation or network encryption leaves more resources to use on the main CPU for tasks that are really performance-critical.