The problem is that almost no one will use IEEE halfs; for the reasons in the blogpost. F16 is probably only if ever going to be used in machine learning, and even in some cases people have moved on to u2 (xnor.ai, inference-only), u8-ish (google, TPU, inference-only). I did some research showing that training models can probably be done effectively using an 8-bit float if you expand to a higher precision during summation phase of kronecker products in backpropagation.
For machine learning, as the article says, bfloat16 is the most popular (google, TF/TPU/gpus), but nvidia also has confusingly named TensorFloat32, which is a 16-bit format.
The problem is that almost no one will use IEEE halfs; for the reasons in the blogpost. F16 is probably only if ever going to be used in machine learning, and even in some cases people have moved on to u2 (xnor.ai, inference-only), u8-ish (google, TPU, inference-only). I did some research showing that training models can probably be done effectively using an 8-bit float if you expand to a higher precision during summation phase of kronecker products in backpropagation.
For machine learning, as the article says, bfloat16 is the most popular (google, TF/TPU/gpus), but nvidia also has confusingly named TensorFloat32, which is a 16-bit format.
[Comment removed by author]