This is a colloquium style talk which presents the idea of the “posit”, a possible replacement for IEEE 754 floats. He starts talking about posits at the 15 minute mark.
This is probably going to be one of the videos I suffer through (it takes a couple watches for me to understand all the words). It’s a topic I’ve long been interested in. I’ll report back on it when I do.
EDIT: I watched from about 0:15 where he starts talking about the criteria he’s chosen for this format, how he’ll evaluate how good it is, and its details, up to about 0:50 where he hands the microphone to someone from a hardware team who have been working to implement his representation. The talk includes a lot of discussion of different ways to evaluate the performance of a float-like number representation, and what they mean intuitively, and it would be worth listening to for that alone. I suspect that the section after :50 is also quite interesting, since it appears to have a lot of details on how programmers will use the format in practical terms, but I couldn’t deal with anything longer.
Okay, so this is an interesting proposal and I think it’s a good idea. Fixed-size representations (like posits, and like IEEE floating-point, but unlike bignum integers) have some inherent limits, but are also inherently attractive because they’re easy to implement in silicon and easy to write code to use. It doesn’t solve all problems, but it solves a nice class of problems.
The core trick, if I didn’t misunderstand (which is possible… would it be so hard to turn on YouTube’s auto-captions? :( ), is that the “regime bits” are used to designate how many bits are mantissa and how many are exponent. Thus the downstream programmer can convey expectations about their use-case, and they can receive explicit feedback on how much precision was lost.
As he points out, NaNs and infinities are not remotely useful enough to justify the amount of representation-space IEEE allocates to them. As such, posits do away with the idea. :) I think that’s an excellent decision, but it’s a little difficult from this talk to determine to what extent his gains are because he reclaimed representation-space in that way (which other formats could do just as well), vs. to what extent they’re from the regime bits.
It’s particularly exciting that, unlike many formats like this, it will eventually be possible to try it out in hardware.
I think the main concrete difference between posits and floats is the use of the projective mapping from bitmaps to numbers. If you’re familiar with it, this is reminiscent of the stereographic projection which maps the complex plain (including infinity) to the surface of a sphere. His trick is to map the real number line (including + and - infinity) to a circle, subdividing that circle evenly to assign bitmaps to real numbers. The other major difference is that instead of uselessly overflowing to infinity, posits clip to the highest representable number.
If you’re familiar with it, this is reminiscent of the stereographic projection which maps the complex plain (including infinity) to the surface of a sphere.
Sure, but I actually consider that a weakness of posits to some extent. I’ll note that in complex analysis, topologically mapping all directions of “infinity” to a single point (the north pole of the Riemann sphere) makes a ton of sense: a lot of theorems become cleaner and proofs become nicer if you do this. You can even define functions with poles as being continuous with respect to the new topology. Also, the complex numbers aren’t ordered so you don’t lose anything by gluing all infinities (e.g. -∞, ∞ + 7i, ∞ * exp(i)) together.
On the real line, it’s useful to have positive and negative infinity if we are invested in using the ordering. If you write max as a fold, then you often find yourself using -∞ as the initial value. If you’re going to identify +∞ and -∞, then you have to make it an error to use ∞ in a comparison.
The other major difference is that instead of uselessly overflowing to infinity, posits clip to the highest representable number.
On the other hand, you might want infinities in overflow, as representative of an error condition, like a machine learning algorithm that diverged. I’m not convinced that mapping to some huge number is less useless than overflowing to infinity. Getting a “bad” result like Inf is useful information. Getting something like 1.8e+308 is less useful.
That said, the strongest argument for posits (as far as I can tell) is that you almost never use the [1e40... 1e308] range in the first place. Posits give you more precision when you’re close to 1 and less precision when you’re far away, so you get more mileage out of your 64 (or 32 or 16 or 8) bits when you’re close to 1.
It’s the same principle as Huffman coding. If it’s sunny 60% of the time, cloudy 25%, rainy 10% and snowy 5%, you use fewer bits with the encoding {0, 10, 110, 111} than with the flat {00, 01, 10, 11}: 1.55 as opposed to 2.00 bits. It’s not magic but you get more mileage out of your encoding as long as certain statistical properties hold. With posits, the assumption is that middling values are more common and deserve more precision.
The core trick, if I didn’t misunderstand (which is possible… would it be so hard to turn on YouTube’s auto-captions? :( ), is that the “regime bits” are used to designate how many bits are mantissa and how many are exponent.
It’s that, but also that there’s redundancy in the regime and exponent bits and so you can represent middling numbers (i.e. numbers close to 1) with more accuracy but at a cost around the extremes. The downside is that you don’t have reliable precision at each order of magnitude: it gets worse the farther you are from 1. Floating point is consistent in this regard, until you get to denormalized numbers (around 10^-300) where it falls off a cliff; posits lose precision over time (and approximately symmetrically, unlike floats where the cliff is at the small end) as you move away. The upside is that you get more accuracy in the mid-range of numbers that we are using most of the time. Instead of having 2^52 numbers (out of your total 2^64) between 1 and 2 and 2^52 numbers between 2^178 and 2^179, you’re trading off having more in [1, 2) and fewer in [2^178, 2^179).
I don’t see how it avoids the symbolic logical problems that floating point has (e.g. non-associative addition). Ultimately, coercing the real number line to a set of 2^64 dyadic rational numbers is going to have its problems. Rationalization is something that we do because we have to, but not because we like it.
As he points out, NaNs and infinities are not remotely useful enough to justify the amount of representation-space IEEE allocates to them.
For 64-bit floating points, it reduces the size of your space from 2^64 to 2^64 - 2^53. That doesn’t seem like a large loss to me: it’s a drop from 64 bits to 63.99964… bits. In that particular case, it’s still a huge space.
For very-small floating point (8- and 16-bit) integers, I would agree vehemently. You slice away a significant fraction of an already tight space. I’d imagine that the gains are much bigger with tiny floats than with standard 64-bit floats– because, from what I understand, if 64 bits isn’t enough, that usually (of course, this field is full of exceptions) means that the problem is the algorithm rather than the data width… and that the places where you really would want 128+ bit floating point numbers (i.e. advanced mathematics) are places where you’d tend to go for rationals or even symbolic computation.
Interesting. I appreciate the details. I have nothing insightful to add. :)
I don’t see how it avoids the symbolic logical problems that floating point has (e.g. non-associative addition). Ultimately, coercing the real number line to a set of 2^64 dyadic rational numbers is going to have its problems. Rationalization is something that we do because we have to, but not because we like it.
Yes, I agree that it can’t avoid those issues. I don’t think that was a goal. But any fixed-size representation is going to have similar problems.
if 64 bits isn’t enough, that usually (of course, this field is full of exceptions) means that the problem is the algorithm rather than the data width
The exceptions I’m aware of are in astronomy (not all astronomical computations, only some), and they’re really not a case that hardware manufacturers are likely to optimize for - it’s so specialized.
Edit: Hadn’t listened far enough. He does call them posits. I’m still not done though.
If it is the same proposal, that’s not obvious to me so-far. In particular I don’t see the regime bits in the paper. Also, the talk is better than the paper at conceptual overview and motivation.
From skimming the slides ngoldbaum posted (I’m not motivated enough to watch the video, but thanks for your summary of it!), it looks like this is indeed a new proposal, though building on his previous work in the area. Slide 12 in particular briefly situates it w.r.t. his previous proposals.
This is probably going to be one of the videos I suffer through (it takes a couple watches for me to understand all the words). It’s a topic I’ve long been interested in. I’ll report back on it when I do.
EDIT: I watched from about 0:15 where he starts talking about the criteria he’s chosen for this format, how he’ll evaluate how good it is, and its details, up to about 0:50 where he hands the microphone to someone from a hardware team who have been working to implement his representation. The talk includes a lot of discussion of different ways to evaluate the performance of a float-like number representation, and what they mean intuitively, and it would be worth listening to for that alone. I suspect that the section after :50 is also quite interesting, since it appears to have a lot of details on how programmers will use the format in practical terms, but I couldn’t deal with anything longer.
Okay, so this is an interesting proposal and I think it’s a good idea. Fixed-size representations (like posits, and like IEEE floating-point, but unlike bignum integers) have some inherent limits, but are also inherently attractive because they’re easy to implement in silicon and easy to write code to use. It doesn’t solve all problems, but it solves a nice class of problems.
The core trick, if I didn’t misunderstand (which is possible… would it be so hard to turn on YouTube’s auto-captions? :( ), is that the “regime bits” are used to designate how many bits are mantissa and how many are exponent. Thus the downstream programmer can convey expectations about their use-case, and they can receive explicit feedback on how much precision was lost.
As he points out, NaNs and infinities are not remotely useful enough to justify the amount of representation-space IEEE allocates to them. As such, posits do away with the idea. :) I think that’s an excellent decision, but it’s a little difficult from this talk to determine to what extent his gains are because he reclaimed representation-space in that way (which other formats could do just as well), vs. to what extent they’re from the regime bits.
It’s particularly exciting that, unlike many formats like this, it will eventually be possible to try it out in hardware.
For what it’s worth, the slides are available on slideshare:
http://www.slideshare.net/insideHPC/beyond-floating-point-next-generation-computer-arithmetic
I don’t think he’s published a paper on this yet.
I think the main concrete difference between posits and floats is the use of the projective mapping from bitmaps to numbers. If you’re familiar with it, this is reminiscent of the stereographic projection which maps the complex plain (including infinity) to the surface of a sphere. His trick is to map the real number line (including + and - infinity) to a circle, subdividing that circle evenly to assign bitmaps to real numbers. The other major difference is that instead of uselessly overflowing to infinity, posits clip to the highest representable number.
Sure, but I actually consider that a weakness of posits to some extent. I’ll note that in complex analysis, topologically mapping all directions of “infinity” to a single point (the north pole of the Riemann sphere) makes a ton of sense: a lot of theorems become cleaner and proofs become nicer if you do this. You can even define functions with poles as being continuous with respect to the new topology. Also, the complex numbers aren’t ordered so you don’t lose anything by gluing all infinities (e.g. -∞, ∞ + 7i, ∞ * exp(i)) together.
On the real line, it’s useful to have positive and negative infinity if we are invested in using the ordering. If you write
max
as afold
, then you often find yourself using -∞ as the initial value. If you’re going to identify +∞ and -∞, then you have to make it an error to use ∞ in a comparison.On the other hand, you might want infinities in overflow, as representative of an error condition, like a machine learning algorithm that diverged. I’m not convinced that mapping to some huge number is less useless than overflowing to infinity. Getting a “bad” result like
Inf
is useful information. Getting something like1.8e+308
is less useful.That said, the strongest argument for posits (as far as I can tell) is that you almost never use the
[1e40... 1e308]
range in the first place. Posits give you more precision when you’re close to 1 and less precision when you’re far away, so you get more mileage out of your 64 (or 32 or 16 or 8) bits when you’re close to 1.It’s the same principle as Huffman coding. If it’s sunny 60% of the time, cloudy 25%, rainy 10% and snowy 5%, you use fewer bits with the encoding {0, 10, 110, 111} than with the flat {00, 01, 10, 11}: 1.55 as opposed to 2.00 bits. It’s not magic but you get more mileage out of your encoding as long as certain statistical properties hold. With posits, the assumption is that middling values are more common and deserve more precision.
Which is a pretty valid assumption. He even brings up some experimental results based on calculations run on clusters at Los Alomos labs.
It’s that, but also that there’s redundancy in the regime and exponent bits and so you can represent middling numbers (i.e. numbers close to 1) with more accuracy but at a cost around the extremes. The downside is that you don’t have reliable precision at each order of magnitude: it gets worse the farther you are from 1. Floating point is consistent in this regard, until you get to denormalized numbers (around 10^-300) where it falls off a cliff; posits lose precision over time (and approximately symmetrically, unlike floats where the cliff is at the small end) as you move away. The upside is that you get more accuracy in the mid-range of numbers that we are using most of the time. Instead of having 2^52 numbers (out of your total 2^64) between 1 and 2 and 2^52 numbers between 2^178 and 2^179, you’re trading off having more in [1, 2) and fewer in [2^178, 2^179).
I don’t see how it avoids the symbolic logical problems that floating point has (e.g. non-associative addition). Ultimately, coercing the real number line to a set of 2^64 dyadic rational numbers is going to have its problems. Rationalization is something that we do because we have to, but not because we like it.
For 64-bit floating points, it reduces the size of your space from 2^64 to 2^64 - 2^53. That doesn’t seem like a large loss to me: it’s a drop from 64 bits to 63.99964… bits. In that particular case, it’s still a huge space.
For very-small floating point (8- and 16-bit) integers, I would agree vehemently. You slice away a significant fraction of an already tight space. I’d imagine that the gains are much bigger with tiny floats than with standard 64-bit floats– because, from what I understand, if 64 bits isn’t enough, that usually (of course, this field is full of exceptions) means that the problem is the algorithm rather than the data width… and that the places where you really would want 128+ bit floating point numbers (i.e. advanced mathematics) are places where you’d tend to go for rationals or even symbolic computation.
Interesting. I appreciate the details. I have nothing insightful to add. :)
Yes, I agree that it can’t avoid those issues. I don’t think that was a goal. But any fixed-size representation is going to have similar problems.
The exceptions I’m aware of are in astronomy (not all astronomical computations, only some), and they’re really not a case that hardware manufacturers are likely to optimize for - it’s so specialized.
Looks very interesting. Thanks!
(Not much more to say yet, but don’t want it to get buried for lack of discussion just because it’s a 90-minute video.)
The speaker (John L. Gustafson) published a paper last year, “A Radical Approach to Computation with Real Numbers”. It doesn’t use the term “posit”, but I assume must be a version of the same proposal?
Edit: Hadn’t listened far enough. He does call them posits. I’m still not done though.
If it is the same proposal, that’s not obvious to me so-far. In particular I don’t see the regime bits in the paper. Also, the talk is better than the paper at conceptual overview and motivation.
From skimming the slides ngoldbaum posted (I’m not motivated enough to watch the video, but thanks for your summary of it!), it looks like this is indeed a new proposal, though building on his previous work in the area. Slide 12 in particular briefly situates it w.r.t. his previous proposals.