1. 32
  1.  

  2. 8

    There seems to be some lede-burying going on. Not having heard of this before, even after reading the first few pages of the spec I’m still wondering what the justification is for reinventing floating-point arithmetic. Is this format more accurate? Faster? Easier to implement?

    It’s been kind of nice having a single standard for floats. Aside from endianness, you don’t have to worry about which format some library uses, or having to tag the type of a value, or choosing which format to use in new code. Unlike, say, character encodings, which used to be a horrible mess before UTF-8 took over.

    1. 10

      The main sales pitch for Posits is that for a given number of bits, typical numerical computations retain more accuracy. Or conversely, you can use fewer bits to achieve a given level of accuracy.

      Posits have better semantics. As a language implementor, I have beefs with the semantics of IEEE floats, which do not map properly onto real numbers, and have shittier mathematical properties than is necessary. The worst problem is NaN, and the rule that NaN != NaN. My language supports equational reasoning, and has an equality operator that is an equivalence relation: a==a, a==b implies b==a, a==b and b==c implies a==c. The semantics of negative 0 is also a big problem. The infinities are easier to deal with. These problems are fixed by Posits.

      1. 6

        Not all mathematical entities obey transitive equality, though, e.g. infinities. The behavior of NaNs is useful because the end result of a computation can reflect that something within it overflowed or produced an illegal result; you don’t have to test every individual operation.

        If Posits don’t support infinities nor NaNs, then operations on them need different error handling — division by zero has to return some kind of out-of-band error code or throw an exception, and then the code that calls it has to handle it. That would be an issue for languages like JavaScript, where division by zero or sqrt(-1) don’t throw an exception, rather return an infinity or NaN.

        1. 4

          In IEEE floats, there are a bevy of non-real values: -0, +inf, -inf, and the NaNs. Posit has a single unique error value called NaR: Not a Real. This is returned for division by zero.

          In IEEE float, positive underflow goes to 0 and negative underflow goes to -0. So 0 ends up representing both true zero and underflowed positive reals. -0 represents underflowed negative reals, in some sense, but it’s messier than that. This design is also not symmetric around zero. -0 is neither an integer nor a real, and in practice, every numeric operation needs to make arbitrary choices about how to deal with it, and there is no mathematical model to guide these choices, so different programming languages make different choices. What’s sign(-1)? Could be 0, -0 or -1 depending on what mathematical identities you want to preserve, or depending on an accident of how the library code is written.

          In Posit, 0 denotes true zero, which is easy to understand mathematically. Positive numbers underflow to the smallest positive number. Negative numbers underflow to the smallest negative number. This design is simple, symmetric around zero, and doesn’t introduce a non-real number with unclear semantics.

          1. 1

            The difference between 0 and -0 is important for many numeric applications, +/-Infinity is important, NaN is important vs Infinity. 1/0 and 0/0 are mathematically distinct, those are real values, -1/0 is again a mathematically distinct value, saying they’re “non-real” is nonsense and does not match the most basic of mathematics.

            What is important on the basis of a normal day-to-day needs of a person does not match what is important when you’re actually performing the kind of numeric analysis that is needed in scientific computation. So please don’t say that they are non-real, and please don’t claim that these aren’t necessary just because you personally don’t use them.

            1. 5

              You raise an important point. The precise semantics of IEEE floats are important to a lot of numeric code, because people code to the standard. Posits are not backward compatible with IEEE floats, and this is a serious issue that will hinder adoption. Posits break some of my code as well.

              But there’s nothing sacred about IEEE floats. They aren’t the best possible design for floating point numbers. The Posit proposal comes out of a community that has discussed a variety of alternative designs: they write papers and hold conferences. These people work in high performance computing and are numerical analysts. There are papers on the new idioms that must be used to write numeric code using Posits, explaining the benefits.

              Please do not claim that 1/0 and 0/0 are real numbers. This is not mathematically correct. These entities are not members of the set ℝ of real numbers. In mathematics, there are a variety of extensions to the reals that add additional values (such as infinities), but these additional values are not real numbers. For example,

              1. 2

                I think that some of the things you describe as “coding to the standard” are the post-factum view of the IEEE standard being specifically designed to handle cases that were difficult to handle in other schemes. (Please note that I don’t want to claim anything bad about Posits by this (I haven’t studied the standard in enough detail) – I just want to point out that some things that we are occasionally annoyed by in IEEE 754 really do have practical use).

                IIRC signed zero, for example, didn’t arise because the representation is nasty, nor as an unpleasant compromise to make other, more important things possible at the expense of an ambiguous representation. It was in fact a deliberate choice, which ensured that, for complex analytic functions, expressions that represent the same function inside their domain will usually have the same value on the boundary as well. This is a pretty useful property, as lots of engineering problems derive from, or are defined in terms of, boundary conditions. Many systems that don’t have signed zero require that you e.g. be careful to use either sqrt(z^2 - 1) or sqrt(z+1)*sqrt(z-1) depending on boundary conditions, even though they both mean the same thing.

                The same thing goes for signed infinities. These aren’t error cases, they’re legit values that propagate through calculations. Pre-IEEE754 real number representation systems that didn’t allow infinities usually did so either at the expense of ambiguity in e.g. inverse trigonometric functions, or by quietly introducing special non-propagating extensions to handle these cases. (I don’t recall the specifics of any representation system that used projective extension. Somehow I doubt that would’ve floated too many engineers’ boats – I’d have a hard time coping with the existential dread induced by a discontinuous exponential function, for example. Even if it turns out to be numerically irrelevant in most cases, I’d have to either be careful about mixing numerical results with analytically-derived conclusions, or carefully rewrite all the math involved in analysing transient systems to account for discontinuities, and I’m really not looking forward to that).

                Maybe Posits avoids these problems – like I said, I haven’t studied the standard in detail, and I’m not trying to bash on it. Just wanted to point out that lots of things which now look like standard warts were actually deliberate decisions made to handle real-life situations, not compromises introduced to allow for better handling of other things.

                1. 1

                  IIRC signed zero, for example, didn’t arise because the representation is nasty, nor as an unpleasant compromise to make other, more important things possible at the expense of an ambiguous representation. It was in fact a deliberate choice, which ensured that, for complex analytic functions, expressions that represent the same function inside their domain will usually have the same value on the boundary as well.

                  Yes; see much ado about nothing’s sign bit.

                2. 1

                  I was using they are “real” in the sense that these are a mathematical concept that exist in the reality of math. Much like 0 this was not acknowledged for most of history, and as such ℝ does not include them. Claiming that they do not exist in ℝ does not mean that they magically cease to exist, any more than 0 does not exist, or that irrational numbers do not exist.

                  1/0 is well defined, and has a sound mathematical definition, they may not be in ℝ but that doesn’t make them cease to exist and is simply an artifact of the age of ℝ. That there are a group of people who do arithmetic that don’t need a floating point format that reflects the possible non-finite possibilities does not negate that those values exist, nor does it negate their value to other users.

                  Posits do not offer any particularly meaningful improvement in what can be represented, it has demonstrable reduction in what can be represented, and circuitry to implement it uses more area and is slower.

                  1. 1

                    Posits are meant to represent approximations of members of ℝ, the Real numbers. Therefore, it doesn’t make sense to include representations for things that aren’t members of ℝ.

                    1. 2

                      In that case posits aren’t a replacement for IEEE floating point, and should stop claiming that they are. The values being disregarded by posits because they aren’t in R are useful, that is why they are there. In the early specification process every feature was under a lot of pressure for performance given the technology of the error. Even something we take for given - gradual underflow - was on the chopping block until intel shipped x87 to show that other manufacturers were wrong to say that what was being specified was “impossible” to implement efficiently (it’s also why fp80 has a decidedly more wonky definition than fp32,64,etc).

                      So the idea that the perf gains that posits get from eliding these features were known, and very heavily hashed out in the 80s, where there was much more pressure against additional logic than there is today, and yet even in that environment they decided to keep them.

                      So it isn’t surprising the eliding those values make posits “simpler”, but you could also make them simpler and faster by having a fixed exponent - it would greatly reduce usefulness of course, but I give this absurd extreme to demonstrate that everything is trade offs. Posit dropped values that are useful for real world purposes because posit folk don’t use them, and that’s fine, but you don’t get to claim you have a replacement when you are fundamentally not solving the same problem.

                      Also, as one final thing, posits don’t support those values to gain some performance back, but despite that hardware implementations are slower and use more area to implement. So to me it seems posits remain a lose/lose proposition.

          2. 1

            also, posits always use bankers rounding

            1. 5

              yup, but there are real reasons that you want different rounding, which is why ieee754 specifies them.

              1. 2

                Okay, but the need to control rounding modes is pretty rare, and support is hit and miss. Hardware doesn’t provide a consistent way to control the rounding modes, if they are supported at all, and most programming languages don’t provide much, if any support. The current Posit standard focuses on just the core stuff that everybody needs, and that’s good. Features like rounding modes that not everybody is going to implement should be optional extensions, not mandatory requirements, and should be added later, if Posits take off.

                I’ve personally not had a use for rounding modes, other than in converting floats to ints. The only rationale I’ve seen for rounding control on arithmetic operations is as a way for numerical analysts to debug precision problems in numeric algorithms by using interval arithmetic. The Posit community has a separate proposal for doing this kind of interval arithmetic using pairs of Posits (“valids”) that is claimed to have better numeric properties than using IEEE rounding modes, but I haven’t read more about that than the summary.

                1. 1

                  Okay, but the need to control rounding modes is pretty rare, and support is hit and miss.

                  The need to care about numerical accuracy for floating point numbers in general is pretty rare. A lot of uses of floating point numbers are very happy with a hand-wave probably-fine approximation. For example, a lot of graphics applications have a threshold of human perception that is far higher than any floating point value (though can have some exciting corner cases where you discover that your geometry abruptly becomes very coarsely quantised when you render an object far from the origin).

                  For applications that do care, support is generally very good. Fortran has had fine-grained control over rounding modes for decades and it is supported by all Fortran compilers that I’m aware of. Most of the code that cares about this kind of thing is written in Fortran.

                  C99 also introduced fine-grained control over rounding modes into C. As far as I know, clang is the only mainstream C compiler that doesn’t properly support them (or, didn’t, 10 years ago - I think the Flang work has added the required support to LLVM and the front-end parts are fairly small in comparison). GCC, Visual Studio, XLC, and ICC all support them.

                  1. 1

                    In that case there is no difference in rounding modes, “bankers rounding” is what I would call “to even” but I think is more formally “to nearest or even if half” or some such

            2. 8

              Posits is one of those “obviously better” things that appear from time to time in techie circles, a bit like tau instead of pi.

              I found the following previous submissions :

              Edit: unums seem to be a superset of posits, here’s a submission about them: Unums and the Quest for Reliable Arithmetic. And Unums: A Radical Approach to Computation with Real Numbers (Gustafson’s paper).

              1. 7

                Legit though tau is better

                1. 4

                  I await your 1.5 hour YouTube video explaining it ;)

                  1. 7

                    No need for a youtube video! A circle is uniquely defined by its center point and radius, but π is the ratio of the circumference to the diameter. This makes π exactly half the “elegant” value, so a lot of equations adds a factor of two “correction” that goes away if you use τ instead:

                    • A 1/4 turn of a circle is π/2 radians (instead of τ/4 radians)
                    • sin and cos are periodic around 2π (instead of τ)
                    • Most double integrations are of the form 1/2 Cx²: displacement is 1/2 at², spring energy is 1/2 kx², kinetic energy is 1/2 mv², etc. The one exception is area of a circle, which is 1 π r² (instead of 1/2 τr²).

                    It’s not like the end of the world that we use π instead, it’s just inelegant and makes things harder for a beginner to learn.

                    1. 7

                      I was a member of the cult of τ back in high school and in my first years of engineering school, mostly because I was on really bad terms with my math teacher :-D. So at one point I τ-ized some of the courses I took.

                      I can’t say I recommend it, at least not for EE. It’s not bad, but it’s not better, either. I was really in awe about it before, because it made the basic formulas “more elegant” and “mathematically beautiful”. But once I did enough math to run into practical issues, it just wore off, I found the effect negligible at best, and in some cases it just made some easy things easier at the expense of making hard things a little harder.

                      First off, I found you wind up playing a lot of correction factor whack-a-mole. For example, working with τ instead of π makes it easier to work with sine signals (and Fourier series of periodical signals), because they’re periodic over τ. But it makes it harder to work with rectified sine signals because those are now periodic over τ/2.

                      Most of the time, I found that working in terms of τ just moved the correction factors from pages 1-2 of my notes from each lecture to page 3 and onwards. (Note that I’m also using “rectified” rather loosely here – lots of quantities wind up effectively looking like rectified versions of other quantities, not just voltage fed to a rectifier).

                      Then there were a bunch of cases where the change was basically inconsequential. For example, lots of the integrals that were brought up in various τ-related topics on the nerd forums I frequented were expressions written in terms of 2π, which seemed annoying to work with. Then I ran into the same integrals in various EE classes, except everyone was just writing (and using them) in terms of ω, as in 2πf. Whether you define it as 2πf or τf has pretty much no effect. You derive lots of stuff in terms of ω anyway, but ultimately, you really want to end up with expressions in terms of f, because that’s what you can actually measure IRL.

                      In most of these cases, working in terms of τ just means you end up with an expression that starts with 1/τ instead of 1/2π (or τ instead of 2π), which hardly makes much of a difference. The expressions you end up with are all in the frequency domain, so their physical interpretation is in terms of “how fast is it spinning on the circle?”, not lengths or ratios of lengths, so τ and π work equally well.

                      And then there were a whole lot of cases that you could simplify much more efficiently by applying some slightly cleverer math. For example, working in terms of τ does simplify a bunch of nasty integrals relevant to transient or oscillating regimes, as in, you don’t have to carry an easily-lost constant term in front of the integral. What really simplifies it though is working in s-domain via the Laplace transform, which you can do without caring if it’s τ or π because you’re working in terms of ω anyway, and which allows you to skip the whole nasty integral part entirely.

                      Finally – I didn’t know it then, but I did think about it later – there are various things that work worse in terms of τ, like some of the discrete cosine transforms, which have nice expressions in terms of π, not 2π.

                      Basically I wasted a couple of weeks of a summer vacation 15+ years ago to find out that, overall, it sucks just as much with both, it’s just that the parts that suck with π are different from the ones that suck with τ. I think that’s when I realised I should’ve really become a musician or, like, drop it all and go someplace nice and raise goats or whatever :(.

                      (FWIW a lot of math I learned in uni was basically “how to avoid high school math”. I knew from my Physics textbook that calculus is really important for studying transient regimes, so by the time I finished high school I could do pretty hard integrals in my head. Fast forward to my second year of EE and ten minutes into the introductory lecture my Circuits Theory prof goes like okay, don’t worry, I know the math classes you guys took don’t cover the Laplace transform – I’m going to teach you about it because *gestures at a moderately difficult integral* I haven’t the faintest clue how to solve this, I haven’t done one of these since I was in high school and that was like forty years ago for me).

                      1. 1

                        Eh, I’m not convinced by the special pleading for the tau version of Euler’s identity:

                        https://tauday.com/tau-manifesto#sec-euler_s_identity

                        I prefer the original.

                        (Apparently Euler was the one who popularized pi the symbol, and he vacillated between it meaning pi or tau.)

                        I like pi because it hearkens back to the primeval discovery that if you have a round object, and measure its diameter with a piece of string (more easily done than its radius), then its circumference, the lengths don’t divide easily. Why is that?

                        1. 4

                          TBH I don’t understand why people find e^iπ + 1 = 0 so elegant. Why +1? You’re sneaking negative numbers in there to make the equation nice.

                          I like pi because it hearkens back to the primeval discovery that if you have a round object, and measure its diameter with a piece of string (more easily done than its radius), then its circumference, the lengths don’t divide easily. Why is that?

                          Even easier than measuring the diameter of a circle with a string is measuring the diagonal of a square, which gives you the even more primeval (and much easier to prove!) discovery that the diagonal doesn’t divide the sides of the square. It’s a lot easier to prove sqrt(2) is irrational than pi is irrational!

                          1. 1

                            Since the invention of the potter’s wheel, accessing an object that’s close to perfectly circular has been easier than one that’s perfectly square. According to El Wik, the potter’s wheel is from ~4000 BCE, so a curious kid in ancient Babylon could wonder about the ratio of the circumference to the diameter long before a more privileged one in ancient Greece learned how to construct a square using straight-edge and divider and measured the diagonal (and got murdered by the Pythagoreans for exposing the secret)[1]

                            In day-to-day use, diameters are almost universally used: pipes, firearm calibers, screws… I have a tape measure with a scale that’s multiplied by pi so that you can get the diameter by wrapping the tape around an object.

                            Sure, all of this can be handled by tau too, but outside the classroom, a radius is much more abstract than a diameter.

                            [1] actual murder probably apocryphal

                          2. 2

                            https://tauday.com/tau-manifesto#sec-euler_s_identity

                            I was already convinced that tau is better for constructing radian arguments to trig functions (tau is a full turn). But the Euler identity is so much more elegant using tau. The pi version never made intuitive sense, but the tau version does make intuitive sense to me. Thanks for pointing it out.

                            1. 2

                              I find the Euler identity most elegant in its full form.

                               𝑖𝑥
                              𝑒   = cos(𝑥) + 𝑖 sin(𝑥)
                              
                      2. 4

                        I prefer the Indiana definition :D

                        1. 9

                          I wrote about the history of that redefinition! It’s wild. https://buttondown.email/hillelwayne/archive/that-time-indiana-almost-made-p-32/

                          1. 1

                            I do love that they didn’t even get a correctly rounded value :D

                  2. 4

                    Standard would benefit from a definition of posit early on, such as in the definitions section.

                    1. 3

                      Posits are not a better floating point format. The people who keep pushing it because they think the ieee floats are confusing or have unnecessary features are people who don’t do actual numeric computation, and failing to support those “unnecessary” features simply means posits aren’t useful for a large amount of scientific computing, so posits can never exist as the only format.

                      Posits also can’t be used for normal high school math because the pushers have decided infinity isn’t real, and to support this erroneous belief they have defined division to not match basic math. This divergence from mathematics again ruins their use in scientific behavior. Similarly claiming -0 isn’t real means you get incorrect results when a value that decays to zero is used in further math, the most canonical example would be something brain dead like n ^^ 3 as n varies from -1 to 0 the cubing will eventually decay to zero before n is actually zero, which means that your precision has underflowed, but you still know that the true infinite precision value is negative, so a subsequent operation like division doesn’t produce an incorrect positive infinity.

                      On top of that, as far as I have been able to make out, a hardware implementation of posits is physically larger, more complex, and slower than an ieee float - that’s even with this standard’s substantial reduction in the ability to vary the exponent bits (which is the selling point of posits).

                      1. 3

                        The Posit technology has improved (simplified) since the last time I looked:

                        NOTE: Contrary to the original paper by Gustafson and Yonemoto, and to early versions of the Standard, the exponent size (eS) is always 2 and does not vary with the precision. This greatly simplifies conversions between precisions and the creation of correctly-rounded math library functions, and in hundreds of experiments with real applications has shown to be a better choice.

                        I wonder how much practical experience there is now with implementing posits in hardware? I would be interested in comparisons with IEEE floating point hardware: transistor budget and speed of computation for various operations. In the early days there were optimistic forecasts about how much smaller, simpler and faster the hardware would be, and it would be nice to know the true costs and benefits based on actual implementations. I expect the simpler semantics to reduce hardware cost and the variable width mantissa to increase hardware cost.

                        In the end, how much graphical (GPU) or machine learning throughput can we get with a given transistor budget, if we ignore standards like Vulkan, shed the burden of legacy backward compatibility, and just create the most optimal hardware design for the problem domain? That’s the context in which Posits are most interesting to me. Apple Silicon has shown how much the x86 architecture is holding us back. Can Posits deliver as big a performance boost?

                        1. 10

                          This 2019 paper is not encouraging. “Evaluating the Hardware Cost of the Posit Number System” https://ieeexplore.ieee.org/abstract/document/8892116

                          The posit number system is proposed as a replacement of IEEE floating-point numbers. It is a floating-point system that trades exponent bits for significand bits, depending on the magnitude of the numbers. Thus, it provides more precision for numbers around 1, at the expense of lower precision for very large or very small numbers. Several works have demonstrated that this trade-off can improve the accuracy of applications. However, the variable-length exponent and significand encoding impacts the hardware cost of posit arithmetic. The objective of the present work is to enable application-level evaluations of the posit system that include performance and resource consumption. To this purpose, this article introduces an open-source hardware implementation of the posit number system, in the form of a C++ templatized library compatible with Vivado HLS. This library currently implements addition, subtraction and multiplication for custom-size posits. In addition, the posit standard also mandates the presence of the “quire”, a large accumulator able to perform exact sums of products. The proposed library includes the first open-source parameterized hardware quire. This library is shown to improve the state-of-the-art of posit implementations in terms of latency and resource consumption. Still, standard 32 bits posit adders and multipliers are found to be much larger and slower than the corresponding floating-point operators. The cost of the posit 32 quire is shown to be comparable to that of a Kulisch accumulator for 32 bits floating-point.

                          1. 7

                            Here’s a July 2022 paper where they fully implement 32 bit posits together with IEEE floats in the same RISC-V core and compare them. These results are better, but still not a complete win for Posits. https://arxiv.org/abs/2111.15286

                            Results show that 32-bit posits can be up to 4 orders of magnitude more accurate than 32-bit floats thanks to the quire register. Furthermore, this improvement does not imply a trade-off in execution time, as they can perform as fast as 32-bit floats, and thus execute faster than 64-bit floats.

                            The quire has a high hardware cost (but also speeds up benchmarks when used).

                            The 32-bit FPU within CVA6 requires an area of 30691 μm2 and consumes 27.26 mW of power. On the other hand, the 32-bit PAU with quire requires an area of 76970 μm2 and consumes 67.73 mW of power.

                            Even without the quire, the posit arithmetic unit requires about 30% more chip real estate than the floating point unit.

                            1. 1

                              However, the variable-length exponent and significand encoding impacts the hardware cost of posit arithmetic.

                              From the link it seems that they’ve changed this, I’m not sure the conclusions of that paper are relevant any more:

                              NOTE: Contrary to the original paper by Gustafson and Yonemoto, and to early versions of the Standard, the exponent size (eS) is always 2 and does not vary with the precision. This greatly simplifies conversions between precisions and the creation of correctly-rounded math library functions, and in hundreds of experiments with real applications has shown to be a better choice.

                              1. 2

                                The “precision” is the total number of bits used to represent a Posit. For a given precision, the exponent size (eS) has always been a constant, but now it is a constant across different precisions (eg, 32 bit posits vs 64 bit posits). This matters if you are converting between different precisions or writing code that is generic across precisions.

                                The paper is talking about “32 bit posit adders and multipliers”, a fixed precision (32 bits) for which the exponent was always a fixed size.

                                I found a newer hardware implementation of Posits with performance competitive with floats (but still more chip real estate), see my new comment.

                                1. 1

                                  Ah, I see. Thanks for the clarification.

                          2. 2

                            Meanwhile everyone is moving to bf16, which is really really imprecise but really really fast :)

                            1. 2

                              posit breaks scale-independence, by having number near 1.0 have more precision that number further from 1.0.

                              1. 12

                                All floating point formats break scale independence. Fixed point formats solve this problem, but have serious limitations for general computing, due to limited range and loss of precision. Based on the tradeoffs, floating point is justifiably more popular and more widely used.

                                1. 2

                                  All floating point formats break scale independence. Fixed point formats solve this problem

                                  I don’t think this is true. I take scale independence to mean that if you scale everything by the same amount, true statements remain true. Floating point clearly doesn’t quite have this, since the relative error depends on one’s proximity to the next power of two, but it’s pretty close.

                                  Meanwhile fixed point doesn’t even make an attempt. I love fixed point, and it does have many virtues, but scale independence ain’t one.

                                2. 1

                                  Why do you think IEEE 754 had scale independence? Obviously it didn’t

                                  1. 4

                                    With the exception of denormals ieee754 does have scale independence though - you always have 53,63, or 22(or some such) bits of precision, regardless of magnitude

                                3. 1

                                  Is there also going to be a reference implementation?

                                  1. 1

                                    I was able to find this: https://posithub.org/docs/PositTutorial_Part1.html Of course it would require CPU support for better priductivity, but soft implementation is good enough for testing in specific applications.