1. 39

  2. 21

    I promise I’m not that negative of a person, it’s just that only rant inducing stories inspire me to comment. That entire issue is a dumpster fire.

    Problem 1: Node has a weird sized integer type smaller than most commonly used integers, causing data loss when using numbers to identify inodes.

    The proposed solutions are all kinds of sad. Someone wants to turn the integer into a string when it gets big enough, because I’m sure having a function randomly change it’s return type won’t cause more bugs. Nothing like doubling down on type un-safety when presented with a dataloss bug.

    Someone proposes breaking up the 64bit int into a series of ints representable in node. Which makes sense if node is a 8-bit microcontroller, it’s a little disappointing otherwise. This is probably the most sensible backwards-incompatible solution.

    There’s lots of handwavy “semantically it’s not an integer”. Whatever. I’m not sure I trust the commenters on that ticket to talk knowledgeably about filesystem semantics.

    The ugliest, but maybe most sensible solution for now is someone who proposes a 64->53 bit hashing op. Seems like they might minimize the practical impact of the bug quite a bit that way. This would of course break any code that passed the ino value back into c-land.

    Man life would be easier if node had 64bit ints.

    1. 17

      The irony here is that windows itself returns inode numbers (file ID) as two 32 bit values, high and low.


      1. 14

        There’s lots of handwavy “semantically it’s not an integer”. Whatever. I’m not sure I trust the commenters on that ticket to talk knowledgeably about filesystem semantics.

        Man life would be easier if node had 64bit ints.

        The question is more, “why should this be an integer?” - what integer operations do we want to perform on an inode? I can’t think of any - an inode is just an identifier, right?

        1. 4

          That’s a valid question, but I personally think it’s the wrong question. What’s wrong with an integer identifier? We use them in filesystems and sports jerseys. They have one really important operation predefined: increment. An API user doesn’t need to ++, but something generating unique IDs might want to. If inode number was a string and node truncated it, nobody would argue that was a bug. Why should a 64bit integer be different?

          I answer the question of “why should this be an integer” by looking at the spec which defines them as integer types: http://pubs.opengroup.org/onlinepubs/009696699/basedefs/sys/types.h.html

          It’s a very interesting language design question: how important is it to make sure the apis of the operating system and internet can be faithfully represented in your language?

          1. 2

            how important is it to make sure…

            Not very, if your abstractions and FFI are good enough. Languages, in order to gain traction, have to be somewhat portable. This can be done in many different ways, but most commonly it seems that API designers define a minimal set of functionality needed, and leave the rest as an exercise to the programmer (likely via FFI).

            1. 5

              In Rust land, we try to expose platform specific functionality via conditional compilation. This means that while you need to be aware of platform specific behavior, you do get standard conveniences once you opt into it. Our first attempt at it was the std::os module, which basically exposes a bunch of platform specific traits that you can bring into scope, which in turn add additional platform specific methods on platform independent types.

              For various reasons outlined in this RFC, we’ve been wanting to head more towards the “define the platform specific methods directly on types” approach. But to do that, we want a better linting system to help prevent conditional compilation errors.

              Just recently, we did a review of the memmap crate which mostly exposes platform independent behavior. But it sounds like it’s headed in a similar direction as std, where platform dependent functionality will also be exposed.

              There’s still a long road to go of course, but it’s a useful counterpoint to “just use FFI” IMO. :-)

            2. 2

              It is not at all obvious that integers are unique. And they have far more structure that you can’t actually use. For an opaque unique ID something like a UUID would be much better (I believe that’s what ZFS uses, at least internally?)

              1. 3

                There’s also the nuance that since inode numbers can be reused, increment doesn’t just solve the problem of uniqueness. But, a UUID is just a really big integer, too…

                1. 1

                  A UUID is implemented as an integer, but semantically it declares a clear intent not to be used as one; most UUID libraries do not expose addition or multiplication operations on them.

                  1. 2

                    Right. But, of course an ino_t and a UUID are isomorphic. We just don’t have any special representation of ino_t as we do for UUIDs.

            3. 3

              It’s a collection of flags and some very small integer values.

              Lua (up to 5.2; Lua 5.3 has 64-bit integers) has/had this problem, and I solved it by breaking the individual bits out as flags. Yes, not that efficient, but it makes it much easier to use in Lua.

              1. 8

                What? That’s not right at all, an inode is a unique identifier for a file on the filesystem! And since it’s meaning is opaque other than being unique, puffnfresh is 100% correct.

                Perhaps you are thinking of mode? Regardless, an inode is definitely not a collection of flags.

                1. 3

                  You are right, I got it confused with the mode bits. Sigh.

            4. 9

              The ugliest, but maybe most sensible solution for now is someone who proposes a 64->53 bit hashing op.

              This is a terrible idea, it just hides bugs without fixing the problem at all. I would say it’s the least sensible solution by a huge margin.

              1. 1

                It is indeed a terrible idea, but even that would be better than sitting around. This issue is still open 2 months later. New thread: https://lobste.rs/s/ycvjzp/node_occasionally_gives_multiple_files

                As far as hiding bugs without fixing the problem, you can’t polish a turd (node.js) :P

              2. 1

                It’s not all that strange considering both Facebook and Twitter encode their IDs as strings in JSON reponses. Twitter adds a *_str field next to the ‘legacy’ fields, which is another solution.

                Perhaps it should’ve been foreseen, though. But I think the filesystem interfaces are also some of the oldest?

                1. 4

                  It’s not strange because twitter solved it that way? Twitter is a web service that provides an abstraction, via their API, to their machines. At such a high level, this seems perfectly reasonable because the ID is completely opaque to users. It’s an implementation detail.

                  But in a language that is being used to write local systems, where the inode number actually has a standard implementation of unsigned int, which is required by local APIs and system calls? A bare string of “this is an inode” is meaningless, but a valid sting, and therefore a valid candidate for an inode. Make a new type. Support JS numbers or this new type in all APIs and call it a day.

              3. 8

                Of course it should be changed to a string/ByeArray or some sort of opaque type. Any code that is doing “int-like” things to it is already broken

                1. 3

                  At what point do they say, “javascript itself is broken, can we add native integer types outside number”?

                  1. 2

                    Mistakes happen. I found this exchange encouraging:

                    Define ‘faithfully.’ Is this an array with the upper and lower 32-bits?

                    Something like that. It’s not actually important to me which of your suggestions – they all make sense – but generally something that people could reimplement the current Stats upon in any way they want.

                    It’s nice to see someone explicitly say “any of the ideas you’ve proposed will work” and get on with working to fix the problem rather than bikeshedding or over-analysing precisely which one should be pursued.

                    The whole Github thread makes the Node community look pleasantly friendly, too.