1. 29

Some context from @gecko, as folks are not sure what this is all about:

Crystal is strongly typed, but overwhelmingly uses type inference, rather than explicit types. Because Crystal aims to be spiritually—and frequently literally—compatible with Ruby, that’s a problem: to accomplish that, Crystal relies on sometimes-nullable types with implicit structure and implicit unions, such that, frequently, the only way to even begin type inference is to load the entire program’s AST into RAM all at once and then start your massive type inference pass. What you’re seeing in this thread is how a “simple” fix to a YAML parser error reporting hit that problem, causing Crystal to use a critical amount too much RAM and OOM.

  1.  

  2. 12

    I like that the next comment is “throw it all away and start over”. Never could have predicted that…

    But otherwise, I have no idea what this is about. Random comments on random github issues don’t convey much information without a lot of context digging.

    1. 26

      Crystal is strongly typed, but overwhelmingly uses type inference, rather than explicit types. Because Crystal aims to be spiritually—and frequently literally—compatible with Ruby, that’s a problem: to accomplish that, Crystal relies on sometimes-nullable types with implicit structure and implicit unions, such that, frequently, the only way to even begin type inference is to load the entire program’s AST into RAM all at once and then start your massive type inference pass. What you’re seeing in this thread is how a “simple” fix to a YAML parser error reporting hit that problem, causing Crystal to use a critical amount too much RAM and OOM.

      I think there’s probably an interesting discussion here about how a language that relies on implicit strict typing really needs a carefully thought-out type system (or language compromises, such as OCaml’s .mli files) to be scalable. But I agree with you that it’s hard to have that discussion based on this GitHub issue thread.

      1. 3

        Thank you for this explanation! Apologies that I posted it without context. May I quote your comment in the post text?

        1. 4

          Sure.

      2. 5

        I agree with you about linking to a github comment being a poor post. I had to read through more comments, as far as I could gather Crystal uses an obscene amount of memory when compiling so you cannot make anything large with it. But hey, maybe that’s actually a good strategy to keep things smaller.

      3. 12

        I don’t understand why “With Ruby’s syntax” is such a selling point. There are a lot of gotchas in it, it doesn’t seem to be any more ergonomic than other syntactic styles, and it seems a particularly bad fit for a statically typed language.

        1. 10

          /me gets ready to burn some Internet credibility

          I’m a fan of Ruby’s syntax, despite being a big FP nerd. I don’t know why I like it, to be honest. I think it is the preference of English-y words over sigils.

          1. 2

            I like that it reads really nicely with the standard formatting and long/descriptive variable and function names:

            def initialize(client, keys)
              raise ArgumentError, t('client_type', :client => client.inspect) unless client.is_a? Riak::Client
              raise ArgumentError, t('array_type', :array => keys.inspect) unless keys.is_a? Array
            
              self.thread_count = client.multi_threads
              validate_keys keys
              @client = client
              @keys = keys.uniq
              self.result_hash = {}
              @finished = false
            end
            

            It’s pure preference, but it feels more welcoming to me than any other language I’ve learned.

          2. 3

            I guess the syntax itself is OK, what I have a problem with is the whole “implicit” nature of Ruby code. I always get confused when things happen in my code that I didn’t ask for or didn’t know of.

            1. 1

              That’s a problem with using any featureful library without fully understanding it beforehand, though ruby makes it harder by having nonlocal imports.

          3. 7

            I find this mildly surprising – I do global program type inference in Myrddin (although, I don’t do it across mutliple files), and I haven’t found it especially memory-hungry. The main data structure is a giant union-find array, which is using about 8 bytes per type declaration, and 8 bytes more per variable declaration.

            Then again, there are more intensive algorithms, which I’d imagine you want to use if you want to maintain the feel of Ruby. Subtyping is an especially thorny problem to solve. Is there a summary of the algorithm that they’re using somewhere? I’m especially curious about the approach they took to determining valid methods and subtyping.

            1. 5

              although, I don’t do it across multiple files

              I suspect this is the big difference.

              1. 2

                Possibly, but just reading the AST would be the bottleneck there – A gigabyte of ram would be enough to infer about a hundred million declarations, and the bottleneck would probably be CPU time at that point.

                1. 4

                  Roughly correct. I don’t know Myrddin’s internals, but just looking at what you’re doing, you’ve made design decisions to make life a lot more pleasant for the compiler:

                  • Your type declarations (i.e., structs, etc.) are fully explicit (e.g., this struct has these members with these types). Crystal’s are inferred; a class has a @foo if it’s referenced at any point, and we’ll need to check everything ever assigned to @foo to figure out if it’s supposed to allow nil and what non-nil types it allows.
                  • Your types have to be fully defined in one place. Crystal can reasonably alter a type deep in a program (e.g. by reopening a class, adding a module, etc.—think about Sequel and Roda’s plugin system for a practical example of why you might sanely do this)
                  • As far as I can tell, functions declare what types they accept. Crystal doesn’t.
                  • If I’m misunderstanding that last bit, then any exported types and functions definitely seem to be typed, preventing you from needing to infer across modules—which due to the above, I’m not actually sure would be a problem (Go and F# do this just fine), but it can’t hurt.

                  So what’s killing the type inferrence pass isn’t the actual types—I don’t know, but I suspect they’re just a few bytes, just like yours—but rather the intermediate data structures as the type checker tries to figure out what types exist in the first place. I doubt they’re able to use classical Hindley–Milner as such, whereas I’ll bet you’re following that pretty closely, and that in turn is why the author in this comment thread is so down.

                  1. 1

                    That’s a lot of flexibility. I’m really curious how they made it work, in detail. Intuition says that reopening types elsewhere isn’t a killer if you’re already doing global inference, but subtyping might really throw a wrench into the works.

                    There’s also a lot more flexibility in Myrddin than your post implies – types just need to be inferred fully by the time they are exported. The convention is to annotate them, but that’s for readability purposes. The reason for inference at module boundaries is to avoid reading the world’s source, which makes separate compilation easy. And while types can’t have their members reopened, you can add traits at any point, from any module.

            2. 5

              It reminds me of Graydon, who recently wrote that “modules” could need some innovation.