1. 2

    “Why be Like Elm?”, is that a typo? Perhaps meant to be: “Why Do We Like Elm?”

    1. 6

      I read “Why be Like Elm?” roughly as “Why would you, NoRedInk, want to write your Haskell like you write your Elm?”

      1. 5

        It was intended as “Why would I, a programming language, aspire to be like Elm?”.

        A bit farfetched perhaps :).

        1. 3

          I don’t think so, because many of their libraries aim to bring Elm flavours into Haskell.

        1. 9

          Good writeup! One interesting thing is that it was harder to find where database calls were happening in the “abstracted” ruby than in the “inline” ruby. Best practices for functionality conflicting with best practices for optimization.

          1. 11

            For anyone with some Rails experience, this is a well-known problem: ActiveRecord has magic accessors which will make a query on-the-fly when you try to access a property that hasn’t been loaded. It then caches it so when you next access it, it will not perform a query. Django has similar behaviour (although it is slightly simpler). This can result in extremely hard to fathom code. “Is this doing a query? Why/why not?” is what you’ll be asking yourself all the time, and there are often no clues in the code you’re looking at: some other method may have already cached the data for you once you hit the code you’re looking at.

            Sometimes, a method can be a bottleneck because it does a lot of queries, but only on a particular code path, because when coming through another code path, it will be passed a pre-cached object. Doing performance analysis on such code bases can be quite frustrating!

            1. 4

              Is there a generally-accepted way to deal with this problem? Is it just “don’t do that”? Asking as someone with little to no Rails experience.

              1. 4

                “It’s slow” is always hard to debug; “It’s slow, but only sometimes” more so.

                As an experienced rails dev, I lean heavily on runtime instrumentation in production.

                Looking for “where is the webserver spending most of its time” rapidly identifies the highest-priority issues; every serious instrumentation platform offers a straightforward way to view all call-sites which generate queries, along with their stack traces. From there it’s pretty easy to identify what the issue is; deciding where to add an explicit preload is the hardest part.

                1. 3

                  I don’t know, but AFAIK you’re supposed to know what your access pattern is going to look like and prefetch/side load when needed. Or discover it when you find out there’s a bottleneck and fix it then.

                  1. 3

                    generally and rails-specific are pretty different.

                    For ActiveRecord, there’s bullet, which helps avoid n+1s. But there’s no rails-native way of doing it, as far as I know. Lots of teams re-invent their own wheels.

                2. 6

                  Interesting observation! It reminds me about this post. (My) TL;DR there is that extracting functions only helps for pure code; when there are mutations to global state, it’s easier to see them all in a single place. It seems that OpenGL context and database connection are similar cases.

                  This immediately suggests a rule-of-thumb for rails apps: do not abstract over SQL queries. When handling a single HTTP request, there should be one function, which lexically contains all the SQL queries for this request. It can call into helper (pure) functions, as long as they don’t issue queries themselves.

                  1. 3

                    It can call into helper (pure) functions, as long as they don’t issue queries themselves.

                    As I pointed out in my sibling comment to yours, this is difficult to verify. You could have one method that performs a query and side-loads related objects from the other side of a belongs_to relation, and another that does some processing on those objects. But processing the objects requires following the relation, which will trigger a query when the objects haven’t been side-loaded. So you could have another method that forgets to do the side-loading and voila, your “pure” helper function now issues a query under the hood.

                  2. 4

                    Yes! Though I think even for functionality the inlining proved helpful. We encountered many cases where we were calling a function that took an array of elements but passed it only a single element. After inlining such a function we could remove a loop. In other cases we were calling a function with a hardcoded argument, and after inlining that we could remove unused conditional branches. It was super interesting to see how much logic evaporated in this process. Code that was useful at some point, but not anymore.

                    I like what Sandi Metz wrote on this subject in The Wrong Abstraction.

                  1. 5

                    In general, this was an interesting read but because I think it blames the technology a bit too much I’d like to point out, that:

                    Before extracting Quiz Engine MySQL queries from our Rails service, we first needed to know where those queries were being made. As we discussed above this wasn’t obvious from reading the code.

                    This is probably not what most people would do here as any kind of APM tool would clearly show you which query is executed where (among other things). For deeper investigations, there are things like rack-mini-profiler, etc.

                    To find the MySQL queries themself, we built some tooling: we monkey-patched ActiveRecord to warn whenever an unknown read or write was made against one of the tables containing Quiz Engine data.

                    Even if you don’t use APM, there is no need for any monkey-patching, you can simply subscribe to the feed with instrumentation API https://guides.rubyonrails.org/active_support_instrumentation.html#active-record

                    Quiz Engine data from single-database MySQL to something horizontally scalable

                    You don’t provide the numbers, so I can’t say anything to the scale you are dealing with, but in general, I wouldn’t want anyone reading this to think MySQL isn’t very scalable because it is horizontally and in other directions as well.

                    1. 7

                      For anyone else wondering, APM is an abbreviation for Application Performance Monitoring (or Management). It’s a generic term, not Ruby-specific.

                      1. 6

                        I think some of the “blame the tooling” comes from how starkly different it feels for us to use Haskell versus Ruby + Rails.

                        With Rails, it’s particularly easy to get off the ground – batteries included! But as the codebase grows in complexity, it becomes harder and harder to feel confident that our changes all fit together correctly. Refactors are something we do very carefully. We need information from tests, APM, etc. Which means legacy code becomes increasingly hard to maintain.

                        With Haskell, it’s particularly hard to get off the ground (we had to make so many decisions! which libraries should we use? How do we fit them together? How do we deploy them? Etc). But as our codebase has grown, it’s remained relatively straightforward and safe to do refactors, even in code where we have less familiarity. We have a high degree of confidence, before we deploy, that the code does the thing we want it to do. As the project grows in complexity, the APIs tend to be easier to massage into the direction we want, rather than avoiding improvements because of some kind of brittleness / fear of regressions / fighting with the test suite.

                        For those that haven’t written a lot of code in statically-typed ml languages like elm, f#, or haskell, the experience of, “if it compiles it works” feels unreal. My experience with compiled languages before Elm was with C++ and Java, neither of whose compilers felt helpful. It’s been a learning experience adopting & embracing Elm, then Haskell.

                        1. 2

                          This is probably not what most people would do here as any kind of APM tool would clearly show you which query is executed where (among other things). For deeper investigations, there are things like rack-mini-profiler, etc.

                          I agree this information can also be found while monitoring, and we did rely on our APM quite a bit through the work (though this is not mentioned in the blog post), for example to see whether certain code paths were dead.

                          A benefit of the monkey patch approach I think, was that it was maybe easier to interact with programmatically. For example: We made our test suite fail if it ran a query against a Quiz Engine table, and send a warning to our bug tracker (Bugsnag) if such a query ran in staging and production (later we would throw in that case too).

                          Didn’t know about the AR feed. That looks like it would have been a great alternative to the monkey-patch.

                          Regardless, our criticism here isn’t really related to Rails tooling available to dig for information, rather that we would have liked not needing to dig so much to know where queries were happening, i.e. that being clearer from reading code.

                          1. 2

                            What APM tools did you use to give what info/data and what didn’t they provide that you needed to use other tools to fill the gap for?

                            1. 1

                              What APM tools did you use to give what info/data and what didn’t they provide that you needed to use other tools to fill the gap for?

                              We primarily use NewRelic in our Rails codebase and Honeycomb in our haskell codebase.

                              NewRelic is a huge product, and I bet we could have gotten more use from NQRL to find liveness / deadness, but we didn’t.

                              We used NewRelic extensively to find dead code paths by instrumenting code paths we thought was dead and seeing if we saw any usage in production.

                              For finding every last query, we wanted some clear documentation in the code of where queries were and where queries weren’t. NewRelic likely could have provided the “where” but our ruby tooling let us track progress of slicing out queries. The workflow looked like this:

                              • Disable queries in a part of the site in dev (this would usually be at the root of an Action)
                              • Ensure a test hits the part of the site with the disabled queries
                              • Decorate all the allowed/known queries to get the test passing
                              • Deploy, and see if we saw any queries being run in the disabled scope
                                • if we do, write another test to ensure we hit the un-covered code path. Decorate some more.

                              It looked something like this:

                              SideEffects.deny do
                                # queries here are denied
                                 data = SideEffects.allow { Quiz.find(id) # this query is allowed }
                              end
                              
                        1. 7

                          It seems to me that the CARE (Code Aspires to Resemble Explanation) idea is fine, but I’m not seeing how DRY has anything to do with what is going on here.

                          Here’s some smells that might indicate code could be more CAREful: … The existence of comments explaining what is happening, suggesting the code doesn’t explain itself well.

                          I wince when I read advice like this these days. I suspect this is because I’ve always seen it come out of examples that are like something you would find in an introductory programming textbook. A lot of the time having a “what” comment is incredibly useful, as long as it’s not an obvious repeat of the code itself. This is, of course, not very practical advice. That’s why I can’t say the author’s advice is wrong (it’s not), but I do think it’s overly simplistic.

                          I guess I’m just reacting to the simplicity of it, and my experience in seeing it misapplied.

                          1. 4

                            but I’m not seeing how DRY has anything to do with what is going on here.

                            Actually I do believe this is relevant. Blindly following DRY instead of thinking about how this helps structure code to support its narrative is more easily avoided if you follow CARE as an overarching principle.

                            Take the example of going to extreme lengths (crossing multiple namespaces or providing a global interface just for this purpose) to link to a library that is otherwise never used in an isolated part of your code, just because you want ro re-use a simple method it contains.

                            This might not be such a great idea, and CARE hints at why: it would make the part much harder to explain. Instead of just saying ‘this part takes the time and deterministically outputs the current positions of all moving objects’, you’d have to add ‘AND it refers to the math library from the financial module from the in-app purchasing module because that already contained a sum method’.

                            1. 1

                              It seems to me that the CARE (Code Aspires to Resemble Explanation) idea is fine, but I’m not seeing how DRY has anything to do with what is going on here.

                              This is good feedback, thank you! To expand a bit about how I see the relation with DRY: I think of both DRY and CARE as examples of heuristics I can use to help improve a bit of code. DRY will drive me to trigger on duplication and stamp it out. CARE invites me to consider whether the code looks like the explanation I would give of it. I think code will come to look pretty different depending on which heuristic I apply.

                              A lot of the time having a “what” comment is incredibly useful, as long as it’s not an obvious repeat of the code itself.

                              I do believe there’s situations where “what” comments can be useful. One that’s top of my mind is when performance requirements require me to compromise a bit on code legibility. I don’t believe that performance-optimized code would turn out particularly CAREful, but with good reason.

                              I’m curious, were you also thinking of particular types of situations where you feel “what” comments are particularly useful?

                              1. 5

                                I’m curious, were you also thinking of particular types of situations where you feel “what” comments are particularly useful?

                                Here’s one of my favorite examples:

                                b=( "${a[@]}" ) # copy array
                                

                                That’s the shortest correct way to copy an array in bash. If you use bash a lot you’ll probably memorize it, but if you’re touching bash once every couple months then the comment will save you a lot of pain.

                                1. 4

                                  This is the kind of comment (or sometimes documentation) that I often see added during code review when the reviewer isn’t familiar with the library/language/etc. and requests it. It is not usually some subtlety, and when the reader becomes more familiar they don’t tend to ask for it in the future. It rubs me the wrong way because it is usually addressing a different “audience” than the rest of the comments, generally the point of writing this program is not to teach the next person who reads this about X, and the program that ends up with the comment explaining X ends up being arbitrary, as does the particular X that happens to trip somebody up this time.

                                  This particular instance is kind of compelling, but I wonder if that’s just because I don’t write enough bash, and I get the impression you add this comment consistently which maybe changes things.

                                  1. 5

                                    Most code can and should assume the reader is familiar with the language in use, but in the case of code written in a language that’s rare-within-the-codebase (a role which eg bash & make often fill), I don’t think that’s a good assumption.

                                    1. 2

                                      I agree with your misgivings, but on a team using multiple languages, with developers of varying experience and familiarity with each, you’re better off anticipating likely pitfalls and steering people away from them with over-explanation, or just saving them googling time, than saying “well, an X programmer should know this, and if you don’t, it’s on you” – even if the latter is true.

                                      1. 1

                                        it’s on you

                                        To be clear I’m happy to explain in say the discussion/email etc., it’s just immortalizing that explanation in the source that tends to put me off.

                                        1. 1

                                          Sure, I didn’t mean suggest you were being dickish. The problem is on a large code base at a big company, with people working in multiple time zones, you won’t always be around, eventually you’ll leave or move to a new team, etc, etc. Again, I agree there is something ugly about it – a better long term solution imo to document the feature set of the language everyone is expected is to know, and invest in training – but that’s a lot of work, and if it’s not going to happen comments are a decent mitigation.

                                    2. 2

                                      b=( “${a[@]}” ) # copy array

                                      This comment would help me a lot as I’m not too familiar with Bash myself.

                                      In the post I describe a “what comment” as a smell that might indicate code isn’t particularly CAREful, and I do feel that applies to this example. Specifically the word “copy” is essential to an explanation of the code but doesn’t appear in the code itself. I think it might just be harder to write CAREful code in some languages than in others.

                                      That said, I wonder how you’d feel about these alternatives to the comment:

                                      • Extract the copying logic into a bash function called copy_array. If that’s possible. I know Bash has functions but am not familiar enough to know for sure we could use them in this situation.
                                      • Rename variable b to copy_of_a.
                                      1. 5

                                        Option 1 is basically not possible in bash (functions don’t “return” values) unless you use a nameref which will be far worse than the comment and not supported in all bash versions anyway.

                                        copy_of_a might be a good alternative but not always, since sometimes it won’t semantically describe the new variable clearly (ie, it describes how you produced the new variable but not the meaning of that copy in the narrative of your code, thus violating CARE)

                                    3. 3

                                      I’m curious, were you also thinking of particular types of situations where you feel “what” comments are particularly useful?

                                      Any time a name is ambiguous or reading the code would take much longer than reading the comment. Of course, both of these things can (and usually should) be considered code smells, but any time you aren’t able to find a perfect name, or make the code completely crisp, a short comment that will save a future developer reading time is usually a better alternative than blindly following “don’t comment what.” The master rule is: If I were coming to this for the first time, would the comment make my experience better or worse?

                                      1. 2

                                        Don’t forget about the costs of a comment beyond the time spent reading it when someone first encounters that code.

                                        At some point the commented code might change. If the developer then remembers to update the comment too, it will cost them only a little time. But if they don’t update it, the comment will cost future readers a lot of time as they recover from being misled by a comment that doesn’t match the behavior of the code.

                                        1. 1

                                          Agreed. The cost/benefit is a judgment call and prediction based on the likely needs and probable mistakes of future readers. Also a good argument for clarity and brevity when you do make comments.

                                      2. 2

                                        I’m curious, were you also thinking of particular types of situations where you feel “what” comments are particularly useful?

                                        In codebases like GCC. Here’s an example in the integrated register allocator. It’s code I had to work through recently. Stuff like the comment on line 3811 are incredibly useful when you’re reading through. It’s code that you have to read to understand the workings of other parts of the code (and probably aren’t directly related), but it’s something you will rarely have to touch. Having comments that guide you through the steps is a godsend.

                                        The obvious retort here is to refactor so that it’s clearer, but that is very unwise. For one thing, the data structures would have to be redone because they are multi-purpose and meant to be generic (otherwise you’d have so many alises or copies it would be more confusing). Another is the sheer legacy to be overcome. And then there is the testing. This kind of code is really tricky to get right, and messing with it can be an awful lot of work.