Threads for malisper

  1. 9

    Instead of spinning up one goroutine per task and rate-limiting the speed you create them, a cleaner solution is to spin up N goroutines and have them all read from a shared channel. Something like:

    func forEachLimit(arr []int, fn func(int) err, parallelism int) err {
      queue := make(chan int)
      g := new(errorgroup.Group)
    
      for i := 0; i < parallelism; i++ {
        g.Go(func() {
          for item := range queue {
            if err := fn(item); err != nil {
              return err
            }
          }
          return nil
        })
      }
    
      for _, item := range arr {
        queue <- item
      }
    
      close(queue)
    
      if err := g.Wait(); err != nil {
        return err
      }
    
      return nil
    }
    

    FWIW, I do agree that concurrency in Go isn’t easy to get right. There was a paper a while ago where the authors found plenty of real world examples of deadlocks and memory leaks due to the specific behavior of channels. They also found Go programs tend to use way more concurrency than in other languages. I suspect this is because Go makes it so easy to write concurrent code, even when concurrency isn’t necessary.

    1. 2

      I suspect this is because Go makes it so easy to write concurrent code, even when concurrency isn’t necessary.

      In fairness, Rob Pike has warned before that goroutines can be a lot of fun to write and work with, but programmers shouldn’t get carried away with them because sometimes all you need a reference counter

    1. 1

      I think it may have been a Lobster that gave me this insight, many moons ago: since there’s usually an upper limit on how many elements can be stored, in-memory binary trees essentially have constant-time access and update. Ruling them out as “too slow” may be a bit pessimistic.

      1. 1

        If you take things in a very literal sense, sure, but that’s usually not what people mean when they talk about big-O. By that definition, any algorithm that runs in finite-memory is constant time.

        The formal definition of big-O is what happens when the size of a data structure approaches infinity. In this context, people are usually talking about an idealized computer in which you have infinite memory.

        1. 1

          Sure, I know that, and my comment was out of scope for “give the expected answer to this interview question”, but you’re presumably interviewing to work on a non-ideal computer with finite memory. It’s just as important to be able to figure out when worst-case complexity is rendered irrelevant by context and usage patterns. For instance, in the rate limiter, N is the constant upper bound for n, making everything O(1) anyway - you may even find that N=1 everywhere it’s used.

      1. 2

        I feel like the first example can be solved more elegantly by doing something like

        l = [0] * 10;
        i = 0;
        
        rate_limit(l, i):
          if now() > l[i]:
            l[i] = now() + 1 minute
            i += 1
            if i == l.size:
              i = 0
            return false
          return true
        
        1. 2

          It’s roughly the same as in the post. The main difference is using a circular array to keep track of the times instead of a linked list:

          l = new Queue()
          
          def rate_limit():
              while l.peek() < now():
                  l.pop()
          
              if l.len() > 10:
                  return false
              else:
                  l.push(now() + 10 minutes)
                  return true
          

          In fact, you can replace the while loop with an if statement and it will still work.

          1. 1

            You’re only allowing one call per minute, you need to allow N calls per minute, but the spirit is the same.

            1. 1

              There are n calls per minute.

              1. 1

                Oh I see, n = 10 in this case.

          1. 1

            Sounds okay for pre-screening coding tests, but doesn’t seem to be enough for on-site interviews of high tier companies. How do you think?

            1. 2

              I would say it still applies for applies for high tier companies. I’ve interviewed with Google twice and have gotten questions like “merge two sorted arrays” and “write a program to play tic-tac-toe”. Both times I got offers. Of course there were also questions out of left field like “solve a maze with 100 servers”.

            1. 6

              I have a pretty similar story with a backend focus. Throughout high school I taught myself the basic CS fundamentals. I had taken the MIT OpenCourseWare course on algorithms, as well as a number of other courses. To give you an idea I had completed ~200 problems on Project Euler.

              When I graduated high school in 2015, I had taught myself roughly the first two years of a typical CS curriculum. If I went to college I would spend the first two years covering stuff I already knew. The topics I saw after those first two years didn’t appeal to me. I think it was worth it to spend four more years in school to learn about those topics and get a degree.

              Due to my algorithms knowledge, I had an easy time interviewing with companies. I eventually found my way to Heap. When I joined Heap, I knew next to nothing about databases. I could explain to you what a join was, but had never performed one myself. When I joined Heap, they made me the person who was working on scaling Postgres. Since Heap has a large Postgres cluster with 100s of TB of data, I got to learn a ton about how to optimize and scaling Postgres. Initially I was the only person focused on scaling Postgres. After about a year a database team was formed and I soon became the leader of that team.

              In April, I left Heap to use my Postgres expertise to start a business. I started Perfalytics which was recently backed by YCombinator. Right now we are focused primarily on providing tech support for Postgres. A lot of teams are using Postgres, but aren’t sure how to solve performance issues as they come up. We advise these companies as they scale. Overtime we plan to automate ourselves away by building tooling that would give the same advice we would otherwise give.

              1. 1

                We advise these companies as they scale. Overtime we plan to automate ourselves away by building tooling that would give the same advice we would otherwise give.

                Love the idea. There’s a lot of people in mid-sized to big companies that do that for their own jobs. Gives them more fun time. ;)

              1. 2

                It’s not optimal to run PostgreSQL on zfs on your master. Every disk write causes b-tree updates in the PostgreSQL layer, as well as the zfs layer. This basically exponentiates your IOPS per write, killing performance. It’s better to run your master on ext4 or xfs. You can use a streaming replication to asynchronously keep a zfs-based standby updated, and delegate all your backup operations to that machine.

                1. 9

                  There’s a lot in this thread, but to sum it up…

                  1. I agree with trousers, there are ways to tune psql on ZFS and I have applied them and it works well. ZFS has much more sophisticated recovery tools than PostgreSQL. PostgreSQL would have to be applied on top of RAID, for example, to be resilient to disk failure, but ZFS has this built-in and I can take advantage of many of its other nice features for free if I use it.
                  2. I have a performance budget and I’m willing to spend it. It’s not necessary to squeeze every drop of performance out of the system when postgres isn’t your bottleneck.
                  1. 1

                    Ignore my advice at your peril. When your master starts hitting IO capacity long before you expected, you’ll come back to this thread. There’s a reason btrfs added the “nocow” option.

                  2. 4

                    I dont know… People run postgres on zfs all the time, and get good performance. The benefits of zfs (protection from bitrot) are real and shouldnt be discounted.

                    You do need to tune zfs a bit though.

                    https://www.slideshare.net/SeanChittenden/postgresql-zfs-best-practices

                    1. 4

                      You do need to tune zfs a bit though.

                      This is true. The most important thing is matching the recordsize to the PostgreSQL page size; i.e., 8KB. Any mismatch at all can put you in the RMW hole, where writes to cold records can first involve a random read – not a good place to be.

                      https://www.slideshare.net/SeanChittenden/postgresql-zfs-best-practices

                      Please beware of this slide deck. It contains things like messing with logbias and sync which are almost certainly wrong; at best your performance won’t improve, and at worst you may be risking corruption.

                      1. 2

                        Please beware of this slide deck. It contains things like messing with logbias and sync which are almost certainly wrong; at best your performance won’t improve, and at worst you may be risking corruption.

                        You are right – thanks for pointing that out. 👍

                        1. 2

                          It contains things like messing with logbias and sync which are almost certainly wrong

                          hm. About logbias it says “controversial” right in the title of the slide, and I can’t find anything about sync in there right now…

                          matching the recordsize to the PostgreSQL page size; i.e., 8KB

                          If write performance is what you’re after. I’m using 64k on my Matrix server to get that sweet 3.34x compression and pay less money for storage :)

                        2. 0

                          People do and they could be getting much better performance. No amount of ZFS tuning is going to eliminate the exponentiation of IOPS that I mentioned. No one is saying the benefits of ZFS should be discounted, just that a more effective use of ZFS in a PostgreSQL setup would be in a asynchronous slave machine.

                          1. 4

                            You can disable pg full_page_writes on zfs though, can’t you? That saves a lot of overhead that you can’t ditch on non-COW fses.

                            An ssd slog can also help a lot for speeding up synchronous writes on a “spinning rust” pool.

                            1. 0

                              You can save even more by switching to ext4 and keeping full_page_writes than keeping zfs and turning off full_page_writes. PostgreSQL was already architected to handle crash recovery before the advent of ZFS. Using ZFS to avoid crash recovery with PostgreSQL is redundant.

                              1. 6

                                ZFS was designed to avoid silent disk corruption as well. I guess we’ll just have to disagree on appropriate deployments at this point.

                                1. 2

                                  That’s why I suggest storing your database backups on ZFS, but not your master.

                                  1. 6

                                    This of course doesn’t stop your master db from silently returning corrupt data if you have bad sectors – expensive raid hardware /may/ detect it though. Flipped bits on the master may also result in backups with flipped bits (depending on how you do backups I guess), because you may be backing up garbled data.

                                    It depends on how valuable your data is though, for sure.

                                    1. 1

                                      You rarely find out about disk corruption fast enough to do anything about it, and your backups would be destroyed in a silent corruption with zfs for backups scenario

                                      1. 0

                                        No, I keep hourly back ups going back a year. The kind of failure being suggested here is exceedingly rare with today’s storage technology (which automatically remaps bad blocks). If you’re running PostgreSQL in a virtualized environment (e.g. on top of something like AWS EBS) then this failure mode is non-existent.

                                  2. 1

                                    You can save even more by switching to ext4 and keeping full_page_writes than keeping zfs and turning off full_page_writes.

                                    Can you explain why this is the case? This isn’t obviously true to me.

                                    If your workload consists of updating many different rows each on a different page, disabling full_page_writes can reduce the amount of WAL generated by an order of magnitude. It seems plausible that the reduction in WAL could make up for the increase in IOPS.

                                    1. 1

                                      If your workload consists of updating many different rows each on a different page, disabling full_page_writes can reduce the amount of WAL generated by an order of magnitude. It seems plausible that the reduction in WAL could make up for the increase in IOPS.

                                      Plausible, yes. In practice, no. Each page write on ZFS results in log(n) page writes within ZFS. Whereas a full page write is still one page write, not to mention a single disk seek (i.e. virtual IO op), compare to the log(n) seeks in ZFS. Of course measure it for yourself. I’ve run PostgreSQL in production on both ZFS and non-ZFS so I already know the difference.

                                      1. 1

                                        As I understand it, synchronous writes will be written to the ZIL, then asynchronously coalesced and written back to main storage. So, isn’t this incorrect?

                                        Also, do you have experience using an external SLOG device with a PostgreSQL setup? How does that affect things?

                                        1. 1

                                          It’s not incorrect that a write of a single block will translate into a write of log(n) blocks in ZFS. How many IOPS that translates into at run-time depends on how often data is being synchronized. No experience using a SLOG device since I run on virtualized hardware.

                            2. 2

                              It’s not optimal to run PostgreSQL on zfs on your master.

                              As with many pieces of advice, it really depends on your specific workload. If you are optimizing for write performance, sure, maybe ZFS isn’t the right choice.

                              On the other hand, if you have a right intensive workload, ZFS can dramatically improve performance due to compression. ZFS compresses both data on disk and data in the in-memory cache. This allows you to read a lot more data from disk faster and cache a lot more data in memory than you would be able to otherwise.

                              Heck, I can even imagine scenarios where ZFS improves write performance. If your workload is primary bulk updates and inserts, ZFS may reduce write volume by compressing the data before it’s written to disk.

                              As always, it all depends.

                              1. 1

                                Every disk write causes b-tree updates

                                What does that mean?

                                1. 1

                                  Exactly what it says it means. ZFS stores all data in COW fashion using b-trees. Every write must update block pointers all the way up the tree.

                                  1. 1

                                    I was asking about “the Postgres layer”. What does that mean in the context of PG?

                                    1. 1

                                      When postgres writes a new row, it has to update every node up the postgres-layer btree. Each of those page writes in consequence causes ZFS to do log(n) writes since it also uses a btree.

                                      1. 1

                                        I don’t know where you got that from. Thinking it might be my ignorance, I asked a few people who know PG internals well, but they didn’t know either.

                                        1. 1

                                          What do you mean where i got that from? Postgres indexes are btrees. That’s how btrees work.

                                          1. 1

                                            I don’t think that that’s how Postgres Btrees work. They aren’t binary trees, they are an improved implementation of Lehman and Yao’s Btrees.

                                            1. 1

                                              Which still essentially have log(n) insertion time. You need to read more. https://en.wikipedia.org/wiki/B-tree

                                              1. 1

                                                Ordinary inserts require a search and then an insertion into the chosen page. Postgres doesn’t have to update every node up the btree. And certainly it’s not true that every disk write causes btree updates (there might not be an index).

                                                1. 1

                                                  I mistyped, I was referring to row updates that modify indexes, nonetheless an extremely common use case.

                              1. 4

                                While I enjoyed the post, the comparison at the end is unfair. The author compares ZFS with a 475GB NVMe drive as a cache to XFS without an equivalent cache.

                                1. 2

                                  The initial comparison with XFS is somewhat unfair as well, though: does XFS provide the same data integrity features that ZFS does? It’s hard, really, to compare file systems with vastly different design centres and feature sets – which feels like the point they’re trying to make, really.

                                  1. 2

                                    Was the comparison looking at data integrity though? I didn’t see any mention of that anywhere – everything I saw was entirely about performance. If you’re doing a performance comparison of two filesystems, comparing them on (very) different hardware doesn’t seem real meaningful.

                                    The author mentions the possiblity of comparing against something like bcache (which would then be a zfs vs. xfs+bcache comparison rather than strictly a filesystem comparison), but then handwaves it away as “exotic” and concludes, essentially, that “zfs plus additional fancy hardware and a bunch of manual tuning outperforms xfs”. Well…big deal.

                                    1. 2

                                      At what point do you need to assume integrity as a baseline though? This is a database blog we’re talking here.

                                      Unrelated observation: it’s tragic that most production databases out there aren’t running on ZFS, and says a lot about the priorities (and less charitably the general ability) of our industry.

                                1. 1

                                  I spent some time thinking about this as I had a database with a high rate of writes. I didn’t get to come up with a solution unfortunately. Copying the production database is certainly useful but it comes with caveats.

                                  For example, if the database is sharded then copying a single shard has limited utility, and copying all of them is going to get expensive quickly.

                                  The other issue I had with using production data for testing is that sometimes it shouldn’t be made available to all and sundry due to privacy concerns, so there either needs to be a process for scrubbing it (error-prone) or we’re back to generating synthetic data.

                                  Finally, in a high write rate scenario there are additional load issues for the application server(s) if you’re trying to write both to the production DB and a copy. It’s another layer of complexity. It’s possible to avoid that by restoring a backup and doing some sort of synthetic load simulation rather than reproducing actual real time writes, but then it’s not so easy to make it realistic and keep it that way over time.

                                  1. 2

                                    Author here.

                                    For example, if the database is sharded then copying a single shard has limited utility, and copying all of them is going to get expensive quickly.

                                    Our database is sharded. We have 10s of thousands of logical shards spread across a much smaller number of physical machines. Almost all the queries we care about are ran across every physical machine. For testing optimizations, copying a single physical machine is the 80/20. It allows us to tell whether the optimization does make the queries we expect to get faster do get faster, and that no other queries degrade. For this to miss regressions, there would need to be a high number of shards on other machines that behave differently under the optimization.

                                    The other issue I had with using production data for testing is that sometimes it shouldn’t be made available to all and sundry due to privacy concerns, so there either needs to be a process for scrubbing it (error-prone) or we’re back to generating synthetic data.

                                    Why would data in shadow prod “be made available to all”? We have the same restrictions on accessing the data in shadow prod as we do for production.

                                    Finally, in a high write rate scenario there are additional load issues for the application server(s) if you’re trying to write both to the production DB and a copy.

                                    For us, our writes do not need to show up instantaneously. Ideally they show up within a few seconds, but it’s okay for us if they periodically take longer. All our writes go through Kafka before they go into our database. Our service that reads from Kafka then writes to both databases. If shadow prod slows things down, all writes will be queued in Kafka. This way, we don’t drop any data. There will be latency between when data shows up in production, which while less than ideal, is acceptable. We’ve discussed having separate Kafka consumers, one that writes to production and one that writes to shadow prod. This would prevent shadow prod outages from causing latency in production. There hasn’t been need for doing so, so we haven’t implemented it yet.

                                    1. 1

                                      Thanks for the extra information, it makes sense.

                                      Why would data in shadow prod “be made available to all”? We have the same restrictions on accessing the data in shadow prod as we do for production.

                                      I think it makes sense for performance testing and general error/no error tests on migrations. In my case, we occasionally had fairly complex data transformations going on during migrations, and if they went wrong or even while writing the migration, we’d need to see what happened to the data. It would have been hugely useful to run this on prod data, but then privacy issues became a stumbling block.

                                  1. 9

                                    I’m another person who skipped college and went straight into industry after high school. Based on my experience, I recently wrote a post describing what approaches I found to work when looking for a job. I hope to help people who are uncertain about going to school like I was realize they do have other options.

                                    1. 20

                                      My company has been looking for some junior and associate-level engineers and we get a lot of applicants from different bootcamps. It’s heartbreaking interviewing them. Usually they’re coming out convinced that while they’re definitely not fully-qualified as software developers, the bootcamp at least got them to entry-level. More often than not, though, they’re unqualified for even that.

                                      With one particularly awful one, some of us were suckered into mentoring for their students. By the end of the time there the students still couldn’t write a max function. The bootcamp told them they were all ready for full-time employment. Some of the students had moved from other states to attend.

                                      I think the students are the biggest victims of this all. Coding isn’t a skill you can learn in twelve weeks, but many predatory camps, which have cachet as authorities and insiders, push that narrative to outsiders. And the rest of the tech industry tolerates it.

                                      1. 9

                                        I’ve had a similar experience to you with some bootcamps, but they’re not all created equal. The Recurse Center in particular has continually impressed me, for example; I don’t know if their teaching methodology is different or what, but we’ve gotten quite a few candidates who may not be ready for full-time, but who can definitely start at what I might call an advanced intern level, and who have a really good chance to progressing to a normal full-time position at the end of fur months.

                                        That said, it’s worth noting that Khan Academy has a specific program for this situation (we call them fellowships), and our creation of that program was a very deliberate move to extend our education focus internally, in part by accomodating the bootcamps. I’m not sure if even some of the more liberal-minded places I’ve worked in the past would’ve been able to justify doing that, and I’m not sure if candidates from even the good bootcamps, like The Recurse Center, would do well otherwise.

                                        1. 13

                                          The Recurse Center (fka the Hacker School) is not a bootcamp at all (the name change was intended to make this more clear). It recruits students who already know how to program and may already work in the industry and takes more of a professional development/continuing education approach, giving students the time and resources to explore advanced topics that may interest them, or to fill gaps in their existing knowledge.

                                          1. 16

                                            (I attended the Recurser Center in the summer of 2015)

                                            To make it clear, the Recurse Center can’t really be compared to a boot camp or a traditional school. The easiest way I’ve found to describe the Recurse Center is that they take 80 people interested in programming, put them into a room together for three months, and have them work on whatever projects they want with little to no oversight.

                                            There are quite a few differences between the Recurse Center and other programs. First of all, they target all sorts of programmers. AFAIK, the only qualities they screen for are a love of programming and self-sufficiency. This leads them to getting programmers all across the experience spectrum. Some people have been programming for only six months, others for 20+ years.

                                            Second, there is no coursework. “Recursers” (someone who attends the Recurse Center) get to work on whatever projects they want with whoever they want. It’s completely self directed. The projects people work on range from building a programming language, to writing an implementation of Paxos, to learning Haskell.

                                            Third, the program is free to attend. The Recurse Center makes money by matching up Recursers with companies and charging the companies a recruiting free. To make it clear, the recruiting aspect of the Recurse Center is completely optional. If you aren’t looking for a job after the Recurse Center, they won’t push you to look for one.

                                            Overall the Recurse Center is completely different from any kind of boot camp. If you are interested in it, I highly recommend applying. It’s a really awesome experience.

                                            As for the recruiting from the Recurse Center aspect, they are also great. 6 of the 20 full time software engineers at my current current company, Heap, were sent to Heap by the Recurse Center. They are by far the best recruiting channel we use.

                                            1. 5

                                              The Recurse Center needs to be in more locations than just NYC. With a family, it is extremely expensive and difficult to secure accommodations for 6/12 weeks, and there are also issues of what the family would do during the day.

                                            2. 1

                                              Well that makes sense. In that case, I’m not sure I’ve had a good experience with a boot camp.

                                          2. 4

                                            I think the students are the biggest victims of this all. Coding isn’t a skill you can learn in twelve weeks, but many predatory camps, which have cachet as authorities and insiders, push that narrative to outsiders. And the rest of the tech industry tolerates it.

                                            I’m not sure what kind of coding we speak about. General? No. Specialised? Yes. I know quite some people that learned the coding needed to support their current task (e.g. a statistical model) very fast. The problem is that a bootcamp rarely facilitats that, they have a one-size-fits-it-all curriculum.

                                            In many cases, I have the impression a trusting employer and some individual coaching at that location would help much more.

                                            1. 3

                                              Care to name names?

                                            1. 1

                                              I really would like to see the queries and explains for Postgres and Virtuoso, or at least know what algorithm was attempted. For Postgres to be that much slower, there has to be some algorithmic issue.

                                              1. 2

                                                You can see the search path query here. I haven’t analyzed it yet to determine how efficient it is.

                                                1. 1

                                                  I saw that, and wasn’t 100% sure that’s the one they used. For example, does that query work on both Postgres and Virtuoso?

                                                  If that is the query, it’s terribly inefficient. They copied the Postgres docs example to enumerate an entire subtree, and modified it to return only the shortest path. So to compute the shortest path between A and Z, it starts at A, lists all of A’s connections, and scans them for Z. It then non-uniquely lists of A’s second degree connections. So if A knows D via B and C, it will list A B D and A C D, and then double-search for D in the next stage. This is particularly bad in social networks, where 2nd degree connections are likely to also be 1st degree connections.

                                                  But this isn’t a tree, it’s a graph. Instead, they should only keep the shortest way to get to any point, and work from there. So if they have A C, and they discover A B C, we should throw that out and keep A C only.

                                              1. 4

                                                This article seems to be a bit fear-monger-y to me. This is (A) documented behavior in the PostgreSQL manual, and (B) If you’re using a SQL database it is your responsibility to determine the level of isolation that your query should need.

                                                I typically prefer coalescing multiple selects into a more complex query, and only very rarely need to fall back to multiple selects where higher levels of isolation would actually be necessary.

                                                1. 3

                                                  This is (A) documented behavior in the PostgreSQL manual, and (B) If you’re using a SQL database it is your responsibility to determine the level of isolation that your query should need.

                                                  While it is documented, I find most people aren’t familiar with it. I could see half of the examples being completely unexpected to someone who knows Postgres, but not in depth.

                                                  Also, I find the documentation in the Postgres manual around the different anomalies to be unclear. I believe this is due to historical issues such as the SQL standard being designed for lock based databases, as well as the SQL standard itself being fairly unclear. See A Critique of the ANSI SQL Isolation Levels.

                                                1. 10

                                                  The content of this post is about the same as PG’s own documentation, however the author left out that you can change the isolation level even as far as serializable, which removes all of these issues if you don’t mind paying the cost.

                                                  https://www.postgresql.org/docs/9.1/static/transaction-iso.html

                                                  1. 2

                                                    the author left out that you can change the isolation level even as far as serializable, which removes all of these issues if you don’t mind paying the cost.

                                                    My next two posts are going to cover the main ways of avoiding these issues (row level locks and different transaction isolation levels), and how each of them affect the different examples in the post.

                                                    Serializable also doesn’t exactly solve all of the problems. For the lost updates, skipped updates, and serialization anomaly examples, serializable will abort one of the transactions, which is less than desirable behavior.

                                                  1. 3

                                                    So the tl;dr is that an insert heavy workload benefited from a batch write?

                                                    These tools are awesome, and deep diving into the source code to find the specific things that make batching more efficient is very educational (apparently some table related metadata is cached).

                                                    For applications that can tolerate bounded latency on their writes (even if it’s only 1s) batch writes are usually a huge win.

                                                    1. 4

                                                      Yes, but it was unclear that was the case at the beginning. We had (incorrectly) assumed CPU was mostly spent on evaluating the partial index predicates. Based on that assumption, we thought batching wouldn’t have had much of an effect. It wasn’t until we actually examined what the CPU was being used for did we realize that our assumption about CPU usage was completely wrong and that batching would actually have a dramatic impact.

                                                    1. 33

                                                      I’m an Ocaml user and, except for a few rare conditions, I’ve found I much prefer a result type to exceptions. My response will be based on Ocaml which may not be the same as F# so if they don’t apply there then ignore it.

                                                      Some points I disagree with the author on:

                                                      AN ISSUE OF RUNTIME

                                                      I didn’t really understand the example here. How is the author accessing an optional value? In Ocaml we have to use an accessor that would throw an exception if the value is not present or pattern match the value out. This doesn’t seem to have anything to do with exceptions or results, just an invalid usage of an option.

                                                      AN AWKWARD RECONCILIATION

                                                      This is the case in Ocaml as well, which is why many libraries try to make exceptions never escape the API boundary. But writing combinators for this are really quite easy. A function like (unit -> 'a) -> ('a, exn) result is available in all the various standard libraries for Ocaml.

                                                      BOILERPLATE

                                                      The author should be using the standard applicative or monadic infix combinators. Maybe F# doesn’t allow that. In Ocaml the example would look like:

                                                      let combine x y z =
                                                          pure (fun x y z -> (x, y, z)) <*> x <*> y <*> z
                                                      
                                                      WHERE’S MY STACKTRACE?

                                                      This is the one I disagree with quite a bit. If I am using exceptions then yes, I want stacktraces, because it’s a nearly unbounded GOTO. But the value result types give me is that I know, using the types, what errors a function can have and I have to handle it. This makes stacktraces much less valuable and the win of knowing what errors are possible and being forced to handle them. I’d much rather have this than stacktraces.

                                                      THE PROBLEM WITH IO

                                                      The problem here doesn’t have anything to do with exceptions, it’s that the return type should be a result where the Error case is a variant of the various ways it can fail. Ocaml makes this much much easier because it has polymorphic variants.

                                                      STRINGLY-TYPED ERROR HANDLING

                                                      Yeah, use a variant not a string.

                                                      INTEROP ISSUES

                                                      This can indeed be a problem. It’s also a problem with exceptions, though.

                                                      1. 9

                                                        100% agreed. Debugging from a stack trace is far more complicated than having good error handling through compiler enforced types.

                                                        1. 3

                                                          Ditto. This is the case I’ve found in any FP language that I worked at, it takes more time to work with the stack trace, and recover anything valuable from it, instead of utilizing the compiler and the type enforcing at compile time.

                                                        2. 2

                                                          WHERE’S MY STACKTRACE?

                                                          This is the one I disagree with quite a bit. If I am using exceptions then yes, I want stacktraces, because it’s a nearly unbounded GOTO. But the value result types give me is that I know, using the types, what errors a function can have and I have to handle it. This makes stacktraces much less valuable and the win of knowing what errors are possible and being forced to handle them. I’d much rather have this than stacktraces.

                                                          This is a case where you can eat your cake and have it too. Java has checked exceptions which the compiler enforces are handled. When call a function that can throw a checked exception, the calling a function either has to handle the exception in a try block, or include in its signature that it can throw an exception of the specified type.

                                                          You can also do the opposite and add the stack trace to the result type. Most languages provide some way to obtain a stack trace at runtime, so all you need to do is attach the stack trace to the error when it is instantiated.

                                                          1. 4

                                                            Checked exceptions in Java are a nice experiment but a rather colossal failure, unfortunately. Since the compiler cannot infer checked exceptions you have to retype them all out at each level and it becomes unwieldy. The situation is even worse with lambda’s where one has to turn a checked exception into an unchecked one.

                                                            1. 3

                                                              Is it simply type inference on function declarations that you see as the difference here? I am curious because as a Java programmer by day, I don’t see a ton of difference between “-> Result<FooVal, BarException>” and “FooVal someFunc() throws BarException { … }”.

                                                              Granted the implementation is quite different (unwinding the stack and all that), but is it simply ergonomics that makes the latter a “colossal failure” in your mind?

                                                              1. 3

                                                                No, the difference is that results are just types and values. From that you get all the great stuff that comes with types and values. For example:

                                                                • Type inference. I only specify the types of my functions at API boundary points.
                                                                • Aliasing types. If I have a bunch of functions that return the same error I can just do type err = ..... rather than type all of the errors out each time.
                                                                • They work with lambdas!
                                                                • They work with parametric polymorphism. I can write a function like 'a list -> ('a -> ('b, 'c) result) -> ('b list, 'c) result.
                                                                • And, probably most importantly, it does not add a new concept to the language.

                                                                That checked exceptions do not compose with lambdas in Java basically tells me they are dead. All the Java code I’m seeing these days makes heavy use of lambdas.

                                                                1. 2

                                                                  Gotcha, thanks for the reply. I don’t disagree strongly, but I feel like what you are arguing for is Java, minus checked exceptions, plus more pervasive type inference, plus type aliases, plus several other changes. Which, that’d be pretty cool, but I think at this point we’re discussing sweeping language changes as opposed to the merits of checked exceptions strictly.

                                                                  For example, simply replacing checked exceptions in modern Java with use of a Result would (at least as far as I can imagine) still result in a lot of verbosity. You’d just be typing “Result<Foo, Bar>” a lot as opposed to typing “throws Bar” a lot.

                                                                  Not to be overly argumentative or anything. But “colossal failure” seems a little strong to me! :)

                                                        1. 7

                                                          In Go:

                                                          type X struct {
                                                              y Y
                                                          }
                                                          
                                                          func (x *X) gety() *Y {
                                                              return &x.y
                                                          }
                                                          

                                                          Like C and C++, it’s still possible to return an interior pointer, but unlike those languages doing so is still guaranteed to be memory safe because of garbage collection. This might mean that your object ends up being punted off to the heap because the *Y outlives the X, but the actual access to the pointer carries no extra cost.

                                                          1. 20

                                                            Fwiw, part of his specification (a “crucial” part) seems to be exactly that *Y doesn’t outlive and that it compiles to mere pointer arithmetic. Not compelling for all use cases, but definitely Rust’s big rallying cry: “zero cost abstraction!”.

                                                            1. 4

                                                              well to be pedantic it still doesn’t outlive the owning object, but that’s because the owning object is kept alive by the GC for as long as it needs to even if the code doesn’t reference it anymore. But yes I understand what the author is getting at.

                                                          1. 2

                                                            Does anyone know what is the current state of the art of profiling tools is? I recently had the opportunity to use JProfiler, a JVM profiler, for my work. It provides the same functionality Spacetime provides and much more. One of the big differences is JProfiler attaches to a running process, so you can interact with the application while profiling it. Similar to Spacetime you can inspect what objects are currently filling up memory and the stack traces that generated those objects. JProfiler also lets you poke around the heap and see what objects a given object points to, and what objects point to a given object. One JProfiler feature I found particularly useful was being to visualize the state of each and every thread over time (at what times each thread is busy/idle/waiting on i/o). This allowed my team to find the bottleneck in our a system. We were able to see that all of the threads of a particular thread pool were almost always waiting on i/o, and discovered the thread pool wasn’t sized properly.

                                                            1. 12

                                                              Please don’t pay attention to TIOBE for determining language popularity. It is literally languages ranked by how many links come up when you Google the language.

                                                              1. 5

                                                                I didn’t know about DEFERABLE. That’ll be useful in the future. At work, we have a script that moves data between different tables. It has an intermediate state in which some fkey constraints would be violated. Since we didn’t know about DEFERABLE at the time, the script drops the fkey constraints, does its job, and then re-adds the constraints, all in a single transaction.

                                                                The script one time caused a pretty serious production outage. One of the members of our support team, who had a basic understanding of transactions, copy and pasted the script into psql. He executed everything in the script except for the final COMMIT, and then double checked everything to make sure it was okay if he ran the script. Because modifying a constraint on a table grabs a lock that blocks reads to the table, queries to the table weren’t finishing. This caused our connection pooler to quickly become saturated with queries to the one table. The connection pool saturating then all queries to the db to hang waiting for a connection. Since the db was down down, the support person decided not to execute the final COMMIT. If he had executed the COMMIT, the db would have come back up. Since the outage, we now require all administrative commands to be slack commands. While the script does work, the right way to do it is with DEFERABLE.

                                                                1. 2

                                                                  Oh, and I hadn’t noticed that DEFERRABLE could be used on foreign key constraints. :) The docs confirm:

                                                                  Currently, only UNIQUE, PRIMARY KEY, REFERENCES (foreign key), and EXCLUDE constraints are affected by this setting.

                                                                  Thanks for pointing that out!