Threads for joelgrus

  1. 9

    Without commenting on the product itself, I’m concerned that this product got $2.2M in funding. Am I really to believe that there’s a huge market for a product like this?

    1. 3

      $2.2M is peanuts. It’s like a side bet for a VC firm.

      1. 9

        That people are willing to throw away $2.2M on a side bet concerns me for different, humanitarian, reasons.

        1. 4

          It’s not like one person put in $2.2M, almost certainly a fair number of people put in much smaller amounts.

      2. 1

        I can’t wait to pay a monthly subscription fee to use my terminal.

      1. 1

        As someone who would rather not spend any time thinking about home networking, I use the router that Comcast gave me and an Eero mesh network, and this setup works really well for my ~2000sqft, 2 floor house. (pre-Eero the wifi was really bad if you were too far from the router).

        1. 6

          I downloaded the demo and never managed to beat the tutorial level. 😞

          I’m really not a bad developer though.

          1. 3

            So we won’t get “Ten essays on Factorio” from you then?

            1. 1

              I’ve played the tutorial at some point: it was hard, even though I was experienced. The analogy with development is only felt during regular open play, over several hours. Me, I’ve learned to reassess my relationship with technical debt: I tend to want the perfect thing, and I can write (or design) that perfect thing. But that takes time, which could be used instead to produce more things.

              Lesson learned: even code that’s not good enough can be good enough for now. It should be replaced when it ceases to be, and no sooner. (Note: that’s my lesson. Tactical tornadoes should instead learn to replace code as soon as it ceases to be good enough. In general, it’s hard to assess when the time is right.)

            1. 5

              I’ve got tons of them, two of my favorites are “Euclid’s solution”

              def fizz_buzz(n: int) -> str:
                  hi, lo = max(n, 15), min(n, 15)
              
                  while hi % lo > 0:
                      hi, lo = lo, hi % lo
              
                  return {1: str(n), 3: "fizz", 5: "buzz", 15: "fizzbuzz"}[lo]
              

              and the trigonometric solution:

              def fizz_buzz(n: int) -> str:
                  fizz = 'fizz' * int(math.cos(n * math.tau / 3))
                  buzz = 'buzz' * int(math.cos(n * math.tau / 5))
                  return (fizz + buzz) or str(n)
              

              (Self-promotion corner: I wrote a book on the topic.)

              1. 9

                I don’t really like Go, and have only done like …. 5 pages of it ever. But I feel like that goroutine example is pretty convincing that Go is easy? The amount of futzing about I have to do in Python to get a similar-working pipeline would lead to maybe 3x as much code for it to be that clean.

                I dunno, Go reintroduces C’s abstraction ceiling problem meaning it’s hard for someone to show up and offer nicer wrapper for common patterns. But if you’re operating at the “message passing” and task spinup level, you’re gonna have to do flow control, and that code looks very nice (and local, which is a bit of a godsend in this kind of stuff).

                Though I feel you for the list examples. Rust (which, granted, has more obstacles to doing this) also involves a lot of messing around when trying to do pretty simple things. At least it’s not C++ iterators I guess

                1. 18
                  import asyncio
                  
                  async def do_work(semaphore, id):
                      async with semaphore:
                          # do work
                          await asyncio.sleep(1)
                          print(id)
                  
                  async def run():
                      semaphore = asyncio.Semaphore(3)
                      jobs = []
                      for x in range(20):
                          jobs.append(do_work(semaphore, x))
                      await asyncio.gather(*jobs)
                      print("done")
                  
                  asyncio.run(run())
                  

                  About the same length in lines, but IMO quite a bit easier to write.

                  1. 4

                    That is a good example but you can’t really compare asyncio and go routines. The latter are more like “mini threads” and don’t need to inherit all the “ceremony” that asyncio needs to prevent locking up the io loop.

                    1. 12

                      Goroutines are arguably worse IMO. They can be both run on the same and a different thread, which makes you think about the implications of both. But here’s an a bit more wordy solution with regular Python threads which can be more comparable:

                      import threading
                      import time
                      
                      def do_work(semaphore, id):
                          with semaphore:
                              # do work
                              time.sleep(1)
                              print(id)
                      
                      def run():
                          semaphore = threading.Semaphore(3)
                          threads = []
                          for x in range(20):
                              thread = threading.Thread(target=do_work, args=(semaphore, x))
                              thread.start()
                              threads.append(thread)
                          for thread in threads:
                              thread.join()
                          print("done")
                      
                      run()
                      

                      As you can see, not much has changed in the semantics, just the wording changed and I had to manually join the threads due to a lack of a helper in the standard library. I could probably easily modify this to work with the multiprocessing library as well, but I’m not gonna bother.

                      Edit: I did bother. It was way too easy.

                      import multiprocessing as mp
                      import time
                      
                      def do_work(semaphore, id):
                          with semaphore:
                              # do work
                              time.sleep(1)
                              print(id)
                      
                      def run():
                          semaphore = mp.Semaphore(3)
                          processes = []
                          for x in range(20):
                              process = mp.Process(target=do_work, args=(semaphore, x))
                              process.start()
                              processes.append(process)
                          for process in processes:
                              process.join()
                          print("done")
                      
                      run()
                      
                      1. 10

                        goroutines aren’t just good for async I/O. They also work well for parallelism.

                        Python’s multiprocessing module only works well for parallelism is basic cases. I’ve written a lot of Python and a lot of Go. When it comes to writing parallel programs, Go and Python are in different categories.

                        1. 4

                          It’s best to decide which one do you actually want. If you’ll try to reap benefits of both event loops and thread parallelism, you’ll have to deal with the disadvantages of both. Generally, you should be able to reason about this and separate those concerns into separate tasks to be able to separate your concerns. Python has decent support for that, with asyncio supporting running functions in threadpool or processpool executors.

                          I do agree though that Python isn’t the best at parallelism, because it carries quite a lot of historical baggage. When it’s threading was being designed in 1998, computers with multiple CPU’s where rare, and the first multi-core CPU was still 3 years away[1], with consumer multi-core CPU’s arriving 7 years later. The intention for it was to allow multiple tasks to run seemingly concurrently for the programmer on a single CPU and speed up IO operations. At the time, the common opinion was that most of the things will continue to have only a single core, so the threading module was designed appropriately for the language, with a GIL, giving it safety that it will not corrupt the memory. Sadly the things didn’t turn out how they initially thought they would, and now we have a GIL problem on our hands that is very difficult to solve. It’s not unlike the errno in C, which now requires macro hacks to correctly work between threads. Just that GIL touches things that are a bit harder to hack over.

                          1. 7

                            I’m aware of the history. My point is that the Python code you’ve presented is not a great comparison point because it’s comparing apples and oranges in a substantial way. In the Go program, “do work” might be a CPU-bound task that utilizes shared mutable memory and synchronizes with other goroutines. If you try that in Python, it’s likely you’re going to have a bad time.

                            1. 3

                              The example with multiprocessing module works just fine for CPU tasks. asyncio works great for synchronization and sharing memory. You just mix and match depending on your problem. It is quite easy to deffer CPU heavy or blocking IO tasks to an appropriate executor with asyncio. It forces you to better separate your code. And in this way, you only need to deal with one type of concurrency at a time. Goroutines mashes them together, leaving you to deal with thread problems where coroutines would have worked just fine, and coroutine problems, where threads would have worked just fine. In go you only have a flathead scredriwer for everything betweem nails and crosshead screws. It surely works, sometimes even well. But you have to deal with warts of trying to do everything with one tool. On the other hand, Python tries to give you a tool for most situations.

                              1. 6

                                The example with multiprocessing module works just fine for CPU tasks.

                                But not when you want to add synchronization on shared mutable memory. That’s my only point. You keep trying to suck me into some larger abstract conversation about flat-head screwdrivers, but that’s not my point. My point is that your example comparison is a bad one.

                                1. 3

                                  Give me an example of a task of that nature that cannot be solved using multiprocessing and asyncio and I’ll show you how to solve it. You shouldn’t try to use single tool for everything - every job has it’s tools, and you might need more than one to do it well.

                                  1. 4

                                    I did. Parallelism with synchronized shared writable memory is specifically problematic for multiprocessing. If you now also need to combine it with asyncio, then the simplicity of your code goes away. But Go code remains simple.

                                    You shouldn’t try to use single tool for everything

                                    If you think I need to hear this, then I think this conversation is probably over.

                                    1. 2

                                      Parallelism with synchronized shared writable memory

                                      You describe a class of problems. But I cannot solve a class of problems without knowing at least one concrete problem from the class. And I do not.

                                      1. 3

                                        Here’s an example of something I was trying to do yesterday:

                                        I wanted to use multiprocessing to have multiple workers pull (CPU-bound) tasks off a (shared) priority queue, process each task in a way that generates zero or more new tasks (with priorities) and put them back on the queue.

                                        multiprocessing.Manager has a shared Queue class, but not a shared priority queue, and I couldn’t figure out a way to make it work, and eventually I gave up. (I tried using heapq with a shared multiprocessing.list and that didn’t work.)

                                        If you can tell me how to solve this, I would actually be pretty grateful.

                                        1. 1

                                          I gave it a bit of time today, here’s the result. Works decently well, if you don’t do CPU expensive stuff (like printing big numbers) in the main process and your jobs aren’t very short.

                1. 12

                  Incredible to see such an opinionated (and flawed) article with that many upvotes.

                  This approach doesn’t works every single time and it just gives more ammunition to those annoying people who don’t understand that.

                  1. 9

                    My biggest pet peeve with these articles is that this self-assured (or, rather, pointlessly arrogant) style tends to trickle down to the audience. Five years from now someone is going to take this article very literally and – lacking the required experience – not realize that you can’t really apply it in every case. And their next code review exercise will devolve into something like “conflating multiple logical conditions in one statement is bad style” - “It may be bad style in general but I want to do this explicitly because this way the conditions in the implementation match the ones in the formal spec, so it’s easier to check compliance.” “False. Conflating multiple logical conditions in a single statement is messy and results in code that’s hard to debug.” and so on and so forth.

                    The whole article reads like narrow dogma, even in areas where there’s really no reason for that, such as the history section, which is kind of upside down. As far as I recall, the earliest description of the if-then-else formalism (e.g. McCarthy, Minsky) are both purely functional and make no reference to the idea of a “block”, which arguably precedes structured programming (early FORTRAN versions, for example, had subroutines and DO loops). Or that bit about GOTO which is not even wrong – nobody who is defending the use of GOTO today is doing it in the context Djikstra was talking about in his famous paper, so no, obviously they don’t “talk like that” (which isn’t just untrue, it’s really just a gratuitous jab).

                    It’s even more ironic when the author has to show up and correct his examples, after spending a few hundred words on preaching why you’re always wrong and he’s always right.

                    1. 3

                      Five years from now someone is going to take this article very literally and – lacking the required experience – not realize that you can’t really apply it in every case.

                      I stopped using Pylint when I upgraded to a new version and my code started failing checks. The reason for it was that they’d added a new rule (that I didn’t agree with). I went and found the PR where someone added that rule, and the justification was simply “here’s a single random blog post that says don’t do X,” which (shockingly) got the rule added.

                      1. 1

                        Care to share? I find that both fascinating and daunting at the same time.

                    2. 4

                      This is what happens with dogmatic statements like the ones in this article. People flock to their favorite ideas, they might sound good (and invite discussion) and might even work in some situations, but in the end it’s down to taste and you can’t rigidly codify that.

                      Right now I’m working on a Clojure codebase that uses multimethods here and there. This is another way of doing conditionals, but it’s very very implicit. I still haven’t made up my mind whether that’s a good or a terrible idea. More likely it is just something you need to apply very carefully, like everything else in programming.

                    1. 1

                      fiction: I just finished rereading Dune (and I liked it less than I remembered)

                      non-fiction: Gary Taubes The Case for Keto, it seems good so far, but I already eat keto, so he doesn’t have much to sell me on

                      technical: a Springer anthology Query Understanding for Search Engines, it’s interesting and relevant for my work

                      with my kid: Big Ideas for Curious Minds: An Introduction to Philosophy, we’re reading a chapter together every night, so far it’s kind of a mixed bag, but it does make for some interesting discussions

                      1. 3

                        Company: Capital Group

                        Company site: Investment Group Technologies

                        Position(s): Software Engineer, Search / ML

                        Location: Los Angeles or Seattle (pre-Covid we were onsite, right now we are remote, it’s not clear what things will look like post-Covid)

                        Description: Right now I’m looking to hire a mid-to-senior level engineer with a focus on search relevance and machine learning (mostly NLP stuff at the moment, although who knows what the future holds). Basically I’m looking for a solid software engineer who knows some things about ML and search, or alternatively an experienced data scientist with strong engineering skills. Lots of interesting problems to solve, and you’d get to work with me.

                        Tech stack: Python (+ all the usual data/ML libraries), Elasticsearch, SQL, more AWS services than I care to list out

                        Contact: send me a message here or email me: joel dot grus at capgroup dot com

                        1. 2

                          My solutions are / will be at: https://github.com/joelgrus/advent2020

                          I also stream while I’m trying to solve the problems, those videos end up here: https://www.youtube.com/playlist?list=PLeDtc0GP5ICmVrjHJrIiDZFW_xr__Ifqk

                          1. 4

                            The argument is basically “Business Logic is superior to Machine Learning”, which makes perfect sense. However, I think machine learning is still helpful in cases where you have insufficient prior knowledge of the business logic, and thus need to estimate or approximate it.

                            1. 1

                              When does that happen?

                              1. 2

                                For example, consider the problem of determining whether an array of pixels represents a picture of a dog or of a cat. It would be difficult to write out the “business logic” for that.

                                1. 1

                                  When you have a large dataset and you don’t know what is driving the data to behave a certain way. Coming up with a detailed, knowledge-based model takes a long time. “Throwing” an ML-style model at a problem can be a lot cheaper. Sure if you’re making line-of-business CRUD you’ll rarely run into this, but there are domains where ML can help short circuit a lot of modelling.

                              1. 5
                                1. being able to see my kid throughout the day (although she’d be happier if she was hanging out with friends)

                                2. we got some quarantine pet chickens, and they’re growing up, and I take breaks during the day to go give them treats and talk to them

                                3. since I’m wfh I can take 15 minute breaks throughout the day to do small bits of dinner prep, and I end up cooking more interesting meals as a result

                                4. not commuting

                                5. I have better coffee at home than at work

                                1. 2

                                  In the Game State section, the author has this sentence:

                                  Normally I avoid dataclasses on account of their being mutable, but here that’s what we need.

                                  I wonder if frozen=True in the dataclass definition is not sufficient for their needs here. Both the data class documentation [1] and the similar documentation [2] for the popular attrs library’s parameter of the same name indicate that true immutability is not possible in Python.

                                  If you want true immutability in your code, Python is not a language to guarantee that.


                                  [1] https://docs.python.org/3/library/dataclasses.html#frozen-instances

                                  [2] https://www.attrs.org/en/stable/examples.html#immutability

                                  1. 1

                                    It’s less that I want “true” immutability and more that I prefer the sorts of code patterns that you’d associate with immutability. In practice I use NamedTuples instead of dataclasses, but as you point out they’re not truly immutable. Usually they’re close enough though.

                                  1. 4

                                    This post makes me want to give Haskell another try.

                                    1. 3

                                      That’s right, I wrote an entire book about Fizz Buzz. It’s really good! Feel free to ask me questions about it.

                                      1. 3

                                        Surprised there’s no mention of typing.NamedTuple, that’s like my #1 go-to for “typed” Python.

                                        1. 2

                                          typing.NamedTuple

                                          It’s less general than typed dataclasses, though I expect it uses less memory. I personally dislike the normal NamedTuple quite a lot in Python (always feels like a poor version of a slotted class) so I am not keen on the typed version either. YMMV of course.

                                        1. 2

                                          I recently upgraded from a Pixel 1 (which was on its second battery, which was not doing well) to a Pixel 3a, and so far (~3 months) I absolutely love it.

                                          (I feel pretty gross about how much of my life / data is entrusted to Google, but so far the convenience outweighs that.)

                                          1. 2

                                            Mine is joelgrus.com. It’s a static Pelican site hosted on Netlify. I chose Pelican because I’m a Python person, but the reality is that I barely touch the code, so if I had to do it again I’d choose something with more “modern looking” themes. (I still haven’t found a Pelican theme that looks great.) Netlify is great, though.

                                            I used to have disqus comments, but one day I realized they were injecting so much crapware / ads / tracking into my site that I just disabled them and now I have no comments.

                                            1. 2

                                              I’m sure I’ll end up learning some tech stuff, but the only thing I’m currently planning on learning is jazz piano.

                                              1. 1

                                                Ooh, what’s your plan for doing that?

                                                1. 2

                                                  I signed up for a year of this course when they had a big sale a while ago:

                                                  https://my.artistworks.com/george-whitty

                                                  although 6 months into that year I’ve spent very little time on it so far. ☹️

                                              1. 3
                                                1. Advent of Code
                                                2. going to see the Star Wars movie at 8am on Saturday
                                                3. friends’ annual winter solstice party
                                                4. band practice
                                                5. neighbors are having a holiday party with tamales

                                                (yikes, that’s way too much stuff)

                                                1. 1

                                                  Weird, I’m at a tamales party right now. (Or at least the lull at the end of the party.)

                                                1. 4

                                                  Recently I upgraded pylint and started getting new violations, at which point I discovered that the bar for adding new pylint rules is “some random blog post says not to do X, so let’s make X an error!”

                                                  https://github.com/PyCQA/pylint/issues/2905

                                                  I took this as motivation to replace pylint with flake8 and black, and I haven’t looked back.