1. 2

    Totally agree with the author. I was also using Python for too many things. Nowadays it’s Go for most things, Python for simple scripts, Node for some other stuff (running a test suite against websockets is much easier in Go than most other languages, also scraping is pretty awesome in Node)

    1. 2

      r. I was also using Python for too many things. Nowadays it’s Go for most things, Python for simple scripts, Node for some other stuff (running a test suite against websockets is much easier in Go than most other languages, also scraping is pretty awesome in Node)

      Python is more ergonomic for some CRUD stuff when you can just use django, other than that I prefer Go.

    1. 2

      happy to implement a little notification feed with Stream if people think it’s useful. pretty easy to aggregate it and plug in realtime WS based updates.

      1. 2

        Also, anyone else here at lobsters using RocksDB?

        1. 6

          TL/DR: Similar to GRPC, but drops requirement for HTTP2 which can lead to a ton of problems when combined with AWS ELBs. This is pretty cool.

          1. 3

            I am disappointingly used to libraries like this that try to focus on just the simple parts being underpowered, but I’m with you: this is clean and surprisingly powerful, without actually sacrificing much at all. I may use this in anger as early as tomorrow.

            1. 2

              This looks incredibly cool. I’ll be looking for additional language support for sure.

            1. 2

              not really the point of the article, but the one thing i’ve never found a good tool for is deploying my application. its something that i absolutely don’t want to build, but keep on reinventing for every project i work on.

              1. 2

                I’d like to solve this problem, but I have very strong opinions about how it should be done.

                1. 2

                  Write down the problem/requirements in a blog post and submit it here. I love to read about unsolved problems. Maybe someone even knows a solution.

                  1. 1

                    Having written these things for a few startups (Instacart, Airbnb) I can say, it’s tough to make a clean API for it and that makes generality, reusability and consistency (all requirements for a “tool” instead of a “solution”) very difficult.

                  1. 3

                    Normally I dislike humble-braggy posts like this, but it is quite impressive that they ship iOS/Android native apps rapidly with only three engineers.

                    1. 2

                      I didn’t spot the humble part. Regardless, it’s an interesting overview. I guess it shows that being disciplined about automation can pay off? I think Heroku adds quite a lot of value - I made a similar choice when I ran a company, and we were able to develop a surprising amount of functionality with 3 engineers too.

                      I do wonder how they keep their services up around the clock with just 3 people. Even with plenty of automation, it doesn’t seem sustainable. In my case, we were lucky that we only had a small number of enterprise users who were mostly active during work hours, and uptime wasn’t critical in the off hours.

                      1. 1

                        Yeah this story would have been impressive with 3 backend engineers. But 3 guys shared across iOS, Android and backend dev is quite the accomplishment.

                        I see that they use Kinesis a lot. I tried to go through those docs, but never quite got the hang of it.

                        1. 1

                          Sadly, pgtune doesn’t seem to be maintained or at least updated to newer Postgres versions. And there’s too much going on from one PG release to another for pgtune to stay relevant for years.

                        1. 14

                          Lacking a bachelors degree effects your career in development in at least one significant way; limiting your salary and promotion potential. Outside “competent” tech companies, Big Dumb Corp (ie the rest of the Forturn 500) HR will always use lack of a BS degree (or only an Associates) as reason to offer less salary up front, and lower raises once you’re on staff, and deny promotion. It’s a check box incompetents use to because they can’t tell who actually contributes. Some of the best developers I’ve worked with have had no degrees, have been self taught. It’s not right, but what I’ve seen where ever I’ve worked.

                          1. 6

                            Another unfortunate but real side effect is many people may be less than thrilled to “work under” you if they have degrees (i.e. self-taught engineer in charge of multiple PhDs).

                            The only exception is if you are some god authority figure like Linus Torvalds where no one dares to challenge your expertise.

                            1. 4

                              That’s a bias too. There is nothing to say that an engineer without a degree cannot do a good job managing a highly credentialed staff. As long as they have humility, know their limits, and are thinking about how to get the best out of someone it should be possible. Lots of research-based organisations don’t have this occurring a lot because the needs of the job (not the people management) require the PhD, but in the tech industry there are lots of PhDs being managed by less credentialed individuals.

                              1. 1

                                I agree. The thing is it’s common enough that you will not be able to consistently escape it.

                            2. 3

                              True, startups and most tech companies don’t care. Fortune 500, consultancies etc will be harder.

                              1. 1

                                I think that is less of a problem outside of the US (And maybe the UK?). I not in those countries and have not been to university. I’m doing ok as a developer. I think you just need other ways to show your skills such as a website/blog/github/experience. Once you get your first job (It’s probably not going to be stellar) then all the companies after that will mainly be looking at your experience in the work force.

                              1. 6

                                That young woman is absolutely amazing. She tops most if not all I went to school with just by doing it in a new country. So, lots of potential for starting a business and already coding a lot of web apps? Made an account on Medium just to tell her about Barnacles. We might see advice from her or a case study on there in future if we’re lucky.

                                1. 3

                                  What’s Barnacles?

                                  1. 2

                                    She might like Lobste.rs too :)

                                    1. 3

                                      I thought about mentioning it but didnt want to seem like url spam or something. Ill tell her if she shows up on Barnacles maybe.

                                  1. 6

                                    It’s a risky move to skip uni. I’ve seen people succeed from all backgrounds though. No degree at all, masters in CS, physics, business and one great programmer i know actually did english literature. You just have to start learning and do it.

                                    1. 5

                                      I’m not surprised at all, I think Dijkstra said that literature & writing best predicts success in programming.

                                    1. 0

                                      I fell asleep while reading this post. How much karma do I need for down voting?

                                      1. 1

                                        Naming things is so hard. Even within the same team people use different names for things. One of the hardest things when you’re programming is writing code with the goal of it being easy to use for the rest of your team.

                                        1. 1

                                          Dramatiq is licensed under the AGPL

                                          Now I have three options:

                                          • Make the codebase at work opensource (lol)
                                          • Violate AGPL on purpose
                                          • Use any of the other Redis-based task queues

                                          I really don’t get what the author is trying to achieve with choosing a license like this.

                                          1. 13

                                            You could also buy a license.

                                            1. 1

                                              Ah, that makes sense. I didn’t see that, shame on me.

                                            2. 4

                                              I think commercial backing of some sort or another is the only way we can sustainably develop open source software long term and dual licensing seemed like the lowest friction way to get started. I’ll have to highlight that fact a little better in the docs! :D

                                              1. 2

                                                You are right of course. Other message frameworks like sidekiq seem to do alright: https://github.com/mperham/sidekiq

                                                The challenge here is that Celery is in pretty great shape for a free solution. On the other hand Python’s support for high concurrency is changing rapidly so who knows maybe there’s room for a new player in this market.

                                                1. 2

                                                  I’ve never met anyone IRL who’s worked with Celery and didn’t run into problems, so there’s definitely room for improvement in this area.

                                                  1. 2

                                                    It works like a charm with RabbitMQ as a backend. The rest is pretty experimental and breaks, especially at high volume. (I’ve been using Celery for >5 years)

                                                    1. 4

                                                      I’ve been using Celery professionally for about 3 years and dramatiq tries to solve many of the issues I’ve encountered using it. Some stuff that immediately springs to mind:

                                                      • Celery doesn’t support task prioritization. You have to deploy multiple sets of workers in order to prioritize queues.
                                                      • Celery has poor support for delayed tasks. Delayed tasks go on the same queue that normal tasks go on and they’re simply pulled into worker memory until they can be executed. This makes it hard to autoscale workers by queue size.
                                                      • Celery acks tasks as soon as they’re pulled by a worker by default. This is easy to change, but a bad default. Dramatiq doesn’t let you change this: tasks are only ever acked when they’re done processing.
                                                      • Celery tasks are not retried on error by default.
                                                      • Celery’s not well suited for integration testing. You’re expected to unit test tasks and to turn eager evaluation on for integration tests, but even then task exceptions will be swallowed by default. Dramatiq provides an in-memory stub broker specifically for this use case.
                                                      • The source code is spread across 3 different projects (celery, billiard and kombu) and it’s impenetrable. Its usage of runtime stack frame manipulation leads to heisenbugs.
                                                      • It’s easy for some of its more advanced “canvas” features to drop tasks.

                                                      All of the above are things that are first-class in dramatiq and there are definitely other things I’m not thinking of right now. That’s not to say that celery is bad, but I think we can do better and that’s why I made dramatiq. :D

                                                      1. 1

                                                        Considering your experience, I was wondering what’s your take on rq? (others who used it, are obviously welcomed to chime in too)

                                                        1. 1

                                                          I don’t have much experience with RQ since it is Redis-only and I’ve generally preferred to use RabbitMQ as a message broker. However, a few things that seem like disadvantages to me with RQ are:

                                                          • Messages are pickled so it’s strictly limited to Python and pickled messages are potentially exploitable. This also means you may sometimes send bigger messages than you intended over the network purely by accident.
                                                          • Queue prioritisation is handled like it is in Celery: you have to spawn different sets of workers.
                                                          • It forks for every job, so it’s slightly slower and forks that are killed b/c they’ve surpassed their time limits can leak DB connections if you’re not careful. I understand this may be swappable behaviour, however.
                                                          • Similar to Celery, there isn’t a good integration testing story for RQ.

                                                          Because I’ve criticised both Celery and RQ at this point, I feel it’s important that I mention a couple areas where they’re both currently better than dramatiq:

                                                          • the obvious one: it’s newer than either of those and is less likely to be familiar to users. The extension ecosystem for dramatiq is nonexistent (though I will be releasing integration packages for Django and Flask soon!)
                                                          • dramatiq doesn’t store task results and doesn’t offer a way to retrieve them. Adding that sort of functionality is trivial using middleware, but it’s not there ootb so if you absolutely need something like that and you don’t care about the things I have mentioned so far then you should look at Celery or RQ instead.
                                                          1. 1

                                                            Thank you for taking the time to post this!

                                                            There are two other areas that bother me personally:

                                                            • Python 3 only. While I would love to switch to Python 3, still need to maintain a large project in Python 2.
                                                            • The AGPL license. The above project is open source too, but I want to keep it BSD licensed to stay “friendly” towards potential users. Ironically, for a commercial project I would worry less about your license of choice, as I wouldn’t mind buying the commercial license when needed.

                                                            I share @jscn’s sentiment about Celery. I I was wondering if RQ, despite the above disadvantages might be more stable. At least their codebase should easier to grok (single repo)…

                                                            1. 1

                                                              Python 3 only. While I would love to switch to Python 3, still need to maintain a large project in Python 2.

                                                              I’m considering adding Python 2 support, but it’s a hard thing to balance what with 2.x getting EOL’d in a little less than 2 and a half years.

                                                              The AGPL license. The above project is open source too, but I want to keep it BSD licensed to stay “friendly” towards potential users. Ironically, for a commercial project I would worry less about your license of choice, as I wouldn’t mind buying the commercial license when needed.

                                                              Understandable.

                                                        2. 1

                                                          Sure, that’s true. Did you ever look at https://github.com/RichardKnop/machinery that project is still really early. Probably much easier to compete with.

                                                  2. 1

                                                    beanstalkd, NSQ, resque, celery, huey, … — pretty much everything in this space is non-GPL. So “use any other queue thing” will definitely be a very popular option :)

                                                    1. 5

                                                      So “use any other queue thing” will definitely be a very popular option :)

                                                      That’s perfectly fine! I just want those people that get value out of my work to contribute back in some way. If someone makes a cost-benefit analysis and decides that they’d rather use celery over dramatiq because they prefer the cheaper option (although it’s worth mentioning that I give out free comm. licenses for one year per company) then that’s their prerogative. I’ll still be around a year later when they realise their mistake ;).

                                                  3. 2

                                                    Trying to achieve you not using this at work? That’s usually what I’m going for when I choose AGPL

                                                  1. 5

                                                    Pair programming. Where you take the productivity of 2 devs and divide it by 4.

                                                    1. 3

                                                      While I’ve always thought the same way, I’ve never ever seen it be said by people who are effective at pairing. In other words, most of the experiences I’ve heard about maybe start out as less efficient, but the gains in shared understanding, shared problem solving, etc apparently pay dividends in time later on.

                                                      1. 3

                                                        Here’s two counterpoints I found looking for one whose bookmark I lost or can’t remember:

                                                        https://www.quora.com/Why-do-many-programmers-oppose-pair-programming

                                                        http://www.bennorthrop.com/Essays/2013/pair-programming-my-personal-nightmare.php

                                                        What’s your thoughts on those gripes given the perspective you shared? For me, I’m an introvert who does much better alone but collaborating with team members when exploring or debugging as in Quora answer. I’d be mentally drained by prolonged, pair programming. I still might try small sessions of it sometime when learning something new. I try to stay open-minded about it as I know it has benefited some people. But in production coding is it really necessary and helpful versus solo folks in the zone occasionally getting together in code reviews? I lean toward doubting that. Maybe for extroverts, though, as they zone when interacting with people.

                                                        1. 4

                                                          My perspective is pretty similar to yours. I work best alone, but collaborative debugging and exploration work really well for short periods of time.

                                                          I’ve tried pairing all day for a couple of weeks, but this experience was remote. I’m not sure how that effects the effectiveness of pairing as perceived by the “pros.” I did find all day pairing to be more draining, and while I appreciated the shared context, I didn’t believe that the person to person communication was better than just writing clear documentation (in the form of actual documentation, and detailed but concise commit messages about the reasoning for changes) which lasts forever.

                                                          1. 1

                                                            Makes sense. It’s about what I expected.

                                                          2. 3

                                                            Yesterday I did a driver-gunner style pairing and I’m curious if that would work better for you than “stereotypical” pair programming styles. Caveat is both my pair and I are fairly extroverted people, and I don’t know if this would be easier for an introvert than regular pairing.

                                                            Context: Three of us are dealing with a complicated production fire that’s harried us for three weeks running. We think we have a patch that would fix the last part of the problem, but it interacts with workflow code creaking under five years of hacks, tech debt, and “move fast”. It’s pretty much rotting turtles all the way down. We paired to write tests to check that the patch doesn’t break any of the legacy behavior.

                                                            My partner was driver, I was gunner. We spent one hour writing on a whiteboard all of the possible actions the user could take and all of the consequences of each action. When we were uncertain about the expected consequences she was responsible for reading and testing the code and I was responsible for asking other people in the company. Once we had a clear picture, she started with writing the setup code to generate exhaustive tests while I continued whiteboarding to figure out the invariants we would eventually test for. Once I finished and she confirmed they’d be useful, I switched to reading rspec docs and answering her questions so she could continue to focus on code. Then we had a meeting and that killed work for the day

                                                            I found this really useful and helped us make a lot more progress than either of us would have individual. The notable differences from standard pairing were

                                                            • I wasn’t on a computer until the very end. I never wrote any of the production code; it was all one person.
                                                            • We never ‘took over’ each other’s computers, so we didn’t stumble over different configs.
                                                            • While I sometimes watched her code, that was never my primary responsibility, so when either of us got edgy I could just back off.
                                                            • She only had to explain what she was doing at a high level, which is less tiring. She only needed to explain low level stuff if she got stuck or needed me to research something.
                                                            • Since we had a dedicated coder and a dedicated researcher, there was less context switching, and both of us could focus more.

                                                            I don’t know why I call it “driver-gunner”. It’s just division of labour. “Driver-gunner” was just the first phrase that came to mind.

                                                          3. 2

                                                            This is not the case in my experience.

                                                            1. 2

                                                              Very interesting! I assume your experience mimics the OP? Or is there something unique?

                                                              1. 2

                                                                Mostly it’s just extremely draining and not very productive. I wrote about it at more length in the last pairing thread, here.

                                                        1. 0

                                                          Another great aspect of concurrency in Go is the race detector. This makes it easy [emphasis mine] to figure out if there are any race conditions within your asynchronous code.

                                                          Am I reading correctly? How exactly does this magical decision procedure establish the presence of absence of data races in your concurrent code?

                                                          1. 4

                                                            Yeah–it is easy, but isn’t comprehensive. If racy accesses don’t actually occur in whatever you run under the detector nothing is reported, and racy accesses that have some totally unrelated sync operation happen between them (like there happened to be some lock taken but it doesn’t guard the racily-accessed data) aren’t found.

                                                            A lot of people seem to say that an emphasis on CSP-style concurrency combined with the race detector catching some stuff gets them far. For what it’s worth, one dissenting opinion on that comes from Dropbox, who say clever engineers looking for races is the best they’ve got: https://about.sourcegraph.com/go/go-reliability-and-durability-at-dropbox-tammy-butow/

                                                            1. 1

                                                              Yeah–it is easy, but isn’t comprehensive.

                                                              Then at best it is easy to find (some) data races, not to establish their presence or absence. The former might be good enough for your purposes, but the latter is how I interpret “figure out if there are any race conditions”. But, then, I’m not a native English speaker.

                                                              If racy accesses don’t actually occur in whatever you run under the detector nothing is reported,

                                                              Which is no different from languages in which concurrency is supposedly harder than in Go.

                                                              and racy accesses that have some totally unrelated sync operation happen between them (like there happened to be some lock taken but it doesn’t guard the racily-accessed data) aren’t found.

                                                              In a heavily concurrent program, this could be pretty much any racy access.

                                                              A lot of people seem to say that an emphasis on CSP-style concurrency combined with the race detector catching some stuff gets them far.

                                                              Yeah, I totally understand the feeling of empowerment when you learn something that seemed out of reach until not too long ago (in this case, writing concurrent programs), but IMO it’s not really justified unless you can reliably do it right. “Not too wrong” isn’t good enough.

                                                              1. 4

                                                                I think the OP is just being imprecise with their wording. They were technically wrong as soon as they started talking about race conditions instead of data races in the context of the race detector. (And many, many, many people get this wrong. It happens.)

                                                                1. 1

                                                                  I think I’d get it wrong too?

                                                                  I guess my understanding is that a race condition is a general term (something which has two actors, and the sequencing of their operations matters) which includes a data race (the actors both happen to be modifying some data) is a specific instance of a race condition.

                                                                  Is that about right, or am I missing some nuance in the terms?

                                                                  1. 6

                                                                    They are actually completely orthogonal concepts. :-) You can have a race condition without a data race.

                                                                    John Regehr explains it far better than I could, with examples: https://blog.regehr.org/archives/490

                                                                    Also, you’re in good company. The blog post introducing the race detector even uses the term “race condition”: https://blog.golang.org/race-detector The official documentation of the tool, however, does not: https://golang.org/doc/articles/race_detector.html

                                                                    (And btw I completely agree with your “perfect is the enemy of the good” remarks.)

                                                                    (Please take a look at John’s blog post. It even shows an example of a data race that isn’t a race condition!)

                                                                    1. 3

                                                                      Thanks for that.

                                                                      Edit: I briefly posted saying I still thought data races were a subset of race conditions - edited after digesting blog.

                                                                      The blog post makes the distinction that (their definition of) race conditions is about correctness, and not all data races violate correctness, so not all data races are race conditions.

                                                                      That’s a subtle distinction and I’m not entirely sure I agree, but I understand better - so thanks :-)

                                                                      1. 6

                                                                        Dmitry Vyukov, who works on the Go race detector, is a big advocate for presuming data races are incorrect rather than trying to sort out if they’re safe, because things that look benign in the code can bite you. Sometimes the problems only arise with the help of compiler optimizations that assume there are no racy accesses and therefore, say, compute on copy of some value in a register rather than the original on the heap for some duration. He writes some about this at https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

                                                                        The io.Discard example partway down https://blog.golang.org/race-detector is good too: some memory that everyone thought of as a write-only ‘black hole’ turned out not to be in one specific situation, and a race that some experienced coders thought was safe caused a bug.

                                                                        Nothing there is inconsistent with what Regehr says, I don’t think: Vyukov isn’t saying a benign data race is an invalid concept, just saying it’s dangerous to try to write code with them. I mention it because examples like these made me much less inclined to try to figure out when data races were or weren’t benign.

                                                                        1. 1

                                                                          Hmm, interesting :) I’m going to read up and correct this.

                                                                        2. 1

                                                                          I don’t find terribly convicning that example of a data race that isn’t a race condition. The racy program doesn’t have a well-defined meaning in terms of the language it’s written in. Its behavior depends on the whims of the language implementor, i.e., they have ways to break your program without neglecting their duty to conform to the language specification.

                                                                          1. 1

                                                                            I’m not convinced by your rebuttal. That doesn’t imply it is a race condition. It just implies that it is UB. UB could cause a race condition, but not necessarily so.

                                                                            1. 1

                                                                              The language specification is a contract between language implementors and users. To prove a language implementation incorrect, it suffices to show one conforming program whose behavior under the implementation doesn’t agree with the language specification. Conversely, to prove a program incorrect, it suffices to exibit one conforming language implementation in which the program deviates from its intended behavior.

                                                                              In other words: Who cares that, by fortunate accident, there are no race conditions under one specific language implementation?

                                                                              1. 1

                                                                                In other words: Who cares that, by fortunate accident, there are no race conditions under one specific language implementation?

                                                                                Everyone that uses it?

                                                                                This is my last reply in this thread. I don’t find your objection worth discussing. Moreover, you didn’t actually refute my previous comment. You just lectured me about what UB means.

                                                                                1. 2

                                                                                  The essence of my objection is that “there are no race conditions under this specific implementation” isn’t a very interesting property when implementation details are outside of your control. But maybe that’s not true. Maybe people genuinely don’t mind when changes to a language implementation suddenly break their programs. What do I know.

                                                                    2. 3

                                                                      I think it sounded like I was disagreeing when I was trying to agree: the Go race detector is basically LLVM’s ThreadSanitizer (Dmitry Vyukov did work on both), and as such can help but can’t prove the absence of races. Agree it’s nothing like Rust’s static checking or the isolated heaps some languages use.

                                                                      1. 2

                                                                        “Not too wrong” isn’t good enough.

                                                                        That’s a pretty good opposite to to “the perfect is the enemy of the good”.

                                                                        In most software domains, the impact of a race is related to how often you hit it. e.g. a race which only shows up under your typical load once a year is a low impact problem - you have other, more relevant bugs.

                                                                        (I agree that in some scenarios (medical, aviation, etc) formal proofs of correctness and other high assurance techniques may be required to establish that even low-likelihood bugs (of all kinds, not just races) can’t occur.)

                                                                        The (imperfect) race detector allows you most easily find the most frequent races (and all of the ones you can provoke in your auto tests).

                                                                        No, it’s not perfect, but it is a very useful tool.

                                                                        1. 1

                                                                          This comment made me think; I think another thing that lets people get by is that a lot of practical concurrency is the simpler stuff: a worker pool splitting up a large chunk of work, or a thread-per-request application server where sharing in-memory data isn’t part of the design and most of the interesting sync bits occur off in database code somewhere.

                                                                          Not that you’re guaranteed to get those right, or that advanced tools can’t help. (Globals in a simple net/http server can be dangerous!) Just, most of us aren’t writing databases or boundary-pushing high-perf apps where you have to be extra clever to minimize locking; that’s why what you might expect to become a complete, constant tire fire often works in practice.

                                                                  1. 1

                                                                    The error handling section was a little confusing to me: Are there any example that are meant by proper error handling?

                                                                    I think when you have multiple return values, you get many of the advantages of exceptions, but I don’t really know what the author of this blog is arguing for as compared to python besides exceptions.

                                                                    1. 1

                                                                      Well you’d at least expect this to be baked into the language: https://github.com/pkg/errors

                                                                      1. 3

                                                                        But by not being baked in, you have alternatives: https://pocketgophers.com/error-handling-packages/ , all of which conform to the error interface

                                                                        1. 1

                                                                          Cool site!

                                                                      2. 1

                                                                        I think he means the kind of stuff he now gets from third-party tools: showing where errors occurred, and making it hard to accidentally discard errors.

                                                                        (Also: big plug for https://github.com/gordonklaus/ineffassign. It finds assignments in your code that make no difference to how it runs, a generalization of the ‘no unused variables’ rule that turns up some bugs and, when it doesn’t, often points to somewhere you could make your code tighter. Better yet, you can use something like https://github.com/alecthomas/gometalinter with a subset of checks enabled.)

                                                                      1. 2

                                                                        I’d be curious to here if things like cython, or the great C API Python has were tried where speed mattered? It seems to me that the natural evolution is to replace the bottlenecks first, and if that fails, replace everything.

                                                                        But, I might be old school in my thinking…

                                                                        1. 4

                                                                          Disclaimer: CTO of Stream here. We experimented writing Cython code to remove bottlenecks, it worked for some (eg. make UUID generation and parsing faster) and think that’s indeed good advice to try that before moving to a different language. We still decided to drop Python and use Go for some parts of our infrastructure mainly for these three reasons:

                                                                          1- Writing Cython is challenging, in our case several parts of our code bases needed to be rewritten 2- In some cases using our fast C code required patching lot of code (eg. Python Cassandra Driver) 3- Python+Cython was still much slower compared to Go

                                                                          1. 2

                                                                            1- Writing Cython is challenging, in our case several parts of our code bases needed to be rewritten

                                                                            More challenging that spinning up an entire engineering team on Go, and rewriting everything?

                                                                            2- In some cases using our fast C code required patching lot of code (eg. Python Cassandra Driver)

                                                                            Yeah, this seems like it’s probably a hassle to manage. On the other hand, maybe once it’s done, it’s done? Not sure how often the libraries you rely on are updated.

                                                                            3- Python+Cython was still much slower compared to Go

                                                                            Fair!

                                                                            Thanks for the clarifications and additional insight on this! It’s obviously a lot better to get the story from the horses mouth than to make half baked assumptions about what you may or may not have done.

                                                                            1. 3

                                                                              More challenging that spinning up an entire engineering team on Go, and rewriting everything?

                                                                              I can’t speak for the Stream team, but we cross-train to Go from python pretty quickly. People are productive on an existing codebase within a week or two. It’s not a big language. There are some idioms and some tooling/conventions (packaging etc) but it’s pretty quick.

                                                                              The biggest part (as with most languages) is being aware of stuff in the std library. But 80/20 helps you there, reading an existing codebase exposes you to the 20% of the stdlib which is useful 80% of the time.

                                                                              Edit: we use both python and Go - but it’s useful (and fun) for people to be able to learn stuff and move between projects.

                                                                              1. 2

                                                                                I agree with your first point. Rewriting all the hot code in Cython was going to take much less time than rewrite to Go (btw we still use Python for many things). But what were we going to have as final result?

                                                                                1- A codebase much harder to maintain and change because is written in a dialect of Python most are not familiar with
                                                                                2- A few more extra forks to maintain
                                                                                3- Something faster but not as fast as we wanted

                                                                                EDIT: markdown fix

                                                                            2. 3

                                                                              Literally the second thing in the post.

                                                                              1. 2

                                                                                Pretty sure it’s not at all addressed. The performance of serialization and ranking in (I am assuming) pure Python is discussed as bottlenecks. I’m suggesting that those things might have been better optimized independently, with Cython, or the C API, than to draw the conclusion immediately to leave Python and adopt Go.

                                                                                I’m not at all suggesting that adopting Go was a bad idea—I’m merely asking if consideration was taken to address the actual bottlenecks, independently, first.

                                                                                Did they try to do the 10ms serialization stuff in C? They spent a bunch of time “optimizing Cassandra, redis, etc” — I assume the author means they optimized their usage patterns, and indexes, and such based on query patterns, not “rewrite bottlenecked portions of the database drivers in C, or Rust.”

                                                                                If I was supposed to take something else from that, I’m sorry that I misinterpreted the ambiguity, and it offended you so much.

                                                                                1. 10

                                                                                  Did they try to do the 10ms serialization stuff in C?

                                                                                  Python [de]serialization IS in C. There are modules for JSON and whatever other common format you want written in C, but the C code still has to create Python objects for Python to use. Pretty much no one uses pure Python JSON libraries, because they’re so ludicrously slow. Parsing any substantial amount of data would be much much much slower than 10ms. I’m not even sure where you’d find one, you’d have to go out of your way to do so, considering even the standard library json module is written in C.

                                                                                  I assume the author means they optimized their usage patterns, and indexes, and such based on query patterns, not “rewrite bottlenecked portions of the database drivers in C, or Rust.”

                                                                                  All of those drivers are already written in C, the Cassandra one is written in Cython as you suggest.

                                                                                  When I read this section I read “we have already optimized everything that we can except what is latent to the language itself.” Like Python objects being… Python objects. Are they supposed to deserialize data into something else in Python? If you can’t deserialize to Python objects then why would you use Python?

                                                                                  it offended you so much.

                                                                                  I wasn’t offended. My comment means exactly what it says, your concerns are quite literally addressed in the second section of their post, with the extremely appropriate heading “Language Performance Matters.” Though now it sounds like you don’t know that all of these things like serialization and drivers have already been optimized in C or Cython, particularly because you said “performance of serialization and ranking in (I am assuming) pure Python”.

                                                                                  It’s true, they didn’t address those things, I expect because they assumed the reader would already know that serialization formats aren’t handled in pure Python. If you didn’t know that, well now you do. Otherwise, I don’t understand your comments.

                                                                                  1. 3

                                                                                    Python [de]serialization IS in C. There are modules for JSON and whatever other common format you want written in C, but the C code still has to create Python objects for Python to use. Pretty much no one uses pure Python JSON libraries, because they’re so ludicrously slow. Parsing any substantial amount of data would be much much much slower than 10ms. I’m not even sure where you’d find one, you’d have to go out of your way to do so, considering even the standard library json module is written in C.

                                                                                    I don’t see JSON mentioned at all in the post as a bottleneck… unless it’s related to the 10ms Cassandra deserialization times…. And, granted, it’s totally possible given the space they are in. They likely get their feed items as JSON. It’s unlikely they actually need to keep it as JSON. Msgpack, or CBOR, or something would (maybe/likely) be faster to deal with than JSON. But, I digress. No idea if they tried.

                                                                                    All of those drivers are already written in C, the Cassandra one is written in Cython as you suggest.

                                                                                    Ah! Ok. I did not know this, thanks for the new context. It seems, however, that these optimizations could be turned off. Not likely in this case, I’m sure.

                                                                                    Like Python objects being… Python objects. Are they supposed to deserialize data into something else in Python? If you can’t deserialize to Python objects then why would you use Python?

                                                                                    Not a fan of your condescending tone. I, honestly, have no idea what’s going on in their application. I have no idea if they are making use of classes, or storing everything in tuples, or lists, or namedtuples, or some dicts, or some other concoction that exists in Python that I don’t know about (Pandas Data Frames?).

                                                                                    The bottleneck in the Cassandra deserialization could be due to the use of the Object Mapper. Did they try without it? Were they invoking some other objects whose init method happened to… I don’t know, accidentally hit disk to load a timezone file that wasn’t always cached?

                                                                                    You’re making a lot of assumptions, and I’m trying not to.

                                                                                    It’s true, they didn’t address those things, I expect because they assumed the reader would already know that serialization formats aren’t handled in pure Python. If you didn’t know that, well now you do.

                                                                                    Never assume the reader is as smart as you.

                                                                                    Let’s discuss more about the ranking / aggregation.

                                                                                    They build a twitter / facebook style feed as a service. The ranking seems to be an ordering thing (not shocking) and the aggregation a grouping thing (as opposed to syndication).

                                                                                    They spent 3 days + 2 weeks to optimize the ranking Python code. That’s not really a lot of time. They even dropped down into the AST module, which means they compiled into Python bytecode. The user can basically create whatever function they want based on a set of predefined primitives. Impressive ideas there, and probably the best you can get given the pure Python circumstances. What if it was written as a C (or Rust) extension? Could they have gotten 20x speed up in a few days? Did they try it? No idea. How long do they cache the ranking bytecode for? What’s the cost of their compilation? Does the Go version just walk a parse tree? Or does it do something even more fancy than the AST -> Bytecode compiler? No idea.

                                                                                    The aggregation related stuff is likely similar–based on some specification that the user provides (a Jinja2 style template), they do some analysis of the template and figure out how to do the aggregation based on the fields. Woah! Also an impressive thing. They even support conditionals. So, they might be using Jinja2 for the actual parsing and rebuilding. From my Python days, I know Jinja2’s parser doesn’t have to be that optimized – you “compile” a template once. The generation part is the part that needs to be fast, and I’m guessing they don’t suffer from slow text generation, but rather the actual filtering of objects, and finding and traversing the things that it all relies on. Is that something that they do every time? Do they cache the result of their compiler (would like to believe so!)? What are the real costs of it all? What differences exist in the Go code?

                                                                                    Don’t have much in the way of ideas here, but presumably the same thing would hold. They could spend some time optimizing just that part by dropping down into C, or Rust. That’s the likely bottleneck, not iterating over a bunch of objects.

                                                                                    So maybe they’ve tried all these things, and it ultimately wasn’t worth the maintenance costs, and the frustration, and things. It doesn’t appear to be the case based on the mention of the AST module that the ranking stuff was ever a C, or Rust thing. It’s likely that they really really tried hard and just kept coming up short. And, that’s fine. It wasn’t my call to make, or my happiness to deal with.

                                                                                    So, yes. Performance matters. And language performance matters… sometimes. And, developer productivity and working as a team together matter, so doing “clever” things is probably not desirable. There’s nothing surprising about their decisions, or reasoning… I’m just infinitely more interested in the stuff they didn’t say. And, I’m really curious about the trend we’re seeing where people suddenly care about efficiency and optimizing resources. The era of “scripting languages” as work horses (that started in the late 90s), is apparently dying, but I don’t know who wrote the blog post “Scripting languages considered harmful (and slow).” (BTW, I welcome this trend, but posit that scripting languages are probably good enough for most things, too)

                                                                                    1. 4

                                                                                      I don’t see JSON mentioned at all in the post as a bottleneck

                                                                                      It’s a typical serialization format. I also said “and whatever other common format” because it doesn’t really make a difference. All those formats you listed perform approximately as well as JSON.

                                                                                      Ah! Ok. I did not know this, thanks for the new context.

                                                                                      Glad to have taught you something!

                                                                                      You’re making a lot of assumptions, and I’m trying not to.

                                                                                      I think I just have a better intuition for performance than you. Which is fine, I’m a performance engineer, I’m supposed to.

                                                                                      Never assume the reader is as smart as you.

                                                                                      Let’s discuss more about the ranking / aggregation.

                                                                                      Indeed. But is that the right thing to look at? I think it’s safe to assume that they cache all their compiled rankings and aggregations, so lets ignore that bit.

                                                                                      How fast do you think ranking is vs deserialization / serialization?

                                                                                      I have a benchmark I pulled out of my ass. It takes a 5.5 MB JSON file (phat.json) that contains 100,000 objects, and performs these operations:

                                                                                      with open('phat.json') as f: raw_data = f.read()
                                                                                      data = json.loads(raw_data)
                                                                                      data.sort(key=lambda o: o['d'])
                                                                                      
                                                                                      total = 0
                                                                                      for obj in data:
                                                                                          total += obj['d']
                                                                                      
                                                                                      raw_data = json.dumps(data)
                                                                                      with open('phat-out.json', 'w') as f: f.write(raw_data)
                                                                                      

                                                                                      Omitted are timers between each of the 6 operations, read, parse, sort, sum, generate, write. The items are ordered randomly with respect to d (the sort key) so the sort should run in a full n log n time. How long do you think each one will take, as a fraction of total run time?

                                                                                      Actually guess, because I think you’ll be surprised.

                                                                                      read: 1%

                                                                                      parse: 21%

                                                                                      sort: 26%

                                                                                      sum: 15%

                                                                                      generate: 34%

                                                                                      write: 3%

                                                                                      A full 55% of the execution time is fucking with JSON. And this is with ujson, a Python JSON library that sacrifices features to get the most raw speed possible. So most of that time is straight up allocating and serializing Python objects. Generating JSON from Python objects is actually slower than sorting them. WTF right?

                                                                                      So maybe they’ve tried all these things, and it ultimately wasn’t worth the maintenance costs, and the frustration, and things. It doesn’t appear to be the case based on the mention of the AST module that the ranking stuff was ever a C, or Rust thing.

                                                                                      If they made the ranking stuff in C or Rust, they’d still be working on Python objects, and would still be dominated by that deserialization time. At that point they’d be deserializing database results in raw C/Rust to native structures, processing native structures, and serializing native structures to return as results. Where’s the Python? At that point it’s less a question of Go vs Python, as Go vs C/Rust with legacy Python glue.

                                                                                      So, yes. Performance matters. And language performance matters… sometimes.

                                                                                      A lot of the time, especially when you’re working with data. Scripting languages just aren’t meant for that. In 2017 processing large amounts of data isn’t about CPU speed, it’s about RAM speed. If your data fits in 1/4 the RAM, you will process it 4 times faster, hard stop. Because CPU speed is going up way faster than RAM speed. RAM is the new disk.

                                                                                      And, I’m really curious about the trend we’re seeing where people suddenly care about efficiency and optimizing resources.

                                                                                      More people have more users. You wouldn’t expect a newspaper to give a crap about performance, but the New York Times gets over 700 million page views a month. That’s a million views an hour, around 270 per second. The conventional wisdom for a Python / Ruby app server is 1 core per 5-10 requests / second, and for modern apps each page view represents more than one request.

                                                                                      But lets assume they’re totally willing to eat that hosting bill. Why would they care then? Well, wasting 10ms serializing a page is a pretty good reason to care. Study after study has shown UI responsiveness and latency directly correlates to user interaction. And user interaction directly correlates to money.

                                                                                      Open up the network dev tools and load nytimes.com. They’re also loading all sorts of 3rd party analytics tools and telemetry. Those 3rd parties definitely care about performance if nytimes is just one of their customers.

                                                                                      The era of “scripting languages” as work horses (that started in the late 90s), is apparently dying, but I don’t know who wrote the blog post “Scripting languages considered harmful (and slow).”

                                                                                      A couple years back everyone ever was blogging about why they switched to Go, how it lowered their operational costs, reduced latency, improved stability by not saturating resources, and so on and so forth.

                                                                                      Scripting languages were the workhorse of the web when you needed a PC to browse. Now, 30% of living humans carry a wireless browser in their pocket.

                                                                                      (BTW, I welcome this trend, but posit that scripting languages are probably good enough for most things, too)

                                                                                      Totally they are good enough for small things, and most things are small things. But if you’re even touching a distributed database like Cassandra, it’s pretty silly to use a scripting language as your main workhorse.

                                                                                      1. 3

                                                                                        Actually guess, because I think you’ll be surprised. read: 1% parse: 21% sort: 26% sum: 15% generate: 34% write: 3%

                                                                                        For shits and giggles, I did a little load/dump benchmark in Python and in Go. It’s here. You may be surprised that using the builtin json on Python 2.7.10 is 25% faster than Go 1.9! This is also on a much larger file than you presented – I’ve showed how I generated it. It’s not incredibly complicated, mind you.

                                                                                        In terms of setup, this is a 2015 Macbook Air, with, of course, an SSD. I’ve tried (this took 10 minutes total of my time, so YMMV) to control for disk cache by running it a couple of times before hand, and I’ve done this about 10 times now and minus a few hundreths hear and there and such, Python always comes out ahead on deserialization over Go. Go consistently comes out ahead in serialization, but most of your argument seems to stem from deserialization as it’s loading and creating objects from Cassandra, etc.

                                                                                        It’s certainly possible that there’s speed to be gained in the Go version, but the performance was worse when I declared i as map[string]map[string]string so… ¯_(ツ)_/¯

                                                                                        Also, just noticed that I’m timing the file open in both cases in Python, too. So, a bit sloppy, but whatever.

                                                                                        1. 2

                                                                                          Try deserializing into structs in Go. There is no need to create all those hash tables. Using structs is how the vast majority of Go programs are written. It’s not an apples to apples comparison, but that’s exactly the point.

                                                                                          1. 3

                                                                                            Try deserializing into structs in Go.

                                                                                            Sure. I do this all the time. It works quite well if you know ahead of time the structure of the data you’re deserializing. This seems to be only half true for there use case. They have activities that have a fixed set of fields, but then allow an arbitrary set of custom fields as well. A natural constraint for a company that provides, essentially, a data store for it’s customers with custom query capabilities…

                                                                                            I don’t know why I’m spending my time on this – I guess it’s fun to prove someone who self proclaims as “intuitive” in performance engineering wrong with simple benchmarks, when you should know that the first rule in performance engineering is “don’t trust your gut, benchmark!” but you’re off base again. In fact, in my new example I’ve shown that JSON tagged structs are slower targets for deserialization than Python dicts, and Go maps.

                                                                                            You can claim it’s not real world, of course – 500 fields, 2000 entries in a list {"foos": [{...}, {...}]}. Go maps are greater than 2x faster to deserialize. Serializing structs, however, is 2x faster than serializing a map!

                                                                                            Python still beats Go though, even with “all those hash tables.”

                                                                                            1. 4

                                                                                              I considered adding that to the article. While Go is generally fast, at least 2 of the builtin libraries are sluggish. JSON parsing and Regex so far. I didn’t try other JSON libraries just yet (we use protocol buffers for most things), I don’t think JSON needs to be slow, it’s just the builtin library that isn’t great.

                                                                                              1. 2

                                                                                                When I need fast JSON parsing in Go, I’ve turned to easyjson to do code generation for specific types.

                                                                                                1. 1

                                                                                                  I’m surprised to hear that regexp in Go is slow! I thought it was based on re2-–though, maybe that’s more correct, and won’t blow you up with malicious input rather than insanely fast.

                                                                                                  1. 4

                                                                                                    The piece of the engine that it’s missing is a DFA. In my experience maintaining Rust’s regex library (also based on RE2 and has its DFA), the difference between the DFA and the Pike VM (a simulation of the NFA using a virtual machine) is about an order of magnitude. Progress on that seems to be tracked here.

                                                                                                    Note that Go’s regexp engine has various other engines from RE2 (like the bitstate backtracker and the one-pass NFA matcher), but they only work in specific circumstances.

                                                                                                    1. 1

                                                                                                      Go uses the same syntax as re2 but doesn’t have a full port of the engine. It uses the same basic strategy that prevents exponential back tracking, which is inherently slower in the happy path without extensive optimization.

                                                                                                  2. 3

                                                                                                    In Python, you’re doing about as best as you can do. In Go though, it’s relatively easy to use a library like easyjson to increase JSON deserialization/serialization dramatically. (I have no dog in this fight, but I think this piece of information is incredibly valuable for evaluating this particular trade off.)

                                                                                                    1. 1

                                                                                                      They have activities that have a fixed set of fields, but then allow an arbitrary set of custom fields as well.

                                                                                                      True, good point.

                                                                                                      I guess it’s fun to prove someone who self proclaims as “intuitive” in performance engineering wrong with simple benchmarks

                                                                                                      Before being an asshole, try being right. It’s not an essential pre-requisite, but it helps.

                                                                                                      In fact, in my new example I’ve shown that JSON tagged structs are slower targets for deserialization than Python dicts, and Go maps.

                                                                                                      You have already concluded with such certainty that Python is faster than Go! With such certainty, and a proven skill—nay perhaps a calling—in running simple benchmarks, you must really know your stuff. But my gut says something is up.

                                                                                                      Ah, intentionally or not, you’ve chosen the worst case for struct deserialization, a large number of fields that only have string values. An allocation per value, same as a map, and the large number of field keys will slow down the reflector.

                                                                                                      Performing a similar test on my JSON I used before, which has a smaller number of keys and mixed value types (string, int, etc), I see that Go encoding/json is ~1.5x faster than Python json. I also see that decoding to structs and maps is about the same. Interesting, my intuition tells me Go’s encoding/json package must not be particularly fast. It’s also ~1.75x slower than the hyper-optimized ujson Python package. Something’s definitely up.

                                                                                                      It’s ironic that you were so condescending about investigating higher performance alternatives, when you completely failed to do so. After spending 2 seconds on Google indiscriminately opening the first couple of GitHub links, I found jsonparser and ffjson. I’ll go with jsonparser since we want something that handles arbitrary data, as you pointed out. This library goes where ujson can’t, it’s a zero allocation parser. Just by using it to parse my data into a slice of structs it’s ~1.6x faster than ujson, ~5x faster than Python builtin json, ~2.8x faster than Go builtin encoding/json. For this data shape of course, obviously it will differ depending on the data.

                                                                                                      you should know that the first rule in performance engineering is “don’t trust your gut, benchmark!”

                                                                                                      No, I know the first rule of performance engineering is “don’t trust your gut when you don’t actually understand what you’re looking at.” The second bit is typically left out because the intended audience tends not to realize they are the intended audience. It’s really not that difficult to have an intuition about performance, you just have to actually know what you’re looking at, and what different types of operations tend to cost. The rule you parroted mostly exists because of a dozen or so counter-intuitive costs in computing. Though obviously measurement is still essential. Just not necessary for basic conclusions like “an optimal Go JSON parser will be faster than an optimal Python JSON parser.”

                                                                                                      1. 2

                                                                                                        Before being an asshole, try being right. It’s not an essential pre-requisite, but it helps.

                                                                                                        Pot, meet kettle.

                                                                                                        The very first thing you responded with assumed I was stupid and couldn’t read. You continued to be condescending suggesting that I have no intuition about performance, etc, etc. Basically, you’ve been a jerk this entire time. But, I forgive you. And, I’m sorry for the way I’ve acted in response.

                                                                                                        You have already concluded with such certainty that Python is faster than Go!

                                                                                                        No. I haven’t concluded anything. I’ve merely suggested that Python can be fast enough (in some situations) and running to Go isn’t always necessary.

                                                                                                        The original author, and the CTO of the company in question were very kind and directly answered what I was asking. You, however, decided that I must be an idiot and showed off your muscles.

                                                                                                        I bet in meat space we could be friends, and have a lot to talk about. Should that day come, I’ll buy you a drink.

                                                                                                        1. 3

                                                                                                          I would appreciate it if both @apg and @peter would leave this thread without further replies, at least until I have coded more moderation actions than “delete comment” and “ban user”.

                                                                                                          1. 1

                                                                                                            I’m interested in what such actions might be, because I agree that the majority of this thread is a waste of time. Much of the responsibility lies on me, as this is the second fight I’ve picked over performance in two weeks. In both cases, my initial throw down received nontrivial positive reception (1, 2), but again in both cases there was little value in continuing after that.

                                                                                                            I’ll endeavor to use less colorful metaphors, and trim discussions like this one rather than continue to engage in a waste of screen space that might otherwise be filled with insightful comments.

                                                                                                            1. 3

                                                                                                              Thanks for taking a minute to consider a pattern and how it can be improved.

                                                                                                              I was writing as I was a minute away from going to bed so I didn’t want to write something long or do something rash. This discussion had some great technical debate but was also sliding towards personal attacks and general unpleasantness.

                                                                                                              Basically I’m looking for the smallest possible early intervention with the best chance to nudge a thread away from escalating toxicity. And most often that’s just going to be a moderator leaving a comment reminding people to be kind. I’m pondering what features would be appropriate and will probably post a meta thread before I actually implement anything besides a mod dashboard that finds hotspots like a chronological list of comments getting more than 3 downvotes, etc. Human judgment is the most valuable thing, tools just exist to target it efficiently.

                                                                                                            2. -1

                                                                                                              Why would this thread need any kind of moderation anyway?

                                                                                                              1. 4

                                                                                                                Cause we are being dicks to each other and that’s not what this community is about.

                                                                                            2. 2

                                                                                              It’s worth pointing out that “getting rid of the bottleneck” when your entire system is in Python isn’t obviously possible. For example, Python’s csv library is written in C, and while its core parser is quite fast, did you know that reading a CSV file in Python is still dog slow? A big part of it is because every record needs to get thrown into Python objects, which has overhead. So OK, maybe you write your CSV loop in something other than Python and go through all the hoopla of designing C bindings for that, but at a certain point, this could easily become a microcosm of your entire system. There’s a lot of work involved in having to push things down into lower level languages, and if you need to do it a lot, you might be better off just switching.

                                                                                              With that said, I don’t disagree with your overall point! Just want to say that your advice can be quite hard to follow in practice.

                                                                                              (I just read the comments down thread and see the others are basically saying the same thing I am: you pay dearly for having to put everything into Python objects. The irony of my comment in this specific example is that Go’s CSV library is not known for its speed… But my CSV example was just that, an example.)

                                                                                              1. 1

                                                                                                It’s worth pointing out that “getting rid of the bottleneck” when your entire system is in Python isn’t obviously possible.

                                                                                                Of course! The whole reason this thread got out of hand is because it’s hard to understand exactly how Python can’t work here. We don’t know how much of the workload is I/O bound, and what part is compute. We don’t know what SLAs they target, and how far from them they are. We don’t know if the problems they face are only at peak times, or if this is constant.

                                                                                                I don’t mean to call out this post specifically, but this is exactly the type of post that leads to cargo culting in tech. Often, they are well intentioned posts (like this one), but they lack enough information to take it as anything more than anecdote, but instead some will ultimately treat it as justification to take a completely unrelated workload and say “Go is faster and we need to move to Go,” when Ruby, or PHP, or whatever else would continue serving them just fine and continue to provide them some of the advantages they adopted the language for in the first place.

                                                                                        1. 3

                                                                                          I don’t think that deciding on using Go, if you dislike its way of error handling or the absence of frameworks is a good idea. The first one is an explicit design decision and the second - depending of your definition of course - is closely tied to the philosophy and community around the language (see the Gorilla “framework” and the creator of Martini). The latter of course being the more subjective impression.

                                                                                          Also reading the article that the main reason for switching to Go is performance which is of course a nice thing, when coming from an interpreted language, but if that’s really the choice for switching from Python I’d be curious to learn why other options or things like PyPy weren’t chosen. At least to me that seems like it would be a similar or even better fit, that doesn’t involve switching to a new language which usually a processes that’s harder than thought.

                                                                                          I really enjoyed the article though. I really like the “Not Getting Too Creative” reason. That’s something I really appreciate. Thanks for the insights! :)

                                                                                          PS: One more thing. About frameworks. I think Go might simply be going a different direction. While I know there are things like Beego, Revel, etc. that seem to be trying to replicate what probably suits other language better, I think something in the direction of Ponzu might (or might not) be working better with what the language provides. I think that Go is still too young though for many people with much experience in utilizing the language and the surrounding idioms to have played with ideas outside of bringing concepts from elsewhere. After all code is is also codifieed knowledge and experience.

                                                                                          1. 2

                                                                                            Thanks! I actually tried PyPy for our import flow. (IE read a huge JSON dump and insert it into the various databases) To it’s credit PyPy was able to speedup the process by roughly 2x. I do think that writing Python optimized for PyPy is quite a bit of work. Perhaps more than just using a language which was designed with performance in mind. If the scope of the performance issues was limited to 1 component we would have probably gone that route. But in this case we had Python related performance issues in many components of our API. Tommaso on our team also experimented with Cython code, I personally didn’t try that.

                                                                                            I personally don’t mind the lack of frameworks. There is just a category of use cases for which it’s an issue though. Say that you’re building a simple app for a client, or a social app, or something B2B and you don’t expect a lot of traffic. In all those scenarios the overhead of using Go instead of Python/Django/DRF or Ruby/Rails is quite large. I think it’s a missed opportunity.

                                                                                          1. 10

                                                                                            We use LaunchDarkly for feature flagging so we can do contained rollouts and testing of new beta features

                                                                                            We use Optimizely for A/B testing

                                                                                            Surely there are libraries for many languages and web frameworks for doing that? For example

                                                                                            I can understand using Pusher (even though there’s a lot of open source self-hosted solutions for that as well), but A/B testing and feature rollout? Why are these things even offered as-a-Service?

                                                                                            I don’t understand this “use 3rd party services for everything” mentality. Downloading a library is easier than creating another goddamn account.

                                                                                            1. 4

                                                                                              They are offered as a service because you have the library wrapping your feature, but can inevitably end up with lots of supporting infrastructure. By supporting infrastructure, I mean things like feature group management, automating roll out of the feature to a larger cohort etc. If your support team needs to replicate a customer issue they might need to be able to report on users that have a feature flag, and ensure they see the exact same feature set too. In many cases you don’t need all this, but some people do.

                                                                                              For others though, having it as a service can be an easy way to adopt feature flags, although in practice they could probably have achieved the same result as your approach. The founders of LaunchDarkly and CircleCI produce a podcast (https://www.heavybit.com/library/podcasts/to-be-continuous/), so it’s unsurprising that they use eachother’s products.

                                                                                              1. 2

                                                                                                I can understand using Pusher (even though there’s a lot of open source self-hosted solutions for that as well), but A/B testing and feature rollout? Why are these things even offered as-a-Service?

                                                                                                Different companies have differing amounts of engineering resources, different patterns to their revenue (e.g. to hire more engineers… or not), and different levels of legacy cruft in their products. Stemming from these differences, and especially for anything infrastructure- or process-related, the “build vs buy” conversation will also differ greatly between companies.

                                                                                                I have had this same conversation with people regarding Heroku and SendGrid. That is, I know people who cannot fathom why anyone would need (or want) to pay for that category of PaaS in that way. Meanwhile, I shudder to imagine how much more difficult my company would have had it without them.

                                                                                                1. 1

                                                                                                  It’s more “download vs buy” here. For trivial stuff like A/B testing and feature flags, integrating a 3rd party service isn’t significantly easier than adding a library.

                                                                                                  Heroku and SendGrid

                                                                                                  That’s why I said “I can understand using Pusher”. That kind of stuff is actual infrastructure than needs maintenance, yeah.

                                                                                                  1. 2

                                                                                                    Curious: What library are you referring to when you mention of A/B testing, and what does it provide?

                                                                                                    Followup: Have you used Optimizely? I am not currently a customer, but when I was, I found it impressive. I would not be able to implement the same level of tooling, WYSIWYG DOM editing, analytics integrations and reporting for less money (the cost of my time) than their subscription costs. Not that simpler needs could not be met with a simpler solution, but if you need what they offer, Optimizely is not a service without its value.

                                                                                                    1. 1

                                                                                                      I linked to https://github.com/ankane/field_test in the original comment, of course there are lots more for different web frameworks and stuff.

                                                                                                      WYSIWYG DOM editing

                                                                                                      That sounds horrifying.

                                                                                                      1. 2

                                                                                                        Quick edit: First off, thanks for pointing out the link!

                                                                                                        That sounds horrifying.

                                                                                                        I thought so too, at first, but it’s not.
                                                                                                        Not entirely, anyway. I of course then thought “but what if you’re using some SPA framework?” and, to my surprise, there was an answer for that, and it wasn’t a bad one. I’m speaking beyond my minor experience, but I suspect it messes with load times a bit, and might not be something to layer on top of, say, a Rails app that already struggles with bad load times. But if you’ve already got a snappy site, Optimizely probably isn’t going to hurt, and might make a marketing team feel like they have freakin’ super powers.

                                                                                                        I have seen things like Optimizely and Infusionsoft give marketing teams amazing productivity boosts that the company’s product engineering team would be hard-pressed to match, and arguably shouldn’t try to match. Especially if it would distract them from their central product and serving their primary/external users needs better.

                                                                                                        Through quirks of the current labor market, software engineers command higher salaries than rank-and-file-but-sophisticated digital marketers (though at the top-of-the-top they even out). This leads software engineers, myself included, to assume that our higher pay means we are more important to a company’s goals in some absolute sense. This is not true, and if a company happens to have an engineering team of 5, and a marketing team of 20, distracting those 5 engineers to have them build and maintain a tertiary A/B-testing framework that can match Optimizely, versus enabling those 20 marketers to do more in less time, can make the latter look very appealing.

                                                                                                2. 1

                                                                                                  Feature flagging seems a bit extreme to me.

                                                                                                  1. 1

                                                                                                    The Weinberg Traction book is my go-to recommendation for getting devs-turned-bootstrappers started with marketing effectively. I hope it’s useful to you.