1. 4

    TL;DR

    Magic is different. It feels different.

    1. 1

      re: conway’s law

      i think you can work around this to some extent:

      • have more components than people (an order of magnitude or so?)
      • avoid “utils” and other catch-all dumping grounds
      • have your components be searchable

      you’re still going to get libraries/tools/etc. that follow org structure, but you can get a lot outside of that structure too, and that’ll be more reusable

      1. 2

        Why work around it? Isn’t the purpose of Conway’s Law to accept the inevitable rather than fight it?

        FWIW I’ve worked in an environment that follows your suggestions and it still followed Conway’s Law.

        1. 1

          Yes. This goes beyond software, too. The way to exploit Conway’s law is to shape the organisation after the system you desire to build. This implies things like smaller, cross-functional teams* with greater responsibility (in order to get less coupling between system components). That way you maximise communication efficiency.

          * Lean people would advocate that the team should be co-located too. The idealist in me still clings to the Microsoft research that showed org chart distance mattered orders of magnitude more than physical distance.

      1. 3

        Why support both multi-line and single-line comments? Seems like unnecessary complexity for a standard that’s explicitly avoiding complexity. If forced to choose, they should retain multi-line comments because they play better with compact JSON.

        1. 3

          git doesn’t have actual command to un-stage a file(s), though. To get around this limitation…

          Limitation, or poor UI decision? I’m guessing the latter.

          1. 10

            newer versions of git have git restore so I think that counts

            1. 5

              git reset -- filename or git reset HEAD filename do the same, tho, right? And that’s been in git for ages.

              1. 5

                I know, just wanted to say there is now an actual command. The article claimed there wasn’t one.

                1. 1

                  Sometimes. If the file is already in HEAD then this works, but if it’s a newly created file I don’t think this works.

                  1. 2

                    It definitely works with newly created files.

              2. 4

                The naming problem. There is, and always have been, git reset that does what OP wanted, however the “feeling” that this one command does “a lot of different things” (reset staging and reset working tree, depending on the flags) is what made people say it doesn’t have such command.

                1. 3

                  I use tig which makes unstaging single files easy and natural, among other things

                1. 2

                  When is data committed to S3? I assume it happens asynchronously since S3 has unreliable latency at p99 and Ben is claiming very low latency. If so, you’ll have data loss if your server ever does go down, even for maintenance.

                  Edit: Yes, it’s susceptible, but not as badly as I was thinking. (Docs). The data gets replicated to S3 as part of the write-ahead log checkpointing process, so yes, that’s async. You’re still expected to open the SQLite DB against a local file. So as long as the host or disk aren’t completely trashed you shouldn’t lose any data. But if you did, you’d lose all transactions that haven’t been checkpointed.

                  This is probably reasonable given how the database is presented. Still, users should be aware of the potential for data loss.

                  One big caveat — during replication to S3 it places a global lock on the database while it replicates to S3. So if S3 is experiencing latency degradation you’ll see some significant latency spikes on write requests. In bad cases, I could see this causing enough requests to back up to cause an outage in your service.

                  1. 4

                    I see commenters here aren’t reading the article. The answer is no…

                    Still, I absolutely love the process they used to come to that conclusion. I can think of a lot of problems with fix rate, but it’s still so much better than any other data driven approach to this problem that I’ve seen. I’ll see if I can implement it for other non-security related things.

                    1. 8

                      Something I love about Python is how the language managed to consistently evolve. I started with Python 1.6, and just loved how each version pushed me to try and adopt new practices. I remember decorators, str/bytes, format strings, async and typing as key improvements that changed the way I code today.

                      1. 17

                        And for the same reasons, I am more and more afraid to see where all the duct tapes and layer over layer cake in Python is gonna taste is a few years. The language changed a lot even when focused on Python3 that the motto “There is only one way to do it” is meaningless. Already when working with numpy & pandas a lot, there is more than one sane way to do it and that will propagate to all Python ecosystem. On the other side of the coin, is that the language keeps growing and adapting. Let’s see where Python 4 lends in the future.

                        1. 9

                          I have same feelings. Python is getting more complicated than it’s ever been. There are many ways to do a simple thing and it’s opposite of Python zen.

                          1. 2

                            Just because a new feature is available doesn’t mean you have to use it.

                            1. 14

                              Of course but someone, some day, will use it in a project you collaborate to and you will have to deal with it.

                              1. 1

                                I’ve dug into this a bit more and I’m probably inclined to agree (I don’t use Python myself though).

                                However, the change does seem to be quite divisive, so it’s very possible this will just be used sparingly in most code bases.

                                1. 3

                                  It’s hard to limit this given the massive size of the Python ecosystem. One observation around this is how the new PEG parser is allowing language developments to be proposed and land very much quicker than before. There seems to have been a barrier to entry before which much of the community was accustomed to.

                                  I’m personally neutral on this feature. But, I’d prefer that the whole ecosystem was considered during prioritisation rather than considering language features separately. I think they can be implemented independently, but from product perspective it doesn’t feel like the most important thing.

                              2. 6

                                I don’t know about you, but I work in a team environment

                                1. 2

                                  My experience is more about being dropped in various projects where the lead and the team have already settled on what they accept or not and after being done, I was moving to the next one. So I tend to have less to say (if it is not critical).

                                  I agree that in a team env on a project,it is easier to settle on what you accept or not as practice. I am a minority here.

                                  1. 1

                                    From what I’ve read, the decision to add pattern matching seems to have been a fraught one. I’d imagine you would have plenty of allies in your team to argue against adopting the paradigm in your team’s code.

                              3. 4

                                To be fair, I think Python has really embraced its position as the duct-tape language. It’s the language that lets you do anything from crafting custom packets to modelling neural nets. My experiences with it, in both corporate settings and ML usage, have been pretty subpar, but for folks who want to learn the language once and just learn domain specific duct tape to enter new subfields, it’s great.

                                1. 1

                                  TBH, in long term I feel that Python will act more as Graphql API to reach those numerous data and ML libraries so tightly linked to it from other languages. It is almost a gimmick to be able to call Python from your language of choice (Julia,Common Lisp,Go,etc.) for the purpose to access ML and Data libraries. IMHO, Python is not a good language to be treated as an API.

                                  1. 1

                                    IMHO, Python is not a good language to be treated as an API.

                                    It’s not in my opinion but there really aren’t any other serious competitors for the sheer breadth of applications that Python provides that is also a nice, generatable API. Julia is a much more fun language for data science, but the libraries are still a lot more raw than Python, and folks still do use PyCall for stuff (e.g. there’s no good equivalent to something like spaCy). Nobody else has really “put their money where their mouth is”.

                            1. 20

                              SQL is a narrow waist. Almost all databases support SQL. Almost all ORMs and data analysis tools emit SQL. If you make a new database that doesn’t support SQL it’s very hard to get adoption because users would have to abandon all their existing tools. If you make a new data analysis tool that doesn’t emit SQL it’s very hard to get adoption because users won’t be able to use their existing databases.

                              an imperative DSL for writing queries would help improve the situation

                              This is essentially what Spark and Flink are. Both of them have also recently added a SQL layer on top.

                              The upsides of having a declarative layer are that the database does a lot of the optimization work for you, in many cases can do a better job than a human while still letting you write readable code, and when the data changes it can reoptimize without you having to rewrite all your code.

                              This is pretty similar to the situation with assembly and structured languages. They will often generate much worse assembly then you would write by hand, but they allow much more productivity and make it easy to port software between different ISAs. Occasionally you might want to spot-check a hot loop or maybe even use a little inline assembly.

                              SQL has a lot of deficiencies as a language, but whatever replaces it will probably have the same separation between query and plan. The only thing I can see changing is better hinting and debugging tools.

                              (It would be nice to be able to submit plans directly, but that would also constrain the database to never changing the plan interface and that has been frequently necessary over the last few decades to take advantage of changes in hardware. There is some research work on the subject but it doesn’t feel like a solved problem yet.)

                              1. 3

                                Adding to this, most traditional RDBMS’s had a more imperative API via cursors. I don’t have a link ready to backup my position, but every opinion I’ve heard throughout my 14 year career is to avoid cursors whenever possible. I’ve replaced a lot of cursor usage with set operations (e.g. INSERT...SELECT) for tremendous speed gains (and readability, imo).

                                One reason for the performance of standard set operations is that a lot of thought has gone into them. The engine makes decisions with a lot of knowledge about data structures, storage hardware, and profile of current data (e.g. cardinality of a join match). OTOH in Spark, it isn’t uncommon for me to reach for RDD operations because the SQL query engine isn’t as developed (this is changing, of course).

                              1. 20

                                Python package maintainers rarely use semantic versioning and often break backwards compatibility in minor releases. One of several reasons that dependency management is a nightmare in Python world.

                                1. 18

                                  I generally consider semantic versioning to be a well-intentioned falsehood. I don’t think that package vendors can have effective insight into which of their changes break compatibility when they can’t have a full bottom-up consumer graph for everyone who uses it.

                                  I don’t think that Python gets this any worse than any other language.

                                  1. 20

                                    I’ve heard this opinion expressed before… I find it to be either dangerously naive or outright dishonest. There’s a world of difference between a) the rare bug fix release or nominally-orthogonal-feature-add release that unintentionally breaks downstream code and b) intentionally changing and deprecating API’s in “minor” releases.

                                    In my view, adopting SemVer is a statement of values and intention. It communicates that you value backwards compatibility and intend to maintain it as much as is reasonably possible, and that you will only knowingly break backwards compatibility on major release increments.

                                    1. 18

                                      In my view, adopting SemVer is a statement of values and intention. It communicates that you value backwards compatibility and intend to maintain it as much as is reasonably possible, and that you will only knowingly break backwards compatibility on major release increments.

                                      A “statement of values and intention” carries no binding commitment. And the fact that you have to hedge with “as much as is reasonably possible” and “only knowingly break” kind of gives away what the real problem is: every change potentially alters the observable behavior of the software in a way that will break someone’s reliance on the previous behavior, and therefore the only way to truly follow SemVer is to increment major on every commit. Which is the same as declaring the version number to be meaningless, since if every change is a compatibility break, there’s no useful information to be gleaned from seeing the version number increment.

                                      And that’s without getting into some of my own direct experience. For example, I’ve been on the Django security team for many years, and from time to time someone has found a security issue in Django that cannot be fixed in a backwards-compatible way. Thankfully fewer of those in recent years since many of them related to weird old functionality dating to Django’s days as a newspaper CMS, but they do happen. Anyway, SemVer’s answer to this is “then either don’t fix it, or do but no matter how you fix it you’ve broken SemVer and people on the internet will scream at you and tell you that you ought to be following SemVer”. Not being a fan of no-win situations, I am content that Django has never and likely never will commit to following SemVer.

                                      1. 31

                                        A “statement of values and intention” carries no binding commitment.

                                        A label on a jar carries no binding commitment to the contents of the jar. I still appreciate that my salt and sugar are labelled differently.

                                        1. 2

                                          Selling the jar with that label on it in many countries is a binding commitment and puts you under the coverage of food safety laws, though.

                                        2. 6

                                          Anyway, SemVer’s answer to this is “then either don’t fix it, or do but no matter how you fix it you’ve broken SemVer and people on the internet will scream at you and tell you that you ought to be following SemVer”.

                                          What do you mean? SemVer’s answer to “this bug can’t be fixed in a backwards-compatible way” is to increment the major version to indicate a breaking change. You probably also want to get the message across to your users by pushing a new release of the old major version which prints some noisy “this version of blah is deprecated and has security issues” messages to the logs.

                                          It’s not perfect, I’m not saying SemVer is a silver bullet. I’m especially worried about the effects of basing automated tooling on the assumption that no package would ever push a minor or patch release with a breaking change; it seems to cause ecosystems like the NPM to be highly fragile. But when taken as a statement of intent rather than a guarantee, I think SemVer has value, and I don’t understand why you think your security issue anecdote requires breaking SemVer.

                                          1. 7

                                            What do you mean? SemVer’s answer to “this bug can’t be fixed in a backwards-compatible way” is to increment the major version to indicate a breaking change.

                                            So, let’s consider Django, because I know that well (as mentioned above). Typically Django does a feature release (minor version bump) every 8 months or so, and every third one bumps the major version and completes a deprecation cycle. So right now Django 3.1 is the latest release; next will be 3.2 (every X.2 is an LTS), then 4.0.

                                            And the support matrix consists of the most recent feature release (full bugfix and security support), the one before that (security support only), and usually one LTS (but there’s a period at the end of each where two of them overlap). The policy is that if you run on a given LTS with no deprecation warnings issued from your code, you’re good to upgrade to the next (which will be a major version bump; for example, if you’re on 2.2 LTS right now, your next LTS will be 3.2).

                                            But… what happens when a bug is found in an LTS that can’t be fixed in a backwards-compatible way? Especially a security issue? “Support for that LTS is cut off effective immediately, everybody upgrade across a major version right now” is a non-starter, but is what you propose as the correct answer. The only option is to break SemVer and do the backwards-incompatible change as a bugfix release of the LTS. Which then leads to “why don’t you follow SemVer” complaints. Well, because following SemVer would actually be worse for users than this option is.

                                            1. 3

                                              But… what happens when a bug is found in an LTS that can’t be fixed in a backwards-compatible way?

                                              Why do people run an LTS version, if not for being able to avoid worrying about it as a dependency? If you’re making incompatible changes: forget about semver, you’re breaking the LTS contract, and you may as well tell drop the LTS tag and people to run the latest.

                                              1. 1

                                                you may as well tell drop the LTS tag and people to run the latest

                                                I can think of only a couple instances in the history of Django where it happened that a security issue couldn’t be fixed in a completely backwards-compatible way. Minimizing the breakage for people – by shipping the fix into supported releases – was the best available option. It’s also completely incompatible with SemVer, and is a great example of why SemVer is at best a “nice in theory, fails in practice” idea.

                                                1. 3

                                                  Why not just tell them to upgrade? After all, your argument is essentially that stable APIs are impossible, so why bother with LTS? Every argument against semver also applies against LTS releases.

                                                  1. 3

                                                    After all, your argument is essentially that stable APIs are impossible

                                                    My argument is that absolute perfect 100% binding commitment to never causing a change to observable behavior ever under any circumstance, unless also incrementing the major version at the same time and immediately dropping support for all users of previous versions, is not practicable in the real world, but is what SemVer requires. Not committing to SemVer gives flexibility to do things like long-term support releases, and generally people have been quite happy with them and also accepting of the single-digit number of times something had to change to fix a security issue.

                                              2. 2

                                                “Support for that LTS is cut off effective immediately, everybody upgrade across a major version right now” is a non-starter

                                                If it’s a non-starter then nobody should be getting the critical security patch. You’re upgrading from 2.2 to 3.0 and calling it 2.2.1 instead. That doesn’t change the fact that a breaking change happened and you didn’t bump the major version number.

                                                You can’t issue promises like “2.2.X will have long term support” because that’s akin to knowing the future. Use a codename or something.

                                                1. 7

                                                  It’s pretty clear you’re committed to perfect technical adherence to a rule, without really giving consideration to why the rule exists. Especially if you’re at the point of “don’t commit to supporting things, because supporting things leads to breaking SemVer”.

                                                  1. 4

                                                    They should probably use something like SemVer but with four parts, e.g. Feature.Major.Minor.Patch

                                                    • Feature version changes -> We’ve made significant changes / a new release (considered breaking)
                                                    • Major version change -> We’ve made breaking changes
                                                    • Minor version change -> Non breaking new features
                                                    • Patch version change -> Other non-breaking changes

                                                    That way 2.*.*.* could be an LTS release, which would only get bug fixes, but if there was an unavoidable breaking change to fix a bug, you’d signal this in the version by e.g. going from 2.0.5.12 to 2.1.0.0. Users will have to deal with the breaking changes required to fix the bug, but they don’t have to deal with all the other major changes which have gone into the next ‘Feature’ release, 3.*.*.*. The promise that 2.*.*.*, as an LTS, will get bug fixes is honored. The promise that the major version must change on a breaking change is also honored.

                                                    SemVer doesn’t work if you try to imbue the numbers with additional meanings that can contradict the SemVer meanings.

                                                    1. 3

                                                      This scheme is very similar to Haskell’s Package Versioning Policy (PVP).

                                                    2. 1

                                                      I’m saying supporting things and adhering to SemVer should be orthogonal.

                                              3. 5

                                                every change potentially alters the observable behavior of the software

                                                This is trivially false. Adding a new helper function to a module, for example, will never break backwards compatibility.

                                                In contrast, changing a function’s input or output type is always a breaking change.

                                                By failing to even attempt to distinguish between non-breaking and breaking changes, you’re offloading work onto the package’s users.

                                                Optimize for what should be the common case: non-breaking changes.

                                                Edit: to expand on this, examples abound in the Python ecosystem of unnecessary and intentional breaking changes in “minor” releases. Take a look at the numpy release notes for plenty of examples.

                                                1. 7

                                                  Python’s dynamic nature makes “adding a helper function” a potentially breaking change. What if someone was querying, say, all definitions of a module and relying on the length somehow? I know this is a bit of a stretch, but it is possible that such a change would break code. I still value semver though.

                                                  1. 3

                                                    The number of definitions in a module is not a public API. SemVer only applies to public APIs.

                                                    1. 4

                                                      If you can access it at run-time, then someone will depend on it, and it’s a bit late to call it “not public”. Blame Python for exposing stuff like the call stack to introspection.

                                                      1. 2

                                                        Eh no? SemVer is very clear about this. Public API is whatever software declares it to be. Undeclared things can’t be public API, by definition.

                                                        1. 7

                                                          Python has no concept of public vs private. It’s all there all the time. As they say in python land, “We’re all consenting adults here”.

                                                          I’m sure, by the way, when Hettinger coined that phrase he didn’t purposely leave out those under the age of 18. Language is hard. :P

                                                  2. 1

                                                    Adding a new helper function to a module, for example, will never break backwards compatibility.

                                                    Does this comic describe a violation of SemVer?

                                                    You seriously never know what kinds of things people might be relying on, and a mere definition of compatibility in terms of input and output types is woefully insufficient to capture the things people will expect in terms of backwards compatibility.

                                                    1. 6

                                                      No, it does not descripbe a violation of SemVer, because spacebar heating is not a public API. SemVer is very clear about this. You are right people will still complain about backward compatibility even if you are keeping 100% correct SemVer.

                                                2. 6

                                                  I would agree if violations were rare. Every time I’ve tried to solve dependency issues on Python, about 75% of the packages I look into have broken semver on some level. Granted, I probably have a biased sampling technique, but I find it extremely hard to believe that it’s a rare issue.

                                                  Backwards compatibility is hard to reason about, and the skill is by no means pervasive. Even having a lot of experience looking for compatibility breaks, I still let things slip, because it can be hard to detect. One of my gripes with semver is that it doesn’t scale. It assumes that tens of thousands of open source devs with no common training program or management structure all understand what a backwards breaking change is, and how to fix it.

                                                  Testing for compatibility breaks is rare. I can’t think of any Python frameworks that help here. Nor can I think of any other languages that address this (Erlang might, but I haven’t worked with it first-hand). The most likely projects to test for compatibility between releases are those that manage data on disk or network packets. Even among those, many rely on code & design review to spot issues.

                                                  It communicates that you value backwards compatibility and intend to maintain it as much as is reasonably possible, and that you will only knowingly break backwards compatibility on major release increments.

                                                  It’s more likely that current package managers force you into semver regardless if you understand how it’s supposed to be used. The “statement of values” angle is appealing, but without much evidence. Semver is merely popular.

                                                  1. 7

                                                    I guess this depends on a specific ecosystem? Rust projects use a lot of dependencies, all those deps use semver, and, in practice, issues rarely arise. This I think is a combination of:

                                                    • the fact that semver is the only option in Rust
                                                    • the combination of guideline to not commit Cargo.lock for libraries + cargo picking maximal versions by default. This way, accidental incompatibilities are quickly discovered & packages are yanked.
                                                    • the guideline to commit Cargo.lock for binaries and otherwise final artifacts: that way folks who use Rust and who have the most of deps are shielded from incompatible updates.
                                                    • the fact that “library” is a first-class language construct (crate) and not merely a package manager convention + associated visibility rules makes it easier to distinguish between public & private API.
                                                    • Built-in support for writing test from the outside, as-if you are consumer of the library, which also catches semver-incompatible changes.

                                                    This is not to say that semver issues do not happen, just that they are rare enough. I’ve worked with Rust projects with 200-500 different deps, and didn’t pensive semver breakage being a problem.

                                                    1. 5

                                                      I would add that the Rust type system is expressive enough that many backwards incompatible changes require type signature changes which are much more obvious than violations of some implicit contract.

                                                  2. 6

                                                    I don’t think I have a naïve view of versioning; putting on my professional hat here, I have a decade of experience dealing with a dependency modeling system that handles the versions of hundreds of thousands of interrelated software artifacts that are versioned more or less independently of each other, across dozens of programming languages and runtimes. So… some experience here.

                                                    In all of this time, I’ve seen every single kind of breaking change I could imagine beforehand, and many I could not. They occurred independent of how the vendor of the code thought of it; a vendor of a versioned library might think that their change is minor, or even just a non-impacting patch, but outside of pure README changes, it turns out that they can definitely be wrong. They certainly had good intentions to communicate the nature of the change, but that intention can run hard into reality. In the end, the only way to be sure is to pin your dependencies, all the way down, and to test assiduously. And then upgrade them frequently, intentionally, and on a cadence that you can manage.

                                                    1. 1

                                                      I don’t think I have a naïve view of versioning; putting on my professional hat here, I have a decade of experience dealing with …

                                                      Here here. My experience isn’t exactly like @offby1’s but I can vouch for the rest.

                                                    2. 4

                                                      to be either dangerously naive or outright dishonest

                                                      This phrase gets bandied around the internet so much I’m surprised its not a meme.

                                                      SemVer is … okay, but you make it sound like lives depend on it. There’s a lot of software running mission critical systems without using SemVer and people aren’t dying everyday because of it. I think we can calm down.

                                                  3. 3

                                                    Thats the problem of the package management being so old. Back then semantic versioning wasnt that common and it never really caught on. In my opinion the PyPA should make a push to make more packages use semantic versioning. I‘m seeing this trend already, but its too slow…

                                                  1. 2

                                                    I like these kinds of blog posts. As an engineer (and a new one in the backend world), it can be difficult to weigh one solution to a problem up against another. This is another, very concrete, tool to do that evaluation that I will add to my toolbox. Thank you.

                                                    Also, welcome back to blogging Mr Kellogh :)

                                                    1. 2

                                                      Thanks! Yeah, crazy how all the posts stopped flowing when my first kid was born. I used to write any idea that popped into my head, now I have to stay interested for an entire week or it won’t get completed.

                                                    1. 11

                                                      The youtube 32 bit overflow wasn’t a cold path. There was no separate code path to handle numbers that didn’t fit in 32 bit, that was triggered for the first time by the Gangnam video. There were no edge cases or fallbacks that could have been avoided and there was no 3rd party to which ‘test capacity’ could have been offloaded.

                                                      The same goes for the TLS example: no untested code path was involved. At most a 3rd party could be used to automatically extend/replace the certificate when the time comes (letsencrypt).

                                                      1. 4

                                                        Good points, I’ll remove them. I had other plans for the direction of the post, but I had to pare down to keep the focus and I missed this. There’s plenty of other examples to choose from.

                                                      1. 3

                                                        There’s a lot of value in being “T” shaped

                                                        1. 4

                                                          My impression of the “T” was that you were supposed to be well above average for a wide range of skills, and exceptionally good at your speciality. In games, we would call such characters “overpowered”.

                                                          1. 4

                                                            I’d always seen it stated as “familiar” with a wide range of skills, but I’ve little doubt there are organizations looking for that shape of person.

                                                        1. 26

                                                          After performing over 100 interviews: interviewing is thoroughly broken. I also have no idea how to actually make it better.

                                                          yep

                                                          1. 2

                                                            Maybe Amazon’s interview is broken. This data-structure bullshit doesn’t help at all if the applicant doesn’t know shit about real work, system designs, soft skills, security, team work, etc

                                                            1. 4

                                                              As much as I dislike FAANG interviews, every attempt I’ve seen to fix them is also fraught with problems

                                                              1. 6

                                                                I’d love to work for one of those big FAANG but I don’t know from the top of my head how to do a BFS on a tree. So fuck it, my 20 years of development is garbage for them.

                                                                1. 4

                                                                  There are many books and courses to prep candidates for FAANG interviews. For senior engineers might be daunting but to join a FAANG some drills are to be expected.

                                                                  Whatever company will have a big pool of candidates will end up in similar situation: assess things that are largely irrelevant to the day-to-day job.

                                                                  The real drama is that very smart people who could use their brain-power to improve society at large, deal with low utility projects for years.

                                                                  1. 4

                                                                    You’re doing yourself a disservice by having this mindset

                                                                    1. 1

                                                                      Why?

                                                                      1. 1

                                                                        Because you don’t get to work at FAANG

                                                                    2. 3

                                                                      I seriously believe that if you are a good programmer, went to uni or similar and spend two weekends with “cracking the coding interview”, do 1 or 2 mock interviews to train the communication style, you have a good chance. If you aren’t anxious or have other such problems during the interview.

                                                                      Without preparation most would be lost.

                                                                      You can, of course, still think this is fucked but it’s not unpassable for good programmers without anxiety problems, reasonably good communication skills and time to prepare.

                                                                      If you are interested, I can do a mock interview with you.

                                                                      1. 15

                                                                        I don’t think the problem for most people is the details of playing the game. The game is learnable, and if one has gotten anywhere in this field its because one can learn things. The problem people have is they question why the game, which everyone knows has no bearing on the ability to do the job at hand, needs to be played at all?

                                                                        If we put our cynicism hat on (mine is pretty worn-out by now), we can answer that question by saying that what the game is about is testing people’s willingness to jump through arbitrary hoops. In that sense, it may actually accurately test their ability to function within the organization at hand, and thus may in fact be very good at its job of filtering out candidates who would not work out.

                                                                        1. 5

                                                                          but it’s not unpassable for good programmers without anxiety problems, reasonably good communication skills and time to prepare.

                                                                          It’s not, but good programmers with 20 years of experience can always get a job someplace where they don’t have to jump through these silly hoops.

                                                                          It works surprisingly well for both parties. It’s not like recruitment heads in Big Corp don’t already know this puts off experienced programmers, everyone’s been aware of that for a long time now. They just don’t want that many experienced programmers. If you’re recruiting for senior and lead positions, it’s much more efficient to go through recommendations (or promote from within) in which case the interview is… somewhat more relaxed, so to speak. The interviews are designed for something else.

                                                                          (Edit: I’m with @gthm on what they’re designed for . The main aim is to select young graduates and mid-career developers who will put up with arbitrary requirements and don’t mind spending some of their free time on it every once in a while.)

                                                                          1. 2

                                                                            Having been through the Google interview gauntlet a few years ago, there’s quite a bit more than just whiteboarding algorithms.

                                                                            I was completely unprepared for the ‘scale this data query service’ chunk, which I didn’t even know was going to be part of the interview (which is a failure of the Google recruiter frankly) but I now know is pretty standard amongst FAANG company interviews for SRE type roles. Didn’t help that the interviewer was a jerk who refused to answer any of my questions, but that’s hardly unusual!

                                                                            1. 2

                                                                              That part is also covered in “Cracking the Coding Interview”

                                                                              Not to invalidate your experience but the vast majority of my interviewing experience was pleasant. Maybe you have had bad luck or me good luck or your standards are different.

                                                                              1. 3

                                                                                1 grumpy jerk who clearly didn’t want to be there, 2 decent guys & a third who was OK but stone walled me when I asked questions about the problem he posed. Which was a little weird, but there it is.

                                                                                (CtCI has 5 pages on system design & about 100 pages on data structures, algorithms & all the rest. When a quarter of the interview is system design, that’s not going to help you much. There are some good online resources around these days though.)

                                                                  1. 24

                                                                    Data tech is a massive and intertwined ecosystem with a lot of money riding on it. It’s not just about compute or APIs, that’s a fairly small part.

                                                                    • What file formats does it support?
                                                                    • Does it run against S3/Azure/etc.?
                                                                    • How do I onboard my existing data lake?
                                                                    • How does it handle real-time vs batch?
                                                                    • Does it have some form of transactions?
                                                                    • Do I have to operate it myself or is there a Databricks-like option?
                                                                    • How do I integrate with data visualization systems like Tableau? (SQL via ODBC is the normal answer to this, which is why it’s so critical)
                                                                    • What statistical tools are at my disposal? (Give me an R or Python interface)
                                                                    • Can I do image processing? Video? Audio? Tensors?
                                                                    • What about machine learning? Does the compute system aid me in distributed model training?

                                                                    I could keep going. Giving it a JavaScript interface isn’t even leaning in to the right community. It’s a neat idea, for sure, but there’s mountains of other things a data tech needs to provide just to be even remotely viable.

                                                                    1. 6

                                                                      Yeah this is kinda what I was going to write… I worked with “big data” from ~2009 to 2016. The storage systems, storage formats, computation frameworks, and the cluster manager / cloud itself are all tightly coupled.

                                                                      You can’t buy into a new computation technology without it affecting a whole lot of things elsewhere in the stack.

                                                                      It is probably important to mention my experience was at Google, which is a somewhat unique environment, but I think the “lock in” / ecosystem / framework problems are similar elsewhere. Also, I would bet that even at medium or small companies, an individual engineer can’t just “start using” something like differential dataflow. It’s a decision that would seem to involve an entire team.

                                                                      Ironically that is part of the reason I am working on https://www.oilshell.org/ – often the least common denominator between incompatible job schedulers or data formats is a shell script!

                                                                      Similarly, I suspect Rust would be a barrier in some places. Google uses C++ and the JVM for big data, and it seems like most companies use the JVM ecosystem (Spark and Hadoop).

                                                                      Data tech also can’t be done without operators / SREs, and they (rightly) tend to be more conservative about new tech than engineers. It’s not like downloading something and trying it out on your laptop.

                                                                      Another problem is probably a lack of understanding of how inefficient big data systems can be. I frequently refer to McSherry’s COST paper, but I don’t think most people/organizations care… Somehow they don’t get the difference between 4 hours and 4 minutes, or 100 machines and 10 machines. If people are imagining that real data systems are “optimized” in any sense, they’re in for a rude awakening :)

                                                                      1. 3

                                                                        Believe that andy is referring to this paper if anyone else is curious.

                                                                        (And if you weren’t let me know and I’ll read that one instead. :] )

                                                                        1. 3

                                                                          Yup that’s it. The key phrases are “parallelizing your overhead”, and the quote “You can have a second computer once you’ve shown you know how to use the first one.” :)

                                                                          https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-mcsherry.pdf

                                                                          The details of the paper are about graph processing frameworks, which most people probably won’t relate to. But it applies to big data in general. It’s similar to experiences like this:

                                                                          https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html

                                                                          I’ve had similar experiences… 32 or 64 cores is a lot, and one good way to use them all is with a shell script. You run into fewer “parallelizing your overhead” problems. The usual suspects are (1) copying code to many machines (containers or huge statically linked binaries), (2) scheduler delay, and (3) getting data to many machines. You can do A LOT of work on one machine in the time it takes a typical cluster to say “hello” on 1000 machines…

                                                                        2. 1

                                                                          That’s a compelling explanation. If differential dataflow is an improvement on only one component, perhaps that means that we’ll see those ideas in production once the next generation of big systems replaces the old?

                                                                          1. 2

                                                                            I think if the ideas are good, we’ll see them in production at some point or another… But sometimes it takes a few decades, like algebraic data types or garbage collection… I do think this kind of big data framework (a computation model) is a little bit more like a programming language than it is a “product” like AWS S3 or Lambda.

                                                                            That is, it’s hard to sell programming languages, and it’s hard to teach people how to use them!

                                                                            I feel like the post is missing a bunch of information: like what kinds of companies or people would you expect to use differential dataflow but are not? I am interested in new computation models, and I’ve heard of it, but I filed it in the category of “things I don’t need because I don’t work on big data anymore” or “things I can’t use unless the company I work for uses it” …

                                                                        3. 2

                                                                          The above is a great response, so to elaborate on one bit:

                                                                          What statistical tools are at my disposal? (Give me an R or Python interface)

                                                                          It’s important for engineers to be aware of how many non-engineers produce important constituent parts of the data ecosystem. When a new paper comes out with code, that code is likely to be in Python or R (and occasionally Julia, or so I’m hearing).

                                                                          One of the challenges behind using other great data science languages (e.g. Scala) is that there may be an ongoing and semi-permanent translation overhead for those things.

                                                                          1. 1

                                                                            all of the above + does it support tight security and data governance?

                                                                          1. 1

                                                                            Sadly, it’s not included on areweguiyet

                                                                            1. 2

                                                                              Listing Python’s strength as being computationally efficient seems non-intuitive, but it’s true. In common usage, Python is really just a high level scriptable linker for optimized C/C++ packages and Java/Scala applications (e.g. Spark). Python will take you a very long way in terms of scaling.

                                                                              1. 1

                                                                                For some things that’s true, but I’ve found it fairly brittle. You really have to find a built-in library function that does exactly what you want. Performance tanks as soon as you have to do anything on the Python side. NumPy in particular is full of gotchas if you don’t think carefully about the execution model. Some numpy functions take a Python function as a parameter (key functions, custom filter functions, etc.), and there are conveniences like numpy.vectorize, but any of these kill your performance, because they’ll end up doing Python function calls in an inner loop, and Python function calls are very slow.

                                                                                1. 1

                                                                                  The numpy.vectorize example is a straw man. The docs are very clear that it’s not performant.

                                                                              1. 22

                                                                                What was 2020 the year of?

                                                                                (Not rhetorical, asking as calibration.)

                                                                                1. 38

                                                                                  The year of remote working?

                                                                                  1. 3

                                                                                    Seconded. I think it opened a lot of folks up to hiring remote, especially hiring managers / higher ups who may have been skeptical about remote work.

                                                                                    Hope remote hiring continues to flourish in 2021.

                                                                                  2. 6

                                                                                    On the front-end side I think it was another year of “no new flavour of the week”. From where I sit it looks like React is still the dominant UI framework, with Angular trucking along beside it. Those ships have been sailing pretty steadily for a while now. Soon people are going to have to come up with a new trope to make fun of those who work with Javascript ;)

                                                                                    1. 2

                                                                                      The fountain of flux has moved to grace the machine learning community with a new Fancy Framework™️ every week.

                                                                                    2. 3

                                                                                      Flutter.

                                                                                      1. 4

                                                                                        My prediction for Flutter in 2021, is that popularity will wane somewhat as it becomes apparent that Flutter for web and desktop aren’t going to live up to the hype.

                                                                                        1. 4

                                                                                          Anything that promises to be great everywhere is going to be average at everything and will have different drawbacks on each platform, making everything much more complicated.

                                                                                          Flutter isn’t the right approach to cross-platform development, in my opinion. The web is the best we’ve got, otherwise better to just target the platforms in a modular way.

                                                                                      1. 8

                                                                                        I really really want 2021 to be the year of Nix.

                                                                                        1. 8

                                                                                          See also: “backlash against Kubernetes”

                                                                                          1. 5

                                                                                            I struggle to see how building infrastructure on top of a library with howmanythousandslinesofcode of some Standard ML derivative and bash constitutes simplicity in orchestration.

                                                                                            NixOS is neat, but it’s not simple.

                                                                                            1. 10

                                                                                              Hot take: I’d prefer building systemd units that automatically get managed over orchestrating k8s pods any day. Nothing is simple, but Nix provides another approach to managing complexity well.

                                                                                              1. 4

                                                                                                I had an idea to rebuild what was once CoreOS’s fleetctl. Orchestration built on top of systemd, without anything more fancy on top.

                                                                                                1. 2

                                                                                                  I think people generally complain about systemd, among many reasons, because distros started with /etc/init.d, picked up systemd, and started using it in the same way. So “why do we need all this crap” makes sense when all the distro uses is the init system features. But systemd, for better or worse, is really a daemon manager, and the argument for systemd vs. sysvinit with NixOS or a tool like that is much stronger.

                                                                                              2. 1

                                                                                                Not sure how can be something too simple. A tool either solves the problem or it doesnt.

                                                                                              3. 4

                                                                                                And a blow against Ansible, Terraform, Salt etc. for free!

                                                                                                1. 1

                                                                                                  I’m a Nix n00b, but I don’t understand the comparison between Nix/NixOS and Kubernetes. To me, k8s = “distributed” vs Nix = “single host”. Did I miss something?

                                                                                                  1. 1

                                                                                                    NixOps and similar let you do a form of distributed Terraform-style configuration. It’s not automatic scheduling like Kubes, but gets you part of the way there at least. Scheduling would be cool, though. If NixOps could do it, it would beat Kubernetes at its own game.

                                                                                                    And that’s why Nix is a technology to watch. I think its current adoption is partially a backlash against Docker being inefficient - you can generate Docker images with docker-tools in nixpkgs now without ever installing Docker. The principles behind the tech are solid - per-application install prefixes are widely used on clusters, Nix takes it to the next level and uses them as part of the dataflow of building new derivations. In 2021, I bet we’ll see the tools and documentation start to mature in a way that will ultimately set the stage for it being able to do more of what Kubernetes does, but without YAML. (Seriously. I’d rather write Nix than YAML.)

                                                                                                    And Nix has done all this without Docker-style growth hacking or Kubernetes-style aggressive advertising, but nixpkgs is now creeping up on FreeBSD Ports in number of maintainers. The ecosystem has a pretty bright future, IMO.

                                                                                                2. 1

                                                                                                  Flakes and the cli update will finally make it make sense.

                                                                                                1. 4

                                                                                                  Go’s learning curve is amazing. I was writing real, useful code in Go in just a few hours and had successfully written a real, “large” application in a few days. It’s great.

                                                                                                  It’s a shame that it seems like all the momentum is with Rust now. It almost feels like there’s no point in learning anything other than Rust given the trajectory of things…(I’m sad because I prefer Go to Rust).

                                                                                                  1. 2

                                                                                                    What momentum? I must be reading all the wrong sources of things, I don’t see Rust as on an upward trajectory myself (maybe I’m just biased), I don’t even see it on the TIOBE index where Go is at #14. :D

                                                                                                    1. 2

                                                                                                      Rust is at 26 right now. It’s jumped into the top 20 within the past year and jumped up and down.

                                                                                                      The momentum I’m talking about is that Rust has been picked up by Microsoft as an acceptable language, Linus has intimated that he might be willing to allow Rust in the kernel, I’ve seen a lot of “why we switched Product X to Rust”, and so on. Go has been “stable” for over a decade now, so it’s got an enormous head start, but I feel like Rust is going to rapidly outpace it.

                                                                                                      Again, they’re both fine languages, I just feel like I’m gonna need to learn Rust for the sake of my future career.

                                                                                                      1. 3

                                                                                                        Rust will never have the “market adoption” of Go. Not to say it’s not worth learning! Or that it’s not the right tool for many jobs. It’s a great language.

                                                                                                    2. 2

                                                                                                      I think it depends on what you want to write and where you want to write it. A lot of big name companies are doing things in Go these days, it’s hardly fallen out of favor in that sense.

                                                                                                      1. 2

                                                                                                        It’s a mistake to conflate online “buzz” with real momentum. I’d argue that Go has less buzz precisely because it has lots of momentum. Rust doesn’t have much momentum right now, it has acceleration.

                                                                                                        1. 1

                                                                                                          Rust doesn’t have much momentum right now, it has acceleration.

                                                                                                          That’s a really elegant way to put it.

                                                                                                          I do feel like every job posting/offer I come across has C/C++/Python/Rust listed, and not Go. I realize that there’s very likely selection bias at play given the kind of jobs I look at.

                                                                                                          EDIT: Okay, so yeah, perception vs reality. I went to Stack Overflow and did a job search. 109 for Golang, 18 for Rust (and I’m sure there’s overlap there).

                                                                                                        2. 1

                                                                                                          Go learning curve is smooth but it feel really like … work even if I only worked with it during a few months two years ago, that was the lasting impression Go made on me. It feel like a tool to get stuff done. The connotation is not negative but I don’t see myself use it for doing fun stuff on the side. Since that time, I want to take the time to learn more Go but for now I am waiting that Go2 drops because my knowledge is dated (never been exposed to go module at that time for example). It feels like a nice tool to have but I will not have fun creating with it.

                                                                                                          On the momentum side, Go still have a lot and Rust has gained a fair share of it but it really different field and for different reasons imho and for teams/companies with different needs and approaches. Nothing to be sad about, your hammer don’t have to fit every nail.

                                                                                                        1. 1

                                                                                                          Alright, let’s do it for “micro services”: go!

                                                                                                          1. 1

                                                                                                            I’m eager to do it for such buzzwords as “email” and “RAM” and “keyboard” which would be more in line with the author’s intent with this post.