Threads for mbenbernard

  1. 1

    Thanks for this post. I always find it valuable to hear about someone else’s learning process, successes and failures, and what they found rewarding.

    1. 1

      My pleasure! I had as much fun writing it as you had reading it.

    1. 3

      As a programmer, I’m not naturally business-oriented. So when I need to substantially deviate from coding for a long time, I feel that something is missing. For example, when I was on a roll of speaking with several potential customers a day, it required careful preparation, reviewing of notes and follow-ups.

      As a programmer, speaking with customers is one of my favorite things. At one point in my career I was the go to guy for dealing with difficult customer issues. Anything from fixing highly complex issues with high value customers, to answering questions about the product. I was essentially the “last resort” of our support team, despite officially holding the title of software engineer. Many of our high profile customers had my personal cell phone number.

      I don’t think I’ve enjoyed any part of my career nearly as much as that, before or since. I found the work so engaging that I never needed to take notes. And I’m by no means someone with a fantastic memory who never needs notes. All the customers and their problems were always on my mind. I just loved the role.

      Perhaps my situation is different from what @mbenbernard is talking about. My work involved a fair bit of debugging, which I always enjoy when programming. But my role was as much business-to-business “politics” as it was debugging and support, and I wrote hardly any code myself.

      Regardless, I don’t think it’s fair to generalize programmers as business-dumb. And I certainly don’t think I’m some bizarre exception. A dear friend of mine is endlessly fascinated with business and incredibly passionate about his role in management, despite starting as—and continuing to be—an extremely talented programmer.

      1. 4

        I don’t think it’s fair to generalize programmers as business-dumb

        and it perpetuates a stereotype that has the risk of becoming aspirational. Not interacting with customers is not a badge of honor.

        1. 2

          I agree with you. Have a look at my comment above - I’ve never said that programmers are business-dumb.

        2. 2

          I agree with the vast majority of what you’ve said. At my previous job, I was also pretty good at solving complex and difficult bugs. Beyond the actual debugging, this involved communicating quite a bit with customers, either via email or by phone. I even visited a couple of customers on-site, which was a lot of fun! I don’t have any problem with that - this is part of a software engineer’s job. Just like you, I didn’t need to take a lot of notes in this context, except maybe for follow-ups with customer support agents and my manager/boss.

          Now, put that into the context of a startup, especially at the beginning when you still don’t have any customers and you’re trying to figure everything out. This is pretty much what Lean Startup is about - developing a new product in a world of uncertainty. You don’t know what you need to build, and you don’t know who your customer is. You have a rough idea of what problem(s) a particular category of people may have, but you still need to speak with actual people to validate your hypotheses. So you contact maybe 100 people, among which only about 30% reply to you. You ask them if they’re available for a call, and about 50% mysteriously vanish (they have their own reasons). You ultimately set up maybe 10-20 phone calls of 20-30 mins each to discuss. In this context, it’s useful to take notes, because memorizing everything, and all subtleties, is next to impossible. After those calls, you follow up with people and thank them for their time.

          In my mind, these are purely business-oriented tasks, which I happen to enjoy doing. Otherwise, I wouldn’t have done them. However, my first love is programming. I love designing and creating new things. I’m fundamentally a problem solver. So if you give me a choice between business-oriented tasks and programming, then I’ll choose the latter anytime. If it weren’t the case, then I would be in the wrong field.

          In my post, I admit that my wording sounded like a generalization, but I was speaking mainly from my own perspective. Besides, I happen to know a few programmers, and the vast majority prefer writing code than doing anything else. They may not be naturally business-oriented, but they’re certainly not “business-dumb” (whatever that means). It’s just a matter of personal traits and preferences.

        1. 2

          OP, have you taken a look at pytype? we are solving a similar problem, of inferring types without annotations, and would love to hear from you if you’re interested in contributing, or even just to compare notes on our various approaches.

          there’s also a mailing list for people working on various python type checkers that you might be interested in.

          1. 3

            Thanks for your feedback and the links, Martin. This is much appreciated.

            I definitely know pytype - I admire your work a lot!

            For the moment, my project is experimental and closed source. I still don’t know whether it will be open source (under which license?), or whether it will stay closed source and shipped as part of a commercial product. My tool isn’t so much for type checking - although it can do that - but instead it’s made to infer various things that current type checkers aren’t able to infer. It will be part of a documentation tool that I’ve been working on in parallel.

            I sincerely appreciate your offer of sharing notes, but since I possibly intend to take a commercial route with my tool, I don’t feel comfortable with sharing how it works at the moment.

            Anyway, many thanks for your offer. I’ll keep you posted when I release something.

            1. 2

              okay, no problem, i can understand not wanting to release details early if the product is intended to be commercial!

          1. 18

            Be careful! Side projects can get you into legal trouble, if you’re a salaried employee of a tech company. Most companies (at least in the US) will have you sign an “Intellectual Property Agreement” or “Proprietary Rights Agreement” when you accept the position. This usually signs over to them the rights to any inventions you come up with, no matter when or where, during the term of your employment, as long as those inventions are related to the company’s field of business.

            • It doesn’t matter if you did the work on your own computer.
            • It doesn’t matter if you did the work on evenings / weekends.
            • It doesn’t matter if it’s open source.
            • It doesn’t matter if you think your project isn’t related to your company’s business. It matters what your employer thinks, and in the worst case what a court thinks.

            People can and have gotten in trouble this way. I got hit by it once, not with legal trouble fortunately, but losing a verbally-promised payment for my app. (And looking at it the opposite way, some companies have been hurt by an employee devoting most of their energies to a side project, and then quitting to turn the side project into a startup, or taking it to a competitor.)

            The workaround is to talk with your employer before you start the project and get them to make an exemption for it; I discuss this in more detail in the blog post I linked above. (But of course I am not providing actual legal advice…)

            1. 24

              Fortunately, in most western jurisdictions outside the US, such oppressive employment contract provisions (if attempted) would be unenforceable.

              1. 2

                Thanks for this wonderful piece of advice. You’re absolutely right - employment contracts should be read very carefully to avoid legal trouble. I gave up on some of my earlier projects partly for this reason.

              1. 4

                I found that the more I dig into complex, obscure and low-level topics, the easier it gets to tackle difficult topics. For example, after having explored CPython’s codebase extensively, I feel much more at ease in exploring codebases of equal or greater complexity.

                I don’t know. I’ve actually never seen an open source project (with more than 2-3 regular committers and sizable userbase > 10) that has exhibited the atrocities of closed-source commercial code that one guy wrote years ago and then vanished. It’s usually like digging out the oldest snapshot of $language from 15 years ago and first of all spending a few hours trying to get it to build, just that it actually has an INSTALL file, so you can poke at it while running it. So yeah, that is one side of “complexity and obscurity”, but that’s my experience from most past employers and customers.

                1. 4

                  I agree with you that proprietary source code is sometimes (often?) harder to understand than open source. IMO, the lack of documentation/comments, the lack of proper organization and overengineering contribute significantly to the problem.

                  Perhaps my example could have been a little bit better to illustrate my point. For example, I’ve been working on a compiler/interpreter and a static type checker in the past couple of years. I started from nothing, without any knowledge about those things. At first, it was scary and I found concepts are bit overwhelming. But the more I dug into it, the easier it was to understand. Incidentally, I found that the “fear” aspect has almost completely vanished whenever I dive into complex and obscure topics. This is what I meant.

                  1. 2

                    I got your point, but given a modicum of competence I think a well-documented “hard topic” codebase like a compiler is still a lot easier to navigate and understand than a shitshow of boring enterprise code :)

                1. 1

                  It makes me think of my Lotus Notes days… There used to be a website reporting all the cryptic errors returned by Notes.

                  1. 2

                    To write good docs, you need to walk on a fine line between conciseness, completeness and clarity. Often, only one of these factors prevails, at the expense of the others.

                    • Docs that are too long (or include too many details) = bad for conciseness and clarity.
                    • Docs that are too short (or include not enough details) = bad for completeness and clarity.
                    • Docs that are written poorly = bad for clarify.
                    1. 3

                      Overall, I find that the premise of the post is true.

                      Programs can be unreadable for several reasons (lousy formatting, bad or no naming conventions, etc.). This makes introducing bugs very easy.

                      Over-engineered code is also a problem. The code might seem well designed at first sight, but you quickly realize that it’s a huge mess and that it’s super hard to understand. Again, introducing a bug in this case is easy.

                      1. 1

                        Great article. This is something I think way too few people know. I used to be in that business myself, even though it was very local and so only nation wide (I am not living in the US) applied. It was really insightful.

                        Funny side note: robots.txt was not binding, calls or emails of people saying you need to stop were. That’s how weird laws can be. Again, not the US. ;)

                        Of course one should still respect it to not get a call. And another hint to not get into troubles: Sell the fact that you are crawling them! Backlinks, free visitors, etc.

                        And one more thing. Crawling, scraping, filling out forms, sending post vs get requests, etc. can all be viewed differently by the law and a lot of common terms can mean different things. So ignore their technical meanings when dealing with the law. Make sure to learn what those things mean to lawyers. They can be funny and things that you never even considered. That’s by the way something one should do in general: Question each and every technical term!

                        Here is the only thing I am not so sure about:

                        “It’s the same as what my browser already does! Scraping a site is not technically different from using a web browser. I could gather data manually, anyway!”

                        False. Terms of Service (ToS) often contain clauses that prohibit crawling/scraping/harvesting and automated uses of their associated services. You’re legally bound by those terms; it doesn’t matter that you could get that data manually.

                        AFAIK ToS only apply if you register/explicitly agree. Is that true?

                        Else it would be really weird. One would essentially create a law, rather than terms. Or in other words you could make a link and there having a ToS saying you are not allowed to visit or that you will have to pay now.

                        1. 1

                          Hey, thanks for the great feedback! :)

                          Sell the fact that you are crawling them! Backlinks, free visitors, etc.

                          You’re absolutely right, and I should have mentioned it in my post!

                          AFAIK ToS only apply if you register/explicitly agree. Is that true?

                          In some cases, courts ruled that since the defendants were logically aware of the ToS (even though they hadn’t explicitly agreed to them), they were enforceable. Take a look at bullets #7 and #8 of the section “The typical counterarguments brought by people” in my post. Whether or not ToS are enforceable seems to depend on the context.

                          Or in other words you could make a link and there having a ToS saying you are not allowed to visit or that you will have to pay now.

                          True, and some people successful did it. Take a look at Internet Archive v. Suzanne Shell.

                        1. 1

                          Great write up! I, for one, can’t wait (sense my sarcasm here) until we have cookie / evercookie/ IP based TOS CAPTCHAs to prove the TOS was agreed to before proceeding. And not one of these JavaScript based things…No! that’d be too easy to ignore or work around. I’m talking a kill bots style intrusion that happens way down low in the stack that can’t be circumvented and must be done to unlock the content. At the first sign of bot like behavior at a given IP, it’s another TOS CAPTCHA.

                          1. 1

                            Thanks for your kind remarks.

                            And yep, what you suggest would be super effective! hehe :)

                          1. 4

                            I’m curious if anyone is knowledgeable about how ad blockers and software like Brave browser fit in with the terms of use situation. Seems like if you are crawling and scraping for your own personal use, and not re-publishing, you might be able to craft your crawler/scraper to adhere as closely to TOU as ad blocking does.

                            1. 3

                              Brave is in a very precarious spot I think because they’re taking the content, remixing it, and showing it. That’s close to what aereo was doing. Actually probably more infringing than aereo. Maybe you can do it for yourself, personally, but it’s treacherous ground for a business model.

                              1. 2

                                I’m curious if anyone is knowledgeable about how ad blockers and software like Brave browser fit in with the terms of use situation.

                                Personally, I don’t know. It’s a different topic.

                                But if you consider that there are still a lot of grey areas in law about scraping/crawling, there are probably also a lot of grey areas about Ad Blockers. I’ve just googled it and I found that some German publishers sued Adblock Plus in the past. Not sure what happened to the other ad blockers.

                                Seems like if you are crawling and scraping for your own personal use, and not re-publishing, you might be able to craft your crawler/scraper to adhere as closely to TOU as ad blocking does.

                                I don’t think so. Because ToS/ToU often prohibit automatic data collection.

                                1. 2

                                  But if you consider that there are still a lot of grey areas in law about scraping/crawling, there are probably also a lot of grey areas about Ad Blockers. I’ve just googled it and I found that some German publishers sued Adblock Plus in the past. Not sure what happened to the other ad blockers.

                                  AdBlock Plus has an “acceptable ads” product, which charges larger publishers a fee to be included on that list.

                                  https://adblockplus.org/acceptable-ads#revenue

                                  Springer sued AdBlock Plus and ad blocking itself was deemed legal, “acceptable ads” not.

                                2. 1

                                  There have been plenty of attempts by publishers to sue adblockers with arguments along those lines. From what I’m aware they always lost.

                                  1. 1

                                    Apparently, Google and other big names attempted to sue Adblock Plus. But I don’t know how it turned out either.

                                    It would be interesting to do a bit more research on this topic. What we’d find out would probably be super interesting :)

                                1. 1

                                  Python might not be the fastest, I recognize. But at the time, I learned this language for very specific reasons:

                                  • It has a very simple syntax. Everything is stripped down, especially when compared to Java or C# (e.g. no need for braces, interfaces, etc.).
                                  • It’s elegant (e.g. comprehensions).
                                  • It has a lot of libraries/packages, so you can do pretty much everything you want.
                                  • It’s truly cross-platform.

                                  So for me, the “speed” factor was not important at all in my decision.

                                  And I tend to agree with the point of the author; most projects are not performance critical, so arguing that Python is slow is completely irrelevant in those situations.

                                  1. 0

                                    The author says that one way to convince people of trying Erlang is by hiding its weaknesses (e.g. its syntax) and focusing on its strenghts (e.g. its performance). He gives MongoDB as an example, essentially saying that the folks of MongoDB originally went to meetups and showed benchmarks to people, instead of focusing on its high unstability.

                                    I’d say that I’m a bit mitigated on this. The problem is that despite its weaknesses, MongoDB presented far more concrete advantages to people, directly from the start. Those advantages were obvious. But the same may not apply to Erlang.

                                    1. 1

                                      I’m not an astrophysicist, but I would imagine any civilization capable of generating the amount of energy needed for the observed power of FRBs would also have come up with something better than solar sails….unless there really isn’t anything better than solar sails, which, while they’re awesome, would make me a little sad.

                                      1. 2

                                        Alternatively, they prefer to keep control of their probes centralised rather than allowing them to be independent; that is, it could be a political rather than technical design choice.

                                        1. 2

                                          I agree. I’d also expect advanced alien civilizations to use… advanced technology.

                                          But what if other alien civilizations aren’t much more advanced than us, or are just a little bit more advanced than us? Maybe, just like us, they’re trying things out, and they don’t really know what they’re doing.

                                          1. 3

                                            Maybe they’ve got much more advanced technology, but this is their equivalent of yacht racing.

                                        1. 4

                                          I’m not a fan of “open floor spaces” either. I agree that it’s a productivity killer, most of the time.

                                          I remember reading an article many years ago about the private offices at Fogcreek, and in my mind it, it made complete sense. But not everybody (read: not every company) sees it that way, unfortunately.

                                          1. 2

                                            I’m mainly a Python programmer these days. I know that there are a few irritating things about it, and this might be why some people decided to switch.

                                            I’ve never worked seriously with Go. I’ve just created a few simple programs here and there. So I can’t really speak about it.

                                            I’m curious; for those of you who code in Go daily, and who have already coded in Python for several years, what are the real advantages of Go that justify a switch?

                                            1. 2

                                              off the top of my head; type safety, real concurrency, some machine density / performance gains.

                                              more than anything i just really like the choices the language designers have made, it leads to clearer programs that are easier to maintain.

                                              it did not replace python as my “batteries included” scripting language. go is great for medium to large projects but struggles a bit for very quick tasks still.

                                            1. 2

                                              The article talks about Python crashing with an out-of-memory error while crawling a web page. The author presents various fixes to his/her Python code.

                                              I think really, though, that those aren’t fixes. Those are workarounds to the fundamentally broken nature of memory allocation on Linux. The OOM killer idea is just…I mean, I know that I love having random processes killed at random intervals because some other process was told there was more memory available than there was.

                                              (Okay, so yeah, the author of TFA shouldn’t have relied on an out-of-memory condition to signal when to stop crawling, but saying that wouldn’t have given me an opportunity to bitch about Linux’s allocator…)

                                              1. 2

                                                Thanks for your feedback :)

                                                I think really, though, that those are fixes.

                                                An out-of-memory error generally means that there’s either:

                                                1. Something broken in your code (or in something that your code depends on).
                                                2. A lack of resources on the system.

                                                In either case, as the programmer, you’re at fault. And fixes or workarounds will be needed.

                                                I’m not sure that I agree with you on the premise that memory allocation is broken on Linux. No matter which OS you use, your system resources are limited in some way. Aren’t they?

                                                But you know what? I’m fairly new to Linux, so I’m really open to different ideas and solutions. What do you think would be a better solution than OOM killer?

                                                Finally, I agree with you that crashes are not 100% fun to deal with.

                                                1. 1

                                                  An out-of-memory error generally means that there’s either:

                                                  Something broken in your code (or in something that your code depends on).
                                                  A lack of resources on the system.
                                                  

                                                  In either case, as the programmer, you’re at fault. And fixes or workarounds will be needed.

                                                  Also, as an additional reply: the way Linux does it, sometimes it’s not your fault. Linux’s default allocation strategy lies to you: it tells you that resources you reserved were in fact successfully reserved for you when they’re not.

                                                  1. 1

                                                    I’ve read your reply below and given that OOM killer can kill the wrong process sometimes, you’re right in a way.

                                                    However, could we say that it’s the programmer’s fault for not planning enough resources on the system? I tend to think so.

                                                  2. 1

                                                    http://www.win.tue.nl/~aeb/linux/lk/lk-9.html#ss9.6

                                                    That sheds some light on how Linux caters to memory-greedy processes.

                                                    @lorddimwit, is this what you called a hassle? It can certainly be, but it’s not impossibly difficult to enforce a hard limit.

                                                    In practice there seems to be enough crappy apps out there that the overcommit system was developed for good reason, but YMMV as always. I have had to tune this for servers, but not so often and never for a desktop.

                                                    1. 1

                                                      I know, that’s why I parenthetically clarified that really I was bitching about the OOM killer, and not something else. :)

                                                      Linux should, IMHO, simply fail at allocation time if memory is exhausted, returning NULL from malloc. Right now, what happens is that memory allocation essentially never fails and then, at some random point in the future if resources are actually exhausted, a random program is killed.

                                                      (Okay, so it’s not random, it’s the one with the highest OOM score, but still.)

                                                      The problem is that the process that’s killed is decoupled from the most recent allocation. This means that long-running processes with no bugs can just be killed at random times because some other program allocated too much memory. You can fiddle with OOM score weights and stuff, but at the end of the day, the consequences are the same: a random process is going to get killed on memory exhaustion, rather than just have the allocation fail.

                                                      The most logical solution, to me, is simply return NULL on allocation failure and let the program deal with it in a way that makes sense (try again with a smaller allocation, report to the user that memory’s exhausted, whatever). Instead, it’s impossible to detect when a memory allocation from malloc isn’t really going to be available.

                                                      It’s possible to disable the OOM killer (or at least it used to be), but it’s a hassle.

                                                      1. 2

                                                        Okay, I see what you mean.

                                                        I don’t claim to know all the details of OOM killer. But by reading OOM killer’s source code, I understand that it will not randomly kill processes.

                                                        Instead, it seems to calculate an “OOM badness score” mainly based on the total amount of memory used by processes. So any process that has the highest score (i.e. takes the most memory) might be killed first, but not necessarily. It depends on other factors as well.

                                                        In my specific scenario, it killed the right process. But you may be right; there are probably other situations where the wrong process will be killed.

                                                        Have you ever experienced it?

                                                        1. 4

                                                          Oh yeah, all the time. There was a period of time where the OOM killer was sarcastically called “the Postgres killer”. Because of the way PostgreSQL managed buffers, it would almost always have the highest OOM score. They fixed it by allocating buffers differently, but it sucks when your production DB is randomly killed out from underneath you when it’s doing nothing wrong.

                                                          Again, you can adjust weights and such so that different processes would be more likely to be selected based on their weighted OOM score, but it’s an imperfect solution.

                                                          1. 4

                                                            Or the X killer. If you had 100 windows running, the X server was probably using the most memory. Kernel kills that, and suddenly lots of memory is free…

                                                            1. 3

                                                              There was a certain era of the ‘90s where people ran dual-purpose Unix server/workstations, where that might not even have been the wrong choice. If you’ve got an X session running on the same machine that runs the ecommerce database and website, better to take down the X session…

                                                              1. 3

                                                                Says the guy who was never running his company’s vital app in an xterm. :)

                                                        2. 1

                                                          The reason Linux does that is because of an idiom of Unix programming—the program will allocate a huge block of memory but only use a portion of it. Because Unix in general has used paging systems for … oh … 30 years or so, a large block will take up address space but no actual RAM until it’s actually used. Try running this:

                                                          for (size_t i = 0 ; i < 1000000; i++)
                                                            void *p = malloc(1024 * 1024);
                                                          

                                                          with

                                                          for (size_t i = 0 ; i < 1000000; i++)
                                                          {
                                                            void *p = malloc(1024*1024);
                                                            memset(p,127,1024*1024);
                                                          }
                                                          

                                                          One will run rather quickly, and one will probably bring your system to a crawl (especially if its a 32bit system).

                                                          You can change the behavior of Linux (the term you want is “overcommit”) but you might be surprised at what fails when disabled.

                                                          1. 3

                                                            I understand the reasoning behind it, and I still think it’s problematic. It’s a distinct issue from leaving pages unmapped until they’re used. I prefer determinism to convenience. :)

                                                            Many Unicies, both old (e.g. Solaris) and new (e.g. FreeBSD), keep a count of the number of pages of swap that would be needed if everyone suddenly called in their loans, and would return a failed allocation if the number of pages requested would cause the total outstanding page debt to exceed that number. That’s the way I’d prefer it, and it still works well with virtual memory and is performant and all that good stuff. Memory is still left unmapped until it’s touched, just as before. All that’s different is a counter is incremented.

                                                            The problem of course is that if every last page of memory is used, it wouldn’t be possible to start up a shell and fix things, in theory. Linux “solved” this by killing a random process. Some of the Unices solved it the right way, by keeping a small amount of memory free and exclusively for use by the superuser, so that root could log in and fix things.

                                                            (Of course that fails if the runaway process is running as root, but that’s a failure of system administration, not memory allocation. ;) )

                                                            I know that Solaris would continue running with 100% memory utilization and things would fail the right way (that is, by returning an error code/NULL) when called, rather than killing off some random, possibly important, process.

                                                            EDIT: FreeBSD does support memory overcommit now, too, optionally, enabled via sysctl.

                                                            1. 3

                                                              I’m always amazed by the level of expertise and knowledge that people have online.

                                                              Thanks for sharing your input, lorddimwit! :)

                                                    1. 2

                                                      I appreciate mbenbernard posting here. This is a small example of a really broad problem. If you do networking with untrusted servers (or even trusted servers) you have to ensure your memory usage is bounded and assume the thing on the other side can misbehave.

                                                      This reminds me of someone seeing an OOM error because random HTTP was interpreted as a 32bit number indicating an allocation size https://rachelbythebay.com/w/2016/02/21/malloc/

                                                      There are multiple approaches to getting a good level of reliability. The dead simple one is using processes and relying on linux. It would have been trivial to write an app that does this:

                                                      timeout 30s curl foo.com | readlimit 200kb | ....
                                                      

                                                      While that sort of setup is pretty fun, it’s high overhead. In practice I take the careful coding approach. In golang I generally use a combination of the streaming approach you mentioned (the default in go) combined with a hard limit on I/O https://golang.org/pkg/io/#LimitReader combined with with deadlines on read and write operations https://golang.org/pkg/net/#TCPConn.SetDeadline

                                                      [edit]

                                                      I made up the readlimit program. Writing a program which copies up to N bytes from stdin and writes to stdout would be pretty straightforward.

                                                      1. 2

                                                        Hey thanks for your comments, Shane!

                                                        One must always assume that APIs might break; and I discovered it the hard way with requests.

                                                        I’m curious; when you say that the streaming approach is “the default in go”, do you mean that the default HTTP library of Go automatically handles streaming (transparently)?