1. 36

    I would take a significant pay cut before I went back to working in an office.

    I wake up now, walk my son to school, come back home and play with my younger son and have coffee with my wife. I go upstairs to my office when the time comes and get to work. When my son gets home from school, I’m there to greet him and give him a hug. When my wife needs help (un)loading the car, I can run down and help her. On my breaks I can go take a shower or eat lunch at home (much cheaper than buying lunch), etc, etc.

    I combat the lack of socialization by going out anywhere between one and four times a month with a close friend of mine who also works from home; we set up shop in a coffee shop/bar/restaurant and work the day together (though to be honest, we do a lot of socializing, since it was pent up).

    This has worked for me for about a decade now.

    1. 3

      It’s like you’ve been watching me…

      But seriously. The extra time I get to have with my kids working from home is just the best thing ever.

      1. 1

        love that idea with your friend. I’m gonna try it too :)

        1. 18

          Well okay but get you’ll have to get your own friend.

          1. 1

            😆😂

        2. 1

          Ditto for me since I starting remoting from about a year back.

          For socialising, I make sure I get two/three slots of non contact sport a week.

          Out of the things I wish I had done sooner, this is pretty much at the top.

        1. 3

          Anyone with a 4-digit PIN may have noticed that banking is in the 90s. The 1890s.

          1. 1

            What’s the problem with 4-digit PIN?

            1. 2

              I don’t have a link handy but the major (current?) implementation is ass. Jist of it is that with the initial pin knowledge one can recover the current pin. So changing pin for security on those systems is kinda moot.

              I just hope most banks have moved away from that.

          1. 28

            Other facts include:

            I don’t care about how fast your app runs in electron. The Space Shuttle goes fast too, but the amount of energy that goes into launching isn’t trivial.

            1. 3

              IMHO the article is a bit meagre on the facts. Thanks for pointing them out. I also really like the part about the space shuttle, I will use it in future discussion about electron :)

              1. 2

                The native analogy to this isn’t recompiling a program using the native GUI, it’s recompiling the native GUI libraries.

                My understanding is that Qt/Gnome are both pretty tough to get up and running.

                That said, it would be extra awesome to make this stuff easier. Namely not having to have a copy of Electron for every app would be great. Where’s the JVM for Electron?

                1. 8

                  My understanding is that Qt/Gnome are both pretty tough to get up and running.

                  Nope. Gnome is a mess, but Qt is beautifully organized and very easy to clone and build.

                  1. 0

                    My limited experience trying to release a new version of a Qt code base says otherwise. Moving an app from one version of Qt to another is non trivial IME.

                    1. 2

                      I have done it. Coincidentally, it is also a gooey client for postgres (https://github.com/pgXplorer/pgXplorer). Moved it from 4.x to 5.x. It was relatively ‘easy’. From 5.x to 5.y involved little to no change in certain versions.

                      My opinion is that the Qt documentation is excellent overall but internals can send you on a wild goose chase. For example textfield or a dropdown in a table header (lel)

                      1. 1

                        You’re comparing apples to oranges, though.

                        Initial setup and day to day use is fundamentally different than upgrading a large code base to an incompatible new major version.

                        Electron hasn’t been around long enough to even make that comparison. Even with Qt, there have only been two major non-backwards compatible releases in the past 16 years.

                        1. 1

                          I don’t doubt it. I’m hardly an expert, my one set of experiences around this was trying to recompile the Zeal open source doc set viewer (like Dash but for Linux) for a newer Qt version, because if you built it from source on modern Linux versions it wouldn’t display properly.

                          It was un-fun :)

                1. 26

                  The thing I don’t see people say enough (or at all) when discussing this Juicero fail:

                  Routinely drinking fruit juices is not, in fact, healthy!

                  Doesn’t matter if squeezed or pasteurized, there’s just too much sugar! And no amount of vitamins is going to offset the damage.

                  1. 15

                    My favorite juicer not only provides the freshest, most nutritious product, but is also the cheapest and requires less cleaning than the Juicero. The one drawback is that it requires owning at least a partial set, but at least each part is individually small and unobtrusive. I won’t tell you how to obtain it, but more than likely you’re already carrying a set in your mouth.

                    1. 8

                      I own a set of these, too, and, though the cleaning regimen is straightforward, the maintenance costs are large enough that there’s an entire arbitrage industry around it. And, yes, the initial product offering is free (and the first part refresh, though that happens pretty quickly given the total equipment lifetime), but replacing parts eventually becomes quite expensive, and requires significant downtime.

                      Don’t get me wrong – the convenience factor is very high with this product. I’m just saying that we shouldn’t downplay the (potentially significant) costs.

                    2. 7

                      This is a tad incorrect. Whole fruit with pulp and all is actually quite healthy. I cannot quite get the link but there was a research done with test groups consuming water, sugared water, freshly squeezed fruit juice and sliced raw fruit. Most healthy outcome was water (lol) and sliced raw fruit.

                      1. 16

                        Juices are considered unhealthy in comparison to the actual fruit simply because of the sheer amount of it: a glass of orange juice contains juice from about 4 oranges which translates to about a full daily doze of sugar. And you’re not going to chew through 4 oranges each time you’re feeling thirsty :-)

                        1. 4

                          you’re not going to chew through 4 oranges each time you’re feeling thirsty

                          Been there, done that. ?

                        2. 4

                          What @isagalaev said, and adding to it, the “health benefit” is that fibers slow down the absorption of carbohydrates.

                          1. 1

                            I thought I’d read that the act of mastication and digestion of intact whole fruit was something that required more energy and delivered a greater health benefit than merely the whole fruits’ ingredients.

                        1. 1

                          This is an amazing engineering effort. Congratulations on the release!

                          1. 7

                            Hands down one of the funniest things I have read in months!

                            1. 13

                              A small niggle: there is a pretty big difference between the fulltext search found in PostgreSQL and the fulltext search found in a real IR engine like Lucene (which powers Elasticsearch and SOLR). In particular, Lucene will rank results using corpus frequencies or something called tf-idf. PostgreSQL won’t do that for you (last time I checked). Taking corpus frequencies into account is important because it lets you build your ranking function based on how important something is in the entire corpus.

                              To be fair, I don’t think I’ve ever seen anyone directly compare PostgreSQL’s ranking with Lucene’s (or any other search engine), so I don’t know where the “it works well enough for X% of use cases” line falls.

                              This is otherwise a nice resource. Thanks!

                              1. 3

                                Appreciate you calling this out. I maintain an Elasticsearch client and while I’m an avid PostgreSQL user I’ve run into some people very stubborn about using PG’s FULLTEXT instead of a real search engine.

                                It’s too bad ES plays hot potato with your data.

                                1. 1

                                  I suspect this can be treated like a scaling constraint? Like pg’s full text search will often suffice while your corpus isn’t large, so most searches return few results, so you aren’t overly bothered by the fact that the ordering isn’t as good as you like.

                                  It is definitely not as good as ES/Lucene, but pgsql full text search is IME a huge step up if you move to it from completely naïve byte by byte substring testing - which I’ve also seen in production plenty of times!

                                  1. 1

                                    I suspect this can be treated like a scaling constraint? Like pg’s full text search will often suffice while your corpus isn’t large, so most searches return few results, so you aren’t overly bothered by the fact that the ordering isn’t as good as you like.

                                    People do not expect boolean search anymore. So, if the query contains some non-discriminative term, you will retrieve a large number of non-relevant documents. Aggressive stopword filtering will help you somewhat, but not much.

                                    Note that TF-IDF is fairly trivial to compute, assuming that you have a term -> (document, freq) mapping (a typical postings list). Though, depending on how your data is structured, computing the denominator in query-document cosine similarity can be more difficult (more specifically, you need to compute the L2 norm of the complete document vector).

                                    If documents have virtually the same length and there are no odd repetitions, you might get away with the dot product of the query and document vector as a similarity measure. If each word occurs once in the query, then you simply sum the document’s TF-IDFs for the query terms.

                                    1. 2

                                      People are used to shit search, though. They’ll try variations on their queries and take words out if the entire corpus comes back. Again, this is replacing substring matching, which has exactly the same problem, not Lucene!

                                2. 2

                                  The RUM index extension plans to address TF/IDF in the near future.

                                1. 1

                                  Rigorous proof for

                                  1 + 1 = 2

                                  in Principia Mathematica (and covers the title).

                                  1. 1

                                    Eh…I went in looking for signal processing reference and it turned out to be something entirely different. Is this really called ‘aliasing’? Both wikipedia and dictionary have no such reference. I wish folks didn’t misappropriate established terms according to their whims and fancies.

                                    1. 9

                                      Yes, it’s an established term. It would less ambiguously be called “pointer aliasing”. It has a long history of use in connection with study of its implications for optimizing compilers.

                                      1. 3

                                        Huh? The very first thing the Wikipedia article for aliasing says is:

                                        This article is about aliasing in signal processing, including computer graphics. For aliasing in computer programming, see aliasing (computing).

                                        With a link to the article about pointer aliasing.

                                        1. 1

                                          Python explicitly calls this aliasing.

                                          Objects have individuality, and multiple names (in multiple scopes) can be bound to the same object. This is known as aliasing in other languages.

                                        1. 23

                                          My favorite tactic for “killing” these is (to use the example from the post):

                                          # e.g. "hello everyone" => "Hello Everyone"
                                          def upcase_words(sentence)
                                            sentence.split(' ').map!{|x| x = x[0..0].upcase << x[1..-1]}.join(' ')
                                          end
                                          

                                          In an ideal world the name is clear enough that someone reading the code at the call site understands what’s happening, and if they don’t the example alongside the definition hopefully gets them there.

                                          1. 6

                                            You mean

                                            # e.g. "col1\tcol2\n    ^ woah" => "Col1 Col2 ^ Woah"
                                            

                                            Naming it hurts in this case, because the function does not do what you named it (e.g. in a string of tab-separated values, or a string where multiple spaces are used for formatting). If you had to name it, it would be better named as split_on_whitespace_then_upcase_first_letter_and_join or leave it unnamed and hope that everyone on your team knows that split in Ruby doesn’t work as expected.

                                            The best solution is one that embodies exactly what you intend for it to do, i.e. substitute the first letter of each word with the upper case version of itself. In Ruby, that would be:

                                            sentence.gsub(/(\b.)/) { |x| x.upcase }
                                            
                                            1. 6

                                              If you had to name it, it would be better named as splitonwhitespacethenupcasefirstletterandjoin or leave it unnamed and hope that everyone on your team knows that split in Ruby doesn’t work as expected.

                                              I disagree. You should name functions and methods based on what they’re supposed to do. If it does something else, then everyone can see it is a bug.

                                              1. 1

                                                I don’t agree with your naming system. I think the name of your function should describe what it does instead of how it does it. If your function name describes how it’s implemented, you have a leaky abstraction.

                                              2. 6

                                                Among other benefits, giving it a name means we can explode the code without worrying about a few extra lines in the middle of the caller.

                                                words = sentence.split ' '
                                                words.each { |w| w[0] = w[0].upcase }
                                                sentence = words.join ' '
                                                

                                                Introducing a variable called ‘words’ is a solid hint about the unit we’re working with. We may not want to pollute the caller with a new variable, but in a subroutine that’s not a problem.

                                                1. 3

                                                  Naming it does help in this case, but mostly because the reader no longer has to scrutinize over what it’s actually doing. Isn’t this sort of like polishing a turd?

                                                  1. 1

                                                    That only masks the issue.

                                                    Any maintenance on that line will still have the same problems, whereas refactoring it to split it up into smaller segments AND giving it a name avoids that issue.

                                                    1. 3

                                                      It gives the reader a good frame of reference to what the function’s doing. Context helps a lot when trying to read code, and although this isn’t as readable as it could be yet, it’s definitely a lot more readable than minus the function signature.

                                                    2. 1

                                                      A kind of offtopic question based on this comment.

                                                      Would I use Coq to prove this function?

                                                    1. 1

                                                      Do we get spoiler text here? I am tempted to attempt a crack at this (which I think is 90% correct) but don’t want to spoil the puzzle for others.

                                                      1. 3

                                                        I would recommend libharu( C) or itextpdf (Java/C#). Both have excellent CJK support.

                                                        I have some example itextpdf restful code here: https://github.com/konomiya/invoice_generator

                                                        1. 7

                                                          I’ve heard lots of anecdotes and admonishments around UUID primary keys but little evidence.

                                                          Pros:

                                                          • you won’t run out of them any time soon
                                                          • merging rows is easy
                                                          • obscurity - n+1 is pretty easy to guess, after all

                                                          Cons:

                                                          • B+ trees and random keys go together like chalk and cheese
                                                          • huuuuuge
                                                          • you probably have a natural key you can use instead
                                                          • sorting? nope (unless you use v1 UUIDs or something)

                                                          Does anyone know of actual best practices here?

                                                          1. 6

                                                            With respect to sorting, I always stored a UUID with a timestamp. This has the added benefit of being much more informative (and good for auditing purposes).

                                                            UUIDs are far superior to integers because the key no longer has to be generated by the database. This makes it much easier for external agents to generate data, send it to the database, and essentially keep a pointer to it without negotiating in some crazy way with the DB. And in many cases, a UUID is a “natural” key.

                                                            And natural keys are great, if you can get ‘em. A lot of the time they’re messy.

                                                            Edit: Just want to stress that in my experience, even for moderate sized databases with only one server, UUIDs have been demonstrably better to work with than integer keys.

                                                            1. 2

                                                              With a BRIN index on that timestamp, this method sounds brilliant.

                                                            2. 3

                                                              I agree with all of your pros and cons and ultimately the answer has to be with regard to the performance needs and characteristics of your database, which are more complicated than just the characteristics of the underlying engine.

                                                              I’d nitpick that “you probably have a natural key you can use instead” applies equally to all synthetic keys. With regard to choosing natural vs synthetic, in every situation I’ve encountered, it was a very clear choice - the wrong one would have been dramatically inappropriate. I haven’t particularly specialized in databases, though, and would defer to anyone who feels they’re an expert.

                                                              1. 2

                                                                Your trade-offs are correct. UUID’s are bad for performance, but many people don’t actually need to care about performance impact on this level. They can let you avoid managing an ID generator across databases/autoincrement modulo number of shards or something, but will make your databases slower and you give up advantages of having a lexicographically ordered database (no more efficient scans over causally ordered data, but you CAN have a timestamp be part of your UUID [with UUID v1 as mysql uses by default] so you are basically relying on NTP + wrapping counter number + a bit of entropy, which may be enough to give you usable scan semantics depending on your workload).

                                                                (skip this chunk if you don’t care how databases work) Databases like mysql split their storage files into chunks that contain a subset of rows (called a page). when mutations occur, the update is stored in a persistent log, which is written to sequentially and fast. but you don’t use arrays as a constant lookup time dictionary, so the crash recovery log needs to be supplemented by a persistent fast lookup structure if you want to have a dataset that is bigger than memory (b+ tree in many classic cases and still awesome for read-heavy stuff, LSM/fractal trees in newer stuff depending on workload characteristics). When you update a row, the page for that section of the tree needs to be updated. Many databases buffer writes to pages, so that fewer total writes may occur (like nagle’s algorithm in TCP/a bus doesn’t slam on the gas when the first passenger jumps on, but waits for a few more to bump up the utility:work ratio). The persistent trees are laid out in a way such that you can efficiently scan through rows based on their primary key, lexicographically.

                                                                • the longer the primary key => the more bytes need to be compared for finding matching rows + the bigger the database + the less of it fits in memory + the more cache thrashing occurs. UUID’s are big, and can dramatically increase the number of bytes that need to be compared in a key before matching can be determined.
                                                                • as you mentioned, random keys can be bad for persistent tree implementations. random keys => more pages being written to because locality has been thrown away => more pages end up in dirty page buffer => fewer utility:work buffering optimizations can occur => total write throughput goes down (this doesn’t matter quite as much with uuid v1 depending on your workload)

                                                                here’s a percona blog post on the subject

                                                              1. 2

                                                                FWIW, here is a tacobell solution:

                                                                $uname -a
                                                                Linux o-test-server 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
                                                                
                                                                $sudo apt install tinycdb
                                                                
                                                                $tr -cd '[:alnum:]' < /dev/urandom | fold -w8 | head -n100000000 > test.txt
                                                                
                                                                $cut -c1-4 test.txt > test2.txt
                                                                
                                                                $paste test.txt test2.txt > test3.txt
                                                                
                                                                $cdb -c -m test3.cdb test3.txt
                                                                
                                                                $ cdb -q test.cdb 2Jaj8gGM
                                                                2Jaj
                                                                
                                                                $ls -lh test3.*
                                                                -rw-rw-r-- 1 od user 3.4G Jun  3 11:14 test3.cdb
                                                                -rw-rw-r-- 1 od user 1.4G Jun  3 11:08 test3.txt
                                                                
                                                                1. 3

                                                                  Does law 8 (also 7) really need to be explicitly stated? Can’t it be derived from 1 & 2. Also, why isn’t a . b = b . a defined?

                                                                  1. 5

                                                                    What a fascinating post! I wonder what kind of effect it would have when presented as a solution in an interview. ;-)

                                                                    Also, by noting that no one has pointed to a typo in the generating function’s derivation one might conclude that the majority just flew over that part. lol

                                                                    1. 6

                                                                      One suggestion I have for a release down the road is to work with text bundles of some sort that makes language translations easier. Looks like all the strings, en_US, are hard-coded right now. Had I seen a bundle, I could’ve quickly sent in a pull request for ja_JP.

                                                                      1. 5

                                                                        This article is light on details, hard to tell if it’s even true but I was surprised that GitHub has 500+ employees. Does anyone know why?

                                                                        I’d expect a service like GitHub to run on 30 - 40 people in engineering. A dozen or two in Sales, and a half dozen in HR. That’s less than 100. Add in Atom and you’ve got 120 or so? I’m just not sure what GitHub does that requires so many people and so much money. I don’t even really see good features coming out of GitHub. PR’s are still pretty trashy, wiki and issues look about the same to me as they did 4 years ago. The URL to clone has moved a few times, back to where it was 5 years ago, though.

                                                                        1. 5

                                                                          Obviously it is case that needs more facts but my guess is that increase in head count is seen as growth in some metrics. GitHub took money and an easy hit for showing growth to investors is: “hey look… We hired all these people… We are growing to meet the milestones we had laid out in the roadmap”.

                                                                          1. 2

                                                                            That sounds pretty naive. Five hundred sounds just fine to me.

                                                                            First off, 30-40 in engineering seems pretty low. Besides the website and its back end they have the enterprise version, and a lot of stand alone tools like GitHub Desktop for Windows, the app for OSX, a mobile app, and the Visual Studio plugin. As you pointed out, they have a lot to improve, so I’m sure they’ve spent some of that investor money on developers.

                                                                            And unless upper management is incompetent they have a lot more than two dozen sales people. Just in the United States there are thousands of potential GitHub Enterprise customers, and before signing up they’ll all want to spend a bunch of time talking about features, compare it to competitors, demo it, talk about pricing, etc. It’s a lot of work selling to enterprises.

                                                                            Besides that, you’re missing a bunch of important groups, like finance, legal, and support (probably pretty big with millions of users).

                                                                            1. 2

                                                                              Does that explain another 300+ people? Maybe. I’m not sure. Feels large and fat to me, especially with a “flat” org. But who knows, I’m generally biased towards thinking companies should be leaner.

                                                                            2. 1

                                                                              I kinda hope this article just sloughs away, but in the event that it doesn’t…

                                                                              The basic question is not “What would it take to run Github?” The question is, “Well, having taken in all this funding, how can we scale the business to get a good exit?” By moving into the enterprise space, and hiring to push those sales, Github will follow a more traditional trajectory–with more predicatable ROI.

                                                                              One wonders if, for example, the whole thing could’ve been bootstrapped or done with minimal outside investment; then again, it’s hard to say no to millions in funding.

                                                                            1. 1

                                                                              Incidentally, I stumbled upon the MetaCLF project that addresses “Formal Reasoning about Languages for Distributed Computation.” Unfortaunately, the last publication is from 2014 and the source code has been in limbo since 2013.