1. 2

    This is the first time Heroku has ever been able to detect configuration options and block a deploy for a vulnerability like this.

    1. 1

      Is there a particular reason for it being a first? Also thanks for the write up and the fixes!

      1. 2

        Is there a particular reason for it being a first? Also thanks for the write up and the fixes!

        We’ve never had the capability before. I just added the code to detect configuration via rails runner recently https://github.com/heroku/heroku-buildpack-ruby/pull/758.

    1. 1

      At least it was fun to write.

      1. 4

        I must say that I think there is a much better solution available here. Instead of dumping the full image in each time, use an SVG <use> tag. You can even define the SVG externally (for most browsers, and there’s even a polyfill for IE). You get to keep all your styling (assuming you’ve decorated your svg’s components usefully), and you dramatically reduce over-the-wire bits. You may still be sending more than with this hack, but it’s much cleaner and more flexible.

        And, not to nitpick, but if your SVG is just a circle-enclosed exclam, it can be of much smaller size than that SVG is if you leverage things other than SVG’s <path>. Some things are much more compact when defined with <path>, but not all. And not in this case.

        1. 2

          Cool, i didn’t know you could use an external source for SVGs with use. How would I got about reducing it? I’m not the original SVG creator. Is there some kind of optimizer I could use?

          1. 4

            Sure. SVG have a large number of drawing directives. Programs that export to SVG tend to prefer <path> because it can do any possible shape that any other drawing directive can produce, so it is easier to programmatically output. Having said that, for anything like this, it is almost certainly simpler to use some of the other directives.

            For example, here is the original you started with:

            <?xml version="1.0" encoding="UTF-8"?>
            <svg fill="#fff" class="issue-icon" version="1.1" viewBox="0 0 16 16" xmlns="http://www.w3.org/2000/svg">
              <path d="m8 0c-4.418 0-8 3.582-8 8s3.582 8 8 8 8-3.582 8-8-3.582-8-8-8zm0 14c-3.309 0-6-2.692-6-6s2.691-6 6-6c3.307 0 6 2.692 6 6s-2.693 6-6 6z" clip-rule="evenodd" fill-rule="evenodd"/>
              <path d="M8.5,3h-1C7.224,3,7,3.224,7,3.5v6C7,9.776,7.224,10,7.5,10h1 C8.776,10,9,9.776,9,9.5v-6C9,3.224,8.776,3,8.5,3z" clip-rule="evenodd" fill-rule="evenodd"/>
              <path d="M8.5,11h-1C7.224,11,7,11.224,7,11.5v1C7,12.776,7.224,13,7.5,13h1 C8.776,13,9,12.776,9,12.5v-1C9,11.224,8.776,11,8.5,11z" clip-rule="evenodd" fill-rule="evenodd"/>
            </svg>
            

            Here is a reduced version:

            <?xml version="1.0" encoding="UTF-8"?>
            <svg fill="currentColor" class="issue-icon" version="1.1" viewBox="0 0 16 16" xmlns="http://www.w3.org/2000/svg">
              <circle r="7" cx="8" cy="8" stroke="currentColor" stroke-width="2" fill="none" />
              <rect width="2" height="2" x="7" y="11" rx="0.5" />
              <rect width="2" height="7" x="7" y="3" rx="0.5" />
            </svg>
            

            The above is virtually identical to the one you are currently using (and can have its full color controlled with the following style):

            .issue-icon {
                color: #000;
            }
            

            This is a reduction in size from 678 bytes to 350 bytes. That change alone would have cut down on your page size pretty dramatically.

            I am not aware of a minifier (as I said, most programs use <path> because it’s so much easier to programmatically generate), but it was not difficult for me (not a web-developer) to figure out this translation by-hand.

            1. 2

              Thanks!

        1. 12

          I think there are several questionable statements there and the problem domain is not described in sufficient detail even for introductory material.

          The reader is left with an impression that in order to fill in I/O wait times with work that you either add more processes or threads and the latter is the faster (but harder to implement) choice. If the goal of the article was to simply describe what threads can be used for this is the stuff which should not have been touched imo.

          The CPU vs I/O bound description is fine at this level. Indeed adding more processes or threads may be a perfectly valid way of dealing with it. Way which was completely omitted was programming with event loops - you set your fds to non-blocking and just add events and react to completions. This is how e.g. nginx works.

          There is a repeated statement about the benefit of sharing code. But this is precisely what normally happens for processes - virtual -> physical mappings for all files (the binary itself, libc, whatever) lead to the same memory pages. 2 processes will have a slightly higher memory footprint than 2 threads and it should be negligible.

          The article itself notes that if you want you can share data between processes, e.g. you can mmap.

          There is a benefit of a cheaper thread<->thread switch, but it seems overplayed and out of place for this piece - it’s too low level. In the same spirit I can point out how threads get slower:

          • plenty of syscalls called all the time accept file descriptors as an argument. translation fd -> actual file happens all the time. linux has a hack - if the process is single-threaded it avoids referencing the found file (and dereferencing later), this saves 2 atomic ops.
          • several kernel structures also get shared - in particular you start getting lock contention on mm (address space) and the file table

          On the other hand a non low-level and very real cost which was not even mentioned comes from providing safe access to shared memory. It was mentioned it is harder, but not that naively applied can result in prohibitive performance impact. “Fun” fact is that even if you have threads which don’t talk to each other, if there was no care taken to properly place global data it may be there are cacheline bounces. Most real-world programs not written with concurrency in mind from day 1 suffer from false-sharing and bad locking to this very day.

          So, I think the article misrepresents what threads are good for and does not provide a fair/balanced cost:benefit ratio (neither does this post, but I think it does enough to point out problems with the article).

          Today, especially with the advent of NUMA, you want to avoid as much writeable shared state as possible. And if your shared state is read-only, you very likely can just use processes (which may or may not be a better choice than threads, point is - memory usage will be virtually the same).

          1. 5

            This is good feedback. I didn’t intend to “sell” threads as much as I was trying to explain their existence. In Ruby other languages that have a GVL there is a bit of thread-phobia and my IO comments are primarily aimed at those groups. Basically the GVL is released during IO which also happens to be an ideal candidate for using threads (even on languages without a GVL). This short video was extracted from a talk I gave to Ruby programmers.

            I find many programmers are literally afraid of threads. Either they’ve been burned badly by them, or they’ve just heard so many horror stories. Many i’ve talked to wish for some kind of a magical drop-in construct that will have all the benefits of threads and processes with none of the downsides. I think when you understand what exactly a thread is, then it’s a bit more clear such a mythical thing won’t come. (or at least not anytime soon, or without it’s own caveats). There are concurrency alternatives, but there are not concurrency magic tools.

            The course I recommend at the bottom of the page goes into quite a bit more detail. Explicitly the problems with different cache invalidation strategies and how they can be messed up with different parallel access patterns. Also different kinds of access controls such as when it’s a good idea to use a spin lock versus a mutex etc. We also went over the “flash” paper which compares many strategies including evented programming. It’s some pretty interesting stuff.

            While the cost of context switching a thread is likely not substantial to most people or programs. The benefits from sharing a hot cache I think can be very substantial. However they’re harder to explain. I mentioned that, but didn’t dig into it.

            Way which was completely omitted was programming with event loops

            I have a note at the bottom of the article that mentions evented programming. I didn’t make it into the video.

          1. [Comment removed by author]

            1. 6

              Except you would fail the assignment because you’re required to do it in python.

              Right tool for the job…that meets the project specifications.

              1. 2

                “Write it in language X” is rarely a real requirement, but there’s always the option of creating a shared library written in something else and calling it from the target language. It’s a popular choice in Python because it isn’t particularly fast but has an easy to use CFFI.

                1. 5

                  “Write it in language X” is rarely a real requirement

                  True, but it’s usually a functional requirement. I mentioned I do a bunch of performance work. I do it in Ruby. I work directly in libraries like Rails to speed them up. Could I make Rails faster by re-writing the whole thing in C? Sure, but then it would likely not be Rails when i’m done and would also impact the productivity of anyone else who wanted to hack on the library (likely rubyists) if they did not also know C.

                  In my case I find hotspots and speed things up 5-20% at a time. It might sound trivial when if we wrote it in C we could make it 40,000% faster. But sometimes 20% is enough, and 40,000% isn’t worth the overhead to get there.

                  Sometimes “using the right tool for the job” is about knowing when the tradeoffs aren’t worth it. There is a need and a benefit for knowing how to optimize for performance of your code even if you’re not on the bleeding edge. Even if you’re in a “fast” language, if you don’t know how to write fast code you can still get in trouble pretty easy.

                  There are cases where bottlenecks have been found to the point that people choose to drop to the C level and write native extensions for example fast_blank gem gives a huge speed up over Active Support’s blank? method. This is not in the Rails dependencies. Other gems like XML parsing with nokogiri is in the Rails dependencies.

                  This is what you’re suggesting with “shared libraries” and in general it works pretty well. I say: make it, make it correct, then make it fast. Once you’ve identified your bottleneck then you can optimize for speed where you need it.

                  Edit: spellning

            1. 6

              I get that moves between languages can be uncomfortable, and are harder for some than others. The real danger is tying yourself down. Saying “I will only code in X” is a great way to end up obsolete. Saying “we only hire experts in X” is a great way to have no candidates.

              1. 3

                I somewhat disagree. There are still people who write COBOL for a living and they get paid bank. If you love a technology then feel free to pursue using it to your heart’s content.

                I think there is value in learning other languages. I’ve dabbled with BASIC, C, Python, Matlab, Octave, LabVIEW, JavaScript, and obviously Ruby. I’m not saying that you should be a language monogamist and never expand your horizons. Every new language I learn teaches me something regardless of how “similar” they are to a lang I already know. I’ve still not come across another language that feels better than Ruby does to me.

                Would I ever work in another language? Sure. But while I’ve got options that include my preferred lang, the reasons need to be more compelling that “is a job”.

                I do think companies need to write better requirements for jobs. Most of the “we need an expert” really means they would like an expert but will take someone with a pulse who has completed a “hello world”. Sometimes you really do need someone with a lot of experience for senior level positions. Most of the time the company has no clue.

                1. 5

                  I would say that you’re actually misinterpreting the lesson of the COBOL programmers. The COBOL programmers you see today are making bank, but you have to keep in mind that they’re the survivors of multiple rounds of layoffs that resulted in mass unemployment at the beginning of the ‘90s. In essence you’re looking at the top 1% or 0.5% of COBOL programmers, those that are good enough to get called out retirement at multiples of their original salary to repair legacy systems. You’re not seeing the ones who went on to different kinds of programming or left the industry altogether.

                  In addition, the problem with sticking to a single language for work is that if there’s another big secular shift in the industry (like the one away from COBOL), it can leave you unhireable. I’ve seen this initially with COBOL, and later with Perl. When Perl faded as a language for web development and was replaced with PHP, Python and Ruby, there were quite a few Perl hackers who had a really hard time finding work, simply because the only thing they had on their resume (in a professional capacity) was Perl. Meanwhile those who had Perl + Java, or Perl + Python, or Perl + C# had a much easier time getting work because they were able to lean on the non-Perl parts of their work experience to make a case that they would be productive in a non-Perl environment.

                  That said, I wholly agree that companies are terrible at writing requirements. The vast majority of developer positions do not require deep knowledge of the internals of frameworks or language runtimes. Usually, basic knowledge of algorithms, data structures, combined with surface level familiarity with language syntax is more than sufficient to ensure that a developer can become productive in a reasonable period of time.

                  However, the world we live in is a world in which employers filter resumes by keyword. In such a world, it makes sense to have professional accomplishments in as diverse a set of programming environments as possible in order to ensure that you’re not caught flat footed by secular shifts in the industry.

              1. 49

                I’m hiring for a weird stack. The people I’ve hired so far aren’t coming in with experience in that stack. They’re coming in with solid design and architecture foundations and enough experience in similar languages that I’m confident that I can develop in them the skills that I need. If I stuck to people with exact experience, as an uninformed recruiter would, I’d be looking for years. The person I hired with the least professional experience has the most experience in one element of the stack: she’ll hit the ground running while the more experienced people spin up on that and other elements.

                I can teach languages and nuances of frameworks. I don’t want to teach good communication, collaboration, and teamwork skills.

                1. 9

                  This is a good take. It sounds like you’re emphasizing growth and learning. It can be fun to push your boundaries and grow as you learn a new language but if you aren’t supported by a company in that process you’re not going to be happy.

                  Most companies try to claim they’re pro learning and pro growth but then throw their devs in with the sharks on their first week. Taking steps to show candidates you’re serious about their concerns and actively taking steps to mitigate them will set you apart.

                  1. 2

                    But they don’t throw the devs in with the sharks. New devs are thrown unceremoniously into the water, but usually with some nice Code Review brand floaties. If the company fired people for getting too many code review comments their first week, that would really be throwing them in with the sharks.

                    And if the company doesn’t do code review, what the hell are you so worried about? The company obviously doesn’t care about code quality anyway, so fire away. And maybe float your resume to some friends.

                  2. 7

                    Very much this. When we were hiring for (failing giant social network), our lowest level code was in scala. At the time, nobody knew scala. But our best hires were the ones who knew a couple of other languages and were willing to learn something new. Your engineering chops are transferrable. Your syntax and library knowledge is not, but after you get 2 or 3 languages under your belt, you get to be pretty good at learning the local idioms. You’ll be fine. It’s your critical thinking and teamwork I want.

                    1. 4

                      What’s the weird stack?

                      1. 9

                        Almost completely greenfield development in Scala and Rust. Hints of groovy, Java, and standard web stuff. Mix of web development and on-premises server stuff. It might not be that weird, but I’m finding it difficult to hire precisely for.

                        1. 5

                          Ah, nifty.

                          What benefit do you get from using Scala and Rust? What’s the problem domain?

                          1. 5

                            Will be running Scala stuff everywhere we control the environment, which is in the cloud and certain portions of the on-premise installation.

                            For the Rust part, we concluded that C or C++ were the options given our “we know nothing about the environment except its kernel” requirements, but then decided that any new stuff in either of those should probably be done in Rust. A part of the role of the Rust app is to provision a JVM for a Scala app and monitor it and the environment in which it’s all running.

                            It’s pretty exciting and if it all works out, we could be using Scala and Rust fairly interchangeably across the JVM, JavaScript, WebAssembly, and native.

                            1. 4

                              Cool, but again, what do those technologies get you? Like, what’s the business value over, say, just gobs of NodeJS or something?

                              1. 6

                                Ah, my kind of architectural question.

                                I chose Scala because I wanted:

                                • a statically typed functional programming language with a wide variety of libraries available, especially for accessing databases of all kinds
                                  • statically typed for intent/contract/etc
                                  • functional for testing and ergonomic benefits
                                  • JDBC is great because a ton of DB vendors support it; I don’t get to choose what DBMS my customers are using, so I need something that ~everything supports
                                • a mature actor system for easier concurrent programming
                                  • easier to implement
                                  • easier to test
                                • shareable components
                                  • Scala can deploy to JVM and JavaScript, so we can share components
                                  • Scala Native will change things a lot but it doesn’t get us what Rust gets us
                                • I know it better than any other language and I can teach it better than any other language.
                                  • I know its rough edges, too: compilation speed, IDE wonkiness, build system areas for improvement

                                I chose Rust because I wanted:

                                • a native binary that can be small and not require any runtimes
                                • code supporting a reliable, long-running service
                                • concurrent tasks without worrying about common concurrency problems
                                • a statically typed functional-ish programming language
                                  • ecosystem is still maturing but it has everything I need right now and I’ve got time to sand the edges
                                • a community that was exceptionally welcoming to new developers, because I knew that my team would not likely know it better than I do, which isn’t that much – I was very fortunate to find someone who has done a lot of Rust work in an internship!
                                • shareable components
                                  • Rust has a WebAssembly target so that we could eventually take advantage of that

                                Rust beat out C, C++, C#, Go, Nim, and Pony. Scala was my first choice with a fallback to Ruby via JRuby or Elixir. It would have been easier to hire for Ruby in my area but I want to help build the Scala community beyond my former employer and a couple of others.

                                If Scala Native was 1.0, I might have chosen it over Rust mostly for personal preference of language ergonomics. However, I see a lot of promise for Rust and intend to use Rust where it is appropriate.

                                1. 2

                                  Ah, thank you for elaborating. The part about you already being really familiar with Scala for teaching purposes makes a good deal of sense to me. :)

                                2. 1

                                  At the risk of being facile, my experience is that you can program better in those languages. The business value is whatever business value you were getting from programming, but more so; like, pick a point on the project management triangle, and you can get an improvement in those aspects.

                            2. 1

                              Sounds aligned with my experience and interests, on the off chance that you’re hiring Londoners and competitive with finance- industry rates.

                              1. 2

                                Alas, negative: local-remotes in the Pittsburgh metro area only.

                        1. [Comment removed by author]

                          1. 10

                            The article would be half the length if I had skipped explaining the problem. I don’t see a good way around it. I did include a disclaimer early though. It kinda doesn’t help to explain the faster code if you don’t understand what the slower code did.

                            no good advice

                            Maybe you can tell me how to make the code faster? What are some general performance tips you commonly use that aren’t listed here.

                            This code has to be pure python and isn’t allowed any imports. Let me know if there’s something huge I missed.

                            You’re the second person to mention clickbait title. I’m curious what you were expecting when you clicked on the link. It’s about me making a function 5x faster in python when my main language is Ruby. I wrote it to be fairly literal.

                            1. 14

                              I don’t like the title because it’s phrased in a way that implies you’ve optimized Python code in general, when you’ve really just optimized a specific script. It also implies that it was your Ruby experience specifically that helped you optimize, but most everything you covered would apply in general.

                              I’m also always skeptical of blog posts like this that don’t make their full code available. Especially when the author is comparing between two languages, which you’ve implicitly done by constantly mentioning Ruby.

                              1. 6

                                Interesting. I had never even considered the title would make people think that. Did you think by “some Python code” I meant a specific subset?

                                The point of the article is that optimizing code is a general skill that can be applied to other languages. It’s supposed to be advise that applies to both languages.

                                Or at least that was what was in my head when I wrote it. I might have not mentioned those words.

                                The code I provided in the article is fully executable.

                              2. 7

                                I’m sure you know this, but for folks who don’t: in your code sample you have, for testing for ‘None’

                                if not value:  # <---------- aaaaaaaaaaaaaaaa noooooooooooo!
                                  return
                                my_list = [value]
                                

                                it should be

                                if value is None:  # <---- this is what you mean
                                  return
                                my_list = [value]
                                
                                1. 3

                                  Good catch, you want to be as specific as possible. The way I used would have caught False in addition to None.

                                  In ruby would be the difference between

                                  if value
                                  

                                  and

                                  if value.nil?
                                  
                                  1. 1

                                    Mostly. Except that non-nil values can define nil? as true in Ruby ;)

                                    1. 2

                                      Though in practice people don’t actually do that. Or rather I’ve never come across it in the wild in the last 10+ years of Ruby.

                                    2. 1

                                      also catches 0, [], {} etc. etc. all of which are not None. Importantly, different objects can have different rules as to when they evaluate to be False. I was bitten by this at work

                                      1. 2

                                        There’s also this related bug, which is a fascinating look into the perils of both implementing rules of thumb without thinking about them, and not thinking sufficiently about how time works.

                                  2. [Comment removed by author]

                                    1. 3

                                      if I knew you wrote that article I wouldn’t have made that comment.

                                      You’re not the only person to make this comment, so I appreciate the response. This is like the rabbit/duck image. I didn’t even realize there was another way of reading the title.

                                      I used to work phone support at my first job we were always trained “under promise over deliver”. It doesn’t do me any good to make you think I did something AMAZING then when you click the post it was really just okay. So it’s good feedback that the title is ambiguous.

                                      I tried to workshop the title a bit the best I came up with is “Lifelong Rubyist makes a Python script 5x Faster” still seems equally problematic though.

                                      I added a note early on to help maybe clarify things earlier hopefully.

                                      Several people have mentioned they thought the post was about me making the Python interpreter faster. That would have been pretty dang cool, but is not the case. This article is about how to write faster code using an interpreted language.

                                      1. 2

                                        I think the title is fine. It says “some Python code”, not “all Python code”.

                                1. 3

                                  Love this story. I think it will go down in history right up there with the “500 mile email”

                                  1. 1

                                    I had figured Heroku users are in a better position than most as far as this goes.

                                    Meta: You can mark yourself as the author of a submission. Unsure if you just missed that here (no big deal)

                                    1. 1

                                      You can mark yourself as the author of a submission

                                      Sorry, thought I did. Thanks for the reminder.

                                    1. 1

                                      Funny. I never saw the n in your name before.

                                      1. 1

                                        Totally, it’s hidden in a sea of consonants.

                                        1. 1

                                          Probably a good idea adding a pronunciation even though yours was easy to do phonetically. I think the people with Indian names operating in America could use this. I’ve seen others in Nordic or European countries that I’d need help with but we encounter Indians more often in tech.

                                          “Snowman.” Is that hardcore like The Iceman, a person built of snow, or a family coming from a cold climate? Or something else entirely?

                                      1. 8

                                        I think the takeaway here is a) don’t confuse all kind of errors with a http request with invalid tokens (I’m not familiar with the Github API, but I suppose it returns 503 unauthorized correctly) and b) don’t delete important data, but flag it somehow.

                                        1. 5

                                          It returns a 404 which is a bit annoying since if you fat finger your URL you’ll get the same response as if a token doesn’t exist.

                                          https://developer.github.com/v3/oauth_authorizations/#check-an-authorization

                                          Invalid tokens will return 404 NOT FOUND

                                          I’ve since moved to using a pattern of wrapping all external requests in objects that we can explicitly check their state instead of relying on native exceptions coming from underlying HTTP libraries. It makes things like checking explicit status code in the face of non 200 status easier.

                                          I might write on that pattern in the future. Here’s the initial issue with some more links https://github.com/codetriage/codetriage/issues/578

                                          1. 3

                                            Why not try to get issues, and if it fails with a 401, you know the token is bad? You can double check with the auth_is_valid method you’re using now…

                                            1. 2

                                              That’s a valid strategy.

                                              Edit: I like it, I think this is the most technically correct way to move forwards.

                                            2. 1

                                              Did the Github API return a 404 Not Found instead of a 5xx during the outage?

                                              1. 1

                                                No clue.

                                                1. 1

                                                  Then there’s your problem. Your request class throws RequestError on every non-2xx response, and auth_is_valid? thinks any RequestError means the token is invalid. In reality you should only take 4xx responses to mean the token is invalid – not 5xx responses, network layer errors, etc.

                                                  1. 1

                                                    Yep, that’s what OP in the thread said. I mention it in the post as well.

                                            3. 2

                                              I think the takeaway is that programmers are stupid.

                                              Programs shouldn’t delete/update anything, only insert. Views/triggers can update reconciled views so that if there’s a problem in the program (2) you can simply fix it and re-run the procedure.

                                              If you do it this way, you can also get an audit trail for free.

                                              If you do it this way, you can also scale horizontally for free if you can survive a certain amount of split/brain.

                                              If you do it this way, you can also scale vertically cheaply, because inserts can be sharded/distributed.

                                              If you don’t do it this way – this way which is obviously less work, faster and simpler and better engineered in every way, then you should know it’s because you don’t know how to solve this basic CRUD problem.

                                              Of course, the stupid programmer responds with some kind of made up justification, like saving disk space in an era where disk is basically free, or enterprise, or maybe this is something to do with unit tests or some other garbage. I’ve even heard a stupid programmer defend this crap because the the unit tests need to be idempotent and all I can think is this fucking nerd ate a dictionary and is taking it out on me.

                                              I mean, look: I get it, everyone is stupid about something, but to believe that this is a specific, critical problem like having to do with 503 errors instead of a systemic chronic problem that boils down to a failure to actually think really makes it hard to discuss the kinds of solutions that might actually help.

                                              With a 503 error, the solution is “try harder” or “create extra update columns” or whatever. But we can’t try harder all the time, so there’ll always be mistakes. Is this inevitable? Can business truly not figure out when software is going to be done?

                                              On the other hand, if we’re just too fucking stupid to program, maybe we can work on trying to protect ourselves from ourselves. Write-only-data is a massive part of my mantra, and I’m not so arrogant to pretend it’s always been that way, but I know the only reason I do it is because I deleted a shit-tonne of customer data on accident and had the insight that I’m a fucking idiot.

                                              1. 4

                                                I agree with the general sentiment. It took me a bout 3 read throughs to parse through all the “fucks” and “stupids”. I think there’s perhaps a more positive and less hyperbolic way to frame this way.

                                                Append only data is a good option, and basically what I ended up doing in this case. It pays to know what data is critical and what isn’t. I referenced the acts_as_paranoid and it pretty much does what you’re talking about. It makes a table append only, when you modify a record it saves an older copy of that record. Tables can get HUGE, like really huge, as in the largest tables i’ve ever heard of.

                                                /u/kyrias pointed out that large tables have a number of downsides such as being able to perform maintenance and making backups.

                                                1. 2

                                                  you can do periodic data warehousing though to keep the tables as arbitrarily small as you’d like but that introduces the possibility of programmer error when doing the data warehousing. it’s an easier problem to solve than making sure every destructive write is correct in every scenario though.

                                                  1. 1

                                                    Tables can get HUGE, like really huge, as in the largest tables i’ve ever heard of

                                                    I have tables with trillions of rows in them, and while I don’t use MySQL most of the time, even MySQL can cope with that.

                                                    Some people try to do indexes, or they read a blog that told them to 1NF everything, and this gets them nowhere fast, so they’ll think it’s impossible to have multi-trillion-row tables, but if we instead invert our thinking and assume we have the wrong architecture, maybe we can find a better one.

                                                    /u/kyrias pointed out that large tables have a number of downsides such as being able to perform maintenance and making backups.

                                                    And as I responded: /u/kyrias probably has the wrong architecture.

                                                  2. 2

                                                    Of course, the stupid programmer responds with some kind of made up justification, like saving disk space in an era where disk is basically free

                                                    It’s not just about storage costs though. For instance at $WORK we have backups for all our databases, but if we for some reason would need to restore the biggest one from a backup it would take days where all our user-facing systems would be down, which would be catastrophic for the company.

                                                    1. 1

                                                      You must have the wrong architecture:

                                                      I fill about 3.5 TB of data every day, and it absolutely would not take days to recover my backups (I have to test this periodically due to audit).

                                                      Without knowing what you’re doing I can’t say, but something I might do differently: Insert-only data means it’s trivial to replicate my data into multiple (even geographically disparate) hot-hot systems.

                                                      If you do insert-only data from multiple split brains, it’s usually possible to get hot/cold easily, with the risk of losing (perhaps only temporarily) a few minutes of data in the event of catastrophe.

                                                    2. 0

                                                      Unfortunately, if you hold any EU user data, you will have to perform an actual delete if the EU user wants you to delete their stuff if you want to be compliant with their stuff. I like the idea of the persistence being an event log and then you construct views as necessary. I’ve heard that it’s possible to use this for almost everything and store an association of random-id to person, and then just delete that association when asked to in order to be compliant, but I haven’t actually looked into that carefully myself.

                                                      1. 2

                                                        That’s not true. The ICO recognises there are technological reasons why “actual deletion” might not be performed (see page 4). Having a flag that blinds the business from using the data is sufficient.

                                                        1. 1

                                                          Very cool. Thank you for sharing that. I was under the misconception that having someone in the company being capable of obtaining the data was sufficient to be a violation. It looks like the condition to be compliant is weaker than that.

                                                          1. 2

                                                            No problem. A big part of my day is GDPR-related at the moment, so I’m unexpectedly versed with this stuff.

                                                      2. 0

                                                        There’s actually a database out there that enforces the never-delete approach (together with some other very nice paradigms/features). Sadly it isn’t open source:

                                                        http://www.datomic.com/

                                                    1. 0

                                                      People who use lobsters are entrepreneurs. The intersection of marketing and tech has always interested me. If you’re just getting started and need traffic to your new tech, you might be tempted to try twitter marketing. This post gives you an idea of what your numbers might look like.

                                                      1. 6

                                                        People who use lobsters are entrepreneurs.

                                                        You’re thinking of barnacl.es.

                                                        1. 3

                                                          Hey, thanks! I honestly didn’t even know that existed. I really appreciate it when people take the time to explain to me why they’re downvoting. Appreciate your time and comment ❤️

                                                          Update: It’s been posted here https://barnacl.es/s/ywvodu/i_spent_50_on_twitter_ads_so_you_dont_have

                                                          1. 4

                                                            No problem. It’s important to give good feedback on flags. :)

                                                            1. 3

                                                              That’s one of the things I really value about lobsters, how civil people are about this kind of thing.

                                                      1. 1

                                                        I tend to always use subqueries. I like them because I can write them incrementally. I.e. write my sub query, make sure it works, then drop it inside of another query, make sure that works…repeat until done.

                                                        I certainly understand joins and use them, but i’m not as fast writing them. I’m curious about the opposite problem. When is it much slower to use a subquery over a joins?

                                                        1. 1

                                                          Don’t quote me on this but my very vague memory is that MySQL’s query planner used to be(*) famously not quite as good as you’d hope at spotting the equivalence between SELECT x FROM t1 WHERE y IN (SELECT z FROM t2 WHERE…) and SELECT x FROM t1 JOIN t2 ON y = z.

                                                          (* I have no idea if it does better now, no idea about what the state of any of the forks like Maria or Percona is).

                                                        1. 1

                                                          Okay, so the video is cheesy but the tech is good. Why downvoted?

                                                          1. 1

                                                            Not from me, but I suspect that it’s due to a kickstarter with an ‘open source if we reach our goals’ knife to the throat.

                                                            A non-kickstarter option is to take some adafruit neopixels, recycle a USB charger or two, a wifi-module, a pullup resistor and capacitor, stick to one of the many arduinos laying around and reuse the code from one of the many hackaday.com projects or something like playing with LEDs.

                                                          1. 1

                                                            How much data is being sent per issue? If I divide “message size a client should handle” by 473, I get a really big number.

                                                            1. 2

                                                              It’s actually double that amount because i’m sending html and plaintext emails so the data is duplicated.

                                                              A message with 22 issues is about 36 KB on disk.

                                                              So (36 kb/ 22 ) => 1.6 KB per issue (not perfect but approximate)

                                                              1.6 KB * 473 => 756 KB, which actually doesn’t sound like a system crashing email. Maybe it’s not the size but rather the layout engine trying to render all the text or something. ¯\_ (ツ )_/¯

                                                            1. 4

                                                              I like this article but think it would be better named “A history of RoR” or something.

                                                              1. 2

                                                                This is cool. I almost think you would need a really high level component that is constantly evaluating a large scale strategy and then executing on it.

                                                                On the micro level I think it shouldn’t be too hard to beat a human player as a computer can multitask way better, control each individual unit. But on macro, humans can do really surprising things in this game that if you can’t really scout for.

                                                                I would love to hear more about how some of the built in AIs are programmed. I remember one called “green tea” that was really challenging.

                                                                1. 2

                                                                  You’d be surprised on micro what AI’s do when people throw them curveballs. In one competition, the AI’s based its attack or retreat on a weight of how threatening the attacker was individually or in a group against its units individually or in a group. It did that in a really local way that was per unit or maybe physical area. The human pro recognized this. The player then put it to work by throwing a group against individual units of the enemies that would cause a fleeing action. Did that repeatedly despite the attackers being weak with defenders having significant advantage. Result was defenders’ micro-strategy of fleeing leading them to defeat by the classic Sun Tzu strategy of fragmenting the stronger enemy into weaker pieces that one’s own weak force can destroy fragment by fragment.

                                                                  Just one weakness due to AI’s mathematical approach to microing that supposedly gives it an advantage over humans. I’ve seen the AI’s pull off amazing feats that humans probably can’t do in general. Just no guarantees it will work when humans are probing its “thinking.”

                                                                  1. 2

                                                                    I see that as a “macro” failure. I.e. if you don’t realize your army is bigger or your unit composition is bad that’s a macro problem. Which is kinda what I was saying before. Attack/retreat should be a macro decision, but when exactly to pull your marine back so their health is exactly 1hp or juggling marines with medivac etc. could be a huge advantage. They just need to get the macro play there (which is also the harder part for a computer).

                                                                    1. 1

                                                                      Well, yeah, it is a macro failure but not to that AI. A lot of the AI’s, due to traditional thinking, have multiple layers each looking at different things. The micro engines are champions at micro battles. So, the thinking is let them decide when to attack or retreat at the unit level unless strategy layer gives them different orders. That usually works better than it did in this case. Where to draw the lines and how each should override the other is an open topic with many combinations to consider.

                                                                1. 2

                                                                  Does frontend now universally mean web? (or at least webtechnologies?) I’m still somewhat surprised by that assertion.

                                                                  1. 1

                                                                    I don’t think it’s exclusive. I’m a web programmer so “frontend” for me generally means web.

                                                                  1. 2

                                                                    This is a great article. I picked up a few tips and I’ll probably refer back to it when diving back into frontend work.

                                                                    When I got to your forEach code, I vaguely remembered something about having to turn lists of nodes into an array with Array.slice, so I tried running the example code you wrote. Turns out you’re missing a closing bracket on the .forEach call, which I discovered by copying and pasting the example without inspecting it ?

                                                                    As a side note, it also appears to be getElementsByClassName that returns a node list that does not respond to array methods:

                                                                    document.getElementsByClassName('foo').forEach((el) => {console.log(el)})
                                                                    // VM284:1 Uncaught TypeError: document.getElementsByClassName(...).forEach is not a function
                                                                        at <anonymous>:1:46
                                                                    
                                                                    Array.prototype.slice.call(
                                                                      document.getElementsByClassName('foo')
                                                                    ).forEach((el) => {console.log(el)})
                                                                    // ...list of elements
                                                                    

                                                                    EDIT - apparently you can also use [...nodeList] and Array.from(nodeList) in ES6 according to these answers: https://stackoverflow.com/questions/3199588/fastest-way-to-convert-javascript-nodelist-to-array

                                                                    1. 2

                                                                      Thanks! I updated the code to add the parens I was missing:

                                                                      var divList = document.querySelectorAll("div");
                                                                      divList.forEach(function(elem) {
                                                                        console.log(elem);
                                                                      });
                                                                      
                                                                      1. 2

                                                                        Huh, I thought querySelectorAll also returned NodeLists instead of arrays!! o_0

                                                                        Anyway, there’s the for…of syntax in ES6 now and it works with NodeLists just fine.

                                                                        1. 1

                                                                          Javascript is nothing if not consistent, right? ?