1. 3

    About six years ago I worked in a solar physics lab. Two of the scientists worked with CUDA, but the majority of scientists were actively developing Fortran 95 codebases, mostly for image processing. Most of what locked people into this were the libraries that had been authored in the 90’s that were still being shared (via email or HTTP links) around the solar physics community. These were algorithms, image reading/writing, etc.

    1. 1

      Hm, I used to convert web logs to JSON records – one per line – and then use grep to do a pre-filter! It can filter out 90% or 99% of the lines that need to be filtered, and then you parse JSON to get the exact filter.

      grep is amazingly fast! This seems like the same idea taken a little further. I’ll have to look at how they do it in more detail.

      1. 2

        Section 7.2 of the paper actually uses grep/ripgrep as a basis of comparison. It seems the two have the same or better performance than Sparser, which still wins out by a small margin for the most selective queries.

        1. 2

          Yes. Always use grep first, even if one awk would do. This special one purpose only tool really cut the time down. Especially when you want to work on less than 100 million lines out of a billion lines.

        1. 3

          Very cool - I really like lower-power equipment like this. However, I think it’s a terrible idea for security since it looks like its an unencrypted video stream which would make eavsdropping trivial.

          1. 6

            I’d hesitate to call it eavesdropping if you have to get within 8 ft of the camera to do it

            1. 3

              It’s sort of the digital equivalent of corner mirrors in stores and on streets.

              1. 0

                Well that just about kills most of the uses for this product :/

                1. 3

                  No? It’s perfect for home security cameras or on-body cameras.

                  Sign me up.

              1. 5

                I’m really amused they generate a C file and call out to gcc / clang. I wonder if they plan to move away from that strategy.

                1. 3

                  This is a draft implementation of the concept, so probably yes.

                  1. 2

                    What would be gained by moving away from that?

                    1. 6

                      For one, no runtime dependency on a C compiler.

                      C compilers are also fairly expensive to run, compared to other more targeted JIT strategies. And it’s more difficult to make the JIT code work nicely with the regular uncompiled VM code.

                      Take LuaJIT. It starts by compiling the Lua code to VM bytecode. Then instead of interpreting the bytecode, it “compiles” the bytecode into native machine code that calls the interpreter functions that would be called by a loop { switch (opcode) { ... } }. That way when the JIT compiles a hot path, it directly encodes all entry points as jumps directly into the optimized code, and all exit conditions as jumps directly back to the interpreter code.

                      Compare this to a external compiled object, which can only exit wholesale, leaving the VM to clean up and figure out the next step. A fully external object—C compiled or not—can’t thread into the rest of the execution, so its scope is limited to pretty isolated functions that only call similarly isolated functions, or functions with very consistent output.

                      1. 2

                        Compare this to a external compiled object, which can only exit wholesale, leaving the VM to clean up and figure out the next step. A fully external object—C compiled or not—can’t thread into the rest of the execution, so its scope is limited to pretty isolated functions that only call similarly isolated functions, or functions with very consistent output.

                        This doesn’t seem to be related to the approach Ruby is taking, though? They’re callng out to the compiler to build a shared library, and then dynamically linking it in. There shouldn’t be anything stopping the code in the shared object from calling back into the rest of the Ruby runtime.

                        1. 2

                          Right, it can use the Ruby runtime, but it can’t jump directly to a specific location in the VM bytecode. It has to call a function that can execute for it, and will return back into the compiled code when that execution finishes. It’s very limited, compared to all types of code being able to jump between each other at any time.

                      2. 4

                        exec’ing a new process each time probably gets expensive.

                        1. 3

                          Typically any kind of JIT startup cost is quite expensive, but as long as you JIT the right code, the cost of exec’ing clang over the life of a long running process should amortize out to basically nothing.

                          I’d expect that the bare exec cost would only become a significant factor if you were stuck flip-flopping between JITing a code section and deoptimizing it, and at that point you’d gain more from improving your JIT candidate heuristics rather than folding the machine code generator in process and continuing to let it flip-flop.

                          There are other reasons they may want to move away from this C-intermediary approach, but exec cost doesn’t strike me as one of them.

                    1. 1

                      A bit orthogonal but still related to the overall theme: are there enough performance/portability benefits of JITs to continue running server applications targeting bytecode platforms? JVM was originally built for safely running web applications, and the CLR seems to exist to allow for portability across the wide range of hardware running Windows.

                      Are fancy VMs and tiered JITs necessary in a world where we can cross-compile to most platforms we’d want to run on? Languages/run times like Go and Haskell have backends that target a wide range of architectures, and there’s no need get intimately familiar with things like the JVM instruction set and how to write “JIT friendly code”.

                      1. 2

                        IBM i (nee OS/400) has an interesting solution where software is delivered compiled for a virtual machine, then compiled from virtual machine bytecode to native code on installation. I would like to see that model expand to other platforms as well.

                        1. 2

                          OS/400 is just…so different in so many ways. So many really interesting ideas, but it’s very different from just about every other operating system out there.

                          I wish there were a way I could run it at home, just as a hobbyist.

                      1. 1

                        Ok, but does this matter? Even a 50% speedup won’t matter much unless a program spends a significant portion of its time doing hash calculations on arrays, which seems unlikely (at least, unusual). I’m not too surprised the Java devs haven’t been too bothered about this.

                        1. 3

                          Performance is death by a thousand cuts. Good performance, performance so good you don’t have to bother optimizing your app, is often the result of thousands of people who sweat the details and agonize over nanoseconds and stalls caused by data dependence. People like the author.

                          1. 2

                            I would say that it’s not a problem - until it is. It’s good to have this kind of information available, but even the tone of the article is that one should avoid prematurely optimizing, since changes in the JVM are likely to render those optimizations moot at some point.

                            1. 2

                              I didn’t take that away. There are a few tidbits that are fundamental to Hotspot and probably aren’t going to change anytime soon. One is that more small methods is better than large methods as the JIT works best on smaller blocks of code and does aggressive inlining so hot methods have no call overhead anyway. Also final variables help with with things like constant propagation. Things related to GC performance are also worth over-fitting for as it takes like a decade between new GCs to get into the JDK.

                              If anything, I took away from the article that there’s enough time between JDK releases that micro-optimizations you implement now have a shelf-life of years, not months.

                          1. 3

                            man. It’s a shame that computers hosted in dorm rooms are the only places code can be deployed.

                            1. 2

                              Only if you’re a student and wanna do it for free.

                              1. 1

                                TIL Google, Facebook and AWS are all run out of dorm rooms.

                                1. 1

                                  Yeah yeah, the possibility of running services from your dorm room was a lot more compelling in the days before $5/mo VMs. Also I am guessing that today’s MIT students have far fewer desktop computers than they used to.

                                  Still, it’s one more way that the MIT spirit of innovation is nerfed for the current generation.

                                  (Disclaimer: I ran services on MITnet for about 8 years, of which I was a student for 2.5)

                                  1. 1

                                    First of all, by doing the whole thing yourself, you learn much more than by simply pushing a couple of buttons.

                                    We the die-hard UNIX community used to laugh at the Windows folks for pushing buttons with the mouse. Look what we’ve now become — noone really knows how shit actually works anymore without clicking those damn buttons.

                                    1. 1

                                      I’d argue that it’s now generally a bad idea to deploy your code as we did in the old days, by custom typing commands into a single machine’s terminal window. While it’s a good idea to know what each individual machine can do, the (good part of the) industry doesn’t really work on that model any more for externally-facing internet services.

                                  1. 3

                                    We already have the java tag, for the moment.

                                    Also, when requesting a new tag, it’s usually best to list relevant submissions in support of an assertion that it’d help.

                                    1. 15

                                      We already have the java tag, for the moment.

                                      Right, but Java is not Kotlin. It’s a different language.

                                      Also, when requesting a new tag, it’s usually best to list relevant submissions in support of an assertion that it’d help.

                                      I’ve updated my original post with some relevant submissions.

                                      1. 5

                                        There’s a scala tag, just saying. Though usually Scala posts are different enough from normal Java-ecosystem posts that they do merit their own tag, not sure if the same can be said for Kotlin.

                                        1. 3

                                          This is a great qualifier for passing the tag, I think. Either that or remove Scala and Java and just use JVM to encapsulate them all.

                                          1. 6

                                            I like the JVM tag idea since dotnet languages don’t get their own tags, it would be more consistent. Either that or add C# and F# tags.

                                      1. 5

                                        While the Internet is running out of addresses overall, MIT actually has a large surplus. We hold a block of 16 million IPv4 addresses. Fourteen million of these IPv4 addresses have not been used, and we have concluded that at least eight million are excess and can be sold without impacting our current or future needs.

                                        MIT was pretty much the only institution involved in the early Internet to hold onto their entire /8 block even in the face of IPv4 exhaustion. Stanford gave theirs back 18 years ago. UC Berkeley never even held one.

                                        Would it not be acceptable for them to gift back their block to ARIN in exchange for another (smaller) set of allocations, like what other organizations did when exhaustion started to hit? Right now it seems to me like they were just being greedy knowing they could exploit their outsized allocation for profit later.

                                        1. 5

                                          But selling it will, in a way, help with the move to IPv6. As long as IPv4 addresses can be had for no additional cost people are likely to delay switching. Once the price starts to rise, (more) people will start to switch. Gifting their allocation back to ARIN would have just temporarily held down the price. Selling it (particularly if they drive a hard bargain) actually helps to raise the price and push people off of IPv4.

                                          At the same time, there’s no reason it can’t be a dick move AND weirdly helpful… :-)

                                          1. 2

                                            They could have held onto them, removing the addresses from the pool so that we get to IPv4 exhaustion quicker.

                                            1. 1

                                              Yeah, that might have been even better.

                                          2. 1

                                            I believe they’ve sold a large portion of them (if not all of them) to Amazon. I guess it took Amazon so long to support IPv6 on AWS that they had to do something to get more addresses…

                                          1. 1

                                            I think some of Ullman’s points are fair: purely incremental papers (like what we often see from the AI community) can get a bit exhausting. However when it comes to systems research I tend to disagree: the field of computer systems is where practice differs the most from theory.

                                            While balanced BSTs have much faster asymptotic lookup times than BTrees, in practice BTrees often provide better spatial locality and we are less prone to allocator fragmentation when using large allocations rather than the small ones used for BST nodes, which lead to Rust’s BTreeMap. In some cases, certain techniques are found to work better in practice with the theory following only later, such as the work of Hogwild! to give theoretical backing to why asynchronous SGD works well for machine learning tasks. These are the types of things you wouldn’t be able to see unless you tested your implementation on a real functioning computer.

                                            1. 2

                                              The object type is a new type in 2.2 that matches any types except for primitive types. In other words, you can assign anything to the object type except for boolean, number, string, null, undefined, and symbol.

                                              Did TypeScript have non-nullable types by default before?

                                              1. 4

                                                The article has been revised since you asked your question. It now reads:

                                                In other words, you can assign anything to the object type except for string, boolean, number, symbol, and, when using strictNullChecks, null and undefined.

                                                  1. 2

                                                    “by default” :-)

                                                    1. 2

                                                      sorry, I didnt actually realize it wasnt a default - ive always used it since it was in beta so I guess I forgot.

                                                1. 2

                                                  Some of these complaints are very valid; while I want to love Scala as a language, some of the performance penalties you get are not worth it over trying to use a functional-style library in pure java, c.f. javaslang. Java 8 interfaces are to my understanding almost identical to traits, and the only thing really missing from Java is type inference (though there is a JEP for this at the local variable level).

                                                  1. 2

                                                    In 2010 I chose Scala for a low-latency web service, and quickly became quite worried with the Scala collections library vs Java’s until package. The API was impressive, but definitely complex, and at the end of the day I started to doubt the performance that was claimed (despite the Great Computer Language Shootout benchmarks). I didn’t use scala after that, but the takeaway is that the niceties of a rich collections API are only worth it with zero-cost abstractions.

                                                  1. 2

                                                    Studying for a trio of midterm examinations at uni: one is a deep-learning for NLP course[1], the other is one on static analysis for compiler optimizations[2], and the third is a databases course[3].

                                                    1. 6

                                                      I’ve been using Neovim for about a year now, haven’t looked back. Development cycle is a lot quicker, all my favorite old Vim plugins still work, and there’s been a huge ecosystem of new plugins that take advantage of Neovim’s remote plugin architecture (check out Deoplete and Denite, or any of Shuogo’s plugins).

                                                      I guess with the addition of packpath, you can no longer directly port your configs from vim -> neovim with zero changes. I’m curious if that was a conscious choice on Vim’s part to try and prevent people from leaving for greener pastures.

                                                      1. 3

                                                        I find it interesting that a lot of reasons have to do with type-level ambiguities. I know that when I first looked at Throwables.propagate, I also assumed it returned a (possibly) wrapped version of the same exception, and actually would often have blocks of the form

                                                        try {
                                                            // ...
                                                        } catch (IOException e) {
                                                            throw Throwables.propagate(e);
                                                        }
                                                        

                                                        In fact, this was standard style on some of the projects at my job this summer.

                                                        The more I’ve been playing with Rust, the more I’m starting to think that they get it the most correct of any language I’ve worked with. I see Result<E, T> as the more or less the typed version of C’s -1 return values, with the added bonus of being pleasantly chainable in a way that propagates the error up without having to do so explicitly (i.e. map).

                                                        1. 8

                                                          Welcome to where the ML world has been for 40 years.

                                                          Checked exceptions made sense, sort of, when Java didn’t have generics. Errors as values are cumbersome when your languages is bad at dealing with values, and the instinct to remove boilerplate and nudge people towards effective approaches is a good one. But in Java they’ve ended up as a second parallel type system, and worse, they’ve broken the ability to express the result of function evaluation as a value, even in “expected/normal” code paths (because any code path where it makes sense to use a checked rather than unchecked exception is probably in some sense expected).

                                                        1. 9

                                                          Favorite Passage:

                                                          Chubby [Bur06] is Google’s lock service for loosely coupled distributed systems. In the global case, we distribute Chubby instances such that each replica is in a different geographical region. Over time, we found that the failures of the global instance of Chubby consistently generated service outages, many of which were visible to end users. As it turns out, true global Chubby outages are so infrequent that service owners began to add dependencies to Chubby assuming that it would never go down. Its high reliability provided a false sense of security because the services could not function appropriately when Chubby was unavailable, however rarely that occurred. The solution to this Chubby scenario is interesting: SRE makes sure that global Chubby meets, but does not significantly exceed, its service level objective. In any given quarter, if a true failure has not dropped availability below the target, a controlled outage will be synthesized by intentionally taking down the system. In this way, we are able to flush out unreasonable dependencies on Chubby shortly after they are added. Doing so forces service owners to reckon with the reality of distributed systems sooner rather than later.

                                                          1. 9

                                                            Link to provisional docs.

                                                            I’m really excited about this. The interface reminds me of Glide somewhat, which I’ve toyed with in the past and seems like the most mature of the unofficial dependency management options. This seems like a better option because (AFAICT) Glide has no support for pegging to a SemVer release (I could be wrong here).

                                                            With better support for dependencies on a per-project level, I’m curious if there will eventually be elimination of GOPATH. I can’t really think of a good reason for it still to exist, other than maybe for simple projects where you don’t want to have to bother with project-level dependencies.

                                                            1. 3

                                                              Here’s hoping that GOPATH dies a fiery death. The requirement, by default, to keep my code 4 layers beneath my home directory always seemed completely obnoxious to me. Obviously there are ways around the various inconveniences this causes, but it’s still irritating, particularly because it’s so incredibly unnecessary.

                                                              1. 5

                                                                My pet peeve with GOPATH is that I sometimes write libraries which support multiple languages. I want to use a single git repo with all the code in it but GOPATH forces me to either split up my archive or put GOPATH tendrils everywhere. It’s ugly. I just want to compile it in place like I can with any other language.

                                                                1. 2

                                                                  Not sure why it’s:

                                                                  • Specifically your home directory (surely you put it anywhere)
                                                                  • Specifically four levels deep (I assume you refer to src/<domain>/<user>/<repo>)
                                                                  • Unnecessary (there are trade-offs, to be sure, but it’s not like there are no nice features it enables)

                                                                  The idea that something you dislike could not exist for a good reason is unhelpful in the extreme.

                                                                  1. 1

                                                                    Specifically your home directory (surely you put it anywhere)

                                                                    Because I normally keep my code in my home directory. It makes no difference where it is, the point is that it’s four layers of directories I don’t need or want.

                                                                    Specifically four levels deep (I assume you refer to src/<domain>/<user>/<repo>)

                                                                    Yep, you assumed correctly. For me, this means ~/Go/src/<domain>/<user>/<repo> instead of ~/repo. I find this endlessly annoying.

                                                                    Unnecessary (there are trade-offs, to be sure, but it’s not like there are no nice features it enables)

                                                                    Like what? What does it provide that would be infeasible with a different, less restrictive directory layout? I’m genuinely curious because I’ve literally never been in a situation where I said to myself “Gosh, I’m sure glad Go imposes this weird directory structure on me.”

                                                                    The idea that something you dislike could not exist for a good reason…

                                                                    I never said that. In fact, I only criticized the directory structure as the default. I feel that the current default is no longer the most reasonable default. Perhaps at one time it was the best default. That has no bearing on my opinion or argument.

                                                                    I’m not saying we should stone Rob Pike because he chose the default stupidly, if that were the case then I would need to demonstrate that there were no good reasons for the current default (I would also need to make a strong case for stoning as the most appropriate reaction, but that’s another matter). But I’m saying the default should be changed, in the present.

                                                                    1. 1

                                                                      For context, I use a GOPATH per project, so my app code is in eg src/server/main.go. I also use an editor with a good fuzzy file search implementation, so I can type usrpkm instead of github.com/user/package/main.go to find a file. These both substantially improve the situation.

                                                                      Like what? What does it provide that would be infeasible with a different, less restrictive directory layout?

                                                                      Making the name of the package map 1:1 to its location on disk makes navigation (‘where is this code’) easy. In eg ruby and node this is a runtime concern (and can be hooked by arbitrary code), which makes static analysis impossible.

                                                                      An alternative directory layout would need to preserve the property of being easy for tools to navigate.

                                                                      I’m saying the default should be changed, in the present.

                                                                      One of the key reasons to use go is that they carefully avoid breaking changes.

                                                                  2. 1

                                                                    totally agree that GOPATH needs to go away, although not just because of the deep directory structure. We now use the vendor directory for managing our projects' dependencies which works well, until a dependency is forgotten and is pulled out of GOPATH instead. this breaks builds (because we do not commit our deps, preferring to use glide to restore them to the correct versions) and makes deterministic builds more difficult. Another issue I’ve hit up against is that it becomes impossible to have two different working copies of a repo without setting up a second GOPATH, at which point, why not just use a project-based build tool to begin with?

                                                                  3. 1

                                                                    Unsure what you mean by “pegging to a semver release” but with glide you can either specify the version as “1.2.3” which will use that exact release version and no others, or “^1.2.3” which means “semver compatible with 1.2.3” (it is shorthand for >=1.2.3, <2.0.0).

                                                                  1. 1

                                                                    Is there a browser plugin that will enable PASTE in these situations?

                                                                    1. 2

                                                                      I’d use Tampermonkey for something like that, just sets the onpaste to undefined for all input elements in the page.

                                                                      I recently switched to FF (nightly) and tried to use Greasemonkey but it seemed unusable, but then I found out Tampermonkey works on FF and Chrome and was able to port my old scripts over :)

                                                                      1. 1

                                                                        That’d be tough to write because you’d have to constantly update it against various hacked-together ways of disabling paste. Better to open devtools and set the field value to something in the JS console.

                                                                      1. 2

                                                                        Some people think that these system calls are a good way to improve the performance

                                                                        Who does? There seems to be no source given to back up this claim.

                                                                        I was missing this and other references while reading through.

                                                                        1. 3

                                                                          It’s often used in data systems, as an example memcached allows you to pin all process memory. DBMS often pin pages in their buffer manager to main memory as well to prevent pessimal behaviors due to interaction with the OS buffer cache. It’s a fairly common syscall and often necessary as in the examples above, but the article points out some common pitfalls that are useful to keep in mind.