Threads for joe_the_user

  1. 1

    How can a “first-principles understanding of deep neural networks” make sense?

    I mean, look at the general linear model. It is consider a fairly well worked out statistical model. You could say that there is a fairly good theoretical understanding of how the model works (I remember it as if errors are normal, min square regression yields a maximum likelihood estimator. I’m probably butchering the formulation but it doesn’t matter for this). But if you apply that model to, say, a physical system, you need some understanding of the system or you’re just application of this model doesn’t really has a theoretical basis.

    Now, jump to neural networks. These are in practice just curve fitting. Even more, their intended use case is “stuff we have no model of”. Moreover, they aim to approximate stuff with a variety of concrete internal structures that the approximation process doesn’t directly take into account. Which is to say they model on the level of “it’s probably X” but not on the level of “here’s the estimated distribution of the thing’s error around X” and I don’t see how they could get something like that.

    1. 13

      This might sound simplistic but “be Latvian” would be first on the list.

      The USSR created an education system that produced more skilled and educated people than the country could use and situation seems to have continued post-Soviet. So local people have skills, don’t have opportunities in the West where programming is well paid and are more trustworthy or at least easier to understand to serious cyber criminals who recruit them (who are from the ex-Eastern Bloc for the same reasons).

      It also seems plausible that those recruited to the gang aren’t actually taught anything about security. The simplest way to run a gang would be a multitier system where inner tiers just have to recruit the outer tiers and so just have to protect themselves from the outer tiers. That an outer tiers get caught regularly could be an advantage - it prevents them from threatening the inner tiers. This sort of thing could also be going with the recent FBI/AFP take-downs. The individuals arrest there would logically be “mid-tier thugs”, have no coherent understanding of security and can never learn it ‘cause by the time they understand, they’re in prison for a long time.

      And sadly this also shows how the variety of enforcement actions one sees are unlikely to change basic organized crime phenomena.

      Longer considerations of mafias and the modern world here:

      1. 1

        Are they ethnic Latvians or Russian colonists from the Soviet era that stayed in Latvia after the collapse? Mix of both?

        1. 1

          I know nothing of the details involved, just the overall history that lead up to this situation.

          1. 1

            Ah ok. You just made a very strong statement about Latvia. I guess you were just talking about Soviet bloc in general.

            1. 1

              I said Latvia just because I assumed that any gangsterism is very local. The Latvians would trust other Latvians, the Russians would trust other Russians and so-forth. Organized crime the world over tends to involve common ethnicities because that allows trust and communications, not because any given ethnic group has a tendency towards crime.

              1. 1

                According to the documents she’s a Russian national, and another site mentioned she studied in Latvia. Don’t know which one is correct, and things like “are you Russian or Latvian?” were murky anyway in the aftermath of the collapse of the USSR (actually, still are today; see: Ukraine). Most others involved were Russians though (and one Ukrainian).

      1. 2

        From the linked reference: Defunctionalization is thus a whole-program transformation where function types are replaced by an enumeration of the function abstractions in this program [1]

        I’m not sure what they mean by enumerate. Suppose you have a program that begins with a few functions and creates more using function-transformations. So the total count of functions is not finite and it seems these couldn’t be finitely enumerated.

        But maybe I’m not quite getting the idea.


        1. 1

          Perhaps there is a finite basis. If so, then we could build all relevant functions from a small set of combinators. It happens to be the case that the essential structure of Cartesian closed categories is captured by such a finite basis. Indeed, the And constructor from the article would be part of such a basis.

          To explicitly construct the enumeration, sort all of the basis’ constructors, and assign an index to each. When the index grows greater than the number of constructors, wrap around with a modulus operation and continue on, ala ASCII. Discard any ill-typed expressions.

          1. 1

            Hi, OP here.

            A “function abstraction” literally refers to a lambda expression (/ function decl). The term comes from the lambda calculus. So these are indeed finite.

          1. 1

            While reducing carbon emissions is a good thing, that rhetoric overlooks a tiny fact… you can only emit 1.8 tons of carbon dioxide if your institution is willing to pay your travel expenses, or you can afford it. Oh, and if the country of destination grants you a visa. For a lot of people in the world, neither of these is true.

            1. 3

              I think this refers to the fact that many conferences require a presenter in order for a paper to be published. Maybe if you get a substitute presenter, then two papers are sharing that 1.8, but in general someone has to travel for the paper to be admitted. So I’d say it’s more than rhetoric.

              Visa issues in my experience lead to a lot of last-minute substitute presenters to ensure the papers are not removed from publication. This means truly awful presentation quality and no questions at worst or just no opportunity for questions at best.

              1. 1

                Right. My point is when many people are simply barred from that publishing process by travel requirements, carbon emissions from those who can participate look like a rather minor issue.

                1. 3

                  Your original use of “overlook” gives the impression the limits on who can travel is a counter-argument to the carbon-production point of the OP. But it isn’t. The carbon is definitely sent into the atmosphere based on the number of attendees. Moreover, the argument that only some people can travel is a good point but one that basically further reinforces the point of the author, that there problems with conferences and not requiring physical travel might be a good thing.

            1. 1

              I think others have pointed out that talking about the probabilities here involves something like sampling over a space of hypothetical possibilities larger than Ackerman’s billionth number. It’s debatable whether a discussion here is meaningful but let’s suppose these possibilities all “exist” and we can talk about their likelihood and so talk about the probability that we’re in one of the “simulated world” possibilities rather than one of the “non-simulated world” possibilities.

              The thing here is that with the existence of these zillions of possible simulations we have to assume that there are many such simulation arbitrarily close to “our world”. So, we can reasonably assume that if, on one particular simulation, “our simulation”, one God-like entity suddenly decides to cheat and start playing favorites or doing whatever rules violations they choose. Well, one can assume that on many of the other hypothetical simulations, no cheating happens and the world continues to have a consistent logic. Between the various very-close worlds, which is really us? One can’t meaningfully answer directly but one note that an internally consistent trajectory continuing forward seems likely. We can call that us - it seems like what we’ve experienced so-far.

              To put it in Bostrom’s language - if a substrate is low-level and consistent enough, an intelligent entity doesn’t need to know, can’t and doesn’t have to know what the substrate “exists in”. Indeed, whether the substrate exists in a single place is debatable. This seems especially true if we’re taking an “all the hypotheticals exist” approach in our argument logic.

              1. 11

                Oooh, here are some of mine that I think most languages would benefit from:

                • First-class contracts, a la Eiffel or Ada12
                • The ability to slap metadata on things, like Clojure has
                • hyphens-in-names
                • Matrices that are not just arrays-of-arrays, kinda like J has

                And some that I like a lot but are probably a bad idea in most cases:

                • Support for quantifiers over finite sets, like \forall and \exists
                • All operators have the same operator precedence, end of discussion, so 2 + 2 * 2 is 8 instead of 6
                • Being able to write func-from(x)-to(y) instead of func_from_to(x, y)
                1. 3

                  Support for quantifiers over finite sets, like \forall and \exists

                  Why limit yourself to “finite sets”? These quantifiers are perfectly valid/computable for (countably-) infinite objects.

                  1. 1

                    Because predicate logic over infinite sets is undecideable.

                    1. 4

                      Undeciability by itself shouldn’t be a problem. I would like to be able to do a variety of searches which might never end, with an extra provision to stop after a certain time period.

                      1. 1

                        That’s basically how the SPARK Ada people do solvers. They set a 2 min limit as the default. It either proves it easily or more work must be done.

                      2. 1

                        Sure, but you can get almost all of the useful functionality by quantifying over (possibly non-finite) types instead.

                        1. 1

                          Quantifying over infinite types is also undecidable.

                          1. 2

                            How does that matter? If we want to know whether an existential type is inhabited, it is up to the programmer to provide the inhabitant (or the proof of noninhabitance).

                            1. 2

                              I think we’re looking for different use cases. I want quantifiers to be able to say things like

                              if ∀x ∈ S: pred(x) {

                              (not a great example, just showing the kinds of stuff I regularly write for my work). Asking the programmer to provide the proof takes away from the benefits, which is having a computer do predicate logic for you.

                              1. 1

                                Ah, that’s completely different that what I was thinking of. I was thinking of quantifiers on types on the type level, not the object level:

                                (f : (∀x : A).P(x), y : A) ⊢ f y : P(y)

                                (x, p) : (∃x : A).P(x) ⊢ p : P(x)

                                In this system, your example would be something like

                                (∀S : U₀).(∀x : S).P(x) ⇒ ⊥

                                Where U₀ is the type of type (universe of types). The inhabitant of this type would be a program that takes a type S, an element x of S, and a proof of P(x) and produces a contradiction.

                    2. 1

                      All operators have the same operator precedence, end of discussion, so 2 + 2 * 2 is 8 instead of 6

                      Nah, I think we should [where possible] standardize on the one from math that we were taught in school: Please, Excuse My Dear Aunt Sally. I mean, the operations with those initials. :)

                    1. 2

                      The idea of a language specifically targeting GPUs is interesting. One thing I’d mention here is that such a language actually would not have to be only vector-based.

                      A project I’ve been interested in for a bit is Harry Dietz’ MOG; this translates general purpose parallel code (MIMD, multiple instruction, multiple data) to the GPU with at most a modest slowdown time (1/6 + running vectorized instructions at nearly full speed).


                      1. 3

                        GPUs are quite a bit more flexible in their control flow than traditional SIMD machines (NVIDIA calls this SIMT), so I think it’s quite clear that you could have each thread do quite different work. The problem is that this is going to be very inefficient, and I don’t think a x6 slowdown is the worst it can get. Worst case warp/wavefront divergence is a x32 slowdown on NVIDIA and a x64 slowdown on AMD (or maybe x16; I find the ISA documentation unclear). Further, GPUs depend crucially on certain memory access patterns (basically, to exploit the full memory bandwidth, neighbouring threads must access neighbouring memory addresses in the same clock cycle). If you get this wrong, you’ll typically face a x8 slowdown.

                        Then there’s a number of auxiliary issues: GPUs have very little memory, and if you have 60k threads going, that’s not a lot of memory for each (60k threads is a decent rule of thumb to ensure that latency can be hidden, and if the MOG techniques are used it looks like there’ll be a lot of latency to hide). With MIMD simulation, you probably can’t estimate in advance how much memory each thread will require, so you need to do dynamic memory management, likely via atomics, which seems guaranteed to be a sequentialising factor (but I don’t think anyone has even bothered trying to do fine-grained dynamic allocation on a GPU).

                        Ultimately, you can definitely make it work, but I don’t think there will be much point to using a GPU anymore. I also don’t think the issue is working with vectors, or data-parallel programming in general. As long as the semantics are sequential, that seems to be what humans need. Lots of code exists that is essentially data-parallel in a way roughly suitable for GPU execution - just look at Matlab, R, Julia, or Numpy. (Of course, these have lots of other issues that make general GPU execution impractical, but the core programming model is suitable.)

                        1. 1

                          Thank you for the reply.

                          I assume you have more experience in making things work on a GPU than I. Still, I’d mention that the MOG project tries to eliminate warp divergence by using byte and a byte-code interpreter. The code is a tight loop of conditional actions. Of course, considerations of memory access remain. I think the main method involved in devoting a bit of main memory to each thread.

                          I believe Dietz is aiming to allow traditional supercomputer applications like weather and multibody gravity simulations to run on a GPU. One stumbling block is people who buy supercomputers are working for a large institution and aren’t necessarily that interested in saving their last dime.

                      1. 12

                        sharing of high-level abstractions of data between documents or applications

                        COM? :D

                        Seriously though, this is a fundamentally hard problem. Every app represents high level concepts in its own way, and the higher-lever the concept, the more differences there are.

                        1. 2

                          The thing about this suggestion is it kind of shows how hard a design problem an OS is. The particular facilities an OS provide make a kind of sense but there are lots of other things that I think it’s natural to want and which quite a few have attempted to add to the basic OS functionality.

                          It’s interesting that memory allocation is split between the OS and whatever language one uses. Exactly how and why is an interesting problem.

                          1. 1

                            Yeah, when I read “Text is just one example. Pictures are another. You can probably think of more. Our operating systems do not support sharing of high-level abstractions of data between documents or applications.” I was like “umm copy/paste?”

                            There are ways to share that stuff. Copy/paste (and drag and drop and related things) actually kinda do solve it. And the formats you use there can be written to disks too - .bmp, .rtf, and .wav files directly represent the Windows formats.

                            Like I agree there are weaknesses and a lot of reimplementations, but it is plainly false to say there is no support.

                          1. 1


                            1. In terms of engineering, “software” is not a thing comparable to “planes” or “elevators”. Software “can be anything” and so a random thing “done with software” is a random thing that might have no particular engineering practices involved in it’s construction. Planes now use software for example but the pre-software engineering practices in place seem to have kept things as safe as previously.

                            2. Software in the generic does not necessarily follow the logic of a thing bounded by time and space. So our natural human intuitions about vulnerabilities don’t necessarily apply. Finding and plugging a vulnerability is like solving a complicated puzzle or math theorem. No one really can predict when that will occur.

                            3. The intuition of managers and decision makes about software is even worse than the intuition of actual software engineers. All the real engineering fields that exist don’t rely on just the hope that upper managements will make allowances for the necessary means to create reliable software - they include field specific regulation. But back to the point of “software” not being one thing (relative to engineering) but a million things. You can apply software to areas where bugs and failure have little impact, you can apply software to areas where it has immediate impact and you apply it to areas where the impact is felt to by others or only appears over time. The people controlling the purse things wind-up concerned, at best, with immediate impact situations. Considers engineers certainly “reinvent the wheel” on a regular basis - engineer need to engineer the wheels used on a given plane for the particular tight constraints involved in flight. Sure, you could stick heavy truck tires on a 747 and it might land safely. The problem is a purse-string pullers may well be happy with the trade-offs of a thing that mostly works for a cheaper price. If people could build skyscrapers with no immediate consequences when they collapsed, would they do so? Of course they would - we occasionally things like that in less regulated areas (China in the 2000s, say).

                            1. 2

                              Seeing this on Hackernews, it reinforced my feeling that ordinary philosophy is kind of weird.

                              I would look at a model-theoretic equivalent. Suppose you have a model and an axiom systems. You might incorrectly prove a given theory that is actually independent. But it might actually be the case that you working with a model within which that theory happens to be true. This seems like the analogue of the “justified belief that isn’t knowledge” examples. The thing here, is that you easily fill in the “hole” by specifying that one’s justifications have to be correct.

                              And altogether, it is surprising that philosophy didn’t put the constraint on “justified” to meaning “a fact that makes another statement true” rather than it being a fact that merely convinces some people.

                              I suppose the problem is philosophy is dealing with less certain situations and the philosopher is trying to come up with a compact piece of rhetoric that would convince another human rather than engaging automatable process.

                              1. 2

                                That raises questions about what a “justification” is, and what it means for a justification to be “correct”. Epistemology deals with messy questions of understanding and knowledge. I know I’m going to drink water tomorrow. My justification is that every day for my entire life, I drank some water. But inductions make terrible justifications: I could die in my sleep. So the justification is not correct, and yet I still know I’m going to drink water tomorrow.

                                1. 1

                                  It’s a very difficult book, but John Hawthorne’s Knowledge and Lotteries discusses this kind of case. There’s some kind of equivalence between “I will drink water tomorrow” (which we tend to think you can know, despite the small probability that you’ll die), and “I know this lottery ticket will lose” (which we tend to think you can’t know, because of the small probability that it will win).

                                  I’d disagree that induction doesn’t provide justification in this case. It just provides fallibilist justification–justification that cannot rule out the possibility that the proposition justified is false, though it makes it very unlikely.

                              1. 1

                                Well, this is interesting and obscure. Dpll and friends happens to be something I’m interested in.

                                I’d wonder where Answer-set fits with “satisfaction modulo theories”.

                                1. 1

                                  I have no idea since I know neither specialty hands-on. I will give you these links which might lead to Lobsters-worthy material on the topic. :)

                                1. 2

                                  What I always read “extreme programming” as was an approach not necessarily based on any new ideas - all the parts of design, implementation, planning, maintenance, etc were there. Rather it was an attitude of using a few approaches to cut through complexity.

                                  The ideal is not so much turning one’s back on good planning principles but having a carefully curated and limited subset of all the reasonable ideas, one calculated to always bring things forward. That sounds reasonable but could just be the come-on for the latest snake-oil.

                                  1. 2

                                    that’s right, Extreme Programming didn’t introduce any new practices (though test first had been lost and needed rediscovering). Kent Beck described discovering the practices as “seeing what hurts and doing more of it”.

                                  1. 1

                                    I began programming in 90s, when OO hype was at it’s highest, and so I definitely feel the apparent failure of reuse strongly even now.

                                    Of course, one part of reuse is the “glass half-empty/glass half-full” effect. Of course code reuse happens but of course the failure of reuse also happens. The key problem is describing the ways that this failure happens, I think.

                                    I’d divide our apparent failures into two parts.

                                    A. The failure of “effortless” reuse. OO originally talked about objects being created during the ordinary process of coding, as if the problems of engineering libraries could be ignored. This fantasy thankfully is mostly done. However, this is also the less fundamental part of the failure of the idea of reuse, since there’s a solution - just make or use object-libraries or just libraries (whether OO, DSL or procedural approaches work better here is a secondary question imo).

                                    B. The less-than-complete-success of any encapsulation effort. The failure of OO, procedural or other library in terms of the failure of these to fully hide their implementation details when they used frequently in a large-ish application. This isn’t saying libraries, operating systems, graphic drivers and high level language are useless. The problem is all the abstractions wind-up “leaky” on some level and so when the programmer is programming and using, say, 10 abstraction layers, the programmer is going to be forced to consider all ten layers at one point or another (though naturally some will be worse than others, some only in terms of performance but that’s still consideration imo). The lpad event that broke the web a bit ago is one extreme example of this sort of problem.

                                    So “B” is bigger problem. It seems to limit how many layers of abstraction can be stacked together. I don’t know of any studies that directly tackle the question on these terms and I would be interested if anyone has links here.

                                    1. 1

                                      I sometimes think we are sometimes just really bad at memorizing advice and passing it on accurately. While OOP was advocated for with the “reuse” argument, there was also a “use before reuse” principle. But these subtleties just seem to get lost, when people start writing up syllabuses and introductory material.

                                      I learned programming mostly with tutorials and books that were written in the 90ies when OOP craze was in full bloom (and that material spent a lot of time explaining “OOP principles”, much more I think than a modern book on Java/C++/Python does). Anyway, I often did not find OOP helpful a lot for structuring my code, finding good OO models was hard and hardly seemed worth it.

                                      Fast forward this month. I borrowed a book on DDD patterns and started tinkering with the patterns outlined there and I must say for the first time in my life I have the feeling that I have a reasonable strategy in front of me for mapping business logic to classes/objects. And differs a lot from the naïve examples that I just recently saw in a mandatory corporate training.

                                      Who knows when functional programming will reach this point where the original motivations are already buried so deep that they cannot be seen anymore.

                                      1. 3

                                        Having invested a lot of time into DDD over the last eight years, I personally think a lot of it isn’t as valuable as it initially seems. The various materials ultimately describe new names for things you probably already have names for, but this author wants you to call something else.

                                        While neat to read, I’d caution against trying to go through your codebases renaming everything, which anecdotally has been what new readers first do. That often is a big time sink that doesn’t have any payoff at all other than just new names for old concepts.

                                        However, reading about CQRS (an often referenced outgrowth of DDD) is a big deal, and would cause you to actually structure everything in a different way that can add potential benefit (immutability, fast read operations, etc). I highly recommend at least reading up on that.

                                    1. 4


                                      One potential problem is that any approach to coding that allows pattern-based auto-fixes seems like it would be an approach to coding that could be done away with by coding at a higher level. The DRY principle involves replacing boiler plate code with functions or macros.

                                      Of course, there are limits to how much functionalizing one wants to do at any one point. But there is no limit to the theoretical degree to which you can removing repeating code.

                                      Which brings another potential problem. Applying pattern-recognition to bug finding may wind-up being dependent on the coding-style of the code that’s being looked. This approach might wind-up being great at finding the bugs that appear in Facebook’s code-set, which I assume is huge and has a characteristic style but the approach may wind-up problematic for code outside Facebook’s context.

                                      All that said, this is certainly an interesting approach.

                                      1. 2

                                        I agree computer vision is slowing down, but natural language processing is progressing. See NLP’s ImageNet moment has arrived.

                                        1. 2

                                          I would say language processing already had it’s ImageNet moment. That was the moment that Watson won at Jeopardy. I’d actually almost forgotten.

                                          But I’d also say the way we forgot this stuff correlates with the way is gradually stops mattering. Winning at Jeopardy or classifying a ginormous but well-define image set or winning at the game of go suggests computers are catching up to humans and then a look at the wide range of human capacities suggests otherwise.

                                          In a sense, with deep learning, there hasn’t been any progress on vision as such. People have simply made progress on adding firepower to a very powerful but narrow pattern recognition system and turned this cannon on various particular problems.

                                          Of course, Watson’s victory didn’t involve any deep learning. It simply leveraged a few universities’ NLP libraries and choose a situation clever association was most of what mattered.

                                          1. 2

                                            My reading of that article is that its authors are eagerly anticipating progress in NLP similar to that seen in CV six years ago, and for similar reasons (we’re figuring out how to use pre-trained hierarchical representations in this domain too). So, it’s not a done deal – and even if the new approach does work out, it’s not a fundamentally novel technique, and there’s no reason to think that applications using it won’t encounter the same difficulties as deep-net CV approaches are now.

                                            1. 1

                                              I agree about everything you wrote, but if NLP is to progress as much as CV did and encounters current CV difficulties, almost everybody will consider that a great success, even if it’s not fundamentally novel, etc. And such success seems likely.

                                              By the way, since that article was written, Google BERT, using the same approach, broke all SOTA records by OpenAI GPT.

                                              1. 2

                                                I agree about everything you wrote, but if NLP is to progress as much as CV

                                                Well, I couldn’t say it strongly since I’m not that much of an expert. But I could suggest, say that my hunch is, that NLP may have already “progressed as much as CV” has. In both NLP and CV, you have impressive seeming applications that get you associations between things (puzzles and potential-solutions, texts and sort-of-translations in the case of NLP). In neither CV nor NLP do you have “meaning-related” functionalities working very well.

                                                The main thing may be that NLP naturally requires much more meaning-related tasked than vision.

                                                1. 1

                                                  Cool! I look forward to seeing adversarial examples in this domain.

                                            1. 3

                                              One scenario is that the momentum and achievements behind current deep learning is strong enough that we won’t have the same sort of winter as the original AI winter. Rather, we’ll have a retrenchment/hunkering-down. An effort will be made to separate economical use-cases from everything else and shed the pure snake-oil parts.

                                              Until you have real intelligence or something, there are verifiably going to be a few cases where deep learning can beat every other solution. Old expert systems never went away but expert systems were much more like extensions of ordinary math. Deep learning systems are a new or a different way of doing things. Once their limits become visible, they seem like a kind of icky way to do thing but still. Building a ginormous black box and using sophisticated but ad-hoc training methods to get it predicting thing is only cool when is seems like it will open up new vistas. When/if it becomes clear it’s more like a cul-de-sac, that it works in a particular though impressive used case and then gradually hits diminishing return, it’s kind of ugly (imo). But it’s not going away.

                                              1. 1

                                                Well and lets be clear, these strategies actually do solve some problems meaningfully that we couldn’t solve before. The first AI golden age gave us chat bots.

                                                1. 4

                                                  Remember like three or four years ago when chat bots were going to be the next big thing again? That sure didn’t last long.

                                              1. 3

                                                So to what extent is “probabilistic programming” a new/different programming paradigm and to what extent is it something like a DSL for setting-up various probabilistic algorithms?

                                                Not to imply I’d dismiss it if it was the latter.

                                                1. 2

                                                  People have approached it from both sides, so some systems have a flavor more like one or the other.

                                                  A very simplified story which you could poke holes in, but which I think covers some part of the history is: The earliest systems (from where the name came) thought of themselves, I believe, as adding a few probablistic operators to a “regular” programming language. So you mostly programmed “normally”, but wherever you lacked a principled way to make a decision, you could use these new probabilistic operators to leave some decisions to a system that would automagically fill them in. If you want to analyze what the resulting behavior is though, it’s somewhat natural to view the entire program as a complicated model class, and the whole thing therefore as a way of specifying probabilistic models. In which case, if you want the system to have well-understood behavior, and especially if you want efficient inference, there’s a tendency towards wanting to constrain the outer language more, ending up with something that really looks more like a DSL for specifying models. Lots of possible places to pick in that design space…

                                                  1. 2

                                                    I think it’s useful to think about it as something like logic programming. Logic programming is useful when the answer you want can be posed as the solution to a series of logical constraints. Probabilistic programming shines your problem can be posed as the expectation of some conditional probability distribution. Problems that benefit from that framing are particularly well-suited for probabilistic programming.

                                                    I think it’s use in practice will like SQL or Datalog resemble using a DSL as you don’t need probabilistic inference everywhere in your software, but in principle as it is just an extension of deterministic programming it does not need to restricted in this way.

                                                  1. 3

                                                    I have many times, mainly for purposes of program synthesis.

                                                    I think it’s quite useful, esp. if you’re iterating on a problem, and looking for someway to describe that problem naturally, without having to think about implementation details.

                                                    1. 2

                                                      Sounds interesting, can you share any more details about this work?

                                                      1. 3

                                                        absolutely! I work as “red team” (previously in adversary simulation, currently in more technical correctness types of situations), so very often I’m presented with:

                                                        1. some set of “things” I need to “do” (API calls native or web, some format I need to construct, some code I need to generate many copies of with minor variance, what-have-you)
                                                        2. a system that I’m not supposed to be on with limited tooling (“living off the land”)
                                                        3. with a large amount of repetition

                                                        so often the easiest way is to simply write something in a simpler format that generates the steps above so that attack chains can be more easily constructed.

                                                        A simple example was that I had Remote Code Execution on a host via two languages (one was the platforms scripting language, the other was native Unix/Windows shell), but only 100 characters at a time (as they were packed in images, with no bindings to Python). So, rather than attempt to write a Python binding or fight with make a generic system using the libraries provided, I:

                                                        1. wrote a simple description format (creds, commands to be run, host, &c)
                                                        2. wrote a compiler to the long horrible chain of things I just described that produced “ok” C
                                                        3. delivered that to team + client for proof of concept

                                                        it’s a weird example of basically partial evaluation, but it works for me, and is usually easier for me to digest than attempting to get all the moving pieces in one go.

                                                    1. 6

                                                      This is partly what a recent paper & submission to, “Collapsing Towers of Interpreters”, Amin, Rompf, is about. Of course this is recent research so applying it will probably be an exercise.

                                                      This is overkill for your use case as you’re not talking about many languages, just your one. Can you compile your program to C/C++/Go or whatever you’re most comfortable with rather than interpreting it?

                                                      1. 2

                                                        That paper looks like it would be very useful whether I implement it directly or not.

                                                        My ultimate target now is simple bytecode for a unique platform. Compiling to C would mean creating a C-to-bytecode compiler too.

                                                        1. 3

                                                          You could perhaps implement a backend for LLVM or some other C compiler emitting to your custom platform. Then you’ll get the LLVM optimizations. I don’t know what your time/money budget is :)

                                                          1. 4

                                                            Yeah no,

                                                            The paper you originally linked to actually is what I need and not overkill. It happens I am dealing with the “interpreter tower” that the paper references.

                                                        2. 1

                                                          That feel when people don’t have a ton of background on the project, but want to tell you that things don’t fit your use case anyway 🙄

                                                          It’s like “rtfm”, but surprisingly even less valuable.

                                                        1. 18

                                                          I maintained the Wasabi compiler for Fog Creek Software, which was only used to compile FogBugz (and FogBugz-adjacent projects). The purpose was not the same as yours, though.

                                                          1. 12

                                                            Similarly, Facebook developed HipHop just to compile their “one” PHP application.

                                                            1. 4

                                                              What did you guys inside Fogcreek think of all the vitirol Wasabi got online? I recall reading a lot of threads about on hn and elsewhere that decried the entire excercise as being misguided at best, for example.

                                                              1. 10

                                                                Like many things, I think most of the outrage came from people who don’t read good, and the rest from people who think because they are reading about a decision today that means it was decided today. Contrary to popular belief, Fogcreek didn’t decide one day to write a bug tracker, and then put “write custom compiler” at the top of the todo.

                                                                I think the takeaway was never talk about solving a problem other people may not have.

                                                                1. 4

                                                                  I like Joel’s response best:

                                                                  What’s more interesting about the reaction is that so many average programmers seem to think that writing a compiler is intrinsically “hard,” or, as Jeff wrote in 2006, “Writing your own language is absolutely beyond the pale.” To me this sounds like the typical cook at Olive Garden who cannot possibly believe that any chef in a restaurant could ever do something other than opening a bag from the freezer and dumping the contents into the deep fryer. Cooking? With ingredients? From scratch! Absolutely beyond the pale! Compilers aren’t some strange magic black art that only godlike programmers are allowed to practice. They’re something you learn how to create in one semester in college and you can knock out a simple one in a few weeks. Whatever you think about whether it’s the right thing or not, if you think that a compiler is too hard to do as a part of your day job, you may want to invest some time in professional development and learning more about your craft.

                                                                  As someone who took one semester of compilers in college, and ended up maintaining this compiler for several years, I agree. People create new web frameworks all the time. People create their own DSLs and ORMs. There’s nothing harder or weirder about compilers than making tools at these other layers of the stack, but for some reason “compiler” shuts off the part of some people’s brains that lets them think “oh, this is just a program, it takes input and creates output and is 100% understandable.”

                                                                  (I have this same belief-bug, but mine’s around encryption and security.)

                                                                  1. 2

                                                                    I think a lot of people assume you have to have all the optimizations, too. The work that goes into compilers and their optimizations are often mentioned together. In many applications, one doesn’t need all those optimizations. They’re worried about a problem they wouldn’t have.

                                                                    1. 2

                                                                      Yep! Wasabi targeted .NET, where even for C#, most of the optimization’s actually in the CLR JITter, rather than in the ahead-of-time compilation phase. We chose to write a transpiler for Wasabi 3 rather than generating bytecode directly, but even if we had done the latter, we would still certainly have done almost no optimizations ourselves. (It also helped that our previous target runtime, ASP VBScript, is notoriously slow, so switching to .NET and doing zero other optimizations was still an enormous performance win.)

                                                                2. 3

                                                                  Googling gives a dead link to a blog about this. Does these blog entries live anywhere now?

                                                                  Is it this?

                                                                  1. 1

                                                                    I have mirrored my two Wasabi posts onto my personal site:



                                                                    Also, my Hanselminutes episode is at, if you want to listen to me blather about it for 30 minutes.

                                                                    1. 1

                                                             is currently undergoing some sort of maintenance due to the recent acquisition of Manuscript/FogBugz/Kiln. I’ll see if I can repost the Wasabi articles on

                                                                      “Wasabi: The ??? Parts” is, basically, the documentation for Wasabi. It was not written by me, but I re-hosted it publicly for the people on HN who asked for it.