1. 35
  1. 9

    The next step, being code negative in other peoples software repositories via sharing knowledge alone :)

    1. 18

      The bullet point on my resume that gets the most comments from interviewers says:

      Reduced codebase by 110 KLOC (43%), largely by rewriting the Java subsystem in Python. Increased reliablity from 93% to over 99.5%.

      1. 3

        That’s the very crux of the problem. How would you shared knowledge without writing code? Well, there’s still an option to write academic papers, but given the rift between compsci academia and practicioners of programming I would expect it not to be very efficient.

        1. 2

          Well you can talk to people.

          1. 5

            Think about it in memetic terms.

            The idea is a meme. The code is its reproductive organ. The code is ‘useful’ so that it can lure its prey (a living human brain). Once the code is used the idea is repeatedly injected into the brain.

            Compare that to talking to people where the idea is basically let floating in the space to be voluntarily accepted or not.

            The former approach is much more efficient.

            1. 1

              Ideas spread fine on their own. For example I’m about to convince you of this without a single line of code. There’s no need to push things into formal language when they make sense in nonformal language. I don’t need to tell you the steps of how to build a boat for you to realize that some method of traveling over water is good. In fact I’d argue that if I told everyone the exact steps to build a boat most would miss the point about what the boat is for. They’d get caught up in the details and fail to capture the bigger picture.

          2. 1

            English descriptions with formal specifications and/or pseudocode accompanying them in a formalism that’s simple. That was the standard for high-assurance security. It worked consistently so long as the formalism could handle what they were describing. The caveat that the teams needed at least one specialist to train them on and help them with the formalism. If we’re talking CompSci, that could just become another 101 course where people are exposed to a few easy ones.

        2. 7

          @sustrik, if you haven’t, you need to look at the STEPS project that Alan Kay et al did at VPRI. They’ve been taking the concept of expressing a lot with a little to point of whole OS that way. I’m linking to the first paper and the publications where to see the next just use Find for “Steps” working from bottom to top since they did the reports annually.



          Highlights to illustrate the concept:

          “JavaScript is not an Ultra High Level Language (it is a VHLL, a bit like Lisp with prototypes) but it is well and widely understood enough to make a useful vehicle for comparisons, and for various reasons we have used it as a kind of pivot point for a number of our activities this year. About 170 lines of meta-description in a language that looks like “BNF with transformations” (OMeta) is sufficient to make a JavaScript that runs fast compared to most of the versions in browsers (because IS actually generates speedy machine code rather than an interpreter).

          The OMeta translator that is used to make human readable and writable languages can describe itself in about 100 lines of code (it is one of these languages).

          IS can make itself from about 1000 lines of code (of itself described in itself).

          One of the many targets we were interested in this year was to do a very compact workable version of TCP/IP that could take advantage for a rather different architecture expressed in a special language for non-deterministic processing using add-on heuristics. Our version of TCP this year was doable in these tools in a few tens of lines of code, and the entire apparatus of TCP/IP was less than 200 lines of code. We had aimed at a solution of this size and elegance because many TCP/IP packages run to 10,000 or 20,000 lines of code in C (and this would use up all of our code budget on just one little subsystem).

          Modern anti-aliased text and graphics is another target that can use up lines of code very quickly. For example, the open source Cairo system (a comprehensibly done version of PostScript that is fast enough to be used for real-time interfaces) is about 44,000 lines of C code, most of which are various kinds of special case optimizations to achieve the desired speed. However, underlying Cairo (and most good graphics in the world) is mathematical model of sampling and compositing that should be amenable to our approach. A very satisfying result this year was to be able to make an “active math” system to carry out a hefty and speedy subset of Cairo in less than 500 LOC.”

          @akkartik, you be taking note too for bootstrapping. They’re the champions at this point. “How can we do better?” is the question to ask. I’m also still trying to tie my concepts with formal methods a bit where any construct I use already has a formal verification done that can be retroactively applied to my work. Compiled with stuff like CompCert or CakeML or extracted to stuff such as Frama-C or SPARK for automated checks. Stuff like that.

          1. 5

            Yes, I was watching VPRI and STEPS for many years. My initial enthusiasm has been dampened for two reasons:

            a) They seem to think the “steps toward the future of programming” all involve language hacking, that you can improve programming just by coming up the perfect DSL for each element of the system stack. But their results never persuaded me this was the case. For a few years now I’ve been chasing the opposite direction: since languages are fairly thick abstraction layers (it’s a big context switch for all programmers to jump from a program in one language to its translation in some other language, even if they know both languages well, regardless of how many lines it takes to implement the translator), a comprehensible system should have as few languages as possible. Minimizing languages also helps with security. Proliferating languages make security vulnerabilities hard to spot, and they make it easy to smuggle in malicious payloads into seemingly innocuous software.

            I care primarily about coming up with a comprehensible software stack; what I’ve started calling linear (as opposed to metacircular) bootstrapping (implementing a compiler without needing some other compiler, e.g. bcompiler, StoneKnifeForth, Amber) is just a tool toward that end. My current vision for a comprehensible stack is: a) minimize the number of languages intended for userland, and b) design a much larger number of internal languages used to gradually accrete these external languages, so that the thick abstraction of “compiler” is deconstructed into a more comprehensible set of steps.

            b) The projects that STEPS consists of always seemed pretty incoherent and slapped together. They never did get tied together into a coherent system as a lot of us had hoped. This is a common failure mode for research projects: the authors scatter when the funding runs out, and any good ideas they may have had are much less persuasive than they might have been with a little more persistence.

            Edit: Then again, perhaps I can contribute some persistence. I’ve been growing disenchanted lately with my plan for linear bootstrapping. What’s the point of growing a software stack all the way up from machine code, if your hardware interfaces are baked into proprietary ROMs? Seeking a concise set of primitives seems quixotic. Perhaps I’ll take a step back and try to build OMeta instead. It’s been on my radar for too long, and reading about it does me no good. Building it may help me appreciate it better.

            1. 2

              I do see your gripes, though, despite the edit. I agree they’ve been inconsistent and too language focused. One thing my time doing strong INFOSEC taught me is we need to start from hardware up. At the least, safe pointers, arithmetic, and control flow that supports clean language. Maybe optional, built-in, concurrent GC as well.

              Far as the languages, Im not worried about several long as their datatypes and calling conventions are easy to integrate, preferrably shared. It’s been beneficial so far for OpenVMS and then .NET CLR. Far as DSL’s themselves, many make the argument that libraries are basically as hard to learn as DSL’s so why not improve the syntax. I still see value in making them share as much syntax or semantics as possible, though. Plus building up layer by layer from simple to complex features.

              1. 3

                I know we chatted about Mu back in March, but just as a reminder, it already supports bounds-checking and safe pointers (using refcounting). It provides structured control flow in an Assembly-like syntax: functions are just sequences of instructions without any recursive expressions like a+b*c. It does all this in an extremely naive manner which has runtime costs. (I’m a disciple of DJB’s approach of building in safety before performance.) But as a result the implementation is very simple and easy to audit.

                I think my next step may be to build a really simple but unoptimized compiler for Mu, generating an ELF binary directly in the spirit of StoneKnifeForth. It’ll generate utterly unoptimized code, accessing memory for each instruction. Later I’ll think about how to let the programmer control register allocation using Mu’s metadata facility, e.g. saying x:num/R1 to get the number variable x to be allocated to register R1. This seems in the spirit of DJB’s qhasm.

                1. 2

                  I thought you were going to redo OpenBSD’s compiler or something like that. Sounded really ambitious. I’m glad you’re setting your sights to a realistic, interim goal. Yeah, the trace-based method reminded me of the successful use of that same method in formal verification with UntrustedProducer/TrustedChecker pattern. A lot of precedent there in one type of verification justifying you might get something out of another (i.e. testing). I also agree in Get it Right Then Fast.

                  It looked like a lot of source files in C++. Not ideal for bootstrapping even if the language and its features were interesting. I might revisit it anyway in the future to play around with it esp with the description you just gave me. Definitely some nice attributes with an interesting philosophy (esp syscall-level mockups for testing). You might want to start with a rewrite of your own tool, though, in a linear bootstrapping sort of way that stays clear of anything C++-like. I mean, you love this concept so much but haven’t applied it to your own project to see a cleaner, easier-to-compile implementation? (elbow to side) (twice)

                  I apologize if I overlooked that you had already done that with my quick [re-]skim of it. Just seems like a nice test case with just 60-80 estimated functions in main part in C++ plus whatever support functions that are used. I’d say you can ignore the external stuff as you might port it to Mu after core is bootstrapped or move on to more interesting stuff. The ELF binary sounds like one good project. You need to be familiar with that anyway for bootstrapping purposes.

                  Edit to add: Oh yeah, qhasm was neat. I found it and Linoleum looking for true, cross-platform assemblers. No, C isn’t one. ;) Those two were neat, though.

                  1. 2

                    I mean, you love this concept so much but haven’t applied it to your own project to see a cleaner, easier-to-compile implementation? (elbow to side) (twice)

                    Very funny. Not sure what concept you mean. Mu uses both trace tests and layers in the C++ level. But yes, I haven’t built a compiler for it yet.

                    If and when you decide to try it out, running Mu on Linux or Mac should take just 3 commands at the commandline:

                    $ git clone https://github.com/akkartik/mu
                    $ cd mu
                    $ ./mu edit

                    That’ll put you in an environment for writing programs described in my blog post. Since Mu requires zero dependencies beyond a vanilla install unix-like install (Xcode, port/brew and git on Mac OS), this really should be bulletproof. Let me know if you run into issues, or if you’d like some ideas for programs to try out. The repo has some example programs, which are described at the top of http://akkartik.github.io/mu.

                    1. 1

                      “Not sure what concept you mean.”

                      The concept of building it layer by layer in easy-to-understand, easy-to-compile language(s) possibly in style of linear bootstrapping. Instead, it’s C++. That language has the least tooling for verification among the popular incumbents with it being really hard to write compilers for. I usually have to recommend commercial stuff for C++ coders whereas lots of FOSS or just free options for C and Java. I was surprised you haven’t applied your favorite idea about building languages/compilers to your favorite, homebrew language slash platform. Not a bad thing or nothing but surprised as it’s a common move.

                      “If and when you decide to try it out…”

                      Appreciate the tips. Good you made it so easy to try out, too.

                      “# example program: add two numbers”

                      That was more interesting than it usually is. ;)

                      1. 2

                        Remember that I don’t care about formal verification :) C++ is crap, but at least it’s ubiquitous. I stick with the default dialect of C++ (C++98, almost 20 years old), and use really only two features beyond C, both to help avoid buffer overflows and heap corruption. So my programs run on most servers.

                        Mu was initially prototyped in (Arc) Lisp. This C++ version was intended to be another short milestone on the way down to still lower levels of abstraction. Sadly the next step has taken longer than I expected. I have tried to build it layer by layer in easy-to-understand C++. Try skimming a few layers: 000; 010; 011; 012; 020. And so on..

          2. 4

            First thing I thought of (although someone in the comments beat me to it): https://web.archive.org/web/20100105021419/peetm.com/blog?p=55