1. 6

Abstract: “The purpose of this dissertation is to provide constructive proof that the logic programming language Prolog can be implemented an order of magnitude more efficiently than the best previous systems, so that its speed approaches imperative languages such as C for a significant class of problems. The driving force in the design is to encode each occurence of a general feature of Prolog as simply as possible. The resulting system, Aquarius Prolog, is about five times faster than Quintus Prolog, a high-performance commercial system, on a set of representative programs. The design is based on the following ideas:

  1. Reduce instruction granularity. Use an execution model, the Berkeley Abstract Machine (BAM), that retains the good features of the Warren Abstract Machine (WAM), a standard execution model for Prolog, but is more easily optimized and closer to a real machine.

  2. Exploit determinism. Compile deterministic programs with efficient conditional branches. Most predicates written by human programmers are deterministic, yet previous systems often compile them in an inefficient manner by simulating conditional branching with backtracking.

  3. Specialize unification. Compile unification to the simplest possible code. Unification is a general, pattern-matching operation that can do many things in the implementation: pass parameters, assign values to variables, allocate memory, and do conditional branching.

  4. Dataflow analysis. Derive type information by global dataflow analysis to support these ideas. Because of the limitations of the dataflow analysis, the system is not yet competitive with the C language for all programs. I outline the work that is needed to close the gap.”

  1.  

  2. 2

    Early on, in its description of prolog, this paper mentions that other research has been done on parallelism (and I specifically recall 1983’s Artificial Intelligence in Prolog describing implicit parallel execution as a possible form of optimization, with resolution done sort of like speculative execution – cut branches being discarded). However, determining whether or not a branch can be executed in parallel doesn’t appear in the overviews of data flow analysis. Does Aquarius perform any kind of parallel or speculative execution? (Is it limited to their VLSI-BAM architecture?) I haven’t finished the paper yet, so maybe it’s buried in the later pages.

    1. 1

      It doesn’t mention parallelism in the sections on performance, future work, or the website. I don’t think it’s designed for that. This makes sense given he was doing 200+ pages of work on mapping logic to high-performance primitives and VM. That’s the kind of thing you figure out before trying to do those things in parallel. The future work section also looks like some things students could tackle for Master’s or PhD projects.

      I kept this as his techniques might still be valuable in later projects for high-performance Prolog. Especially, a Prolog or non-Prolog way to do verified, executable specs that perform well. For high-performance logic computing, Mercury is the most state-of-the-art system that’s actively developed. It also has commercial use.

      Edit: Here’s you a survey paper on parallel execution of logic programs, too.

      1. 1

        Yeah, it seems to be all compiler optimization type stuff. Today, parallelism is a pretty cheap way to get better performance (and it was certainly on the map in 1990 – the authorship of the Aquarius paper overlaps with the Fifth Generation Computer project in Japan, which bet heavily on both parallelization and custom hardware, and outside of logic you had stuff like the Connection Machine), but I guess it wasn’t considered as cheap in those days than even fabricating your own chip.

        Thanks for the survey paper. The discussions I’ve seen of parallelism in logic programming have largely been non-academic and not terribly detailed :)

        1. 1

          The Fifth Generation project was one of biggest failures in history of computing. It failed about as big as AI itself before the AI winter. Cool as it was as a concept, I think the author was wise in just trying to make better optimizing compilers for commodity CPU’s. Glad you found the survey paper more informative than what had to be the same weak discussions I was seeing online about it. Mostly people speculating instead of doing real research or builds.

          One thing I don’t mention a lot in my Brute-Force Assurance descriptions is it was originally for performance as well as assurance. Changing the form of a program can create optimization opportunities. One branch of the idea was having a few, high-level ways of describing an algorithm which got converted into various languages or libraries great for parallelism for SIMD, multi-core, NUMA, DSM, and message-passing clusters. You describe it once with efficient implementations synthesized in all those forms so you can evaluate their performance to see what makes the most sense. There’s also languages like Cray’s Chapel flexible enough that they cover a lot of that by themselves. Although synthesis has inefficiencies, I think the synthesized results would still show whether the algorithm was instrinsically easy or difficult to speed up where failures may indicate one should use different algorithm or structuring.

          Reason I’m bringing it up is that one could probably run through a lot of experiments on parallelizing Prolog by just describing each idea (eg in survey paper) in such a language or tool measuring the results on lots of example programs. The primitives or strategies that worked the best are used in the custom implementation of parallel Prolog. I’ve read a lot of papers where they go straight to trying to build custom stuff which uses up lots of effort. I think prototyping in stuff like Chapel stringing together primitives from a FFI or something would make a better default in such explorations. Just a hypothesis, though, as I haven’t tested it.

          1. 2

            The Fifth Generation project was one of biggest failures in history of computing.

            My understanding was that the failure was mostly down to a focus on specialized hardware. (In other words, parallelism on clusters of commodity CPUs was not on their plate, on the grounds that the idea wasn’t feasable at the time when the project was announced in the early 80s, even though by the late 80s it would have been.) However, information about the project in english seems pretty scarce – I’ve never really been able to figure out what all they were doing, other than homebrewing new architectures with late-70s tech and breaking compatibility with prolog!

            what had to be the same weak discussions I was seeing online about it

            I haven’t even been seeing these things online, to be honest. I own a couple books about expert systems written in the early 80s, and they all mention parallelizing prolog implicitly via parallel speculative execution on multi-CPU setups as a promising future direction.

            one could probably run through a lot of experiments on parallelizing Prolog by just describing each idea (eg in survey paper) in such a language or tool measuring the results on lots of example programs. The primitives or strategies that worked the best are used in the custom implementation of parallel Prolog.

            I worry that keeping compatibility with existing example prolog programs might make certain otherwise promising strategies too complex to implement. For instance, while Aquarius uses static analysis to figure out whether or not a predicate could be determinate, that’s a lot more complicated than doing what Church does and requiring determinacy markers in ambiguous situations (and I might even argue in favor of just relying on determinacy tagging for non-anonymous preds for early versions). Likewise, speculative execution is complicated but that kind of behavior is only necessary for simulating sequential evaluation of disjunt definitions; while existing prolog programs lean heavily upon execution order and optimize based on an understanding of execution order, languages like Miranda require guards on all disjunct definitions and so all paths can be evaluated simultaneously in most cases, with incentives to ‘fail early’ baked into the guard format (vs prolog, where preds could backtrack due to the failure of the last term). If I was going to try a bunch of policies, I would break assumptions like sequence in predicates not containing a cut in order to make the implementation easier, then determine whether or not it’s worthwhile to retrofit compatibility with the original behavior. The core prolog community is small enough that a really interesting prolog-derivative that breaks compatibility could probably overtake it in size (and that’s already happened with Erlang, though the domain is different).

            1. 1

              That approach makes sense. Interesting, too.