1. 21
    1. 9

      One other reason why we don’t use SAT solvers more - they are pretty opaque. If they can’t find a solution, you don’t really know why, and you don’t know if it would find a solution if you were to change how you expressed your clauses. You only get “solved” or “counterexample.”

      I am not a SAT expert, so if some feature like this does exist, someone please let me know. Because that would make me a lot more open to using them more.

      1. 4

        Yeah I think this is the main reason … Similar to why we almost always write parsers and type checkers by hand (ever look at TypeScript’s codebase :-P ?)

        In theory they can also be described with declarations, but in practice you need to add error messages, optimize specific cases, integrate with other pieces of software, etc.

        I would like to have seen more details about performance. Even if 99% cases are fast, it’s not good if 1% of cases blow up in performance, even if you get the right answer eventually.

        The linked paper give some color on that:


        Despite their dramatic impact, SAT solvers remain poorly understood. There are significant gaps in our theoretical understanding of why these solvers work as well as they do. This question of “why CDCL SAT solvers are efficient for many classes of large real-world instances while at the same time perform poorly on relatively small randomly-generated or cryptographic instances” has stumped theorists and practitioners alike for over two decades.

        This alone is a good enough reason not to use them for “real engineering”.

        It’s sort of like deep learning – it has bad engineering properties, and you can’t rely on it for things like self-driving cars. People don’t know why it works – only that it does sometimes – and they can’t control it. (See recent funny news stories about chat bots.)

        Another reason is that SAT solvers apparently only got fast somewhat recently, i.e. there was significant progress in the last 10 years. Ideas like algebraic data types and garbage collection took decades to become widely adopted too.

      2. 2

        SMT solvers are still used for a lot of things but the problem is a bit worse than you suggest. The outputs of solved or counterexample are fine. The unhelpful output is ‘solver timeout’, where you don’t know if it would be solved with a bit ore time, or what you need to do to prod the solver to make it easier to either prove or find a counterexample.

    2. 6

      I’ve been playing around with SAT encoding for generating and solving puzzles, it’s been interesting to improve it by many orders of magnitude. It would probably be faster with a custom solver, but it’s nice to have the flexibility to arbitrarily combine puzzle rules.

      It has taken some days, though, to come up with insights, and some tricks I gleaned from Knuth’s volume 4B. An example is the order encoding of integers, take n bits for an integer variable v and each bit i is 1 if v < i, this allows for fairly simple comparison-based constraints which propagate efficiently.

    3. 4

      I don’t know of knowledgeable people who say Python dependency management is “hard” – meaning “tools are slow” – due to NP-completeness. They do mention that the difficulty of solving for the requested dependencies can be a factor, but pretty consistently the biggest source of slowness in dependency-resolving tools is the fact that you have to, currently, download and unpack and inspect a full copy of every package involved, because there is no other way to obtain the dependencies.

      That is, if I request foolib==1.0, then the tool must download a copy of foolib 1.0 from PyPI, unpack it, read its dependency list, then for each item in the dependency list go download a copy of that package, unpack it, read its dependency list, and so on until it has the full dependency graph and can start selecting which versions it will actually install (which may not be the ones previously downloaded – suppose something depends on barlib>=2.1, and so the tool conservatively downloads barlib 2.1, only to later encounter a different dependency on barlib==2.2, etc.).

      Some tools have been working on ways to figure this out without needing to do full downloads (i.e., requesting only a range of bytes of a .whl package and hoping to get the metadata file), and there has been work on exposing dependency metadata via the PyPI API, so tooling won’t have to keep doing this laborious and slow dance of downloading packages just to obtain their metadata.

    4. 3

      Regarding the problem of converting the problem to SAT, it’s interesting that Picat does this very well! It’s a language derived from Prolog which ships with a SAT solver (PicatSAT) but also, the way it introduces the SAT solver in the language it’s so easy, that you still think you’re programming in a mix of imperative/functional programming.

      1. 2

        Picat is one of those languages I’ve always wanted to learn but haven’t had a motivating problem to actually learn it with. I usually learn languages by making small QoL tools and have yet to find a QoL constraint problem!

    5. 2

      Aside: I really like your use and styling of the <dfn> element.

      For others who, like me, are new to the <dfn> element: this how MDN describes <dfn>.

      The <dfn> HTML element is used to indicate the term being defined within the context of a definition phrase or sentence. The ancestor <p> element, the <dt>/<dd> pairing, or the nearest <section> ancestor of the <dfn> element, is considered to be the definition of the term.

    6. 2

      Another reason is that in practice you often don’t need the solution, you just need a sufficiently good solution.And often an approximate algorithm that can solve any practical problem in a reasonable amount of time exists.

    7. 1


      Is missing. It can work like this:

      First, let’s consider only decision problems (“is there a solution to this formula?”, rather “what is the solution for this formula?”). The two definitions we want to prove equivalent are:

      • Non-determinism: we allow Turing machine to for and try different alternatives in parallel. The answer is “yes”, if it is “yes” along at least a single branch (eg, for SAT, we try to set each variable to true and false, in parallel)
      • Verification: there exist a Turing machine which can confirm a “yes” given a suitable hint (e.g for SAT the hint is assignment of values to variables, and verification is evaluation of the formula).

      If we have a non-deterministic state machine, we can build a verifier. The hint would be the sequence of choices the NDTM does along some “winning” branch. The verifier than checks that this sequence of choices indeed leads to a win.

      If we have a verifier, we can build a non-deterministic state machine. It would non-determinnistically choose a hint (ie, fork for every bit of the hint), and then run a verifier, effectively, “for each hint”.

      Small print: we require both the depth of NDTM and the length of the hint to be polynomial relative to the size of the problem.

      1. 2

        Markdown is basically the worst. Pushed a new version that’s now working its way through cloudfront.

        1. 2

          Wow, congratulations on combining the two hardest things in computer science: caching, and nested lists in Markdown

      2. 1

        Nice thing about this construct is that it gives intuition for why “the hardest” NP problems are equally hard (why solving just one NP problem in polynomial time solves all NP problems in polynomial time). This is because we can build a universal verifier. Consider this problem:

        Given a Turing machine, is there a (polynomials-sized) input on which the machine returns “yes” in polynomial time?

        Note that this is not a halting problem: we explicitly bound the execution time of the Turing machine, so if it loops long enough, it’s a “no” rather than “we don’t know”.

        This problem is NP: indeed, the input would be a hint for the verifier. Alternatively, a non-deterministic Turing machine can just try all polynomially-sized inputs.

        It’s also easy to see that solving this problem, we can solve any NP problem. Indeed, for each specific NP problem (like a particular SAT formula) we can use a verifier for that problem with hard-coded input as an input to our universal verifier.

        So, yeah, we have one “hardest” NP problem. Its still a bit surprising that all kinds of random combinatorial stuff also turns out to be NP-complete. Here’s a sketch of how to get from this weird “is there an input which makes this Turing machine output ‘yes’?” to something more reasonable like a SAT.

        • we can encode one step of a Turing machine as a Boolean schema (value of each cell on a tape after the step can be describe by a formula which depends on the position of a head, and the current value of a cell).
        • by stacking N such schemas, we encode N steps. In particular, by stacking polynomial amount of such schemas, we get a polynomialy-sized schemas for the end state of the machine.
        • And that means “is there a solution for this Boolean schema” is also NP complete, and that one is almost a non-weird problem
        • the difference between Boolean schema and Boolean formula is that schemas allow re-using outputs several times. That’s a bit problematic, because naively converting schema to a formula would lead to an exponential blow up! But there’s a smarter way to do it, by introducing extra variables for intermediate outputs. After that, we get SAT, which is a totally non-weird problem.
        • and why SAT can be reduced to all kind of things is not really surprising

        To unwind this a bit, solving SAT turns out to be so powerful because, with SAT, you can encode arbitrary Turing machines. And “solving” such SAT means finding an input accepted by Turing machine. And that means that, if you can write a trivial algorithm for you problem X which just tries every solution one-by-one, you can encode the solution-checker inside the loop as a SAT-formula, and then ask the SAT solver to just point out a solution!

    8. 1

      @hwayne, I have heard the “package resolution is NP complete” before. I don’t doubt that it’s true, but curious if have any references / further reading which explains what this means. I’m interested in it.

      1. 4

        So, for what this means in general, I think links in the end of the post should be a good overview. But the TL;DR version is

        “dependency resolution is NP complete” means that, if you have an efficient (polynomial, something like O(n^20), rather than O(2^n)) algorithm for resolving dependencies, you can construct an efficient algorithm for solving boolean satisfiability (SAT), and efficiently solving that allows you to solve all kinds of hard problems in polynomial time.

        We will prove that dependency resolution is NP complete constructively. Namely, we will show how to, given an arbitrary formula, construct a set of constraints for dependencies, which has a solution only if the given formula has a solution.

        The precise formulation of SAT is given in the post. You have a logical formula like

        (A or !B or C) and (D or E or F) and (!A or !E or !C)

        Each variable can be either true or false, and you are wondering if there’s a variable assignment which makes the whole formula true.

        “dependency resolution” is a bit more vague, and actually depends on a specific package manager we are talking about. Indeed, some formulations of these problem can be solved efficiently. But most cases can’t.

        In particular, Cargo’s version is definitely NP complete. Cargo has a rather rich language for specifying dependencies(https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html), but we’ll use a tiny subset here. Namely, we only allow exact version requirements, and we always will use 1.x.0 version (only minor version varies).

        First, we have versions of packages, like A 1.0.0, A 1.1.0, B 1.2.0. Each version of the package specifies dependencies as version requirements. For example, A 1.0.0 can specify

        B = "=1.0.0"
        C = "=1.0.0, =1.1.0"

        This requirement means that, if we select A 1.0.0, wee must also select B 1.0.0 and either C 1.0.0 or C 1.1.0. That is, each package version can require a bunch of packages, and for each required package, it can require one of the specific versions.

        Not that different versions of the same package can have different dependencies.

        Now, here’s the actual dependency resolution problem: given the universe with different packages and versions, and a single root package version, is it possible to select a subset of package versions such that:

        • root package version is selected
        • dependencies of every selected package version are satisfied
        • for each package, there’s at most one version selected (Cargo allows selecting package multiple times with different major versions, but our major versions are always fixed at 1)


        I claim that this problem is not easier than SAT. I will reduce an arbitrary SAT formula to our dependencies problem.

        Let’s use (A or !B or C) and (!A or B) as our example formula.

        First, for each variable (V), I make two package versions, V 1.0.0 and V 1.1.0:

        A 1.0.0, A 1.1.0
        B 1.0.0, B 1.1.0
        C 1.0.0, C 1.1.0

        The semantics I am thinking about here is that A=true corresponds to selecting A 1.0.0, and A=false (!A) corresponds to A 1.1.0.

        To the root package, I add a requirement for each variable: V = "=1.0.0, =1.1.0". This way, I model the requirement to assign true or false (but not both) to each variable.

        Next, I create a package for each clause of the sat formula (clause is a bunch of (possibly negated) variables ored together. The whole formula is an and of a bunch of clauses. You can write arbitrary logic formula that way). If the clause consists of n variables, the corresponding package would have n versions. ith version would correspond to the case where the clause is true because its ith variable is true.

        So for clause A or !B or C I create a package c1 with the following versions and dependencies:

        c1 1.0.0:
          A = "=1.0.0"
        c1 1.1.0:
          B = "=1.1.0"
        c1 1.2.0
          C = "=1.0.0"

        I can now say that, for each clause, my root package requires one of its versions.

        All together, for (A or !B or C) and (!A or B) formula, I get the following set of packages:

        // _the_ root package
        root 1.0.0:
          // variables
          A = "=1.0.0, =1.1.0"
          B = "=1.0.0, =1.1.0"
          C = "=1.0.0, =1.1.0"
          // clauses
          c1 = "=1.0.0, =1.1.0, =1.2.0"
          c2 = "=1.0.0, =1.1.0"
        A 1.0.0: // empty set of deps
        A 1.1.0: 
        B 1.0.0: 
        B 1.1.0: 
        C 1.0.0: 
        C 1.1.0: 
        // first clause
        c1 1.0.0:
          A = "=1.0.0"
        c1 1.1.0:
          B = "=1.1.0"
        c1 1.2.0
          C = "=1.0.0"
        // second clause
        c2 1.0.0:
          A = "=1.1.0"
        c2 1.1.0:
          B = "=1.0.0"

        Now, if I find a solution to this dependency problem (a selection of package versions), I can extract a solution to the formula from it. For example, (A=false, B=true, C=true) corresponds to selecting root 1.0.0, A 1.1.0, B 1.0.0, C 1.0.0, c1 1.2.0, and c2 1.1.0. Or, generalizing, if I have a polynomial time algorithm which solves arbitrary dependency problem, I can use that to solve, in polynomial time, arbitrary SAT problem.

        An important thing to note here is that the dependency setup here is super-contrived, and looks nothing like real deps in real projects. This makes sense – dependency specifications that people actually write are usually easily solved.

        For example, most of the time people use semver open constraints ^1.2.3. If all constraints are semver open, then a gready algorithm which always picks the maximal versions always finds a solution!

        1. 2

          Turns out, I don’t know how Cargo’s requirements work :-) Comma separated requirements in Cargo mean and rather than or. So while the above is correct (I hope) with respect to reduction of two stated problem, it’s incorrect that dependency problem models how Cargo works. It’s easy to fix though: rather than writing "=1.0.0, =1.1.0" one can just write "*", which is even simpler. In this cases, we always want any of the existing versions.

        2. 1

          Great answer, thanks!

      2. 4

        I first heard this ~10 years ago … I wondered why I was having so much trouble installing Ubuntu packages – i.e. shit would randomly break – and why there was seemingly no notion of correctness at all.

        Also I remember trying to reverse engineer Debian’s packaging then, and realizing there are actually some base packages hard-coded in the code! i.e. there is no clear separation between Debian’s package lists and the package manager!

        I stumbled across the EDOS project, which studied Linux distros in a formal way:


        I think this may have been the paper I found:

        Managing the Complexity of Large Free and Open Source Package-Based Software Distributions


        Section 3.1 is “Encoding the Installability Problem as a SAT Problem”, which proves its NP-complete.

        There are some more references at the end here, EDOS project and otherwise: