There are a lot of definition errors which may also be factual errors in this article. I feel this could lead to a lot of confusion (which leads to more articles including that confusion which would lead to more confused readers, …).
tl;dr We’ve typically considered it a deal-breaker to discover that an algorithm we care about is NP-hard.
Algorithms are not NP-hard, problems are. You can have more than one algorithm (quicksort, mergesort, bogosort) for the same problem (sorting).
Algorithms have inherent complexity (worst-case running time among others). Problem have inherent difficulty. Algorithms do not have any associated notion of difficulty.
If you make the substitution “algorithm is/isn’t NP-hard” with “problem is/isn’t NP-hard”, a lot of the sentences makes much more sense.
A problem in NP is hard to solve
No, not all problems in NP are hard. In fact, P is contained in NP (and we just said all of those problems are easy).
NP-complete (which is not the same thing as NP) problems are (presumably) hard to solve. NP-hard (which is also not the same thing as NP and not the same thing NP-complete) problems are also (presumably) hard to solve, possibly harder than the NP-complete ones.
NP problems are those which can be solve in polynomial time on a non-deterministic machine. If fact, its in the name “Non-deterministic polynomial”. A non-deterministic machine is a regular machine with an extra instruction: a special superfork() that make a copy of the machine (and puts a copy of the process in each forked machine like regular fork).
NP-complete problems are problems in NP that are also “as hard as” any problem in NP.
NP-hard problems are those that are at least “as hard as” any problem in NP but not (necessarily) known to be themselves in NP. (Or rather, being NP-hard makes no statement of whether the problem is in NP or not.)
I would argue that there’s a third possibilty, which is to just…shrug. NP-hard is a worst-case bound on your algorithm. It might only apply to “pathological” inputs.
This is a really bad conclusion.
By analogy, you could also say “the library’s test don’t pass but people might not be using it in the manner of those pathological test cases”. Or “these asserts fail but maybe the program state fixes itself in all but the pathological cases”. The level of (un)certainty is comparable.
Instead what you are supposed to do is exactly what’s discussed for the Go ecosystem.
While researching this problem for the Go ecosystem, Russ Cox pointed out that you can simplify the problem by prohibiting certain kinds of negative dependencies
You should describe the actual problem more accurately. Maybe this particular refinement of the problem statement now undershoot your needs (the original one probably overshot by a lot). But now the task is to refine it further (or refine from an earlier version).
For example, at most 100 negative dependencies are allowed.
There are actually many other things to try in the face of NP-hardness. And from experience, when those all fail, you’re never so “lucky” that the input you care about isn’t pathological. So this kind of luck can usually be explained so that you are once more certain instead of running your program and hoping for the best.
No, not all problems in NP are hard. In fact, P is contained in NP (and we just said all of those problems are easy).
Even this interpretation of P=easy is dubious. The class P is huge, and it contains very difficult problems that cannot be solved by algorithms that run faster than O(n^(googolplex)). The fact that a problem is in P does not mean that it can be solved in practice; and by the same reason a problem in NP does not mean that a candidate solution can be verified easily. It is true that many of the well-known polynomial algorithms run in linear or (at worst) quadratic time, but this is a limitation of our current knowledge, not of class P.
Before reading the article, I’ll quickly note that almost everything in hardware development/synthesis is NP-hard. That hasn’t stopped them from cranking out useful, profitable stuff. Their “compilers” just run a lot slower with the result being more time between iterations of specific components and/or more money spent on compute clusters to reduce that time. For some domains, it theoretically could be a deal-breaker to use NP-hard algorithms on anything iteration speed depended on. I suspect in many cases one could use a quick-and-dirty solution for the app development with the NP-hard version dropped in and tested after it’s synthesized. Even that part might be semi-automated where a command or annotation tells whatever does the NP-hard part where to drop the result. Doing this for certified vs regular vs superoptimized compilation was part of my Brute-Force Assurance concept.
(reads article)
One other thing about worst-case performance people often neglect: there’s nothing stopping you from using two algorithms. It seems obvious but programmers seem to obsess about all eggs in one basket solutions. I learned about this trick with QuickSort vs HeapSort. QuickSort was faster on average with terrible worst case performance. HeapSort slower on average with no worst case. The proposed solution was to use QuickSort, observe the time it takes, notice if it’s dragging out way past the average, cancel it if it is, and switch to HeapSort. Now, you go from worst case to two averages combined or something like that.
You can make this idea more general by defining a way to monitor for worst case or just unacceptable performance on any NP-hard algorithms switching upon failure to something suboptimal that at least works. Alternatively, go “human-in-the-loop” to let a programmer/admin decide how to proceed once failure is detected. I’m not sure you can always detect failures with simple, efficient monitors. You can with many, though.
On the timing stuff, this is one reason I love watchdog timers in embedded chips. All CPU’s should have them both for kernel and individual apps. That way, a check for something like QuickSort vs HeapSort doesn’t waste any CPU budget: dedicated, low-resource hardware is already watching. Hardware, bounds checking can do something similar to detect size blowup on top of benefits like stopping buffer overflows or out-of-bounds access for pointers.
There is a lot of interesting research in portfolio algorithms, especially in areas related to combinatorial problems such as SAT solving and constraint programming.
Solvers are either run in parallel or some heuristic is used to switch between them. Sometimes solvers are used independently, and sometimes information from one solver can be sent to another solver. Sometimes algorithms and heuristics are used to analyse the problem instance to make an informed choice of the right solver to use.
Please do not do this. You are confusing too many things here. You probably want to say exponential behavior is a worst case bound. NP-hard is a complexity class to which a problem belongs, not an algorithm bound.
What fascinates me about complexity theory, is how easy we can so from “yeah sure no problem” to “the universe will be dead by then”.
Shortest path from A to B, “yeah sure no problem”.
Optimal city round trip; “better make some tea, this is going to take a while” (if you discard the “good enough” solutions).
Same goes for computability theory, learning that “there is no way on earth, that you will get a result in finite time for all possible inputs” was such an mind opening experience, I wish I’d understand more of it.
There are a lot of definition errors which may also be factual errors in this article. I feel this could lead to a lot of confusion (which leads to more articles including that confusion which would lead to more confused readers, …).
Algorithms are not NP-hard, problems are. You can have more than one algorithm (quicksort, mergesort, bogosort) for the same problem (sorting).
Algorithms have inherent complexity (worst-case running time among others). Problem have inherent difficulty. Algorithms do not have any associated notion of difficulty.
If you make the substitution “algorithm is/isn’t NP-hard” with “problem is/isn’t NP-hard”, a lot of the sentences makes much more sense.
No, not all problems in NP are hard. In fact, P is contained in NP (and we just said all of those problems are easy).
NP-complete (which is not the same thing as NP) problems are (presumably) hard to solve. NP-hard (which is also not the same thing as NP and not the same thing NP-complete) problems are also (presumably) hard to solve, possibly harder than the NP-complete ones.
NP problems are those which can be solve in polynomial time on a non-deterministic machine. If fact, its in the name “Non-deterministic polynomial”. A non-deterministic machine is a regular machine with an extra instruction: a special super
fork()
that make a copy of the machine (and puts a copy of the process in each forked machine like regularfork
).NP-complete problems are problems in NP that are also “as hard as” any problem in NP.
NP-hard problems are those that are at least “as hard as” any problem in NP but not (necessarily) known to be themselves in NP. (Or rather, being NP-hard makes no statement of whether the problem is in NP or not.)
This is a really bad conclusion.
By analogy, you could also say “the library’s test don’t pass but people might not be using it in the manner of those pathological test cases”. Or “these asserts fail but maybe the program state fixes itself in all but the pathological cases”. The level of (un)certainty is comparable.
Instead what you are supposed to do is exactly what’s discussed for the Go ecosystem.
You should describe the actual problem more accurately. Maybe this particular refinement of the problem statement now undershoot your needs (the original one probably overshot by a lot). But now the task is to refine it further (or refine from an earlier version).
For example, at most 100 negative dependencies are allowed.
There are actually many other things to try in the face of NP-hardness. And from experience, when those all fail, you’re never so “lucky” that the input you care about isn’t pathological. So this kind of luck can usually be explained so that you are once more certain instead of running your program and hoping for the best.
Even this interpretation of P=easy is dubious. The class P is huge, and it contains very difficult problems that cannot be solved by algorithms that run faster than O(n^(googolplex)). The fact that a problem is in P does not mean that it can be solved in practice; and by the same reason a problem in NP does not mean that a candidate solution can be verified easily. It is true that many of the well-known polynomial algorithms run in linear or (at worst) quadratic time, but this is a limitation of our current knowledge, not of class P.
Before reading the article, I’ll quickly note that almost everything in hardware development/synthesis is NP-hard. That hasn’t stopped them from cranking out useful, profitable stuff. Their “compilers” just run a lot slower with the result being more time between iterations of specific components and/or more money spent on compute clusters to reduce that time. For some domains, it theoretically could be a deal-breaker to use NP-hard algorithms on anything iteration speed depended on. I suspect in many cases one could use a quick-and-dirty solution for the app development with the NP-hard version dropped in and tested after it’s synthesized. Even that part might be semi-automated where a command or annotation tells whatever does the NP-hard part where to drop the result. Doing this for certified vs regular vs superoptimized compilation was part of my Brute-Force Assurance concept.
(reads article)
One other thing about worst-case performance people often neglect: there’s nothing stopping you from using two algorithms. It seems obvious but programmers seem to obsess about all eggs in one basket solutions. I learned about this trick with QuickSort vs HeapSort. QuickSort was faster on average with terrible worst case performance. HeapSort slower on average with no worst case. The proposed solution was to use QuickSort, observe the time it takes, notice if it’s dragging out way past the average, cancel it if it is, and switch to HeapSort. Now, you go from worst case to two averages combined or something like that.
You can make this idea more general by defining a way to monitor for worst case or just unacceptable performance on any NP-hard algorithms switching upon failure to something suboptimal that at least works. Alternatively, go “human-in-the-loop” to let a programmer/admin decide how to proceed once failure is detected. I’m not sure you can always detect failures with simple, efficient monitors. You can with many, though.
On the timing stuff, this is one reason I love watchdog timers in embedded chips. All CPU’s should have them both for kernel and individual apps. That way, a check for something like QuickSort vs HeapSort doesn’t waste any CPU budget: dedicated, low-resource hardware is already watching. Hardware, bounds checking can do something similar to detect size blowup on top of benefits like stopping buffer overflows or out-of-bounds access for pointers.
There is a lot of interesting research in portfolio algorithms, especially in areas related to combinatorial problems such as SAT solving and constraint programming.
Solvers are either run in parallel or some heuristic is used to switch between them. Sometimes solvers are used independently, and sometimes information from one solver can be sent to another solver. Sometimes algorithms and heuristics are used to analyse the problem instance to make an informed choice of the right solver to use.
I’ll go ahead with this submission since you brought that up. :)
Please do not do this. You are confusing too many things here. You probably want to say exponential behavior is a worst case bound. NP-hard is a complexity class to which a problem belongs, not an algorithm bound.
Good enough solutions are often good enough.
worse is better!
What fascinates me about complexity theory, is how easy we can so from “yeah sure no problem” to “the universe will be dead by then”.
Shortest path from A to B, “yeah sure no problem”. Optimal city round trip; “better make some tea, this is going to take a while” (if you discard the “good enough” solutions).
Same goes for computability theory, learning that “there is no way on earth, that you will get a result in finite time for all possible inputs” was such an mind opening experience, I wish I’d understand more of it.