NP-Completeness

We’ve seen many problems that can be solved quickly (in close to linear time , or at least a time that is some small polynomial function of the input size ~ O(n^{k})).

NP-completeness is states something differently. It gives evidence that many important problems can’t be solved quickly.

## Why should we care?

These NP-complete problems really come up all the time. Knowing they’re hard lets you to opt for alternatives instead of searching an overly optimized solution:

1.Use a **heuristic.** {solving fractions of problem in good O(n)}

2.Solve the **problem approximately instead of exactly**

3.Use an **exponential time solution** anyway.

4.Choose a **better abstraction**.- Add important details.

## Classification of problems

The subject of (computational complexity theory) is dedicated to classifying problems by how hard they are. There are many different classifications; some of the most common and useful are the following.

1.**P.** Problems that can be solved in **polynomial time**. (“P” stands for polynomial.)

2.**NP. **This stands for “**nondeterministic polynomial time**” where nondeterministic just means guessing a solution. A problem is in NP if you can quickly (in polynomial time) test whether a solution is correct (without worrying about how hard it might be to find the solution). Problems in NP are still relatively easy: if only we could guess the right solution, we could then quickly test it.

NP does not stand for “non-polynomial”. There are many complexity classes that are much harder than NP.

1.**PSPACE.** Problems that can be solved using a reasonable amount of memory (again defined formally as a polynomial in the input size) without regard to how much time the solution takes.

2.**EXPTIME.** Problems that can be solved in exponential time. This class contains most problems you are likely to run into, including everything in the previous three classes.

3.**Undecidable**. For some problems, we can prove that there is no algorithm that always solves them, no matter how much time or space is allowed. One very uninformative proof of this is based on the fact that there are as many problems as there real numbers, and only as many programs as there are integers, so there are not enough programs to solve all the problems. But we can also define explicit and useful problems which can’t be solved.

NP-completeness theory is concerned with the distinction between the first two classes, P and NP.

## Examples of problems in different classes

**Example 1: Long simple paths**.

A ** simple path **in a graph is just one without any repeated edges or vertices. To describe the problem of finding long paths in terms of complexity theory, we need to formalize it as a yes-or-no question: given a graph

**G**, vertices

**s**and

**t,**and a number k, does there exist a

**simple path from s to t**with at least

**k edges**? A solution to this problem would then consist of such a path.

Why is this in NP? If you’re given a path, you can quickly look at it and add up the length, double-checking that it really is a path with length at least k. This can all be done in linear time, so certainly it can be done in polynomial time.

However we don’t know whether this problem is in P; I haven’t told you a good way for finding such a path (with time polynomial in m,n, and K).

There are algorithms that solve the problem; for instance, list all 2^m subsets of edges and check whether any of them solves the problem. But there is no algorithm that runs in polynomial time.

**Example 2: Cryptography**.

Suppose we have an encryption function e.g.

code=RSA(key,text)

The “RSA” encryption works by performing some simple integer arithmetic on the code and the key, which consists of a pair (p,q) of large prime numbers. One can perform the encryption only knowing the product pq; but to decrypt the code you instead need to know a different product, (p-1)(q-1).

A standard assumption in cryptography is the “known plaintext attack”: we have the code for some message, and we know (or can guess) the text of that message. We want to use that information to discover the key, so we can decrypt other messages sent using the same key.

Formalized as an NP problem, we simply want to find a key for which code=RSA(key,text). If you’re given a key, you can test it by doing the encryption yourself, so this is in NP.

The hard question is, how do you find the key? For the code to be strong we hope it isn’t possible to do much better than a brute force search.

**Example 3: Chess**

**Example 4: Halting problem**.

Suppose you have written your program, and start to run it. After five minutes, it is still going. Does this mean it’s in an infinite loop, or is it just slow?

It would be convenient if your compiler could tell you that your program has an infinite loop. However this is an undecidable problem: there is no program that will always correctly detect infinite loops.

Some people have used this idea as evidence that people are inherently smarter than computers, since it shows that there are problems computers can’t solve. Here’s an example:

main()

{

int x = 3;

for (;;) {

for (int a = 1; a <= x; a++)

for (int b = 1; b <= x; b++)

for (int c = 1; c <= x; c++)

for (int i = 3; i <= x; i++)

if(pow(a,i) + pow(b,i) == pow(c,i))

exit;

x++;

}

}

This program searches for solutions to Fermat’s last theorem.

## Problems of complexity theory

The most famous open problem in theoretical science is whether P = NP.

**“If it’s always easy to check a solution, should it also be easy to find the solution?”**

So how does this theory tell us how hard any particular problem is to solve?

## NP-completeness

The theory of NP-completeness is a solution to the practical problem of applying complexity theory to individual problems. NP-complete problems are defined in a precise sense as the hardest problems in P. Even though we don’t know whether there is any problem in NP that is not in P, we can point to an NP-complete problem and say that if there are any hard problems in NP, that problems is one of the hard ones.

So if we believe that P and NP are unequal, and we prove that some problem is NP-complete, we should believe that it doesn’t have a fast algorithm.

Most problems we’ve looked at in NP turn out either to be in P or NP-complete. So the theory of NP-completeness turns out to be a good way of showing that a problem is likely to be hard, because it applies to a lot of problems. But there are problems that are in NP, not known to be in P, and not likely to be NP-complete; for instance the code-breaking example .

## Reduction

Formally, NP-completeness is defined in terms of “reduction” which is just a complicated way of saying one problem is easier than another.

We say that A is easier than B, and write A < B, if we can write down an algorithm for solving A that uses a small number of calls to a subroutine for B.

Then **if A < B, and B is in P, so is A**: we can write down a polynomial algorithm for A by expanding the subroutine calls to use the fast algorithm for B.

So “easier” in this context means that if one problem can be solved in polynomial time, so can the other. It is possible for the algorithms for A to be slower than those for B, even though A < B.

As an example, consider the Hamiltonian cycle problem. Does a given graph have a cycle visiting each vertex exactly once? Here’s a solution, using **longest path** as a subroutine:

for each edge (u,v) of G

if there is a simple path of length n-1 from u to v

return yes // path + edge form a cycle

return no

This algorithm makes m calls to a longest path subroutine, and does O(m) work outside those subroutine calls, so it shows that** Hamiltonian cycle < longest path. **

## Cook’s Theorem

We are now ready to formally define NP-completeness. We say that a problem A in NP is NP-complete when, for every other problem B in NP, B < A.

This seems like a very strong definition. After all, the notion of reduction we’ve defined above seems to imply that if B < A, then the two problems are very closely related; for instance Hamiltonian cycle and longest path are both about finding very similar structures in graphs. Why should there be a problem that closely related to all the different problems in NP?

**Theorem: **An NP-complete problem exists.

We prove this by example. One NP-complete problem can be found by modifying the halting problem .

**Bounded halting**. This problem takes as input a program X and a number K. The problem is to find data which, when given as input to X, causes it to stop in at most K steps.

To be precise, this needs some more careful definition: what language is X written in? What constitutes a single step? Also for technical reasons K should be specified in *unary *notation, so that the length of that part of the input is K itself rather than O(log K).

For reasonable ways of filling in the details, this is in NP: to test if data is a correct solution, just simulate the program for K steps. This takes time polynomial in K and in the length of program.

To finish the proof that this is NP-complete, we need to show that it’s harder than anything else in NP. Suppose we have a problem A in NP. This means that we can write a program PA that tests solutions to A, and halts within polynomial time p(n) with a yes or no answer depending on whether the given solution is really a solution to the given problem. We can then easily form a modified program PA’ to enter an infinite loop whenever it would halt with a no answer. If we could solve bounded halting, we could solve A by passing PA’ and p(n) as arguments to a subroutine for bounded halting. So A < bounded halting. But this argument works for every problem in NP, so bounded halting is NP-complete.

## How to prove NP-completeness in practice

The proof above of NP-completeness for bounded halting is great for the theory of NP-completeness, but doesn’t help us understand other more abstract problems such as the Hamiltonian cycle problem.

Most proofs of NP-completeness don’t look like the one above; it would be too difficult to prove anything else that way. Instead, they are based on the observation that if A < B and B < C, then A < C.

As a consequence of this observation, if A is NP-complete, B is in NP, and A < B, B is NP-complete. In practice that’s how we prove NP-completeness: We start with one specific problem that we prove NP-complete, and we then prove that it’s easier than lots of others which must therefore also be NP-complete.

So e.g. since Hamiltonian cycle is known to be NP-complete, and Hamiltonian cycle < longest path, we can deduce that longest path is also NP-complete.

*What steps do we have to take to prove a **problem P is NP-Complete?*

1. Prove **P Ε**** NP **

2. Pick a known NP-Complete problem Q

3.1Reduce Q to P

3.2 Describe a transformation that maps instances of Q to instances of P, => “yes” for P = “yes” for Q

3.3 Prove the transformation works

4. Prove it runs in polynomial time

{This is a modified version of a lecture document, which was previously published here.}