tl;dr: the computation that they are doing is computing every value of A (Alice’s computed key sent in the clear) given every value of a (Alice’s 1024 bit secret key), a prime p, and every prime g that works with p, they then throw it in this equation: A = (g^a) mod p. Then when they see A in the wild and they have it in their db, they can determine it’s a and then compute the shared key (they know everything Alice knows) and listen in. It’s slow because you still have to compute (2^1024)*(number of possible g values) values of A and set up a db that can hold those values and query quickly. It’s a lot of everything, compute, storage, time and power.

How DH works

Given 2 private keys for Alice and Bob a and b, each person sends their computed key, for Alice: A = (g^a)mod p (where p is the important prime and g is a prime that ‘fits’ with p) and they compute the shared key. In Bob’s case, he recieves A from Alice and computes K as (A^b) mod p. Then they have a secret key they can use for symmetric encryption. See: http://mathworld.wolfram.com/Diffie-HellmanProtocol.html

What the NSA might do

If you are eavesdropping, you get g,p,A,B,K which you can then use to solve for a or b by solving A = (g^a) mod p. This requires a discrete log to solve (which is hard like factoring, and has a similar solution) which makes it unfeasible to solve for a before the sun explodes. So the paper shows that you can precompute (with the 2^1024 possible as) a list of A based on a chosen common p, which have all g values already calculated (this is a limited, relatively small number). Then you are watching traffic and you see a keys A and B passed with the correct g and p pair, you do a lookup and you have a. Using that you compute the shared key (as if you were Alice) and then you read their conversation.

Why they could do it

First, finding 11^291 is faster than getting the discrete log of 11^k = 1.109868e+303 so the precomputation is feasible (as in faster than the lifetime of the universe) but still takes ~year on the theoretical hardware. Second, p is restricted because of certain primes being ‘weak’, which is that solving for the discrete log of those primes is ‘easier’, and in order to be ‘safe’ a prime p must have the property that (p-1)/2 is prime. This makes computation a little bit more annoying than just finding primes, which is more annoying than using a list of primes. In fact there is an RFC with ‘good’ primes to use for certain key lengths: http://tools.ietf.org/html/rfc3526 . These common primes are well known, and even if they weren’t publicized, the NSA could watch the beginning of all the DH exchanges they see on the internet and build up a list of most common primes based on number of users then prioritize breaking those that would impact the most users.

What can you do

Ask nicely that people make sure they aren’t using 1024 bit DH and just bump it up to 2048 bits, this greatly increases the precomputation needed to an unreasonable amount. Additionally it doesn’t require you to waste more time and compute making a bunch of primes (which will eventually be common). This also multiplies the number of calcs by 2^1024 so it’s redonk

The process begins by having the two parties, Alice and Bob, agree on an arbitrary starting color that does not need to be kept secret; in this example the color is yellow. Each of them selects a secret color—red and aqua respectively—that they keep to themselves. The crucial part of the process is that Alice and Bob now mix their secret color together with their mutually shared color, resulting in orange and blue mixtures respectively, then publicly exchange the two mixed colors. Finally, each of the two mix together the color they received from the partner with their own private color. The result is a final color mixture (brown) that is identical with the partner’s color mixture.

If another party (NSA) had been listening in on the exchange, it is computationally difficult for that person to determine the common secret color; in fact, when using large numbers rather than colors, this action is impossible for modern supercomputers to do in a reasonable amount of time.

Now imagine the colors as being prime numbers. (Source)

It turns out that verifying whether a given prime is appropriate for Diffie-Hellman is rather slow. This means that, if you accept a prime supplied by someone else (e.g. whoever you’re communicating with), and you don’t carefully verify the prime, then you’re at risk of ending up with a maliciously chosen weak group and/or generator, which results in weak encryption.

The paper doesn’t suggest that primes are being mailciously chosen. It’s just that a small number of fixed primes are used for a large percentage of internet communications, and the authors believe that breaking about one DH-1024 prime a year is now within the capabilities of an adversary with state-level resources. Therefore it’s plausible that the NSA has broken, say, 2-3 of the most widely used primes, if one assumes (as some circumstantial evidence suggests) that they may have spent $millions building specialized hardware to do so, sometime in the past few years.

The easiest solution is to switch everyone over to still using pre-chosen primes, but 2048-bit ones, which are not remotely feasible to break (unless DH itself is totally broken).

The issue (and this is not explained in the post) is why such different technologies share a small number of primes.

Basically, it’s difficult to find a prime big enough and it’s easier to reuse existing ones. When I was into crypto a long time ago this was known, but I never thought the pool of primes in the market is so reduced.

Obviously having more primes to select will help (as well as moving to bigger ones), but I fear that if the number of primes is so small because cost issues, we’ll repeat the same problem when moving to 2048-bit ones.

It’s not, though! It’s not remotely as computationally trivial as looking one up, but we have primality tests that run in O(log(n)^c) for some reasonable c, and by the prime number theorem approximately one in every log(n) numbers is prime, so finding a large prime is entirely tractable, even on underpowered hardware like phones or embedded systems. Obviously for a lot of hardware it’s too slow to perform on-demand for interactive sessions, but surely you could pregenerate a handful of unique ones and keep them around for when you need them?

I stand corrected… I had no idea we had advances in fast non-probabilistic tests since 2000. Anyway, is there a paper discuss how easy/difficult is to implement it?

I don’t know specific papers, no. Also, to the best of my knowledge, most implementations actually use the Fermat primality test, which is probabilistic, but with enough rounds of checking to push the probability of accidentally selecting a composite number below the probability of e.g. a cosmic ray corrupting the generated prime before it’s used.

The relevant algorithms are not difficult to implement, in any case. I wrote a toy RSA implementation for an undergraduate number theory class.

Unfortunately AKS is not something we can use today. In 6.2 Counclusions, page #32:

In its original version, the algorithm of Agarwal, Kayal, and Saxena is too slow
to be of practical use. The hidden constant factors and the actual asymptotic
bounds conspire to make the algorithm intractable for even modest inputs.
The terms “polynomial time” or “exponential time” should be used carefully
when assessing the actual performance of an algorithm. As we have shown, in
the realm of asymptotic analysis and Complexity Theory, AKS is superior to
older algorithms such as trial division or Rabin-Miller. Since they are either not
polynomial time or not deterministic. However, in using these algorithms for
actual primality-proving, it is clear that AKS is not superior.
For the inputs tested here, AKS was several orders of magnitude slower than
other algorithms. The difference is so great, that for n large enough that the
asymptotics begin to factor, AKS would likely use a large amount of memory.
At these n, it may be necessary to re-architect the implementation, which would
render it even slower.

The precomputation attack with 2048-bit primes just isn’t anywhere in the realm of feasibility, even looking into the future, unless there’s either a break in the algorithm, or a big breakthrough in hardware for this kind of thing. Each bit doubles the cost of the attack, so going to 2048 bits multiplies the difficulty by 2^1024. Even with a Moore’s-law type assumption that computational resources double every 18 months, that’s about 1500 years before the resources are available.

Can you explain how this prime works? What does it mean to brute force the prime in this case? Do SSL implementations just have an array of primes and pick a random one? I think what I’m not understanding is if software is using only a few primes, what needs to be brute forced? If knowing the prime is important, can’t I just find all the primes in the source code out there and use them? I’m missing something…

The primes themselves are known, but you can effectively break all the communications that use one specific prime by doing a huge amount of precomputation. Once you have this huge amount of precomputation done, then you can break individual messages encrypted using DH and that specific prime, using more normal levels of computing resources. The authors estimate that with custom hardware and NSA-level resources, you can probably do the necessary precomputation for about one 1024-bit prime per year, so it’s just barely in the realm of computational feasibility.

What is that “huge amount of precomputation” computing? All the possible “secret colors” (as described in the link) that users have?

Is it correct that there are 3 “keys” in this process. The prime is public, and the two others are secrets each side has. The problem is that the public key is so small that with a lot, but not infinite, amount of computation an attacker can determine the two other keys?

Given 2 private keys for Alice and Bob a and b, each person sends their computed key, for Alice: A = (g^a)mod p (where p is the important prime and g is a prime that ‘fits’ with p) and they compute the shared key. In Bob’s case, he recieves A from Alice and computes K as (A^b) mod p. Then they have a secret key they can use.

If you are eavesdropping, you get g,p,A,B,K which you can then use to solve for a or b by solving A = (g^a) mod p. So the paper shows that you can precompute (with the 2^1024 possible as) a list of A based on a chosen common p, which have all g values already calculated (this is a limited, relatively small number). Then you are watching traffic and you see a keys A and B passed with the correct g and p pair, you do a lookup and you have a. Using that you compute the shared key and then you read their conversation.

tl;dr: the computation that they are doing is computing every value ofA(Alice’s computed key sent in the clear) given every value ofa(Alice’s 1024 bit secret key), a primep, and every primegthat works withp, they then throw it in this equation:A= (g^a) modp. Then when they seeAin the wild and they have it in their db, they can determine it’saand then compute the shared key (they know everything Alice knows) and listen in. It’s slow because you still have to compute (2^1024)*(number of possiblegvalues) values ofAand set up a db that can hold those values and query quickly. It’s a lot of everything, compute, storage, time and power.How DH worksGiven 2 private keys for Alice and Bob

aandb, each person sends their computed key, for Alice:A= (g^a)modp(wherepis the important prime andgis a prime that ‘fits’ withp) and they compute the shared key. In Bob’s case, he recievesAfrom Alice and computesKas (A^b) modp. Then they have a secret key they can use for symmetric encryption. See: http://mathworld.wolfram.com/Diffie-HellmanProtocol.htmlWhat the NSA might doIf you are eavesdropping, you get

g,p,A,B,Kwhich you can then use to solve foraorbby solvingA= (g^a) mod p. This requires a discrete log to solve (which is hard like factoring, and has a similar solution) which makes it unfeasible to solve forabefore the sun explodes. So the paper shows that you can precompute (with the 2^1024 possibleas) a list ofAbased on a chosen commonp, which have allgvalues already calculated (this is a limited, relatively small number). Then you are watching traffic and you see a keysAandBpassed with the correctgandppair, you do a lookup and you havea. Using that you compute the shared key (as if you were Alice) and then you read their conversation.Why they could do itFirst, finding 11^291 is faster than getting the discrete log of 11^k = 1.109868e+303 so the precomputation is feasible (as in faster than the lifetime of the universe) but still takes ~year on the theoretical hardware. Second,

pis restricted because of certain primes being ‘weak’, which is that solving for the discrete log of those primes is ‘easier’, and in order to be ‘safe’ a primepmust have the property that (p-1)/2 is prime. This makes computation a little bit more annoying thanjustfinding primes, which is more annoying than using a list of primes. In fact there is an RFC with ‘good’ primes to use for certain key lengths: http://tools.ietf.org/html/rfc3526 . These common primes are well known, and even if they weren’t publicized, the NSA could watch the beginning of all the DH exchanges they see on the internet and build up a list of most common primes based on number of users then prioritize breaking those that would impact the most users.What can you doAsk nicely that people make sure they aren’t using 1024 bit DH and just bump it up to 2048 bits, this greatly increases the precomputation needed to an unreasonable amount. Additionally it doesn’t require you to waste more time and compute making a bunch of primes (which will eventually be common). This also multiplies the number of calcs by 2^1024 so it’s redonk

Would be great if someone with some crypto knowledge can weigh in on here about if the idea in this post sounds about right.

Now imagine the colors as being prime numbers. (Source)

It turns out that verifying whether a given prime is appropriate for Diffie-Hellman is rather slow. This means that, if you accept a prime supplied by someone else (e.g. whoever you’re communicating with), and you don’t carefully verify the prime, then you’re at risk of ending up with a maliciously chosen weak group and/or generator, which results in weak encryption.

The paper doesn’t suggest that primes are being mailciously chosen. It’s just that a small number of fixed primes are used for a large percentage of internet communications, and the authors believe that breaking about one DH-1024 prime a year is now within the capabilities of an adversary with state-level resources. Therefore it’s plausible that the NSA has broken, say, 2-3 of the most widely used primes, if one assumes (as some circumstantial evidence suggests) that they may have spent $millions building specialized hardware to do so, sometime in the past few years.

The easiest solution is to switch everyone over to still using pre-chosen primes, but 2048-bit ones, which are not remotely feasible to break (unless DH itself is totally broken).

The issue (and this is not explained in the post) is why such different technologies share a small number of primes.

Basically, it’s difficult to find a prime big enough and it’s easier to reuse existing ones. When I was into crypto a long time ago this was known, but I never thought the pool of primes in the market is so reduced.

Obviously having more primes to select will help (as well as moving to bigger ones), but I fear that if the number of primes is so small because cost issues, we’ll repeat the same problem when moving to 2048-bit ones.

It’s not, though! It’s not remotely as computationally trivial as looking one up, but we have primality tests that run in O(log(n)^c) for some reasonable c, and by the prime number theorem approximately one in every log(n) numbers is prime, so

findinga large prime is entirely tractable, even on underpowered hardware like phones or embedded systems. Obviously for a lot of hardware it’s too slow to perform on-demand for interactive sessions, but surely you could pregenerate a handful of unique ones and keep them around for when you need them?I stand corrected… I had no idea we had advances in fast non-probabilistic tests since 2000. Anyway, is there a paper discuss how easy/difficult is to implement it?

I don’t know specific papers, no. Also, to the best of my knowledge, most implementations actually use the Fermat primality test, which is probabilistic, but with enough rounds of checking to push the probability of accidentally selecting a composite number below the probability of e.g. a cosmic ray corrupting the generated prime before it’s used.

The relevant algorithms are not difficult to implement, in any case. I wrote a toy RSA implementation for an undergraduate number theory class.

OpenSSL and GMP uses Miller-Rabin… I never understood the love of Fermat, being not faster than Miller-Rabin.

Anyway, I just discovered an implementation of AKS in GMP

(pdf alert), now I need some time to check how fast it is.Unfortunately AKS is not something we can use today. In 6.2 Counclusions, page #32:

In its original version, the

algorithm of Agarwal, Kayal, and Saxena is too slow to be of practical use. The hidden constant factors and the actual asymptotic bounds conspire to make the algorithm intractable for even modest inputs. The terms “polynomial time” or “exponential time” should be used carefully when assessing the actual performance of an algorithm. As we have shown, in the realm of asymptotic analysis and Complexity Theory, AKS is superior to older algorithms such as trial division or Rabin-Miller. Since they are either not polynomial time or not deterministic. However, in using these algorithms for actual primality-proving, it is clear that AKS is not superior.For the inputs tested here, AKS was several orders of magnitude slower than other algorithms. The difference is so great, that for n large enough that the asymptotics begin to factor, AKS would likely use a large amount of memory.At these n, it may be necessary to re-architect the implementation, which would render it even slower.The precomputation attack with 2048-bit primes just isn’t anywhere in the realm of feasibility, even looking into the future, unless there’s either a break in the algorithm, or a big breakthrough in hardware for this kind of thing. Each bit doubles the cost of the attack, so going to 2048 bits multiplies the difficulty by 2^1024. Even with a Moore’s-law type assumption that computational resources double every 18 months, that’s about 1500 years before the resources are available.

Can you explain how this prime works? What does it mean to brute force the prime in this case? Do SSL implementations just have an array of primes and pick a random one? I think what I’m not understanding is if software is using only a few primes, what needs to be brute forced? If knowing the prime is important, can’t I just find all the primes in the source code out there and use them? I’m missing something…

The primes themselves are known, but you can effectively break all the communications that use one specific prime by doing a huge amount of precomputation. Once you have this huge amount of precomputation done, then you can break individual messages encrypted using DH and that specific prime, using more normal levels of computing resources. The authors estimate that with custom hardware and NSA-level resources, you can probably do the necessary precomputation for about one 1024-bit prime per year, so it’s just barely in the realm of computational feasibility.

What is that “huge amount of precomputation” computing? All the possible “secret colors” (as described in the link) that users have?

Is it correct that there are 3 “keys” in this process. The prime is public, and the two others are secrets each side has. The problem is that the public key is so small that with a lot, but not infinite, amount of computation an attacker can determine the two other keys?

Given 2 private keys for Alice and Bob

aandb, each person sends their computed key, for Alice:A= (g^a)modp(wherepis the important prime andgis a prime that ‘fits’ withp) and they compute the shared key. In Bob’s case, he recievesAfrom Alice and computesKas (A^b) modp. Then they have a secret key they can use.If you are eavesdropping, you get

g,p,A,B,Kwhich you can then use to solve foraorbby solvingA= (g^a) mod p. So the paper shows that you can precompute (with the 2^1024 possibleas) a list ofAbased on a chosen commonp, which have allgvalues already calculated (this is a limited, relatively small number). Then you are watching traffic and you see a keysAandBpassed with the correctgandppair, you do a lookup and you havea. Using that you compute the shared key and then you read their conversation.EDIT: I just found http://mathworld.wolfram.com/Diffie-HellmanProtocol.html for a quick overview if you aren’t afraid of better formatted math than I have :D