Friday, November 28, 2014

Secrets & spies: or, I try to understand RSA

Thing practiced: ?

Tools used: IntelliJ IDEA community edition, Applied Cryptography

Scheming humans have always faced a basic problem: how can we communicate securely in the presence of adversaries? Since ancient times, the art of secret communication, or cryptography, has been crucial for governments and the military. But today cryptography affects us all. As messages are increasingly transmitted through public channels, cryptography prevents malicious interceptors from using our information against us.

The evolution of cryptography can be split into two eras by the invention of asymmetric ciphers in the 1970s. Historically, encrypting and decrypting a message had been symmetric processes — that is, unscrambling the message required knowing the key that had been used to scramble it. This begat the problem of key distribution: before sending a secret message, the sender would have to take great precautions to transport her encryption key safely to the recipient. In asymmetric cryptography, the keys used to encrypt and to decrypt a message are different. This means that the recipient can make his encryption key public, as long as he keeps his decryption key private.

What we need for an asymmetric-cryptography protocol to work is a trapdoor one-way function. This is the mathematical equivalent of a physical lock: easy to process in one direction (snapping the lock closed), but hard to undo (opening the lock), unless we have a special secret (the key). In RSA – the first public-key cryptosystem, and still the most popular – the trapdoor function exploits mathematical features of modular exponentiation, prime numbers, and integer factorization. Let’s throw together a toy implementation in Scala.

Helpful math functions

A number is prime if it has exactly two divisors: 1 and itself. Sometimes it’s useful to have a list of small primes, so we need a Sieve of Eratosthenes:

// Sieves a stream of Ints
def sieve(s: Stream[Int]): Stream[Int] =
  s.head #:: sieve(s.tail.filter(_ % s.head != 0))

// All primes as a lazy sequence
val primes = sieve(Stream.from(2))

What we really want, though, is large primes – ones higher than 2⁵⁰⁰. To get a prime that large, we can’t sieve up from 1; instead, we find a prime by taking a random number, checking if it’s probably prime, and repeating if necessary. The primality test of choice in real systems is Miller-Rabin, but we’ll use Solovay-Strassen to keep things simple:

// Ints won't fit these
import math.BigInt

// Euclid's GCD algorithm
def gcd(a: BigInt, b: BigInt): BigInt =
  if (b == 0) a else gcd(b, a % b)

// Computes the Jacobi symbol (needed for our test)
def jacobi(a: BigInt, n: BigInt): BigInt = {
  if (a == 1) 1
  else if (n == 1) 1
  else if (gcd(a, n) != 1) 0
  else if (a == 2 && (n % 8 == 1 || n % 8 == 7)) 1
  else if (a == 2 && (n % 8 == 3 || n % 8 == 5)) -1
  else if (a > n) jacobi(a % n, n)
  else if (a % 2 == 0) jacobi(a / 2, n) * jacobi(2, n)
  else if (n % 2 == 0) jacobi(a, n / 2) * jacobi(a, 2)
  else if ((((a - 1) * (n - 1)) / 4) % 2 == 0) jacobi(n, a)
  else jacobi(n, a) * -1
}

// Runs the Solovay-Strassen test for i iterations
def isPrime(n: BigInt, i: Int): Boolean = {
  if (i <= 0) true
  else if (n % 2 == 0 && n != 2) false
  else {
    val a = (random(n.bitLength) % (n - 1)) + 1
    val j = jacobi(a, n)
    val exp = a.modPow((n - 1) / 2, n)
    if (j == 0 || j % n != exp) false
    else isPrime(n, i - 1)
  }
}

Finally, the main reason we’re interested in primes is so we can do calculations modulo our prime. To do this, we need the Extended Euclidean algorithm:

// Returns (x, y) such that a*x + b*y = gcd(a, b)
def extendedGcd(a: BigInt, b: BigInt): (BigInt, BigInt) = {
  if (a % b == 0) (0, 1)
  else {
    val (x, y) = extendedGcd(b, a % b)
    (y, x - y * (a / b))
  }
}

The RSA system

An asymmetric encryption system has two parts: a public key and a private key. In theory, encrypting a message with the public key can only be reversed by decrypting with the private key.

In RSA, we encrypt a message by computing msg^e mod n, using the publicly-known information n and e. This is the crux of RSA’s security: modular exponentiation is easy to do, but exceedingly hard to undo. To make it possible to retrieve the message from the ciphertext, we build a trapdoor into our encryption routine by making n the product of two large primes p and q (which we keep private). We can then reconstruct the message using a calculated decryption exponent d.

To choose the right decryption exponent, we first need a value φ based on n’s factorization such that x^{φ(p, q)} = 1. Luckily, Euler’s totient function gives us just this:

// Euler's totient phi, where p and q are n's prime factors
def phi(p: BigInt, q: BigInt): BigInt = (p - 1) * (q - 1)

As long as we choose our public exponent e so that it doesn’t share a common factor with the totient φ, we can decrypt using the inverse of e mod φ:

// A legal (public) exponent has to
// (1) be between 1 and the totient
// (2) have no non-1 factors in common with the totient
def isLegalExponent(e: BigInt, p: BigInt, q: BigInt): Boolean =
  e > 1 && e < phi(p, q) && gcd(e, phi(p, q)) == 1

// Returns b such that a*b = 1 mod n
def modularInverse(a: BigInt, n: BigInt): BigInt = {
  val (x, _) = extendedGcd(a, n)
  x % n
}

// Our decryption exponent: the inverse of e mod phi
def d(e: BigInt, p: BigInt, q: BigInt) = modularInverse(e, phi(p, q))

Encryption and decryption are now trivial:

// Takes a message m and public information e and n
def encrypt(m: BigInt, e: BigInt, n: BigInt): BigInt =
  m.modPow(e, n)

// Takes a ciphertext c, private exponent d, and public information n
def decrypt(c: BigInt, d: BigInt, n: BigInt): BigInt =
  c.modPow(d, n)

(In reality, RSA is never used to encrypt messages — for n to be large enough to encode any reasonable length of message, the computing resources required for even the “easy” process of encryption are prohibitively high. Instead, RSA is used to safely deliver symmetric keys, which can then be used with block or stream ciphers to encrypt and decrypt large messages.)