It makes online shopping, banking, and secure communications possible.
Encryption algorithms are the mathematical formulae for performing these transformations. You provide an encryption algorithm with a key and the data you want to protect (the plaintext), and it produces an encrypted output (the ciphertext). To read the output, you need to feed the key and the ciphertext into a decryption algorithm (sometimes these are identical to encryption algorithms; other times they are closely related but different).
Encryption algorithms are designed so that performing the decryption process is unfeasibly hard without knowing the key.
The algorithms can be categorized in many different ways, but perhaps the most fundamental is the distinction between symmetric and asymmetric encryption.
Before 1973, every known encryption algorithm was symmetric. What this means is that the key used to encrypt the plaintext is the same key used to decrypt the ciphertext; if you know the key, you can decrypt any data encrypted with it. This means that the key must be kept secret—only people authorized to read the messages must know it, and those people who do know it can read every single message that uses it.
This in turn makes symmetric algorithms tricky to use in practice, because those keys must be securely transported somehow. Key transport wasn't so difficult in the 6th century BC, when the first encryption algorithm, called the Caesar cipher, was invented. If you wanted to share a key with someone, you could just tattoo the key onto the shaved head of a slave, wait for his hair to grow back, and then send the recipient of your message the slave.
Unfortunately, that approach doesn't scale very well. If you want to buy something from Amazon or do some online banking, you probably don't have the time to wait for your slave's hair to grow back, and given the multitude of e-commerce sites out there, you may not even have enough slaves to go around.
It took 2,500 years of on-and-off cryptography invention and research for a solution to be found to this problem. That solution is asymmetric encryption. With asymmetric encryption there is not one key, but two related keys. Messages encrypted with one of the keys can't subsequently be decrypted with that same key. Instead, you have to use the other key for decryption. Typically, one key is designated as the public key and is published widely. The other is designated the private key and is kept secret.
The first published, commercially available algorithm (which happened to work in much the same way as GCHQ's worked, albeit in a more generalized form) was invented in 1977, and it was called RSA after the names of its three inventors (Ron Rivest, Adi Shamir and Leonard Adleman).
Symmetric algorithms essentially depend on jumbling up the plaintext in complex ways and doing so lots of times; the key is what specifies the exact jumbling up pattern that was used. Asymmetric algorithms take a different approach. They depend on the existence of hard mathematical problems. The keys here are solutions to these mathematical problems.
The problem that RSA is built around is that of integer factorization. Every integer has a unique prime factorization; that is, it can be written as a multiplicative product of prime numbers. For example, 50 is 2×52. While factorization is something that we all learn at school, it's actually a hard problem, especially when dealing with large numbers.
The naive algorithm you learn in school is called "trial division"; you divide the number you are trying to factorize by each prime number in turn, starting at 2 and working your way up, and check to see if it's wholly divisible. If it's wholly divisible then you then factorize the number that's left over, starting at the same prime as you just divided by. If it isn't, you try the next prime number. You only stop when you've tried every prime number less than or equal to the square root of the number you're trying to factorize.
This is easy enough for small numbers, but for big numbers it's very time-consuming. There are various algorithms that are a bit quicker than trial division, but only incrementally so.
RSA key pairs (that is, the pair of private and public keys) are generated in the following way:
Encryption and decryption are both quite simple. A ciphertext c is generated from a plaintext m as follows:
c = me (mod n)
Decryption is similar:
m = cd (mod n)
Here's where the difficulty of integer factorization is important: if n could easily be factorized then anyone could determine p and q, hence φ(n), and hence d. If d were known to everyone then they could decrypt the messages encrypted with e with ease.
Conventionally, e, the public key, is chosen to be a smaller number than d, typically 65,537.
Since RSA's invention, other asymmetric encryption algorithms have been devised; like RSA, they're all built on the assumption that a particular mathematical problem is hard to solve, usually either integer factorization or the discrete logarithm. Surprisingly, the assumption of difficulty is not actually proven in the case of either integer factorization or the discrete logarithm, and there is a possibility that a "fast" algorithm will be devised, which could render some kinds of public key cryptography insecure.
Perhaps surprisingly, one of the things it's not good for is general-purpose encryption. Those exponentiation operations used in RSA's encryption and decryption (or similar operations in other algorithms) are slow and expensive to compute; as a result, RSA isn't well-suited to encrypting large chunks of data. What it is good for, however, is encrypting small pieces of data—such as the encryption keys for symmetric algorithms, which are good for encrypting large amounts of data.
An example of this is the PGP program, first released in 1991. PGP, standing for "Pretty Good Privacy," is a program for sending secure messages using a combination of symmetric and asymmetric encryption algorithms. Everyone who wants to receive PGP-secured messages publishes their PGP public key. Traditionally, this is an RSA public key, though nowadays other algorithms are also supported.
To send a secure message, you first generate a random key for use with a symmetric algorithm. You use that key and the symmetric algorithm to encrypt your message. The original PGP implementation used an algorithm called IDEA for this purpose, though modern versions offer a variety of options for this, too. You then encrypt that random key with the RSA public key and bundle the two things—symmetrically encrypted message, asymmetrically encrypted random key.
To decrypt the message, the recipient uses his private key to decrypt the random key, and then uses the random key to decrypt the message itself.
Similar schemes are used for the S/MIME secure e-mail standard: public key cryptography is used to secure keys, and symmetric cryptography is used for the bulk encryption.
Disk encryption systems like Microsoft's BitLocker also uses a similar approach on systems equipped with TPM (Trusted Platform Module) chips. These chips securely store encryption keys and in principle only allow authorized software to access them. The bulk encryption of the data on the disk uses the AES symmetric algorithm; that key is then encrypted using RSA.
The ssh network protocol widely used for remote connections to Unix-like operating systems can also use public key cryptography for logging in. Users connecting to systems by ssh generate their own key pairs and associate their public keys with their user accounts on every server they want to connect to. Each time a user connects to the server, the server verifies that they're in possession of the corresponding private key by exploiting the fact that only the private key can decrypt messages encrypted with the public key.
To create a digital signature, first the message is fed into a hash algorithm. Hash algorithms convert variable length data into fixed length numbers. Simple hash algorithms are widely used to provide probabilistic detection of changes to data due to things like transmission errors or disk corruption. Hash algorithms used in cryptographic applications are generally a bit more complex, because they're designed to ensure that it's all but impossible for someone to construct a file in order to have a specific hash value.
The output of this hash function, called the hash, is then encrypted using the private key. The encrypted hash and the message are then distributed.
To verify the signature, the encrypted hash is decrypted. The message is hashed using the same hash algorithm, and this hash is compared to the decrypted value. If they match, then it proves two things: first that the message was sent by the owner of the private key, second that the message has not been subsequently modified.
Digital certificates are little chunks of data that contain some information about an identity (so they could represent a company, a subdivision of a company, or an individual person, say), some usage information about the certificate itself (a certificate might be legitimate only for signing e-mail messages, or for signing software, or for encrypting files), and a public key.
These chunks of data are then all signed by a certificate authority; an organization that's in some sense trusted to have performed verification of the identity of the person the certificate belongs to. Each certificate also has a corresponding private key that is given to the certificate's owner.
Like normal public keys, certificates are freely published; the private keys, of course, are not.
PKI is something that most of us unwittingly use every single day, because these certificates are used for Web serving with secure HTTP (HTTPS). HTTPS is used for both encryption, to prevent people from eavesdropping on your connection to your bank, and also authentication, so that you can prove that it's really your bank you're talking to and not some hacker.
The authentication aspect is provided by the use of digital certificates: whenever a client connects to a server, the server sends its certificate for the client to inspect. For its encryption, HTTPS follows the same pattern that we've seen before; bulk encryption of the data using a symmetric algorithm and a random key, with the key transported using asymmetric encryption. The specifications underpinning HTTP (namely, secure socket layer [SSL] and transport layer security [TLS]) allow a variety of symmetric and asymmetric algorithms, but RSA is one of the more common. When RSA is used, the server certificate's public key itself is used to encrypt the secret key.
As such, it's important to keep private keys private. The best approach for doing this depends on what the key is used for. Private keys can generally be password protected, which provides some protection should the key be lost. However, this can be awkward for many usage scenarios; you don't want a rebooted Web server to be unable to use HTTPS until someone has typed in a password, after all.
If passwords aren't suitable, the next best bet is simply to strive to ensure the integrity of all machines used to store private keys. This means ensuring that systems are patched and malware free, and that access to systems holding important private keys is as restricted as possible. To minimize exposure in the case of a break-in, it's also common for websites to separate the machines that perform HTTPS from the ones that run Web applications; this way, a flaw in the application that exposes the server to hackers doesn't automatically mean that the private key is exposed.
In extreme cases, for particularly sensitive keys, private keys are stored offline and used only when absolutely necessary.
Should they get stolen, the results can be pretty disastrous.
Two chip companies, Realtek and JMicron, had their private keys stolen, probably in 2009 or 2010. These private keys corresponded to certificates that were authorized for code signing. 64-bit Windows requires device drivers to have a digital signature signed by a code-signing certificate. Hardware companies buy the code-signing certificates for the drivers they produce.
In the case of Realtek and JMicron, the stolen private keys were used to sign drivers that were used by the infamous Stuxnet malware. Armed with these private keys, Stuxnet's authors could produce drivers that Windows trusted and that appeared to come from harmless hardware companies from East Asia.
PKI provides a partial solution to this kind of compromise. Certificate authorities don't just sign certificates; they also have the power to revoke them. If a certificate authority is notified that a private key has been compromised for some reason, the CA can add the certificate to its list of revoked certificates. In theory, any system depending on PKI should check these revocation lists and refuse to use certificates found on them. Unfortunately, practice doesn't always match up with theory.
For private keys used outside of PKI, there's no easy solution, as there's no formal revocation procedure, and the impact will vary depending on how the keys were used. Accidental, unexploited disclosure of ssh keys may require nothing more serious than generation of a new key pair, removal of the old public key from all systems it's stored on, and the uploading of the new key.
This is a problem recently faced by GitHub, the project hosting and source control service. GitHub uses ssh for remote access to its source repositories and uses RSA keys to authenticate that access. It was recently noticed that many GitHub users had uploaded their private keys to public GitHub repositories, giving anyone the ability to modify those repositories. GitHub has responded by removing all the corresponding public keys and hence preventing those private keys from being used for further access.
For keys that were only used for GitHub that's probably sufficient, but anyone who used their keys for other purposes will probably have some work to do.
There is, however, a class of computer that can efficiently solve both such problems. Quantum computers that encode data using quantum features such as superposition and entanglement do have effective algorithms for integer factorization and discrete logarithms. A big enough quantum computer could relatively rapidly break RSA and other mainstream public key algorithms.
Fortunately, it's proving rather difficult to make big quantum computers. The number n, calculated in RSA by multiplying the two large prime numbers p and q, is typically 1,024 or 2,048 bits long. To factorize that number with a quantum computer would require at a minimum as many quantum bits (qubits) as there are bits in n. In practice, to account for error correction and other concerns, it could require the square of the number of bits, so about a million qubits to factorize a 1,024 bit n.
The largest number factorized by a quantum computer, however, is perhaps 143, on a system with just 4 qubits. One company claims, contentiously, to have 128 qubit processors but even this is a long way short of the million qubits needed to crack real RSA keys.
Even if realistic quantum computers should be developed, there will be new asymmetric encryption algorithms devised that will be deliberately resistant to quantum attacks. Public key cryptography is a hidden but fundamental part of modern life: even quantum leaps in computing technology aren't going to make it go away.
Encryption algorithms are designed so that performing the decryption process is unfeasibly hard without knowing the key.
The algorithms can be categorized in many different ways, but perhaps the most fundamental is the distinction between symmetric and asymmetric encryption.
Before 1973, every known encryption algorithm was symmetric. What this means is that the key used to encrypt the plaintext is the same key used to decrypt the ciphertext; if you know the key, you can decrypt any data encrypted with it. This means that the key must be kept secret—only people authorized to read the messages must know it, and those people who do know it can read every single message that uses it.
This in turn makes symmetric algorithms tricky to use in practice, because those keys must be securely transported somehow. Key transport wasn't so difficult in the 6th century BC, when the first encryption algorithm, called the Caesar cipher, was invented. If you wanted to share a key with someone, you could just tattoo the key onto the shaved head of a slave, wait for his hair to grow back, and then send the recipient of your message the slave.
Unfortunately, that approach doesn't scale very well. If you want to buy something from Amazon or do some online banking, you probably don't have the time to wait for your slave's hair to grow back, and given the multitude of e-commerce sites out there, you may not even have enough slaves to go around.
It took 2,500 years of on-and-off cryptography invention and research for a solution to be found to this problem. That solution is asymmetric encryption. With asymmetric encryption there is not one key, but two related keys. Messages encrypted with one of the keys can't subsequently be decrypted with that same key. Instead, you have to use the other key for decryption. Typically, one key is designated as the public key and is published widely. The other is designated the private key and is kept secret.
The first public key encryption algorithm: RSA
The first algorithms using asymmetric keys were devised in secret by the British government's SIGINT agency, GCHQ, in 1973. That work was not made public, however, until 1997, when the British government declassified it.The first published, commercially available algorithm (which happened to work in much the same way as GCHQ's worked, albeit in a more generalized form) was invented in 1977, and it was called RSA after the names of its three inventors (Ron Rivest, Adi Shamir and Leonard Adleman).
Symmetric algorithms essentially depend on jumbling up the plaintext in complex ways and doing so lots of times; the key is what specifies the exact jumbling up pattern that was used. Asymmetric algorithms take a different approach. They depend on the existence of hard mathematical problems. The keys here are solutions to these mathematical problems.
The problem that RSA is built around is that of integer factorization. Every integer has a unique prime factorization; that is, it can be written as a multiplicative product of prime numbers. For example, 50 is 2×52. While factorization is something that we all learn at school, it's actually a hard problem, especially when dealing with large numbers.
The naive algorithm you learn in school is called "trial division"; you divide the number you are trying to factorize by each prime number in turn, starting at 2 and working your way up, and check to see if it's wholly divisible. If it's wholly divisible then you then factorize the number that's left over, starting at the same prime as you just divided by. If it isn't, you try the next prime number. You only stop when you've tried every prime number less than or equal to the square root of the number you're trying to factorize.
This is easy enough for small numbers, but for big numbers it's very time-consuming. There are various algorithms that are a bit quicker than trial division, but only incrementally so.
RSA key pairs (that is, the pair of private and public keys) are generated in the following way:
- Choose two large, distinct prime numbers, called p and q.
- Compute p × q, and call this n. The size of this number in bits is the key length, and it gives an indication of how strong the key is: the longer, the better.
- Compute (p - 1) × (q - 1), and call this φ(n).
- Pick an integer e that is coprime with φ(n). That is to say, a prime factorization of e should have no factors shared by a prime factorization of φ(n).
- Calculate d such that d × e = 1 (mod φ(n)). This multiplication uses modulo arithmetic (also known as clock arithmetic). Modulo arithmetic wraps around. It counts normally from 0 to mod φ(n) - 1, then wraps around back to 0 again. This is much the same as the way that on a clock, the number after 12 isn't 13; it's 1.
Encryption and decryption are both quite simple. A ciphertext c is generated from a plaintext m as follows:
c = me (mod n)
Decryption is similar:
m = cd (mod n)
Here's where the difficulty of integer factorization is important: if n could easily be factorized then anyone could determine p and q, hence φ(n), and hence d. If d were known to everyone then they could decrypt the messages encrypted with e with ease.
Conventionally, e, the public key, is chosen to be a smaller number than d, typically 65,537.
Since RSA's invention, other asymmetric encryption algorithms have been devised; like RSA, they're all built on the assumption that a particular mathematical problem is hard to solve, usually either integer factorization or the discrete logarithm. Surprisingly, the assumption of difficulty is not actually proven in the case of either integer factorization or the discrete logarithm, and there is a possibility that a "fast" algorithm will be devised, which could render some kinds of public key cryptography insecure.
Public key cryptography in practice
Given a public key algorithm, what's it actually good for?Perhaps surprisingly, one of the things it's not good for is general-purpose encryption. Those exponentiation operations used in RSA's encryption and decryption (or similar operations in other algorithms) are slow and expensive to compute; as a result, RSA isn't well-suited to encrypting large chunks of data. What it is good for, however, is encrypting small pieces of data—such as the encryption keys for symmetric algorithms, which are good for encrypting large amounts of data.
An example of this is the PGP program, first released in 1991. PGP, standing for "Pretty Good Privacy," is a program for sending secure messages using a combination of symmetric and asymmetric encryption algorithms. Everyone who wants to receive PGP-secured messages publishes their PGP public key. Traditionally, this is an RSA public key, though nowadays other algorithms are also supported.
To send a secure message, you first generate a random key for use with a symmetric algorithm. You use that key and the symmetric algorithm to encrypt your message. The original PGP implementation used an algorithm called IDEA for this purpose, though modern versions offer a variety of options for this, too. You then encrypt that random key with the RSA public key and bundle the two things—symmetrically encrypted message, asymmetrically encrypted random key.
To decrypt the message, the recipient uses his private key to decrypt the random key, and then uses the random key to decrypt the message itself.
Similar schemes are used for the S/MIME secure e-mail standard: public key cryptography is used to secure keys, and symmetric cryptography is used for the bulk encryption.
Disk encryption systems like Microsoft's BitLocker also uses a similar approach on systems equipped with TPM (Trusted Platform Module) chips. These chips securely store encryption keys and in principle only allow authorized software to access them. The bulk encryption of the data on the disk uses the AES symmetric algorithm; that key is then encrypted using RSA.
The ssh network protocol widely used for remote connections to Unix-like operating systems can also use public key cryptography for logging in. Users connecting to systems by ssh generate their own key pairs and associate their public keys with their user accounts on every server they want to connect to. Each time a user connects to the server, the server verifies that they're in possession of the corresponding private key by exploiting the fact that only the private key can decrypt messages encrypted with the public key.
Signed, sealed, and delivered
Both RSA keys can be used to encrypt data. For secure communications, the public key is used to encrypt and the private key to decrypt, but the reverse is also useful; using the private key to encrypt and the public key to decrypt. This is useful because it proves authenticity. If a message can be decrypted using the public key then that proves that it was encrypted by the private key.To create a digital signature, first the message is fed into a hash algorithm. Hash algorithms convert variable length data into fixed length numbers. Simple hash algorithms are widely used to provide probabilistic detection of changes to data due to things like transmission errors or disk corruption. Hash algorithms used in cryptographic applications are generally a bit more complex, because they're designed to ensure that it's all but impossible for someone to construct a file in order to have a specific hash value.
The output of this hash function, called the hash, is then encrypted using the private key. The encrypted hash and the message are then distributed.
To verify the signature, the encrypted hash is decrypted. The message is hashed using the same hash algorithm, and this hash is compared to the decrypted value. If they match, then it proves two things: first that the message was sent by the owner of the private key, second that the message has not been subsequently modified.
Public Key Infrastructure
The biggest single use of public key cryptography, however, is building public key infrastructure (PKI). PKI uses public key cryptography to create digital certificates that allow for secure attestation of identity.Digital certificates are little chunks of data that contain some information about an identity (so they could represent a company, a subdivision of a company, or an individual person, say), some usage information about the certificate itself (a certificate might be legitimate only for signing e-mail messages, or for signing software, or for encrypting files), and a public key.
These chunks of data are then all signed by a certificate authority; an organization that's in some sense trusted to have performed verification of the identity of the person the certificate belongs to. Each certificate also has a corresponding private key that is given to the certificate's owner.
Like normal public keys, certificates are freely published; the private keys, of course, are not.
PKI is something that most of us unwittingly use every single day, because these certificates are used for Web serving with secure HTTP (HTTPS). HTTPS is used for both encryption, to prevent people from eavesdropping on your connection to your bank, and also authentication, so that you can prove that it's really your bank you're talking to and not some hacker.
The authentication aspect is provided by the use of digital certificates: whenever a client connects to a server, the server sends its certificate for the client to inspect. For its encryption, HTTPS follows the same pattern that we've seen before; bulk encryption of the data using a symmetric algorithm and a random key, with the key transported using asymmetric encryption. The specifications underpinning HTTP (namely, secure socket layer [SSL] and transport layer security [TLS]) allow a variety of symmetric and asymmetric algorithms, but RSA is one of the more common. When RSA is used, the server certificate's public key itself is used to encrypt the secret key.
Private parts
Steal someone's private key and you can masquerade as them to any system depending on public key cryptography. Take someone's PGP key and you can read their encrypted mail. Break into a bank's Web server and take its HTTPS certificate's private key and you can stick your own server on the Web and run the perfect phishing site, tricking the unsuspecting customers of the bank into handing over their account details.As such, it's important to keep private keys private. The best approach for doing this depends on what the key is used for. Private keys can generally be password protected, which provides some protection should the key be lost. However, this can be awkward for many usage scenarios; you don't want a rebooted Web server to be unable to use HTTPS until someone has typed in a password, after all.
If passwords aren't suitable, the next best bet is simply to strive to ensure the integrity of all machines used to store private keys. This means ensuring that systems are patched and malware free, and that access to systems holding important private keys is as restricted as possible. To minimize exposure in the case of a break-in, it's also common for websites to separate the machines that perform HTTPS from the ones that run Web applications; this way, a flaw in the application that exposes the server to hackers doesn't automatically mean that the private key is exposed.
In extreme cases, for particularly sensitive keys, private keys are stored offline and used only when absolutely necessary.
Should they get stolen, the results can be pretty disastrous.
Two chip companies, Realtek and JMicron, had their private keys stolen, probably in 2009 or 2010. These private keys corresponded to certificates that were authorized for code signing. 64-bit Windows requires device drivers to have a digital signature signed by a code-signing certificate. Hardware companies buy the code-signing certificates for the drivers they produce.
In the case of Realtek and JMicron, the stolen private keys were used to sign drivers that were used by the infamous Stuxnet malware. Armed with these private keys, Stuxnet's authors could produce drivers that Windows trusted and that appeared to come from harmless hardware companies from East Asia.
PKI provides a partial solution to this kind of compromise. Certificate authorities don't just sign certificates; they also have the power to revoke them. If a certificate authority is notified that a private key has been compromised for some reason, the CA can add the certificate to its list of revoked certificates. In theory, any system depending on PKI should check these revocation lists and refuse to use certificates found on them. Unfortunately, practice doesn't always match up with theory.
For private keys used outside of PKI, there's no easy solution, as there's no formal revocation procedure, and the impact will vary depending on how the keys were used. Accidental, unexploited disclosure of ssh keys may require nothing more serious than generation of a new key pair, removal of the old public key from all systems it's stored on, and the uploading of the new key.
This is a problem recently faced by GitHub, the project hosting and source control service. GitHub uses ssh for remote access to its source repositories and uses RSA keys to authenticate that access. It was recently noticed that many GitHub users had uploaded their private keys to public GitHub repositories, giving anyone the ability to modify those repositories. GitHub has responded by removing all the corresponding public keys and hence preventing those private keys from being used for further access.
For keys that were only used for GitHub that's probably sufficient, but anyone who used their keys for other purposes will probably have some work to do.
Our quantum future?
Keeping the private key private only matters if integer factorization (or discrete logarithms, for the other major family of asymmetric encryption algorithms) remains a hard problem. It's still not proven that this is the case, leaving the door open to a fast algorithm at some point in the future. For the moment, conventional computers struggle at these problems.There is, however, a class of computer that can efficiently solve both such problems. Quantum computers that encode data using quantum features such as superposition and entanglement do have effective algorithms for integer factorization and discrete logarithms. A big enough quantum computer could relatively rapidly break RSA and other mainstream public key algorithms.
Fortunately, it's proving rather difficult to make big quantum computers. The number n, calculated in RSA by multiplying the two large prime numbers p and q, is typically 1,024 or 2,048 bits long. To factorize that number with a quantum computer would require at a minimum as many quantum bits (qubits) as there are bits in n. In practice, to account for error correction and other concerns, it could require the square of the number of bits, so about a million qubits to factorize a 1,024 bit n.
The largest number factorized by a quantum computer, however, is perhaps 143, on a system with just 4 qubits. One company claims, contentiously, to have 128 qubit processors but even this is a long way short of the million qubits needed to crack real RSA keys.
Even if realistic quantum computers should be developed, there will be new asymmetric encryption algorithms devised that will be deliberately resistant to quantum attacks. Public key cryptography is a hidden but fundamental part of modern life: even quantum leaps in computing technology aren't going to make it go away.
Yes its a very promising solution for securing information online. So far this technique is considered to be the safest method developed ever. Thank you for explaining this form in detailed manner.
ReplyDeleteelectronic signatures