Saturday, May 4, 2013

A Little About Encryption

My people have no tradition of proofreading.  —Ken White

This is the third of three posts on encrypting email.  The others are It's Time to Encrypt Your Email and Using Encrypted Email

About Encryption

Encryption mathematically scrambles the bits of your email messages or other documents so that the content is impossible to read without reversing the encryption process.  The encrypting process combines a long and random collection of bits called the key with the message to to produce an encrypted message, called the cipher text.  This is analogous to putting the message in an envelope, except that the envelope cannot be opened without the key.  The cipher text can be safely sent to the recipient electronically; even if the message is intercepted, the adversary will not be able to read it. Decrypting the message involves reversing the process using the same key, as shown in the diagram.  A system of encryption that uses the same key for both encrypting and decrypting is called secret key or or shared key or symmetric key encryption.


You've spotted the problem!  Sender and recipient must each have a copy of the key, so you have to figure out a way to get the key to your recipients securely and hope they keep it secure.  You also have to have a separate key for each person with whom you want to correspond; otherwise, all will be able to decrypt everyone else's messages.  Symmetric key encryption has important uses as we see below, but for correspondence, it doesn't scale well when used by itself.

Public Key Cryptography

In the 1970s, three groups of researchers independently invented a mechanism that uses two different keys with the same message, one to encrypt and one to decrypt.  The key usually used for encrypting is called the public key, and the key usually used for decrypting is called the private key.  Here's what's important:  a message encrypted with one's public key can only be decrypted using the corresponding private key.  You can give the public key to anyone, and they will not be able to decrypt messages that others may have encrypted with the same public key.

You can give your public key to everyone with whom you correspond.  Only you will be able to decrypt the messages they encrypt for you.  In fact, there are public key servers that will allow you to post your own public key and to look up the public keys of others.  One such key server is pgp.mit.edu.  You will probably want to upload your public key to that one or another.  You could try looking up the public keys of others right now by clicking on the link.

Hybrid Cryptography

There's a problem with public key cryptography: it's a lot of computational work.  Encryption using public key cryptography can take as much as 10,000 times longer than encrypting the same message using symmetric key cryptography.  Encrypting a message could take from several minutes to hours.  We probably don't want to wait even a few minutes for the encryption process.

Symmetric (shared) key encryption is much faster, with times in seconds for even very long messages.  But, as we saw earlier, the problem is how to get a copy of that shared key securely to the recipient.  The solution is to use public key cryptography to solve the key exchange problem.  The sender generates a random number that's used to create a key for symmetric key encryption, using something like the Advanced Encryption Standard (AES.)  So, encryption can be completed very quickly.  The symmetric key is called a session key when used this way because each key is used for only one message or communication session.  The session key itself is encrypted using public key cryptography and the recipient's public key.  Because the session key is short – perhaps 256 bits (32 bytes) – the time to encrypt it is minimal.  The encrypted message and encrypted session key are packaged together and transmitted to the recipient.  This two-step process is called hybrid cryptography, and is almost always the way public key cryptography is employed to secure messages from eavesdropping.

The recipient reverses the process.  The session key is first decrypted using the recipient's private key.  Then, the session key is used to decrypt the message.  Besides speeding up the process, use of session keys and hybrid cryptography actually improves the security of messages because it deprives an adversary of the chance to collect many messages encrypted with the same key.  That's important for two reasons: having a large collection of text encrypted with a single key may make the cryptanalyst's job easier, and, if the key for one message is cracked, all the messages are revealed.  (Of course, if the cryptanalyst can recover the recipient's private key, the public key no longer offers security.  But, if the cryptanalyst cracks one session key, the other messages remain secure because they were encrypted with different session keys.)

Idea: A Digital Signature

The two keys of a public key crypto key pair are cryptographic inverses of one another.  A message encrypted with one key of a pair can only be decrypted with the other key of the same pair.  In the normal course of things, Alice would encrypt a message using Bob's public key, and Bob would decrypt it with his private key.  That would keep the message confidential while it travels over an unsecure channel.

If Alice encrypts a message with her private key, Bob or anyone else could decrypt it using Alice's public key.  There's no confidentiality because the message can be decrypted with Alice's public key, and that's, well, public.  However, if we believe that Alice has guarded her private key carefully, only she could have encrypted the message. So, we can say that Alice has digitally signed a message that was encrypted with her private key.

Encrypting a message with a private key (to sign it) has the same problem as encrypting with a public key to secure it, namely that it would take a very long time.  We need a way to characterize a specific message that is shorter than the message itself.

Cryptographic Hashes

In the early days of computing, it was common to add up a series of numbers before entering data into a computer, then add them up again with the computer.  If one got the same total, that was a good indication that the data entry process was free of errors.  Sometimes the numbers were things like birth dates, where a total did not have any meaning other than as a check for consistency.  Such totals were called "hash totals."

The characters that comprise a computer message are just numbers.  In theory, we could add them up to get a hash total that could serve as a consistency check on the message. In practice, it's a little more complicated. 

A cryptographic hash function has three special properties.  First, even a tiny change in a message, like adding a zero to make $100 into $1,000, must change the computed hash code.  Second, it should be impossible, given a message, to create another message that produces the same hash code.  There exist hash codes that are thought to meet both criteria.  We can think of the computed hash code as a kind of fingerprint for a message.  Different messages always have different fingerprints, even if the difference in the messages is very small.  A hash code computed over a message is called a message digest.

The third property is that one cannot reconstruct the message given only the message digest.That's important because the digital signature exposes the message digest to an adversary.

The Digital Signature Improved

Given that we can compute a cryptographic hash, we can improve upon our idea of a digital signature.  Instead of encrypting the entire message with her private key to sign it, Alice can compute a cryptographic hash over the message and encrypt the hash code only using her private key.  Since the hash code is short – perhaps 256 bits – it, like a session key, can be encrypted quickly even with public key cryptography.  The encrypted hash code is sent along with the message to serve as the digital signature.

Such a digital signature not only authenticates the sender, it protects the message from tampering while in transit.  Here's why: Anyone can decrypt the digital signature using Alice's public key, but only Alice could have encrypted it.  Bob can verify the message by computing the hash code anew, then comparing it with the decrypted hash code sent with the message.  If they're equal, we can be sure the message actually came from Alice.  We can also be sure the message hasn't been altered.  If Evil Eve had altered the message, the hash code Bob computed would  be different from the one Alice computed.  The computed hash code wouldn't match the decrypted hash code, and the digital signature validation would fail.  (Eve can't replace the hash code after altering the message because doing so requires Alice's private key, which only Alice has.)

Putting the Pieces Together

We can encrypt a message using hybrid cryptography, and we can authenticate it using a digital signature.  If we put those pieces together, we can package a message that only the intended recipient can decrypt, and for which the authenticity of the sender is assured.

The diagram below shows what happens when Alice's encryption software prepares a digitally-signed and encrypted message for transmission to Bill.  Plain text (unencrypted) information is represented by green boxes, encrypted information by red boxes, and keys by orange boxes.  The package of information that is transmitted to Bill is surrounded by a blue box.  The digital signature is "sort of green" because, although it is encrypted, anyone can decrypt it using Alice's public key.The circle-plus symbol indicates encryption.


Alice's encryption program uses a cryptographic hash algorithm to compute a message digest of the plaintext message, then encrypts the message digest using Alice's private key to form the digital signature.  A random number generator is used to produce a session key.  The session key will be used only once, for this message.

The plaintext message is encrypted using the session key and a symmetric key encryption algorithm such as AES.  The session key itself is encrypted with Bill's public key.

There are three encryption operations.  Symmetric key encryption is used for the "main" message because it is comparatively fast.  Public key encryption is used to produce the encrypted session key and the digital signature.  This works because, although public key encryption is slow, both the message digest and the session key are small, perhaps 32 bytes each.  The "package" sent to Bill includes the three main components shown in the diagram and some additional information.  It identifies Alice as the sender, Bill as the recipient, and names the encryption algorithms used.  It is possible to include more than one encrypted session key.  If Alice wanted to send the same message to both Bill and Charlie, the package would include a copy of the session key encrypted with Bill's public key and another copy encrypted with Charlie's public key.

Bill Receives the Message

Bill gets the packaged message over the Internet by email, or perhaps in some other way. The session key was encrypted by Alice using Bill's public key.  Using Bill's private key, Bill's crypto program decrypts the session key.  The session key is used to decrypt the actual contents of the message.  Since only Bill has a copy of his private key, only Bill can decrypt the session key, and hence decrypt the message.  The message is now available in plaintext.  The next step is to check that it really came from Alice and is not a forgery.

Alice digitally signed the message by encrypting a message digest using her private key.  We know two things about that.  Since the digest was encrypted with Alice's private key, only Alice's public key can decrypt it.  Since only Alice has a copy of her private key, only Alice could have encrypted it.

Bill uses Alice's public key to decrypt the digest that is the digital signature.  Bill also computes a new message digest from the plaintext message.  The decrypted digest and the computed digest are compared.  If they are equal, Bill has confidence that the message actually came from Alice, and also that it hasn't been tampered with.  If they're not equal, then something is wrong and Bill must mistrust the message.

The amount of confidence Bill has in the authenticity of the message depends on the amount of confidence Bill has that Alice has kept her private key secure.  If Bill is sure that Alice has kept her private key secure, then Bill can be sure the message came from Alice.  If malicious Mallory has gotten a copy of Alice's private key, then Mallory could have forged the message.

Notice that if Eve the eavesdropper can intercept a copy of the message, Eve can decrypt the digital signature with Alice's public key because it is, well, public.  However, it doesn't do Eve any good because one of the characteristics of that cryptographic hash is that the message cannot be reconstructed from the message digest.

Digital Certificates

In a previous post, I wrote about signing others' public keys as a way to improve our confidence that the key actually belongs to the person it claims to belong to. We saw earlier how a message could be both authenticated and protected from tampering through the use of a digital signature.  A public key can be protected in the same way.  When a public key is digitally signed, the result is called a digital certificate

The purpose of a digital certificate is to bind an identity to a public key.  When you created your public/private key pair, you put in your own email address, but you could have claimed to be alice@example.com or even president@whitehouse.gov.  If you could get someone to use such a public key, thinking they were corresponding with the real Alice (or President) you'd be able to decrypt the messages because you have the corresponding private key.  A digital certificate helps us increase our confidence that a particular public key actually belongs to the claimed party.  It binds an identity to a public key.
The diagram at the left is a simplified representation of a digital certificate.  The part in green is all plain text.  It identifies alice@example.com as being the owner of the given public key.  This certificate has two digital signatures, one by Bob and one by Charlie.  By signing this certificate, Bob and Charlie are certifying that they have checked that the given public key actually belongs to alice@example.com.

Here's how it works.  The certificate is just a message consisting of an identifier, a public key, and other information in plain text, the part shown as green in the diagram.  Each signature has two parts, identification of the signer, shown in yellow-green and stored as plain text, and an encrypted hash code, shown in red and encrypted with Bob's private key.  Someone who knows and trusts Bob can compute the hash code directly, using the part of the certificate shown in green, than use Bob's public key to decrypt the part shown in red under Bob's identifier.  If the computed hash code and the decrypted hash code match, we can have confidence that the part of the certificate in green hasn't been tampered with, and that the public key given really does belong to Alice.  How much confidence we have depends upon how much we trust Bob.  If we don't know or trust Bob, perhaps we know Charlie or someone else who has signed Alice's key.  The subject of trust was discussed in Using Encrypted Email.

A digital certificate can be signed by a number of individuals, creating a web of trust, or by a single trusted organization, called a certificate authority.  OpenPGP uses the web of trust model.

A Note on the Strength of Cryptography

Modern cryptosystems are very difficult to break, but probably not impossible to break.  If you used a 4,096-bit key size when you generated your key pair, cracking your encryption would take dozens of years using the best algorithms and fastest computers available.  Of course, Moore's law means that computers are getting faster, and in a dozen years, 4,096 bits may not be enough.  For now, even with a 2,048 bit key size, you are probably safe from any person or agency except possibly the NSA.

The weak points are the private key, stored on your computer and the passphrase, stored in your head. If an adversary can get both of those, your privacy is toast.  Without them , your privacy is protected against casual snoopers, and even persistent snoops like reporters.  You are in more danger from something like this than you are from a computational attack. Your private key backup is another weak point.  Guard it carefully.


While we're talking about the strength of cryptography it's time for a warning: don't roll your own.  While it might seem very cool to design and implement your own cryptosystem, it turns out to be surprisingly hard to do right.  Unless you have the equivalent of a Ph.D. in mathematics with an emphasis on cryptology, it's an extremely bad idea to trust important information to home brewed crypto.  Experiment all you want, but when you're serious about protecting information, use cryptosystems developed by experts, examined by other experts, and that have withstood the test of time.

Cryptography and the Law

There's a different question when it comes to law enforcement: can you be compelled to decrypt documents?  Even if law enforcement agencies cannot crack your encryption, they may be able to get a court to order you to decrypt documents yourself or to surrender your passphrase.  If you refuse, you might be held in jail until you change your mind.  Whether the Fifth Amendment protection against self-incrimination means you do not have to reveal your passphrase or the contents of encrypted documents has not been settled as of summer, 2013.  Different courts have ruled differently.  There is also a distinction at law between being forced to divulge your passphrase and being forced to produce documents in plain text (decrypted) form.

I'm not a lawyer, and cannot give legal advice.  Good, common sense advice is not to do illegal things nor possess illegal materials.  Encryption may delay an investigation, but it is unlikely to save you from the consequences of illegal actions.

With that said, I cannot emphasize too strongly that, in the United States, there is nothing illegal about using encryption. 


Previous article: Using Encrypted Email

Copyright © 2013 by Bob Brown

Creative Commons License
A Little About Encryption by Bob Brown is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Some of the symbols used in some of the illustrations were borrowed from a document on the subject of public key cryptography by Microsoft. Thanks, Microsoft!  The XKCD cartoons are used under the terms of the Creative Commons Attribution-NonCommercial 2.5 License.The quotation by Ken White is used by permission.