In an era where security breaches seem to be regularly making the news, encryption is a very important topic to understand. It helps protect your data, your interactions, and your access even when attackers make end-runs around software defenses. It’s critical to use properly because, in a public network, there are still opportunities for data to leak out, even if your software is standing guard effectively.
But it’s not necessarily an easy topic; there is no magic wand you can wave to encrypt your data and effortlessly realize gains. Some recent high-profile stories have shown that even software developers don’t necessarily understand this topic well enough to make good decisions on how to best use encryption to protect their users.
In this post, I’ll walk you through the three most important types of encryption that protect users today, tell you why we need each one, and show you how each one addresses specific aspects of keeping systems secure.
Symmetric encryption’s job is to take readable data (“plaintext” in crypto parlance), scramble it to make it unreadable (protecting it from prying eyes while it’s being stored on a disk or transmitted over a network), then unscramble it again when it’s needed. It’s generally fast, and there are lots of good encryption methods to choose from. The most important thing to remember about symmetric encryption is that both sides—the encrypter, and the decrypter—need access to the same key.
A key, for symmetric encryption purposes, is a string of data that is fed to the encrypter in order to scramble the data and make it encrypted. It’s best if this key is completely random, but there are ways to derive keys from (hopefully really good) passwords as well. The tricky part about using symmetric encryption is how to store the key and make it available only to the software that needs it.
Drawbacks of symmetric encryption
If it comes from a password, then someone needs to type that password every time the software starts up—this is the basis of how disk encryption on personal computers, like Mac OS X’s FileVault 2, works.
If you have to store the key on a disk or a device (e.g. in an app), or if you transmit it unprotected over a network, then once an attacker gains access to that key, your encryption is useless.
It’s also important to remember that even if your data is encrypted, software needs access to the unencrypted data to do its job. This means that if the software or platform itself is compromised, the encryption once again becomes useless. The only way you can effectively protect against this is to design your services in such a way that data is encrypted when it leaves the user’s computer, leaving the key exclusively in the user’s possession and storing only unreadable encrypted data. But that, of course, reduces the usefulness of many systems that may need to read the unencrypted data to function.
Uses of symmetric encryption
Symmetric encryption is best used:
- In services that store encrypted data on behalf of a user (like cloud backup services) when those services leave the decryption key in the hands of the user
- To encrypt computer or device storage (One particularly neat property of a well-encrypted device is that it can be really quickly erased: just make sure the key is destroyed. The resulting encrypted data still stored on the device is then useless to anyone.)
- To create a secure channel between two network endpoints, provided there’s a separate scheme for securely exchanging the key.
Properly used, it’s very valuable, but the key needs to be protected even while it’s being shared among the parties that legitimately need it.
Asymmetric encryption also takes readable data, scrambles it, and unscrambles it again at the other end, but there’s a twist: a different key is used for each end. Encrypters use a public key to scramble the data, and decrypters use the matching private (secret) key on the other end to unscramble it again.
The public key is just that, public; it can and should be published. (This is why asymmetric encryption is also often referred to as public-key cryptography.) But the private key must be kept private, protected much like the key for symmetric encryption. The good news is that this is easier, since only one party ever needs access to it: the party that needs to decrypt the messages.
Some (but not all!) asymmetric encryption systems have one additional important capability: the ability to cryptographically sign data. In this system, the private key is used to make the signature, and the public key is used to verify it. You can thus prove, if you have data with a signature and the matching public key, that it was signed with the private key.
Problems with asymmetric encryption
This all sounds too good to be true, right? Well, there are, of course, caveats. The biggest issue with public-key cryptography is making sure you can trust the public key you have. A man-in-the-middle attack is a common way to compromise asymmetric encryption: you are given a public key to use to securely communicate with someone or some service, and dutifully use it, thinking you’re protected. In fact, through network trickery, you’re communicating with another party entirely, who is sitting between you and the other end.
This party gives you their own public key, and gives the other end another public key, pretending it is yours. They can thus decrypt your data meant for the other end, re-encrypting with the real public key it before sending it on, and use a similar process in the other direction to gain full access to the unencrypted data. Protecting against this is accomplished by making sure we always have the right public keys, either by distributing them in trusted software, or by having entities we already trust cryptographically sign new keys we need to use. This is why you need to get a certificate for your HTTPS site from a certificate authority—web browsers trust these authorities to sign keys, allowing websites to send signed public keys on to the browsers that they can then trust to secure the connection.
Uses of asymmetric encryption
Asymmetric encryption is pervasive on the Internet; in fact, it’s not a stretch to say the Internet wouldn’t work securely without it. For example:
- It’s used with TLS (née SSL) to secure connections between browser and website as well as other network services.
- It’s used with SSH to secure login sessions to remote servers as well as authorize (through the use of signing) users without using passwords.
- It’s used to sign software updates so that computers and devices can know that they’re getting code that originated from a trusted party.
- It’s also possible to use asymmetric encryption for email with systems like OpenPGP or S/MIME, but regrettably, this happens very rarely because the software is often difficult to use.
Hashing is what is actually happening when you hear about passwords being “encrypted”. Strictly speaking, hashing is not a form of encryption, though it does use cryptography. Hashing takes data and creates a hash out of it, a string of data with three important properties:
- the same data will always produce the same hash
- it’s impossible to reverse it back to the original data
- given knowledge of only the hash, it’s infeasible to create another string of data that will create the same hash (called a “collision” in crypto parlance)
Uses of hashing
The most important use of hashing is, of course, protecting passwords. If a system stores a password hash instead of a password, it can check an incoming password by hashing that and seeing if the hashes match. It’s not possible to use the hash to authenticate. The system increases its security by only knowing the password in the brief moments it needs to when changing it or verifying it.
Another common use of a hash is to authenticate otherwise clearly-transmitted data using a shared secret (effectively, a key.) The hash is generated from the data and this secret, so that only the data and the hash are visible; the shared secret is not transmitted and it thus becomes infeasible to modify either the data or the hash without such modification being detected.
Weaknesses of hashing
I said earlier that it was impossible to reverse a hash, and that’s true. But it is possible, with access to the hashes and lots of resources (fewer if there’s an implementation weakness in the way hashing is used), to find data that hashes the same as the password—a collision—and this may even be the password itself.
This is why it’s critically important to select a good password-hashing algorithm that costs a lot to find a collision for; increasing the cost of this brute-forcing makes your hashing more resistant, buying you time after a breach or even dissuading the breacher altogether.
Don’t try to develop your own, even using a well-known hash algorithm (I’ve seen systems that just passed passwords through one of the SHA functions—a good hashing algorithm, but a fatally flawed way to use it), because the way you use it can break your security just as effectively as the choice of algorithm. Always research and use a proven, well-audited password-hashing function—good choices are bcrypt and scrypt.
The Weakest Link
The most important thing to remember in security design is that you can’t just sprinkle cryptography or security on a part of the system and make it secure. Any system is only as secure as the weakest link in the chain. Make sure you understand how security and cryptography protects your system and your users end-to-end, and you’ll be able to use it effectively.