In cryptography, encryption is the process of transforming information (the plaintext) using an algorithm (the cipher) and a secret (the key) into something unreadable to anyone except those possessing the key (the ciphertext).
What it comes down to is that information is scrambled and that some other person can only unscramble it if he/she knows the key used to scramble it. Cracking the encryption is figuring out how the information was scrambled. Encryption can be done both by hand and with a computer. Note that doing encryption by hand has some limitations; for instance, it is impossible to encrypt large amounts of information by hand due to the time it would take.
Symmetric vs. asymmetric encryptionThere are two main forms of encryption today: symmetric and asymmetric.
Symmetric encryption is the classic form of encryption as is known today. The plaintext is encoded into ciphertext using a secret key; the recipient, to decode the message, must know the secret key that was used to encode it. The name refers to the fact that encryption and decryption are inverse functions and both use the same key to work (thus symmetrical). This was the first form devised — primitive versions (simple substitutions of one letter for another, or shift-by-x codes like the Caesar Cipher) go back to ancient times, with increasingly sophisticated variations being developed as the arts of codemaking and codebreaking advanced. Eventually, the algorithms became so complex that machines (such as the Enigma device used in World War II) were required to encrypt and decrypt messages with reasonable speed and accuracy.
Asymmetric encryption is a newer form of encryption, devised in The '70s; in this form, the key used to encrypt the message and the one used to decrypt it are not the same. In an asymmetric cipher, each party has a pair of keys: a public key and a private key. If Alice wants to send Bob a message, she uses Bob's public key to encrypt the plaintext, and Bob uses his private key to decrypt it. Public keys, as the name indicates, are not required to be secret; private keys are. In short, encryption and decryption are not inverse to each other in asymmetric encryption schemes, hence the name.
The advantage of asymmetric encryption is that there is no need for the sender and recipient to know a shared secret key. Suppose you wanted to send an encrypted message to somebody, and you tried to do so using a symmetric cipher. How would you send them the secret key if you're concerned that somebody might eavesdrop? To send them the key, you need to use a special, secure channel that is resistant to eavesdroppingfor example, an in-person meeting.
Another advantage of asymmetric encryption is that it can be used in reverse, encrypting a message with your private key to create a ciphertext that anyone can decrypt using your public key. This gives the message a digital signature that proves that only the private key owner could have written it, because it was encrypted with a key nobody else knows. The two processes can be combined, so that Alice can send Bob a message that nobody else could have written and nobody else can read.
The biggest practical disadvantage of asymmetric encryption is that you need to "trust" that what you think is the recipient's public key really is theirs, and that their private key has not been disclosed — we can pretend to be the President of the United States and send you a public key, and if you mistakenly believe us, you might unwittingly send your top secret messages to us instead of the President and accept our digitally signed messages as if they came from the President. The normal method to verify this is for a third-party public key repository to digitally sign and store usernames and public keys. This does require a third party trusted to be impartial and an accurate record-keeper. Another alternative is a "web of trust", in which people sign each other's keys, so that (for example) Alice can verify that Carol and Dave have signed Bob's key, vouching for the fact that it actually belongs to Bob. Alice then decides whether or not to take their word for it (much as she would if Carol and Dave were vouching for Bob in-person).
Another disadvantage of asymmetric encryption is that it is more computationally expensive than symmetric systems. Most secure encrypted channel schemes use a hybrid of asymmetric and symmetric encryption to get around this problem: the asymmetric encryption is used solely to transmit a randomly generated one-time symmetric encryption key, which is then used in symmetric encryption for the bulk of the transaction proper.
Security certificates — the bits of bits that tell us that individuals online are in fact who they say they are — use the above described digital-signature technique to generate a "seal of approval" that can be read by everyone, but only manufactured by the issuing authority.
One-time padThe one-time pad is a special kind of cipher that is completely unbreakable if used correctly—BUT, is very weak if used incorrectly, and also very impractical. The trick is that the key must be at least as long as the plaintext, must be completely random, and must never, ever be reused.
The reason one-time pads are unbreakable is that for any conceivable plaintext, there exists a possible key that would produce that plaintext from the encrypted message. This means that if you try to guess what the key is, there are exponentially many more false positives than the real message, and no way to tell a false positive from a true positive.
But if the users of a one-time pad get sloppy and reuse a key for more than one message, it becomes trivial to break. If the keys are not truly randomly generated, it can be broken, too. A number of historical codebreaking successes resulted because somebody tried to use one-time pads but either reused the keys or generated them in a non-random fashion.
Then there is also the problem of communicating the keys, which is even harder than in the normal case because: (a) you need as many keys as you have messages, and (b) the keys are at least as long as the messages.
While using one-time pads to protect a whole conversation is cumbersome, a related idea of throw-away cipher input is the cryptographic nonce (number used once). This is often used in challenge/response-type authentication. In this method, Alice gives Bob a nonce and asks for a password back. Bob uses a hashing function (a type of one-way encryption function) on his password and the nonce and sends it over to Alice. Alice applies the same hashing function with the password with the nonce generated and if it matches Bob's response, then she knows it's Bob. Every time Alice asks for a password, she gives Bob a new nonce. This is to prevent someone from using Bob's old responses (known as a "replay attack"). Therefore, throwing the nonce away each time is the whole point; and because the password is short, the nonce doesn't have to be very long. The nonce doesn't even have to be kept secret.
However, there's a flaw to this. Eve, an eavesdropper, can pretend to be Alice to ask Bob, Charlie, and others for their passwords using the same nonce. From their responses, Eve can try figure out their passwords by brute forcing or using a dictionary of known passwords. To combat this, Bob can also generate a nonce then run the hashing function with his nonce, Alice's nonce, and the password. Then Bob sends his nonce along with the hash function output, and Alice will use Bob's nonce along with her's to generate what should be expected. Since Alice and Bob are both using nonces, Eve will have a harder time figuring out the password.
Another use of the one-time pad are for two-factor authentication systems. This can be either in the form of a generated throw-away key similar to a nonce, such as when a login system sends you a code to type in, or in the form of a continuously changing key, such as the token fobs or authentication apps found on smartphones.
Cryptographic hash functions (One-way Encryption)Suppose you want to store sensitive information for a challenge/response, like a password to an account. Obviously, storing passwords in a database in plaintext is highly insecure. You could encrypt the password and store the encrypted version, but this presents several issues:
- You need a key to encrypt something, and that key must be stored in a place where the database can access it.
- If someone dumps the database of the encrypted passwords, you can easily see when multiple people have used the same password.
- Most encryption methods output varying sizes based on the input. AES-128 for example, outputs 16 bytes for every set of 16 characters you input. An attacker could use this to deduce how many characters the plaintext has.
This is where the cryptographic hash function comes into play. A hash function takes the bytes that make up the data, adds them together in a convoluted manner, and spits out a number of fixed size called a digest. That is, for any input, the function will spit out a value that has the same number of bytes. Hashing, as it's called, is one-way, hence "one-way encryption". That is, theoretically, you cannot map a digest back to its original content because a digest has an infinite number of things that could map to it. This characteristic obviously has the problem that there's an infinite number of things that can map to the same digest, called collisions. So for a hash function to be suitable for cryptography, it must have a reasonably large digest size to make the chance of a collision prohibitively small for someone to guess via brute force. Another characteristic is that for a small change in input, the function must create an output so that no obvious relationship can be made between the outputs for a given change.
However, the hash function itself is not enough. If you input the same thing in a hash function, it'll spit out the same output. To combat this, salt is added. Salt is a random value that's included with the input to a hash function and is also stored with the user account. note This way, in the password storing database, even if two or more people use the same password, their hashes will be different due to the salt.
Running the password with salt through a cryptographic hash function solves the three issues noted before with encryption:
- No key is necessary.
- If someone dumps the database of hashed passwords, you can't tell if people used the same password.
- The outputs are of same size, so you can't tell if someone used a 6-letter password or a 100-letter one.
Aside for storing passwords securely, a superset of that use is that cryptographic hash functions are used to verify the integrity of any given piece of data. For example, when downloading files off the internet, you want to make sure that the files weren't tampered with. The provider of the file can say what the hash digest is supposed to be and then you can verify it by running it through the same hash function. It would be extremely difficult for an attacker to change the file in such a non-obvious way to have the hash function spit out the same hash value. Hashing is also used in cryptocurrency to maintain the integrity of the block chains.
Homomorphic encryptionA problem with encryption is that the encrypted data cannot be changed, otherwise decrypting it will result in garbage. But what if you can make modifications to the encrypted data and when you decrypt, those changes carry over? That is, say Alice wants to know the answer to 2 + 5 but doesn't want Bob to know she's using 2. She encrypts 2 as 5, Bob adds 5 to the encrypted value, and then she decrypts the value to get 7. This is the idea behind homomorphic encryption: allowing people to change the data, making it appear the changes to encrypted data are still valid, and keep those changes upon decryption, but not know what the data actually is.
Like encryption in general, homomorphic encryption starts with a key to encrypt and decrypt the data. However, instead of hiding the information by swapping bits around, it encodes the data by using a math function. For example, let's say Alice has an array of numbers: 9, 0, 2, 6, 7, 2, 8, 1, 6, 8. She can apply a simple "add 2" cipher to it to get 11, 2, 4, 8, 9 ,4, 10, 3, 8, 10. Then she asks Bob to add these numbers together. Bob does so and gives Alice the answer "69". As far as Bob knows, Alice just asked him to add a bunch of numbers but he doesn't know what Alice actually wanted. Alice can apply the opposite of the "add 2" cipher, achieving the real answer of 49. Of course, this is a highly simplified example; more complicated functions are used in practice. Also, for certain schemes not every math operation can be used on the encrypted data. If that's the case, then the encryption scheme is known as Partial Homomorphic Encryption (PHE). If any math operation can be done, it's known as Fully Homomorphic Encryption (FHE)
The act of analyzing the cipher and the ciphertext in order to retrieve the original plaintext. It is not true that any ciphertext can be cracked. Using a wrong key can sometimes result in a valid-looking plaintext that is in fact not the correct plaintext (one-time pads are all about this).
To recover a plaintext from a ciphertext, the key and the algorithm used are required. Having only the ciphertext is the hardest problem: the cryptanalyst must guess both the algorithm and the key. This is called a "ciphertext-only attack" and it requires the experience and the intuition of the analyst, knowledge of the circumstances, the sender, the receiver, current events, etc... While statistical analysis of the ciphertexts could provide information about the algorithm, it requires plenty of ciphertexts or it doesn't give any meaningful information. With modern encryption algorithms, ciphertext-only cryptanalysis is basically impossible no matter how much data you have.
If the algorithm is known, the recovery can be easier: only the key (usually a password, though other things can be considered as keys) is required. When evaluating the security of an encryption system, it is prudent to assume that the attacker knows the algorithm (a dictum known as Kerckhoffs's principle, named after cryptographer Auguste Kerckhoffs).
The simplest method of cracking a password is known as "brute force": trying every possible password. The problem with this is that it can take a very long time to find the right password. The number of possibilities for a password increases with every character added to the length of the password and every character added to the range of options. For example, if you wanted to to find a password that was six (uppercase only) letters long, you might have to try 266 = 308,915,776 possible passwords. At the rate of a thousand guesses per second, it would take three and a half days to run through the list. Trying every seven-letter password at the same rate would take three months. If, instead of uppercase letters only, the passwords use lowercase letters, uppercase letters, and digits (26 + 26 + 10 = 62 options for each character), a six-character password requires 1.8 years to exhaustively search at this rate, and a seven-character one requires 111.5 years.
The problem for the user is that memorizing a truly random string of characters is very difficult. It's easier to use actual words as passwords. However, this is more vulnerable to brute-force attack: the number of words in the dictionary is much smaller than the number of random combinations of characters. Using odd spelling (such as "leetspeak" substitutions of other characters for letters) and using unusual words makes a dictionary attack more difficult; however, sophisticated attackers will use an exhaustive vocabulary and try a range of variations for each word.
It is possible to combine randomness and easy memorization using tricks such as remembering a phrase and using the first letter of each word (e.g. "This website will ruin your life" becomes "Twwryl"). Another option is to use a password manager program to store an encrypted database of passwords; the user then only needs to remember one master password to access all the others.
Of course, if the encryption algorithm itself is weak, even an unguessable password won't help you. Cryptographers consider an algorithm broken if there is a way to figure out the key faster than brute forcing it. Sometimes, this is only of theoretical interest (like if, say, it would still take longer than the age of the universe, with or without the faster speed). Other times, the algorithm is so broken and/or outdated that the key can be recovered quickly and easily (as was the case with the DES cipher, which was designed in The '70s and proved unable to keep up with the rise in computing power by the late '90s, which forced people to resort to the more expensive Triple DES while also spurring calls for a replacement that eventually gave us the current AES standard). There are a large variety of attack techniques using advanced math, and new cryptosystems are expected to show evidence of resistance to them. If, after years of analysis by expert cryptographers, there aren't any practical attacks discovered, then it's considered probably secure. That little code you created yourself, however, doesn't stand a chance.
As mentioned above, the key doesn't have to be a password. For example, in Cryptonomicon, two people communicate using the "Solitaire cypher". The cypher uses a deck of cards; their initial arrangement is the key leaving 54! (54 factorial, 54×53×52×...×2×1 = about 2.3 × 1071) possible keys and no dictionary to use.
The knowledge of the plaintext or parts of the plaintext (so-called "cribs") can make a cryptanalysis problem exponentially easier. The plaintext - or parts of it - could be acquired by old-fashioned spying or, more inventively, by feeding the mole. This is called a "known plaintext attack".
And then (as the xkcd comic at the top of the page illustrates) there's the age-old standby of rubber-hose cryptanalysis — beating/torturing the key out of a holder. (The name comes from the rather vivid image of the keyholder being beaten across their bare feet with a rubber hose). This does not have a direct counter, but many applications (such as VeraCrypt) allow a defense based on plausible deniability for an encrypted volume to decrypt to a 'decoy', which hides a second encrypted volume with a different key. Thus, someone coerced into giving up a key can reveal one secret while hiding a bigger one. The interrogator may suspect the presence of a hidden inner volume, but its existence can not be proved or disproved.note
Of course, no encryption can protect you from stupidity. If you ever find yourself in a situation where the Secret Service is digging through your trash and anything you say might spell your doom if it ever gets in the wrong hands (because, be honest, who doesn't get into situations like this?), remember the following:
- Use good passwords. Single words that can be easily guessed will easily fold under a dictionary attack, and short passwords are relatively easy to brute-force. There are lots of resources regarding strong password generation on the web.Why?
- Keep the keys secret! This is pretty obvious; if someone knows the key, your encryption is fucked.Why?
- Choose the algorithm carefully! Don't use any algorithm that has been cracked (such as the Enigma)!
- On the developer side, NIST and OWASP regularly publish lists of algorithms recommended for application usage.Which? Algorithms like DES, MD5 and SHA-1 are already declared insecure due to brute-force attacks on them being feasible, so don't use them unless you have to provide compatibility with legacy systems.
- On the user side, if you need to protect your documents, use properly implemented, well-backed products like LUKS, Veracrypt, BitLocker, and any decent OpenPGP implementation.
- If you're a developer, whatever you do, NEVER make up your own encryption. For that matter, try to avoid writing your own code to implement existing cryptosystems, too, and use existing protocols and libraries as much as possible. Encryption is notoriously difficult to get right, and you almost certainly won't.Why?
- Be weary of tells, habits, and other repeated phrases you use. What allowed code breakers to defeat Enigma (among other things) was that the German military always sent the same type of message at specific times and ended each message the same.Why?
Wait, they are after me... IBUDHRYKPSSRCGCSXDHGRECTRHNZMFZUMLPOAPUNPBXHJFIIMKQMQDLPRVEXYUXKOKJJATCNHTTJOLPBXCEYNYITDZWFHXHJ