Useful Notes / Encryption
In cryptography, encryption is the process of transforming information (the plaintext
) using an algorithm (the cipher
) and a secret (the key
) into something unreadable to anyone except those possessing the key (the ciphertext
What it comes down to is that information is scrambled and that some other person can only unscramble it if he/she knows the key used to scramble it. Cracking the encryption is figuring out how the information was scrambled. Encryption can be done both by hand and with a computer. Note that doing encryption by hand has some limitations, it is impossible to encrypt large amounts of information due to the time it would take and one could not use every encryption, because some calculations might be too difficult to do by hand.
Symmetric vs. asymmetric encryption
There are two main forms of encryption today: symmetric and asymmetric.
encryption is the classic form of encryption as is known today. The plaintext is encoded into ciphertext using a secret key; the recipient, to decode the message, must know the secret key that was used to encode it. The name refers to the fact that encryption and decryption are inverse functions and both use the same key to work (thus symmetrical). This was the first form devised — primitive versions (simple substitutions of one letter for another, or shift-by-x codes like the Caesar Cipher
) go back to ancient times, with increasingly sophisticated variations being developed as the arts of codemaking and codebreaking advanced
. Eventually, the algorithms became so complex that machines (such as the Enigma device used in World War II
) were required to encrypt and decrypt messages with reasonable speed and accuracy.
encryption is a newer form of encryption, devised in The '70s
; in this form, the key used to encrypt the message and the one used to decrypt it are not the same. In an asymmetric cipher, each party has a pair
of keys: a public
key and a private
key. If Alice wants to send Bob a message, she uses Bob's public key to encrypt the plaintext, and Bob uses his private key to decrypt it. Public keys, as the name indicates, are not required to be secret; private keys are. In short, encryption and decryption are not inverse to each other in asymmetric encryption schemes, hence the name.
The advantage of asymmetric encryption is that there is no need for the sender and recipient to know a shared secret key. Suppose you wanted to send an encrypted message to somebody, and you tried to do so using a symmetric cipher. How would you send them the secret key if you're concerned that somebody might eavesdrop? To send them the key, you need to use a special, secure channel that is resistant to eavesdropping—for example, an in-person meeting.
Another advantage of asymmetric encryption is that it can be used in reverse, encrypting a message with your private key to create a ciphertext that anyone can decrypt using your public key. This gives the message a digital signature that proves that the private key owner wrote it, because it was encrypted with a key nobody else knows. The two processes can be combined, so that Alice can send Bob a message that nobody else could have written and nobody else can read.
The biggest practical disadvantage of asymmetric encryption is that you need to "trust" that what you think is the recipient's public key really
is theirs, and that their private key has not been disclosed — I can pretend to be the President of the United States and send you a public key, and if you mistakenly believe me, you might unwittingly send your top secret messages to me instead of the President and accept my digitally signed messages as if they came from the President. The normal method to verify this is for a third-party public key repository to digitally sign and store usernames and public keys. This does require a third party trusted to be impartial and an accurate record-keeper. Another alternative is a "web of trust", in which people sign each other's keys, so that (for example) Alice can verify that Carol and Dave have signed Bob's key, vouching for the fact that it actually belongs to Bob. Alice then decides whether or not to take their word for it (much as she would if Carol and Dave were vouching for Bob in person).
Another disadvantage of asymmetric encryption is that it is more computationally expensive than symmetric systems. Most secure encrypted channel schemes get around this problem by using the asymmetric encryption solely to transmit a randomly generated one-time symmetric encryption key, then switch to symmetric encryption for the bulk of the transaction.
Security certificates — the bits of bits that tell us that individuals online are who they say they are — use the above described digital-signature technique to generate a "seal of approval" that can be read by everyone, but only manufactured by the issuing authority.
The one-time pad
is a special kind of cipher that is completely unbreakable if used correctly—but very weak if used incorrectly, and also very impractical. The trick is that the secret key must be as long as the plaintext, must be completely random, and must never ever be reused.
The reason one-time pads are unbreakable is that for any conceivable plaintext, there exists a possible key that would produce that plaintext from the encrypted message. This means that if you try to guess what the key is, there are exponentially many more false positives than the real message, and no way to tell a false positive from a true positive.
But if the users of a one-time pad get sloppy and reuse a key for more than one message, it becomes trivial to break. If the keys are not truly randomly generated, it can be broken too. A number of historical codebreaking successes resulted because somebody tried to use one-time pads but either reused the keys or generated them in a non-random fashion.
Then there is also the problem of communicating the keys, which is even harder than in the normal case because (a) you need as many keys as messages, (b) the keys are as long as the messages.
While using one-time pads to protect a whole conversation is cumbersome, a related idea of throw-away cipher input is the cryptographic nonce (N
umber used once
). This is often used in challenge-response type authentication. With this, Alice gives Bob a nonce and asks Bob for a password. Bob encrypts the response with the nonce and sends it over to Alice. Alice encrypts the password with the nonce generated and if it matches Bob's response, then she knows it's Bob. Every time Alice asks for a password, she gives Bob a new nonce.
This is to prevent someone from using Bob's old responses ("replay attack"). Therefore, throwing the nonce away each time is the whole point; and because the password is short, the nonce doesn't have to be very long. The nonce doesn't even have to be kept secret.
However, there's a trick to this. Eve, an eavesdropper, can pretend to be Alice and ask Bob, Charlie, and others using the same nonce. From their responses, Eve can try figure out their passwords just by brute forcing, or using a dictionary of known passwords. To combat this, Bob can also send a nonce with the encrypted response (encrypted with both nonces), which Alice can use on her end to see if Bob is legitimate. Since Alice and Bob are both using nonces, Eve will have a harder time figuring out the password.
Cryptographic hash functions
Suppose you want to store sensitive information for a challenge-response, like a password to an account. Obviously storing passwords in a database in plaintext is highly insecure. You could encrypt the password and store the encrypted version, but this presents several issues:
- You need a key to encrypt something, and that key must be stored in a place where the database can access it.
- If someone dumps the database of the encrypted passwords, you can easily see multiple people have used the same password.
- Most encryption methods output varying sizes based on the input. AES-128 for example, outputs 16 bytes for every set of 16 characters you input. An attacker could deduce how many characters the plaintext has.
This is where the cryptographic hash function comes into play. A hash function takes the bytes that make up the data, adds them together in a convoluted manner, and spits out a number of fixed size called a digest. That is, for any
input, the function will spit out a value that has the same number of bytes. For a hash function to be suitable for cryptography, it must have a reasonably large digest size. A larger digest size decreases the chances of two inputs producing the same output, called a collision. Another characteristic is that for a small change in input, the function must create an output so that no obvious relationship can be made between the two.
So to store the passwords of user accounts, their password is sent through a cryptographic hash function and the result is what's stored in the database. Every time they enter the password, it's run through the same hash function and the output is compared to what's in the database. This way, even if an attacker steals the database, all they get for passwords are random values that can't be used to figure out what the actual password is.
However, the hash function itself is not enough. If you input the same thing in a cryptographic hash function, it'll spit out the same output. To combat this, salt
is added. Salt is a random value that's included with the input to a hash function. This way, in the password storing database, even if two or more people use the same password, their hashes will be different due to the salt. Running the password through a cryptographic hash function plus salt solves the three issues noted before with encryption:
- No key is necessary.
- If someone dumps the database of hashed passwords, you can't tell if people used the same password.
- The outputs are of same size, so you can't tell if someone used a 6-letter password or a 100-letter one.
Aside for storing passwords securely, a superset of that use is that cryptographic hash functions are used to verify the integrity of any given piece of data. For example, when downloading files off the internet, you want to make sure that the files weren't tampered with. The provider of the file can say say what the hash is and then you can verify it by running it through the same hash function. It would be extremely
difficult for an attacker to change the file in such a non-obvious way to have the hash function spit out the same hash value. Hashing is also used in cryptocurrency
to maintain the integrity of the block chains.
The act of analyzing the cipher and the ciphertext in order to retrieve the original plaintext. It is not true that any ciphertext can be cracked. Using a wrong key can sometimes result in a valid-looking plaintext that is in fact not the correct plain text (one-time pads work this way).
To recover a plaintext from a ciphertext, the key and the algorithm used are required. Having only the ciphertext is the hardest problem: the cryptanalist must guess both the algorithm and the key. This is called a ciphertext-only attack and it requires the experience and the intuition of the analyst, knowledge of the circumstances, the sender, the receiver, current events, etc... While statistical analysis of the ciphertexts could provide information about the algorithm, it requires plenty of ciphertexts or it doesn't give any meaningful information. With modern encryption algorithms, ciphertext-only cryptanalysis is basically impossible no matter how much data you have.
If the algorithm is known, the recovery can be easier: only the key (usually a password, though other things can be considered as keys) is required. When evaluating the security of an encryption system, it is prudent to assume that the attacker knows the algorithm (a dictum known as Kerckhoffs's principle
, named after cryptographer Auguste Kerckhoffs).
The simplest method of cracking a password is known as brute force: trying every possible password. The problem with this is that it can take a very long time to find the right password. The number of possibilities for a password increases with every character added to the length of the password and every character added to the range of options. For example, if you wanted to to find a password that was six (uppercase only) letters long, you might have to try 26^6 = 308,915,776 possible passwords. At the rate of a thousand guesses per second, it would take three and a half days to run through the list. Trying every seven-letter password at the same rate would take three months. If, instead of uppercase letters only, the passwords use lowercase letters, uppercase letters, and digits (26 + 26 + 10 = 62 options for each character), a six-character password requires 1.8 years to exhaustively search at this rate, and a seven-character one requires 111.5 years.
The problem for the user is that memorizing a truly random string of characters is very difficult. It's easier to use actual words as passwords. However, this is more vulnerable to brute-force attack: the number of words in the dictionary is much
smaller than the number of random combinations of characters. Using odd spelling (such as "leetspeak" substitutions of other characters for letters) and using unusual words makes a dictionary attack more difficult; however, sophisticated attackers will use an exhaustive vocabulary and try a range of variations for each word.
It is possible to combine randomness and easy memorization using tricks such as remembering a phrase and using the first letter of each word (e.g. "This website will ruin your life" becomes "Twwryl"). Another option is to use a password manager program to store an encrypted database of passwords; the user then only needs to remember one master password to access all the others.
Of course, if the encryption algorithm itself is weak, even an unguessable password won't help you. Cryptographers consider an algorithm broken if there is a way to figure out the key faster than brute forcing it. Sometimes, this is only of theoretical interest (for example, even with the speedup it would still take longer than the age of the universe). Other times the algorithm is so broken that the key can be recovered quickly and easily. There are a large variety of attack techniques using advanced math, and new cryptosystems are expected to show evidence of resistance to them. If after years of analysis by expert cryptographers there aren't any practical attacks discovered, then it's considered probably secure. That little code you created yourself, however, doesn't stand a chance.
As mentioned above, the key doesn't have to be a password. For example, in Cryptonomicon
, two people communicate using the "Solitaire cypher". The cypher uses a deck of cards; their initial arrangement is the key leaving 54! (54 factorial, 54×53×52×...×2×1 = about 2.3 times 10 to the 71st power) possible keys and no dictionary to use.
The knowledge of the plaintext or parts of the plaintext (so-called "cribs") can make a cryptanalysis problem exponentially easier. The plaintext - or parts of - could be acquired by old-fashioned spying or, more inventively, by feeding the mole
. This is called a known plaintext attack
And then (as the xkcd comic at the top of the page illustrates) there's the age-old standby of rubber hose cryptanalysis
the key out of a holder. (The name comes from the rather vivid image of the keyholder being beaten across their bare feet with a rubber hose). This does not have a direct counter, but many applications (such as VeraCrypt
) allow a defense based on plausible deniability
for an encrypted volume to decrypt to a 'decoy', which hides a second
encrypted volume with a different key. Thus, someone coerced into giving up a key can reveal one secret while hiding a bigger one
. The interrogator may suspect the presence of a hidden inner volume, but its existence can not be proved or disproved.note
Of course, no encryption could protect you from stupidity. If you ever find yourself in a situation where the secret service is digging through your trash and anything you say might spell your doom if it ever gets in the wrong hands (because, be honest, who doesn't
get into situations like this?), remember the following:
- Use good passwords. Single words that can be easily guessed will easily fold under a dictionary attack, and short passwords are relatively easy to brute-force. There are lots of resources regarding strong password generation on the web.Why?
- Keep the keys secret! This is pretty obvious, if someone knows the key, your encryption is fucked.Why?
- Choose the algorithm carefully! Don't use any algorithm that has been cracked (such as the Enigma)! And whatever you do, NEVER make up your own encryption. For that matter, try to avoid writing your own code to implement existing cryptosystems too, and use existing protocols and libraries as much as possible. Encryption is notoriously difficult to get right, and you almost certainly won't.Why?
- Be weary of tells, habits, and other repeated phrases you use. What allowed code breakers to defeat Enigma (among other things) was that the German military always sent the same type of message at specific times and ended each message the same.Why?
See Hollywood Encryption
for the usual treatment of cryptography in fiction (which generally involves a lot less detailed analysis and a lot more technobabble
Wait, they are after me... IBUDHRYKPSSRCGCSXDHGRECTRHNZMFZUMLPOAPUNPBXHJFIIMKQMQDLPRVEXYUXKOKJJATCNHTTJOLPBXCEYNYITDZWFHXHJ