CS 161 Notes Lec 9-10
-
Course website: https://su21.cs161.org/
-
Course textbook: https://textbook.cs161.org/
Overview
Lecture 9: Public Key Encryption and Digital Signatures
Lecture 10: Certificates, Passwords, Case Studies
Public-key cryptography
- In public-key schemes, each person has two keys.
-
Public key: known to everybody.
-
Private key: Only known to the person.
-
Public and Private keys always appear in pairs.
-
- The public key schemes are built on number theory.
- This implies that
-
message encryptions should be done in terms of decimal numbers instead of bit strings.
-
Encryption/Decryption will be slower than those symmetric-key encryption schemes that operate directly on bit strings(like XOR, shifting…)
-
- This implies that
- Benefit brought by public-key scheme:
-
No need to assume that Alice and Bob already share a secret, since public-key cryptography can communicate without a shared secret key.
-
👆The above is not saying that MITM(man in the middle) adversary is solved by public-key cryptography! It’s still an issue that we need to settle later.
-
Public-key encryption
The main idea of public-key encryption is that everybody can encrypt message with the public key, while only the man with the private key can decrypt it.
Definition
- Three parts:
-
KeyGen() → PK, SK: Generate a public/private keypair, where PK is the public key, and SK is the private (secret) key.
-
Enc(PK, M) → C: Encrypt a plaintext M using public key PK to produce ciphertext C.
-
Dec(SK, C) → M: Decrypt a ciphertext C using secret key SK.
-
- Properties
- Correctness: Decrypting a ciphertext should result in the message that was originally encrypted.
- Dec(SK, Enc(PK, M)) = M for all PK, SK ← KeyGen() and M.
-
Efficiency: Encryption/decryption should be fast.
- Security: Similar to IND-CPA, but Alice (the challenger) just gives Eve (the adversary) the public key, and Eve doesn’t request encryptions, except for the pair _M_0, _M_1.
-
You don’t need to worry about this game (it’s called “semantic security”).
-
👆The above descriptions about security(IND-CPA and semantic security stuff) are copied from the slides, and I’m confused by them honestly :) Below are my understandings of IND-CPA and semantic security:
-
semantic security: An adversary can know all the details of an encryption process, together with the public key, but he/she will never be able to figure out which one between M0 and M1 is the plaintext(so his/her choice will always be a 1/2 random guessing).
-
IND-CPA in asymmetric-key encryption:
- According to https://en.wikipedia.org/wiki/Ciphertext_indistinguishability#IND-CPA
-
The challenger generates a key pair PK, SK based on some security parameter k, and publishes PK to the adversary. The challenger retains SK.
-
The adversary may perform a polynomially bounded number of encryptions or other operations.
-
Eventually, the adversary submits two distinct chosen plaintexts M0 and M1 to the challenger.
-
The challenger selects a bit b {0, 1} uniformly at random, and sends the challenge ciphertext C = E(PK, Mb) back to the adversary.
-
The adversary is free to perform any number of additional computations or encryptions.
-
Finally, the adversary outputs a guess for the value of b.
-
If the asymmetric-key encryption is IND-CPA secure, the probability that b is correct is 1/2 + a negligible function.
-
- In the lecture, the instructor said that “semantic security” is not the same as “IND-CPA”, but in this course we can understand them as the same.
-
-
- Correctness: Decrypting a ciphertext should result in the message that was originally encrypted.
EIGamal
Remember the cryptography roadmap?
Symmetric-key | Asymmetric-key | |
Confidentiality | 1. One-time pads 2. Block ciphers with chaining modes (e.g. AES-CBC) | 1. RSA encryption 2. ElGamal encryption |
Integrity, Authentication | 1. MACs (e.g. HMAC) | 1. Digital signatures (e.g. RSA signatures) |
EIGamal encryption is a way to do asymmetric-key encryption, which will satisfy confidentiality.
It’s actually quite similar with Diffie-Hellman key exchange.
How does EIGamal work?
-
The motivation is that Diffie-Hellman key exchange is great, but it can’t send message.
- Here’s the design of EIGamal:
- KeyGen():
- Bob generates private key b and public key B = gb mod p.
- Enc(gb, M):
-
Alice generates a random r and computes gr mod p.
-
Alice computes M × (gb)r mod p.
-
Alice sends C_1 = _gr mod p, C_2 = _M × (gb)r mod p.
-
- Dec(b, gr, M × gbr):
- Bob computes M × gbr × (gr)-b mod p to get M.
- KeyGen():
- Is EIGamal IND-CPA secure?
-
NO!! Cause there’s no randomness! The algorithm is deterministic.
-
How to fix? Add some random paddings to the message each time! And some other modification on EIGamal will also be helpful.
-
- Malleability: The adversary can very easily tamper with the message
- The adversary can manipulate C_1’ = _C_1, _C_2’ = 2 × _C_2 = 2 × _M × (gb)r to make it look like 2 × M was encrypted
RSA
It’s a popular way of asymmetric-key encryption. RSA stands for Rivest–Shamir–Adleman, these are the names of its inventors.
How does it work?
- KeyGen():
-
Randomly pick two big primes, p and q.
- Compute N = pq
- N is usually 2048 bits to 4096 bits long :)
- Choose e.
-
e must be relatively prime to (p-1)(q-1)
-
2 < e < (p-1)(q-1)
-
- Compute d = e-1 mod (p - 1)(q - 1).
- Use Extended Euclid’s algorithm
-
Public keys: N, e
- Private key: d
-
- Enc(e, N, M):
- Output Me mod N
- Dec(d, C):
- Output Cd mod n = (Me)d mod N
- The correctness of this algorithm is kind of trivial, so I’m not gonna include the proof here.
Is RSA IND-CPA secure?
-
Again, NO!! Since there’s no randomness!
-
An extreme case is to provide M0 = 0, it will definitely give 0 as ciphertext ;)
Also note that there’re some potential vulnerabilities:
- Sending the same message encrypted with different public keys also leaks information
-
m__e__a mod N__a, m__e__b mod n__b, m__e__c mod N__c
-
Small m and e leaks information
- e is usually small (~16 bits) and often constant (3, 17, 65537)
-
- Side channel: A poor implementation leaks information
-
The time it takes to decrypt a message depends on the message and the private key
-
This attack has been successfully used to break RSA encryption in OpenSSL
-
We would definitely like to add randomness to RSA encryption, since by the case of M = 0 we it’s clear that we don’t want to apply RSA encryption directly on the given plaintext. Also, randomness provides IND-CPA security and a fixed length message to encrypt will solve some side channel issues mentioned above.
So how to add randomness to RSA?
-
Padding!
-
In fact, there’s a scheme called OAEP(optimal asymmetric encryption padding).
- The main idea is to use hashing to make the text fixed length and random before performing encryption with RSA. Also, we should make sure such “padding + hashing” can is revertible for Alice and Bob given the keys for hashing functions(that sounds weird, how can you revert hashing? ;) )
- k_0 and _k_1 are constants defined in the standard, and _G and H are hash functions
- M can only be n - _k_0 - _k_1 bits long
-
G produces a (n - k_0)-bit hash, and _H produces a _k_0-bit hash
- Pad M with _k_0 0’s
- Idea: We should see 0’s here when unpadding, or else someone tampered with the message
-
Generate a random, k_1-bit string _r
-
Compute X = M 00…0 ⊕ G(r) -
Compute Y = r ⊕ H(X)
-
Result: X Y
- k_0 and _k_1 are constants defined in the standard, and _G and H are hash functions
- For unpadding:
-
Compute r = Y ⊕ H(X)
-
Compute M 00…0 = X ⊕ G(r) -
Verify that M 00…0 actually ends in _k_1 0’s - Error if not
-
Here’s an illustration of the OAEP algorithm above
Such structure is also called Feistel network. In practice, such structures can be concatenated for multiple times to make it harder to be attacked.
RSA with OAEP is called RSA-OAEP.
Hybrid encryption
- Issues with public-key encryption
-
Notice: We can only encrypt small messages because of the modulo operator(if your message is larger than the prime you use, it wraps!)
-
Notice: There is a lot of math, and computers are slow at math :)
-
Result: Asymmetric doesn’t work for large messages
-
- Hybrid encryption: Encrypt data under a randomly generated key K using symmetric encryption, and encrypt K using asymmetric encryption
- Benefit: Now we can encrypt large amounts of data quickly using symmetric encryption, and we still have the security of asymmetric encryption
Digital signatures
Remember the cryptography roadmap, we’re almost done!!
Symmetric-key | Asymmetric-key | |
Confidentiality | 1. One-time pads 2. Block ciphers with chaining modes (e.g. AES-CBC) | 1. RSA encryption 2. ElGamal encryption |
Integrity, Authentication | 1. MACs (e.g. HMAC) | 1. Digital signatures (e.g. RSA signatures) |
Digital signatures satisfy authentication, it’s basically letting everyone else with the public key to verify that some message is verified with signature(SK) of someone.
Algorithm and properties of digital signature:
- Three parts:
-
KeyGen() → PK, SK: Generate a public/private keypair, where PK is the verify (public) key, and SK is the signing (secret) key
-
Sign(SK, M) → sig: Sign the message M using the signing key SK to produce the signature sig
-
Verify(PK, M, sig) → {0, 1}: Verify the signature sig on message M using the verify key PK and output 1 if valid and 0 if invalid
-
- Properties
- Correctness: Verification should be successful for a signature generated over any message
- Verify(PK, M, Sign(SK, M)) = 1 for all PK, SK ← KeyGen() and M
-
Efficiency: Signing/verifying should be fast
- Security: EU-CPA, same as for MACs
- Correctness: Verification should be successful for a signature generated over any message
Digital signature with RSA
I’ll just copy the definition here.
- KeyGen():
- Same as RSA encryption:
-
Public key: N and e
-
Private key: d
-
- Same as RSA encryption:
- Sign(d, M):
- Compute H(M)d mod N
- Verify(e, N, M, sig):
- Verify that H(M) ≡ sige mod N
The H(M) above means hashing message M.
Even with the digital signatures introduced in lecture 9, we’re still not able to deal with the MITM attack yet! Since it’s impossible to verify the public key with the public key when you’re receiving it for the first time, that’s saying, there’s no way to give the public key a signature if you’re sending it out for the first time!
Also, note that public-key encryption has one advantage compared with Diffie-Hellman key exchange, that is public-key encryption doesn’t need Alice and Bob to be online at the same time, while Diffie-Hellman does.
So the next lecture will try to handle MITM attack, surely it’s not done by requiring Alice and Bob to be online and communicate quickly enough to avoid Mallory changing messages in the middle ;) (I came up with this crazy idea when seeing MITM for the first time).
Certificates
Recall how MITM(man in the middle) attack can be performed by Mallory.
Trust-on-first-use
We know that the key problem of MITM is the fact that when Bob’s public-key is sent to Alice for the first time, there’s no way for Alice to verify it. Suppose we can make sure that Bob’s public-key is sent securely to Alice, all the rest of MITM problem can be solved with digital signatures mentioned in lecture 9.
So trust-on-first-use is the way that try to settle this key problem with a simple idea:
- The first time Alice and Bob communicate, trust the public key that is used and warn the user if it changes in the future.
This is used in SSH and a couple of other protocols(If you have used SSH, you’ve definitely seen those warning messages when connecting to servers for the first time).
Trust-on-first-use is based on the threat model that attacks aren’t frequent, so assume that you aren’t being attacked the first time communicate :)
Certificates
If you don’t want to risk with trust-on-first-use, using certificates might be a good choice.
- Certificate: A signed endorsement of someone’s public key
- A certificate contains at least two things: The identity of the person, and the key
-
So the main idea is to ask the help from some authority who we can trust(the public key of such authority still needs to be communicated, but the trick is we hardcode those in hardware production time so that we’re sure about its security!)
- Abbreviated notation
-
Encryption under a public key PK: {“Message”}PK
-
Signing with a private key SK: {“Message”}SK-1
- Recall: A signed message must contain the message along with the signature; you can’t send the signature by itself!
-
- Scenario: Alice wants Bob’s public key. Alice trusts Charlie (PK__C, SK__C)
-
Charlie is our trust anchor
-
If we trust PKC, a certificate we would trust is {“Bob’s public key is PKB”}SKC-1
-
- Trust anchor:
-
Someone we implicitly trust.
-
The trust can come from the fact that public keys of them are hardcoded in our hardware by manufacturers.
-
The trust can also come from the fact that we have exchanged public keys with our friends offline with some absolutely secure methods.
-
The trusted directory(a way of assigning certificates)
- Idea: Make a central, trusted directory (TD) from whom you can fetch anybody’s public key.
-
The TD has a public/private keypair _PK_TD, _SK_TD.
-
The directory publishes _PK_TD so that everyone knows it (baked into computers, phones, OS, etc.)
- When you request Bob’s public key, the directory sends a certificate for Bob’s public key.
- {“Bob’s public key is PK__B”}_SK_TD-1
- If you trust the directory, then now you trust every public key from the directory.
-
- What do we have to trust?
-
We have received TD’s key correctly.
-
TD won’t sign a key without verifying the identity of the owner.
-
- Problems:
-
We have to keep this TD active all the time to handle requests from everyone.
-
If the TD fails, the whole encryption system fails(single point failure)
-
If the TD is no longer secure, we can hardly recover it, and everything based on that is no longer secure.
-
A single authority can’t handle the requests from so many people.
-
- Solutions:
- Hierarchical trust
- The roots of trust may delegate trust and signing power to other authorities
-
{“Carol Christ’s public key is _PK_CC, and I trust her to sign for UCB”}_SK_MD-1
-
{“John Canny’s public key is _PK_JC, and I trust him to sign for the CS department”}_SK_CC-1
-
{“Nick Weaver’s public key is _PK_NW (but I don’t trust him to sign for anyone else)”}_SK_JC-1
-
-
MD is still the root of trust (root certificate authority, or root CA)
-
CC and JC receive delegated trust (intermediate CAs)
- NW’s identity can be trusted
- The roots of trust may delegate trust and signing power to other authorities
- Multiple trust anchors
-
There are ~150 root CAs who are implicitly trusted by most devices
-
Public keys are hard-coded into operating systems and devices
-
- Hierarchical trust
- Some other concerns(IMPORTANT):
-
What happens if a certificate authority messes up and issues a bad certificate?
-
Here’re two possible ways:
- Expiration dates
-
Each certificate will have its expiration date, after that, clients have to ask the trust anchor for new certificates.
-
In this way, bad certificates will last for only a certain a mount of time.
-
That’s also saying, the user that’s releasing its public key to trust anchor has to update its public key once a while!
-
So there’re clearly tradeoffs between the frequency of public key renewal and the security it provides.
-
- Announcing revoked certificates
-
Periodically release a list of invalidated certificates.
-
Users must periodically down load such lists(called CRL, Certification Revocation List).
-
One of the drawback is that such lists can be huge, thus hard to send efficiently.
-
Another drawback is that if the certificate authority is unavailable, the mistake will never got recovered, unlike expiration dates which gives a relatively better(in some security sense) result when the authority fails.
-
- Expiration dates
-
A side note:
-
There’s something called “web of trust”
-
Modern public-key infrastructures are structured like trees.
- Originally, public-key infrastructures looked like graphs instead.
-
Everybody can issue certificates for anyone else.
-
Example: Alice signs Bob’s key. Bob signs Carol’s key. If Dave trusts Alice, he trusts Bob and Carol.
-
Benefit: You know the trust anchor personally (e.g. because you met them in-person, or because you signed their key).
-
Problem: Graphs get far more complex than trees!
-
- OpenPGP (Pretty Good Privacy) originally used the web of trust model.
-
Key-signing parties: meeting in-person to sign each other’s public keys.
-
It quickly proved to be a disaster.
-
- Trust anchors make public-key infrastructures much simpler!
Passwords
How should we store passwords?
-
Bad idea#1: Store them in plaintext :)
- Bad idea#2: Encrypt every user’s password before storing it
- Ok, so what if the attacker finds the way to decrypt? You’d then lose everything!
-
Main idea: We need a way to verify passwords without storing any information that would allow someone to recover the original password.
- We’ve met something that fits the job: Hashing!
-
It’s just so straightforward, we only need to hash users’ passwords and store the hash codes.
-
When verifying, we just hash the input and check if it matches the stored hash code.
-
What are the attacks for password hashing?
-
The main idea is brute force.
-
The attacker can produce a large amount of hash codes with common passwords, and see if there’re any matches in the target’s password hashing storage.
-
It’s also called dictionary attack.
-
A technique called rainbow table can help the attacker to accelerate the process of brute force.
What are the methods to defend dictionary attacks?
- Salted hashes
-
Add some unique “salt” for each user’s password before hashing them, so these salts can be stored in plaintext with the final hashing results, but make sure those salts should be randomly generated for each user.
-
In this way, the attacker has to generate rainbow tables according to each user’s salt, it brings the complexity of attack from O(M+N) to O(MN) :) here M is the number of common passwords, N is the number of users.
-
It’s basically manually enhancing users’ passwords.
-
- Slow hashes
-
Hash functions are designed to be fast, but what if we deliberately slow them down?
-
In this way, the brute force process will be slower for a constant factor of time, which might not that noticeable for common users, but significant for attackers.
-
Password-based key derivation function 2 (PBKDF2): A slow hash function
-
Setting: An underlying function that outputs random-looking bits (e.g. HMAC-SHA256).
-
Setting: The desired length of the output (n).
-
Setting: Iteration count (higher = hash is slower, lower = hash is faster).
-
Input: A password.
-
Input: A salt.
-
Output: A long, random-looking n-bit string derived from the password and salt.
-
Implementation: Basically computing HMAC 10,000 times ;)
-
-
- For online attacks, there can be rejection to frequently trying passwords, and there can also be limitation on how many times a user can input his/her password.
Case studies
Snake oil cryptography
“Snake oil” is basically something that doesn’t help much but being exaggerated by scamming advertisements.
What are some signs of snake oil in cryptography?
- Amazingly long key lengths
-
Once brute-forcing a key becomes astronomically hard, making it longer probably doesn’t provide extra security
-
The NSA is super paranoid, and even they don’t use >256-bit symmetric keys or >4,096-bit public keys
-
- New algorithms and crazy protocols
-
There is no reason to use a brand-new block cipher, hash algorithm, public-key algorithm, etc.
-
Existing protocols have been vetted by security experts for years: They’re widespread for a good reason!
-
New protocols probably means someone is trying to write their own crypto (and asking for trouble!)
-
- Fancy-sounding technical buzzwords
- Claims of inventing “new math” :)
- “One time pads”
-
Recall: One-time pads are secure if you never reuse the key
-
Recall: Secure one-time pads are highly impractical
-
Almost all schemes advertised as “one-time pads” probably aren’t true one-time pads
-
Wacky stream ciphers (often self-designed) are often advertised as “one-time pads”
-
- Rigged “cracking contests”
-
Advertising a secure scheme by challenging the public to break the scheme
-
The challenge is often “decrypt this message” with no context or structure
-
Example: Telegram offered a $300,000 prize in a contest to break their encryption, that’s probably because they’re not confident with their encryption system and they hope someone to fix it for them.
-
Here’re some links as examples listed in the slides:
-
https://arstechnica.com/information-technology/2019/08/company-accused-of-crypto-snake-oil-sues-black-hat-anonymous-detractors/
-
https://arstechnica.com/information-technology/2019/09/medicine-show-crown-sterling-demos-256-bit-rsa-key-cracking-at-private-event/
Nothing-up-my-sleeve-numbers
This is an interesting one.
- As we all know, cryptography uses a lot of constants, examples are:
-
Initial state for SHA.
-
Prime p and generator g in Diffie-Hellman.
-
Parameters for elliptic-curve cryptography (curve equations, points on the curve like P and Q).
-
ipad and opad in HMAC.
-
- Usually, any value could work, but the designer needed to choose some constant for the algorithm
- Example: In Diffie-Hellman, any large prime p and generator g works, but there’s a default p and g that everyone uses.
- So where do these default parameter values come from?
-
A proper way of practice is that rule makers should explicitly tell the public how and why those constants are chosen, so that everyone knows why it’s secure.
-
The main idea is to choose values with obvious human significance: The developer probably doesn’t have millions of values of obvious significance to brute-force and pick a value with a backdoor.
-
Seed a PRNG with your name.
-
The first few digits of π.
-
0x67452301, 0xefcdab89, … (SHA-1).
-
Fractional parts of the square/cube roots of prime numbers (SHA-2).
-
-
A case of improper practice: Dual_EC_DRBG
-
Dual_EC_DRBG is a PRNG published by the NSA behind the scenes.
- It relies on two public hard-coded parameters P and Q
-
P and Q are points on an elliptic curve.
-
If P and Q are related by Q = eP (analogous to Q = Pe mod n in discrete log), you can learn the internal state!
-
-
However, it’s horribly slow.
-
And it had subtle biases that shouldn’t exist in a secure PRNG: You could distinguish the upper bits from random.
-
Cryptographers spotted these flaws early on.
- Why would anyone use such a horrible PRNG 😳?
-
There’s a company called RSA.
-
NSA has donated $10 million to RSA.
-
In exchange, RSA Data Security implements Dual_EC in their RSA BSAFE library, and silently makes it the default PRNG.
-
With RSA Data Security’s support, Dual_EC became a NIST standard.
- Used in other products.
-
- 2013: Whistleblower Edward Snowden reveals classified NSA information
-
The New York Times vaguely mentions a crypto talk given by Microsoft people.
-
Everybody quickly realized this referred to a backdoor the NSA inserted into Dual_EC.
-
Backdoor: An attack inserted by an organization (e.g. NSA) so that they, but nobody else, can attack the system.
-
- With the backdoor, it’s possible to predict both future and past PRNG outputs.
- No rollback resistance, unlike HMAC-DRBG.
-
Juniper Networks used Dual_EC in their virtual private networks (VPNs).
- Juniper claims their version of Dual_EC was safe from the NSA backdoor.
- Juniper selected a different public parameter (Q) than the NSA’s public parameter.
- Later: Juniper is hacked.
- The hacker changed Dual_EC’s public parameter (Q) to give themselves a backdoor!