CS 161 Notes Lec 19-20

Course website: https://su21.cs161.org/
Course textbook: https://textbook.cs161.org/

Overview

Lecture 19: TLS

Lecture 20: DNS

Domain Name System (DNS)
1. DNS name servers
2. Steps of a DNS lookup
3. Stub resolvers and recursive resolvers
4. DNS message format
5. DNS records_(included in the message format section)_
6. DNS lookup walkthrough(Demo!!!)
DNS Security
1. Cache poisoning attacks
2. Kaminsky attack

TLS

TLS stands for Transport Layer Security. TLS is a protocol for creating a secure communication channel over the Internet.

TLS replaces SSL(Secure Sockets Layer), which is an older version of the protocol.

TLS is built on top of TCP, so it’s about layer 4.5

Recall that TCP is not secure under MITM, and there’re also some other vulnerabilities related to TCP, we mentioned that these will all be settled with higher layer protocols. Now finally we see the “higher layer protocol”, TLS!!

What are goals of TLS?

Confidentiality: Ensure that attackers cannot read your traffic.
Integrity: Ensure that attackers cannot tamper with your traffic.
- Prevent replay attacks.
  - The attacker records encrypted traffic and then replays it to the server.
  - Example: Replaying a packet that sends “Pay $10 to Mallory”.
Authenticity: Make sure you’re talking to the legitimate server.
- Defend against an attacker impersonating the server.

With the goals in mind, let’s see how a TLS connection is established. This also involves a handshake process.

TLS handshake

Step 0: Assume an underlying TCP connection has already been formed(recall TLS is based on TCP).
Step 1.1: The client sends ClientHello with
- A 256-bit random number RB (“client random”).
- A list of supported cryptographic algorithms.
Step 1.2: The server sends ServerHello with
- A 256-bit random number RS (“server random”).
- The algorithms to use (chosen from the client’s list).
RB and RS prevent replay attacks
- RB and RS are randomly chosen for every handshake.
- This guarantees that two handshakes will never be exactly identical.
Step 3: The server sends its certificate.
- Recall certificates: The server’s identity and public key, signed by a trusted certificate authority.
- If you forgot what’s certificates, check here ;) .
The client validates the certificate.
- Verify the signature in the certificate.
The client now knows the server’s public key.
- The client is not yet sure that they are talking to the legitimate server (not an impersonator).
- Recall: Certificates are public. Anyone can provide a certificate for anybody.
- Thus there should be a way to verify that the server actually has the private key to his public key.

Step 4: To verify the server owns the private key & share a secret between client and server, a premaster secret will be shared(it can be conducted in two ways).
- with RSA:
  1. The client randomly generates a premaster secret (PS).
  2. The client encrypts PS with the server’s public key and sends it to the server.
    - The client knows the server’s public key from the certificate.
  3. The server decrypts the premaster secret.
  4. The client and server now share a secret.
    - Recall RSA encryption: Nobody except the legitimate server can decrypt the premaster secret.
  5. Proves that the server owns the private key (otherwise, it could not decrypt PS).

Step 4(the DHE way):
1. The server generates a secret a and computes ga mod p.
2. The server signs ga mod p with its private key and sends the message and signature.
3. The client verifies the signature.
  - Proves that the server owns the private key.
4. The client generates a secret b and computes gb mod p.
5. The client and server now share a premaster secret: gab mod p.
6. Recall Diffie-Hellman: an attacker cannot compute gab mod p.

Step 5: Server and client will both derive symmetric keys and exchange them.
- The server and client each derive symmetric keys from RB, RS, and PS.
  - Usually derived by seeding a PRNG with the three values.
  - Changing any of the values results in different symmetric keys.
- Four symmetric keys are derived
  - CB: For encrypting client-to-server messages.
  - CS: For encrypting server-to-client messages.
  - IB: For MACing client-to-server messages.
  - IS: For MACing server-to-client messages.
  - Note: Both client and server know all four keys.
Step 6: Exchange MACs
- You may wonder what if the previous messages got tampered.
- You may wonder what if the server doesn’t really know the private key.
- Exchanging MACs will deal with these!
- The server and client exchange MACs on all the messages of the handshake so far!

Step 7: Now the client can communicate securely with the server.
- Encrypted and MAC’d.
- Note: TLS uses MAC-then-encrypt, even though encrypt-then-MAC is generally considered better(check lucky 13 attack, previous lecture posts also mentioned this), for legacy reasons.

The overall picture looks like this,

Some TLS issues to consider:

How can we be sure that network attackers can’t read or tamper with our messages?
- The attacker doesn’t know PS.
  - RSA: PS was encrypted with the server’s public key.
  - DHE: An attacker cannot learn the Diffie-Hellman secret.
- The symmetric keys are derived from PS.
  - The attacker doesn’t know the symmetric keys used to encrypt and MAC messages.
- Encryption and MACs provide confidentiality and integrity.
How can we be sure that the attacker hasn’t replayed old messages from a past TLS connection?
- Every handshake uses a different RB and RS.
- The symmetric keys are derived from RB and RS.
  - The symmetric keys are different for every connection.
How can we be sure that the attacker hasn’t replayed old messages from the current TLS connection?
- Add record numbers in the encrypted TLS message.
  - Every message uses a unique record number.
  - If the attacker replays a message, the record number will be repeated.
- TLS record numbers are not TCP sequence numbers.
  - Record numbers are encrypted and used for security.
  - Sequence numbers are unencrypted and used for correctness, in the layer below.
How to ensure forward security?
- Recall that forward security is to prevent a successful attack that happen in the future to help the attacker decrypting messages they recorded in the past.
- For RSA TLS, sadly, there’s no forward security.
  - The adversary can record RB, RS, and the encrypted PS.
  - If the adversary later compromises the server’s private key, they can decrypt PS and derive the keys!
- For DHE TLS, forward security is guaranteed.
  - Diffie-Hellman provides forward secrecy: PS is deleted after the TLS session is over, so the adversary can’t learn the keys, even if they later compromise the server’s private key.
  - Note: Because the server’s Diffie-Hellman component is signed, the adversary can’t MITM the Diffie-Hellman exchange without the server’s private key.

As we saw, DHE TLS is generally better than RSA TLS, also, the vulnerability brought by MAC-then-encrypt(lucky 13 attack) in message sending process is kinda overwhelming.

So, we had TLS 1.3, which came out at 2018, with following features:

RSA no longer supported (only DHE).
- Guarantees forward secrecy.
Performance optimization: The client sends gb mod p in ClientHello.
- If the server agrees to use DHE, the server sends ga mod p (with signature) in ServerHello.
- Potentially saves two messages later in the handshake.
Only supports AEAD mode encryption.
- Recall AEAD (authenticated encryption with additional data): a block cipher mode that guarantees confidentiality and integrity at the same time.
- Eliminates attacks associated with the insecure MAC-then-encrypt pattern.

TLS in practice

Firstly, let’s consider the efficiency of TLS.

Public-key cryptography: Minor costs.
- Client and server must perform Diffie-Hellman key exchange or RSA encryption/decryption.
Symmetric-key cryptography: Effectively free.
- Modern hardware has dedicated support for symmetric-key cryptography.
- Performance impact is negligible.
Latency: Extra waiting time before the first message.
- Must perform the entire TLS handshake before sending the first message.

TLS provides end-to-end security, which means even though the communication between client and server might be insecure, with TLS, we’re guaranteed to have most of the security covered.

However, there’re two thing that TLS can’t provide:

Anonymity: Anonymity means to communicate without others recognizing client and server’s identity. An attacker can always figure out who is communicating with TLS(the hello messages in handshake is in plaintext).
Availability: An attacker can stop a TLS connection.
- MITM can drop encrypted TLS packets.
- On-path attacker can still do RST injection(on TCP header) to abort the underlying TCP connection.

TLS provides services to higher layers(the application layer).

Remember HTTPS? It’s just the HTTP run over TLS(default HTTP runs on TCP).

Also note that there’re other application layer protocols besides HTTP. For example, when sending emails, Email protocol can use the STARTTLS command to uses TLS to secure communications.

Speaking of HTTPS, let’s see some attacks related to it ;) .

SSL stripping attacks

By default, browsers usually use HTTP. So, when you type google.com as an URL, the browser will automatically extend it to http://www.google.com, that’s awkward, since we know that only HTTPS provides encrypted communication. So, usually the website will redirect user requests with HTTP to HTTPS.

However, note that even though the communication will be encrypted after user requests are redirected to HTTPS, the very first HTTP request would still remain unsafe, and there’s no way to protect that!

So here comes SSL stripping, a related attack.

The attacker will force the user to use HTTP instead of HTTPS.
A MITM attacker intercepts the first HTTP request and creates their own HTTPS connection to the server.
The user never receives a redirect to HTTPS, so it believes the site wants them to use HTTP.

To prevent SSL stripping, use HTTP Strict-Transport-Security (HSTS) header, it tells browsers to only access the server with HTTPS.

PRNG Sabotage

Here’s an attack aiming on the handshake process of TLS.

Consider TLS with Diffie-Hellman,
- An attacker who learns the DHE secret a can derive the PS gab mod p (recall gb mod p is sent over the channel).
- An attacker who knows the PS can derive the symmetric keys (recall RC and RS are sent over the channel).
Consider using a PRNG to generate all random values,
- Includes the server DHE secret a and the client DHE secret b .
What if the PRNG is sabotaged and doesn’t have rollback resistance?
- Example of sabotage: Dual_EC DRBG(remember this? in lecture 10 case studies) with knowledge of the secret used to create the generator.
- Example of sabotage: ANSI X9.31: An AES-based PRNG with a secret key.
Once the PRNG algorithm is broken, the TLS will be totally broken.

Trust issues

Recall that we need a certificate authority to certificate the public key sent by the server at the beginning of TLS handshake.

This certificate is necessary, otherwise the attacker can pretend to be a server(spoofing), and send public to user to make TLS connection(surely the attacker will know the private key in this case).

So, the certificate authority here is the trust anchor of our web. Once the CA send the browser with server’s public key and domain name,

The browser checks the domain name in the URL matches the domain name in the certificate.
The certificate authority’s public key is hardwired into the browser (trust anchor).
- Modern browsers implicitly trust 100–200 root certificate authorities.
The browser uses the CA’s public key to verify the signature.

What if the CA is not recognized by the browser? The user will see something like this, depends on user’s choice, the TLS will either be established or abandon.

Also note that some revocation methods should be available for CAs.

What if an attacker steals a server’s private key?
- The certificate with the corresponding public key is no longer valid.
- TLS certificates have an expiration date, but they often don’t expire for years 😂 .
Solution: Certificate revocation lists.
- The CA occasionally sends out lists of certificates that are no longer valid.
- The browser occasionally downloads the lists.
Solution: Online Certificate Status Protocol (OCSP).
- The browser queries the CA whether a given certificate is still valid.
- The CA responds either “good” or “revoked,” signed with the CA’s private key.
A CA might issue a malicious certificate (e.g. stating that attacker’s public key belongs to Google) because:
- The CA is hacked.
- An attacker pays the CA to issue a malicious certificate.

There’s a famous CA called Let’s Encrypt, check it out if you’re interested :) .

Domain Name System (DNS)

Recall that in lecture 12, we introduced the domain attribute in URLs. Well, we know that the Internet uses IP addresses to communicate. So how do we match domain with IP addresses?

We gonna use something called DNS, which stands for Domain Name System.

DNS is an Internet protocol for translating human-readable domain names to IP addresses.
When you want to send a packet to a certain domain(say, google.com), your computer performs a DNS lookup to translate the domain name to an IP address, then it sends the packet to that IP address.

DNS name servers

It’s a type of servers on the Internet that are responsible for answering DNS requests.
- They have their own domains & IP addresses, too(we’ll see how to know these later).
When performing a DNS lookup, the computer will send a DNS query to the DNS name server, hopefully, the name server will answer with a DNS response, telling you the IP address of desired domain name.
Obviously such task is impossible to be done by a single server, so we’ll have multiple DNS name servers.

Actually, these DNS name servers are constructed in a hierarchy structure,

Steps of a DNS lookup

The main idea:

The DNS name servers are not storing every mapping information on each single server.
A DNS query will always be sent to the root DNS name server.
Whenever a DNS name server fails to know the answer to the DNS query, it will respond with the information(domain name, IP address) of one or multiple of its children.
The response must be either the answer(IP address) or some DNS domains in a lower layer that might know the exact mapping.
When user receives the DNS response, he/she will either get the IP address or know who to ask next, so eventually the IP address will be obtained.

Following slides is a brief example of the DNS lookup process.

Stub resolvers and recursive resolvers

In practice, it’s not your computer to make DNS queries directly. Instead, most of Internet service provider will provide a stub resolver to your computer, who talks with the recursive resolver.

One recursive resolver is usually equipped for each LAN, it communicates to stub resolvers on each computer inside the LAN, and it also helps them to make the DNS queries.

So the DNS request will be something like this,

Note that one advantage is that recursive resolvers can cache some common domain name - IP address pairs for their LAN, so that some requests are no longer necessary to made.

DNS message format

Before introducing the format of DNS messages, let’s consider which protocol to use for DNS messages.

To choose between TCP and UDP, some might think we should use TCP for such task that relates to security, however, it’s actually UDP that being used.

Simply because DNS is designed to be lightweight and fast, since we’ll frequently translate domain names to IP addresses for every single HTTP requests we make on the web.

Now let’s introduce the DNS message format.

Source port (16 bits): Chosen by the client.
- Can be randomized for security, as we’ll see later.
Destination port (16 bits): Usually 53.
- DNS name servers answer requests on Port 53.
Checksum: Code to check the UDP payload was not corrupted in transit.
Length: Length of the UDP payload.
ID number (16 bits): Used to associate queries with responses.
- Client picks an ID number in the query.
- Name server uses the same ID number in the response.
- Should be random for security, as we’ll see later.
Counts: The number of records of each type in the DNS payload.
Resource records(RR):
- Each RR is a name-value pair with a type.
  - A (answer) type records: Maps a domain name to an IPv4 address.
  - AAAA type record: Maps a domain name to an IPv6 address :)
  - CNAME type record: Maps one domain name to another domain name. Used for aliases.
  - NS (name server) type records: Designates another DNS server to handle a domain(recall that when a DNS name server can’t answer your question, it will tell you the servers that might know the answer).
  - MX type record: Used for mail servers.
  - There’re also some other types of RR.
- Storing information in the form of RR helps the recursive resolver to cache them easily.
- RRs are sorted into four sections:
  - Question section: What is being asked.
    - Included in both requests and responses.
    - Usually an A type record with the domain being looked up.
  - Answer section: A direct response to the question.
    - Empty in requests.
    - Used if the name server responds with the answer.
    - Usually an A type record with the IP address of the domain being looked up.
  - Authority section: A delegation of authority for the question.
    - Empty in requests.
    - Used to direct the resolver to the next name serve.r
    - Usually an NS type record with the zone and domain of the child name server.
  - Additional section: Additional information to help with the response, sometimes called glue records.
    - Empty in requests.
    - Provides helpful, non-authoritative records for domains.
    - Usually an A type record with the domain and IP address of the child name server (since the NS record provides the child name server as a domain).
- Each RR also contains some metadata(e.g. TTL, time to live, specifying how long would this answer be valid).

DNS lookup walkthrough_(Demo!!!)_

Here’s an exciting demo of how the DNS query is performed. We use a command called dig to walkthrough this process(on a Macbook).

The command is something like this

dig +norecurse eecs.berkeley.edu @198.41.0.4

Actually, the +norecurse parameter is not necessary here, but it’s included in the lecture demo, so I’ll just keep it here.

1). Start with IP 198.41.0.4, since we’ve searched online and know that it’s a DNS name server IP. Our goal is to get the IP address of eecs.berkeley.edu.

After executing the command, we see that there’s a bunch of response sent to us.

In the field names of this response, we can many see things that we learnt, including “id”, “QUESTION SECTION”, “AUTHORITY SECTION” and so on.

The response is stating that no answer is available, but the AUTHORITY SECTION contains the server names that we can ask next, and their IP addresses are contained in the ADDITIONAL SECTION.

2). Try to ask 192.33.14.30, the first IP address listed in the ADDITIONAL SECTION.

This response is similar with the previous one, so let’s keep on asking.

3). Try to ask 128.32.136.3.

🥳 We got an answer this time! So we know that eecs.berkeley.edu has an IP address being 23.185.0.1.

Try this out by yourself!

DNS security

Most attacks targeting DNS are related with caching(recall that recursive resolvers will cache resource records they receive).

The main idea is to trick the resolver to store some malicious domain name-IP address pairs, so that users will believe in malicious sites as legitimate ones.

Cache poisoning attacks

The idea is straightforward, the attacker tries to return malicious records to the victim, two possible approaches are listed below,

Malicious name server
- If a name server is somehow malicious, it can response with whatever malicious IP address it likes.
- Since there’s an additional section in DNS messages, the malicious name server can not only trick victims about the domain names they ask, but also trick them about domain names they didn’t ask!
- For example, when a query of eecs.berkeley.edu is sent to a malicious name server, it can respond with a malicious IP address, and include a RR as additional record, claiming www.google.com is mapped to some malicious IP address. The response will then cause the victims to believe that google has that malicious IP, that’s really dangerous!
- To defend against this, there’s a thing called bailiwick checking, saying that resolvers can only accept records if they are in the name server’s zone.
  - Example: The berkeley.edu name server can provide a record for eecs.berkeley.edu, but not mit.edu.
  - Example: The .edu name server can provide a record for mit.edu and berkeley.edu, but not google.com.
Instead of controlling name server, the attacker can also focus on the DNS messages.
- Man-in-the-middle (MITM) attackers
  - DNS is not secure against MITM attackers(since no encryption has been involved).
  - MITM attackers can poison the cache by adding, removing, or changing any record in the DNS response.
- On-path attackers
  - DNS is not secure against on-path attackers.
  - On-path attackers can poison the cache by sending a spoofed response.
    - If the spoofed response arrives before the legitimate response, the victim will cache the attacker’s malicious records.
    - The on-path attacker can see every field in the unencrypted DNS request. Nothing to guess!
- Off-path attackers
  - The off-path attacker needs to guess the ID field to spoof a response.
    - Recall that if the ID in the response doesn’t match the ID in the request, the resolver won’t accept the response.
  - If the ID number is randomly generated:
    - Probability of guessing correctly = 1/216
    - Recall: The ID number is 16 bits long.
    - Requires approximately 65,000 tries to successfully send a spoofed packet.
    - This is too small! So it’s easy for attackers to guess the ID number.

Kaminsky attack

As we’ve just mentioned that brute forcing to guess ID number will be easy, it’s actually not that simple.

Imagine, you’re an off-path attacker, your plan is to trick the victim to make a request to google.com, and before the legitimate DNS name server respond with the true IP address, you give the victim a malicious IP address, with an ID field that happen to be correct. Since the ID field only requires 65000ish times to brute force, repeating such process will eventually make your attack success.

However, the truth is that you got only one trial! Once your first malicious DNS response failed to arrive before the real one, or the ID field was wrong, the true DNS response will be accepted and cached by the resolver! After this, no more chance for you to try, until that RR expires.

Sounds great, now we know the brute-force against ID field is not that overwhelming :)

However, it’s not the truth ;) Dan Kaminsky(he passed away on April 23, 2021, RIP), security researcher, noticed that DNS clients would cache additional glue records as if they were authoritative answers, even though they aren’t! So the attacker can trick the victims to make useless requests(so even failed, the right answer for google.com will not be cached), thus having no limitation on brute-force trials!

Now, the attacker can gain more tries at once:
- The attacker includes the following in a website.
- For each, the client makes a request for the domain name.
- The attacker’s spoofed response contains:

Authority: fake1.google.com.  172800  IN  NS  www.google.com.

Additional: www.google.com.    172800  IN  A   6.6.6.6

The client now caches the record for www.google.com, and the cache is poisoned!

Isn’t that clever? ;)

Two defenses are made here:

Randomize the source port of the DNS query.
- The attacker must guess the destination port of the response in addition to the query ID.
- This adds 32 bits to guess, to total 248 possibilities.
Don’t cache glue records as part of DNS lookups.
- They are necessary, since NS records are given in terms of domain names, not IP addresses.
- If you want to cache, you can perform a separate recursive DNS lookup to validate the glue record authoritatively.
- Issue related to this method is that not all DNS softwares have implemented this.

Written on May 10, 2023

-->