CS 161 Notes Lec 25-26
-
Course website: https://su21.cs161.org/
-
Course textbook: https://textbook.cs161.org/
Overview
Lecture 25: Anonymity and Tor
Lecture 26: Signal Protocol
Anonymity
So far, we’ve introduced security involving integrity and confidentiality. However, as most of the network protocols are based on things like IP, anonymity of communication is seldom provided in those secure protocols, since it’s just too difficult.
On the other hand, attackers are more usual to remain anonymous. Based on how some of that works(like DDoS), we know that a main strategy for anonymity can be letting others to send messages for you.
Such idea naturally brings us to proxy.
Proxies and VPNs
In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource.
- Alice sends the message and the recipient (Bob) to the proxy, and the proxy forwards the message to Bob.
- The recipient’s name (and optionally the message) is encrypted, so an eavesdropper does not see a packet with both Alice and Bob’s identities in plaintext.
- Bob receives the message from the proxy, with no indication it came from Alice.
Proxy is a little similar with VPN. Recall in lecture 24, we briefly introduced VPN(Virtual Private Network).
VPN can be seen as an encrypted virtual connection to an internal network.
-
Allows access to an internal network through an encrypted tunnel.
-
Creates an alternative use case: Appear as though you are coming from the virtually connected network instead of your real network!
-
Similar concept to proxies, but Alice directly sends packets as though coming from the VPN, wrapped in the VPN’s layer of encryption.
-
Proxies operate at the application layer, while VPNs operate at the network layer.
-
What are some issues about proxies and VPNs?
- Performance
- Sending a packet requires additional hops across the network(well, this is in fact unavoidable for anonymity).
- Cost
-
VPNs can cost $80 to $200 per year.
-
Yeah, you may claim that your VPN charges you way less than this, that’s because those VPNs can steal your data as an implicit profit.
-
- Trusting the proxy
-
The proxy can see the sender and recipient’s identities.
-
Attackers might convince the proxy to tell them about your identity.
-
Tor
Tor stands for The Onion Router, which is a network that uses multiple proxies (relays) to enable anonymous communications.
Recall that a concern of proxy is that proxy server may leak your information to attackers, but with multiple proxies, the your information got leak only if the attacker controls all the proxies.
Components of Tor:
-
Tor network: A network of many Tor relays (proxies) for forwarding packets.
-
Directory server: Lists all Tor relays and their public keys.
-
Tor Browser: A web browser configured to connect to the Tor network (based on Firefox).
-
Tor onion services: Servers that can only be reached through the Tor network.
-
Tor bridges: Tor relays that try to hide the fact that a user is connecting to the Tor network.
A basic Tor communication looks like this, note there’re three relays on the path,
Before introducing how Tor works, let’s first take a look at the threat model of it,
- Security: Client anonymity and censorship resistance.
-
Client anonymity means that the communication servers shouldn’t have any idea about the client’s identity.
-
Censorship resistance means that even if someone want’s to block the connection, the connection will still be able to established.
-
Optional: Server anonymity with onion services.
-
-
Performance: Low latency (communication should be fast).
- Tor preserves anonymity against local adversaries.
- Example: An on-path attacker sees Alice send a message to a Tor relay, but not the final destination of the message.
Tor circuits
Now, let’s see how a basic Tor connection is established. To communicate anonymously with a server, the Tor client forms a circuit consisting of 3 relays (by default).
-
Step 1: Query the directory server for a list of relays.
-
Step 2: Choose 3 relays to form a Tor circuit.
-
Step 3: Connect to the first relay, forming an end-to-end TLS connection.
-
Step 4: Connect to the second relay through the first relay, forming an end-to-end TLS connection.
-
Step 5: Connect to the third relay through the second relay, forming an end-to-end TLS connection.
-
Step 6: Connect to the web server
- If the web server is using HTTPS, then an end-to-end TLS connection will be formed through the third relay.
Below is a graph illustrating Tor circuit,
So what does the relay do?
-
Perform TLS handshakes when requested.
-
When receiving a packet, decrypt using the key obtained through TLS.
-
If the destination of the packet is another relay, forward the packet to the next relay.
-
If the destination of the packet is an external server, forward the packet to that server.
The idea is that no matter where the relay stands in the path, all it can see is the packet header for its layer, and the destination that can be another relay or the real destination.
Note that Tor exit nodes are actually playing a slightly different role among the relays. In the graph above, K3 is the relay that actually communicates with Bob, so Bob only knows the existence of K3.
-
First of all, notice how the blue arrow is different in the graph above from the other channels, this means that the former channels are always TLS, but the blue arrow is determined by the communication protocol Alice chose to use to communicate with Bob.
-
If they are communicating in HTTP, then K3 will be the one who actually communicates with Bob in HTTP, that also means K3 can perform a MITM attack.
-
Another special thing about the Tor exit node is that if Alice is doing something illegal, what Bob can find is only K3, so K3 will be the one to be blamed. So administrators of Tor exit nodes often receive abuse complaints.
-
As a result, most Tor relays choose to only be entry or intermediate nodes, not exit nodes.
- Exit node bandwidth is the bottleneck in Tor, not internal bandwidth.
Timing attacks
Although we use Tor to ensure the anonymity of communication, it might potentially be broken with timing attacks.
- A network attacker who has a full (global) view of the network can learn that Alice and Bob are talking.
- Exploit a timing attack: Observe when Alice sends a message, when Bob receives a message, and link the two together.
- Global adversaries are outside of Tor’s threat model and are not defended against.
Collusion
The Tor circuit is not safe if all the relays you use are leaking information about the communication, that’s called collusion.
-
Collusion is adversarial (dishonest) behavior.
-
Honest nodes should never share information with other proxies.
-
If all nodes in the circuit collude, anonymity is broken.
-
If at least one node in the circuit is honest, anonymity is preserved.
With collusion in mind, it’s easy to notice that the more relays we use for a Tor circuit, the less likely a collusion will break the anonymity. On the other hand, the more relays, the more delay, which violates the efficiency design goal. So, in practice, 3 relays is the default choice.
To defend against collusion, we use something called guard nodes.
-
Guard nodes must have a high reputation and must have existed for a long time.
-
Clients will always use a guard node as the entry node (by default) and the same guard node is used for a long period of time.
-
Attackers’ nodes are unlikely to become guard nodes.
-
Because clients use the same guard nodes for a long period of time, there is only a low chance that the client will switch to an attacker’s guard node.
-
Distinguishable traffic
Recall that we want Tor to provide anonymity even for the traffic between client and the first relay. However, a local adversary can see that you are sending packets to a Tor relay, which indicates that you’re using a Tor.
This also brings up an important idea: Tor is only safe if there’re many people using it. Crowd is the place you gonna hide and stay anonymous!
To defend the local adversary, something called Tor bridge is used.
- Notice: Attackers can tell you are using Tor because they can see you are connecting to an entry node.
- Lists of entry nodes are publicly available.
-
Tor bridges are entry nodes that are not available on any public list
-
Users request bridges from a separate directory, which will only give a few bridges to the user.
-
There is no publicly available list of all bridges!
-
To defend this,
-
- Censors can no longer block Tor based on IP addresses, but they can still distinguish traffic that looks like Tor traffic from normal traffic.
- To defend this, we use pluggable transports, which change the appearance of the client’s traffic to the entry node (only for bridges), that’s basically obfuscating the encrypted traffic to make it “look” more like normal web traffic.
Tor onion services
Sometimes, the server also wants to be anonymous, so no one knows where the server is located.
So we have Tor onion services, meaning websites that are only accessible through the Tor network. Also known as dark web.
To make the server anonymous, we can’t use their domain name in URL anymore, instead, we hash their public key and use part of it as URL, e.g. http://pwoah7foa6au2pul.onion.
Here’re steps to establish a Tor onion service:
- The server needs to publish how to contact the server:
-
The server chooses a set of nodes to be introduction points and forms a Tor circuit to each of them.
-
The server publishes its public key and its introduction points to a publicly available directory.
-
- Now, the client connects to the server:
-
Step 1: The client queries the directory using the hash of the public key to get the server’s full key (not just its hash) and the introduction points.
-
Step 2: The client chooses an introduction point and forms a Tor circuit to it.
-
Step 3: The client chooses a rendezvous point and secret used to communicate to the server, encrypts them with the server’s public key, and sends them to the introduction point, which relays them to the server.
-
Step 4: The client and server both form Tor circuits to the rendezvous point and perform an end-to-end TLS handshake, and the server sends the decrypted secret to the client to authenticate itself.
-
step 2
step 4
In such way, the client and the server will communicate through 6 relays, and it provides truly hidden onion services. However, if the server is okay to let the rendezvous point know its identity, it can also use non-hidden onion services, which looks like this,
Tor in practice
- Benefit: Free to use.
-
Tor is mostly funded by the US government.
-
Users “pay” by providing traffic for other users to hide in (recall: you don’t want to be the only user on the network using Tor).
-
Drawbacks:
-
Exit nodes are a man-in-the-middle attacker.
-
Performance: Latency is significantly worse.
-
Full anonymity requires usability tradeoffs.
-
All Tor browsers need the exact same configuration, so they don’t save your history.
-
They even recommend keeping the browser window size constant(otherwise some particular size might identify it’s you), which can be annoying!
-
For censorship resistance:
-
Because Tor hides the sites a user is connecting to, it is useful for bypassing censorship.
-
Some arm races surrounding this topic:
- Censors can easily block access to all public Tor entry points.
- Bridge services provide a set of entry points that aren’t listed publicly anywhere, so they can’t be blocked by IP.
- Censors can block traffic that looks like Tor traffic.
- Pluggable transports make traffic look more like normal web traffic.
- Censors can pretend to be a Tor client to see if an endpoint is a Tor node.
-
More recent pluggable transports distribute a shared secret, not known to active probers.
-
Some pluggable transports deliberately rely on cloud services, so censors have to block important web services (like Google Cloud Platform, Amazon Web Services, etc.) to block Tor.
-
- Censors can easily block access to all public Tor entry points.
Lecture 26 is the last lecture in CS 161. It’s an optional one, the topic is about Signal, an app that provides users with secure communication. The lecture focuses on the Signal Protocol that makes such communication secure.
Signal protocol
Firstly, what is signal? It’s an open source app that provides secure communication to its users, their code can be found here.
Signal uses a carefully designed protocol to ensure that the Signal server themselves can’t read users’ messages, such property is not provided by most of the common communication apps.
You may wonder why is it a difficult thing to stop app servers from reading users’ messages, given that we have secure protocols like TLS. However, in the case of web server, we can always assume a web server is online to answer clients’ requests, but in the case of communication, we have to make sure Alice can send Bob a message even if Bob is offline(maybe on an airplane).
-
The above scenario forces us to use an asynchronous protocol.
-
Also, we want the protocol to have forward secrecy(leaking of secrets in the future won’t help the attacker to decrypt messages they recorded previously).
-
Besides that, we want the protocol to provide deniability: In cryptography, deniable authentication refers to message authentication between a set of participants where the participants themselves can be confident in the authenticity of the messages, but it cannot be proved to a third party after the event.
-
Lastly, we also want backward secrecy, which means a leak of secret now can’t help the attacker to decrypt messages in the future after a certain period of time.
So, why is TLS not applicable for the scenario above? Firstly, asynchronous means that it might not be possible for both sides of the communication to stay online to finish the handshake. Even if they shared a secret key, they shouldn’t use that for future messages forever, because that will not give forward secrecy. Thus, the protocol we use must change the secret key frequently.
- To frequently generate secret keys, we’ll use HKDF(hash-based key derivation function), a simple function to generate keys which works like HMAC.
- Note that doing HKDF for each message, with a constant added into the algorithm, we’ll have forward secrecy. The recipient should delete the secret key once it has been used. But the HKDF doesn’t provide backward secrecy, since any key leaking will enable to attacker to calculate all the following keys.
-
Such way of using HKDF is like a ratchet, the keys can be deduced in one way, but not the other way.
-
To provide backward secrecy together with forward secrecy, we’ll need a double ratchet algorithm.
-
So we bring up a Diffie-Hellman ratchet. The idea is that since Diffie-Hellman requires Alice and Bob to give ga and gb to generate key gab, it’s in fact possible to just change one of a and b to get a whole new key. So, even if Alice is offline after one communication, Bob knows that Alice remembers a herself, so he can generate a new b’, and encrypt message with gab’, when Alice is online again, she will still be able to decrypt this message, even though the key is already changed by Bob, which provides backward secrecy.
- Now let’s combine HKDF ratchet with the Diffie-Hellman ratchet to get a double ratchet.
Double ratchet algorithm
Firstly, we write Diffie-Hellman(DH) in the following structure.
The DH ratchet introduced above will then be something look like this,
The main idea is that even if Alice and Bob is communicating asynchronously, as long as one of the side has its public key updated, the DH public key will be updated safely.
Now back to double ratchet, remember in KDF chain(HKDF ratchet) we pass a constant 0x01 as a parameter to KDF each time. The main idea of double ratchet is to change this constant each time, so that KDF will also be determined by another parameter! As you may guessed, this parameter will be generated by DH ratchet.
The graph of double ratchet is shown below,
- In the graph above,
-
circle: DH
-
square: KDF
-
RK, CK, A1 things: Just some names, not super important :)
-
-
At the beginning, Alice and Bob will know each other’s public key for DH, together with an initial RK for root chain(how they do this will be briefly mentioned later).
-
Each time when Alice wants to send messages, she updates her public key, with Bob’s public key she used last time, perform a DH, and combine the result with RK to perform a KDF, to update RK.
-
At the same time, slightly modify RK to get CK, it can be just adding some constant to RK. Then pass CK to KDF to update CK. For each chunk of message she wants to send, Alice should update a new CK, and each CK will not be directly used, but slightly modified instead.
-
See the A1, A2 … above generated by CKs, those are keys used to do encryption for each chunk of message!
It’s easy to verify that the protocol above will provide both forward and backward secrecy!
If the attacker want’s to fully decrypt each communication, he/she has to attack the server to get the initial RK, then sit as MITM for each DH exchange!
The last question to consider is how do Alice and Bob share the very first public keys safely, together with the initial RK.
- Signal protocol solves this with something called X3DH, here’s a document made by Signal explaining how this works.
That’s all about the brief introduction of signal protocol. To learn this exciting protocol in detail, check this link to see their official documents.