CS 161 Notes Lec 23-24

Course website: https://su21.cs161.org/
Course textbook: https://textbook.cs161.org/

Overview

Lecture 23: Intrusion Detection

Path traversal attacks
Types of detectors
Detection accuracy
Styles of detection
Other intrusion detection strategies

Lecture 24: Malware

Malware
Viruses
Worms
1. Propagation strategies
2. Modeling worm propagation
3. History of worms
Infection cleanup and rootkits

Path traversal attacks

Path traversal attack is a type of web attack. The reason why it’s introduced here is because the topic of this lecture is about detecting attacks that are happening/happened, and we need a type of attack to detect :)

Path traversal attack is quite similar to things like SQL injection, the main idea is that when the website asks users to input the name of the file they want,

and want to simply send them the file base on this input file name with a preset prefix,

It’s possible that the real filesystem is look like this in the web server,

An attacker may use an input like below to get passwords.txt in the private folder.

The defense for that is also easy, one possible way is to parse the input file name before executing right away, just like the defense for SQL injection.

Types of detectors

Three types of detectors
- Network Intrusion Detection System (NIDS)
- Host-based Instruction Detection System (HIDS)
- Logging
The main difference is where the detector is deployed.

Network intrusion detection system (NIDS)

NIDS is a detector installed on the network, between the local network and the rest of the Internet.

Operations NIDS can do:
- NIDS has a table of all active connections and maintains state for each connection.
- If the NIDS sees a packet not associated with any known connection, create a new entry in the table.
  - Example: A connection that started before the NIDS started running.
- NIDS can be used for more sophisticated network monitoring: not only detect attacks, but analyze and understand all the network traffic.
Benefits of NIDS:
- Cheap: A single detector can cover a lot of systems.
- Easy to scale: As the network gets larger, add computing power to the NIDS.
  - Linear scaling: Investing twice as much money gives twice as much bandwidth.
- Simple management: Easy to install and manage a single detector.
- End systems are unaffected.
  - Doesn’t consume any resources on end systems.
  - Useful for adding security on an existing system.
- Smaller trusted computing base (TCB).
  - Only the detector needs to be trusted.
Drawbacks and the corresponding defenses of NIDS:
1. Inconsistent or ambiguous interpretation between the detector and the end host.
  - Here’re some examples. Suppose the NIDS see a packet containing “../etc/passwd” flying through, which looks like a path traversal attack, should it raise an alarm?
  - What if the packet has an insufficient TTL so it will only pass the NIDS and won’t even reach the end host?
  - Another case to consider, what if NIDS see something like “%2e%2e%2f%2e%2e%2f” flying through, which may seems meaningless but will be parsed as “../etc/passwd” by URL percent encoding!
  - Lastly, what if a message like “..///.///..////” is flying through? Without knowing the previous contexts, the NIDS has no way to tell whether this message is malicious.
  - The examples above basically shows that imperfect observability and incomplete analysis of NIDS(don’t worry, the two phrases are just concluding scenarios above) make it hard for NIDS to make a decision.
  - Such inconsistent interpretation leads to a possible attack: evasion attack. Evasion attack exploits inconsistency and ambiguity to provide malicious inputs that are not detected by the NIDS.
  - How to defend evasion attack?
  - You can try to make sure the NIDS understands things like URL, UNIX filesystems, so that it can have the same interpretation as the end host.
  - You can also manually impose a normalized form for all inputs, such as forcing all URLs to be expanded as plaintext.
  - You can also let the NIDS to simply consider all the possibilities, but that will be painfully slow.
  - The NIDS can also use flags to remind the end hosts, showing that the packet might be malicious, but can’t be sure about it.
2. How does NIDS monitor encrypted traffic?
  - Recall that TLS uses encryption, so how can NIDS verify packets sent in that protocol?
  - One possible solution: Give the NIDS access to all the network’s private keys😲
    - Now the NIDS can decrypt messages to inspect them for attacks.
    - Problem: Users have to share their private key with someone else which is dangerous!

Host-based intrusion detection system (HIDS)

Unlike NIDS, HIDS installs detectors for each end system.

Benefits of HIDS:
- Fewer problems with inconsistencies or ambiguities: The HIDS is on the end host, so it will interpret packets exactly the same as the end host!
- Works for encrypted messages :)
- Can protect against non-network threats too (e.g. malicious user inside the network).
- Performance scales better than NIDS: one NIDS is more vulnerable to being overwhelmed than many HIDS.
Drawbacks of HIDS:
- Expensive: Need to install one detector for every end host.
- Evasion attacks are still possible (consider Unix file name parsing).

Logging

Logging is usually done by web servers, they store their network traffic separately at a place, and regularly use an agent to review it to detect attacks.

Benefits of logging:
- Cheap: Modern web servers often already have built-in logging systems.
- Fewer problems with inconsistencies or ambiguities: The logging system works on the end host, so it will interpret packets exactly the same as the end host!
Drawbacks of logging:
- Unlike NIDS and HIDS, there is no real-time detection: attacks are only detected after the attack has happened.
- Some evasion attacks are still possible (again, consider Unix file name parsing).
- The attacker could change the logs to erase evidence of the attack.

Detection accuracy

Now let’s consider the detection errors, there’re two main types of detection errors:

False positive: Detector alerts when there is no attack.
False negative: Detector fails to alert when there is an attack.

Detector accuracy is often assessed in terms of the rates at which these errors occur.

False positive rate (FPR): The probability the detector alerts, given there is no attack.
False negative rate (FNR): The probability the detector does not alert, given there is an attack.

Obviously, there’s no way to build a perfect detector that gives 0 FPR together with 0 FNR. So there must be tradeoffs in the design of a detector.

Here’s an interesting question, can you combine two independent detectors to create a better detector? ;)

Parallel composition:
- Alert if either detector alerts.
- Intuition: The combination generates more alerts.
- Reduces false negative rate.
- Increases false positive rate.
Series composition:
- Alert only if both detectors alert.
- Intuition: The combination generates fewer alerts.
- Reduces false positive rate.
- Increases false negative rate.

So you won’t get both FNR and FPR reduced! Again, tradeoffs :)

Styles of detection

So far we’ve introduced the styles of detectors, now we’ll introduce the styles of detection: how the detector scans data to find attacks. There’re four main styles of detection:

Signature-based detection
Specification-based detection
Anomaly-based detection
Behavioral detection

Signature-based detection

Flag any activity that matches the structure of a known attack.
It’s some sort of blacklisting, the detector keeps a list of patterns that are not allowed, and alert if any activity matches with them.
Some examples:
- We know that “../” might be related with path traversal attacks, so alert if any request contains “../”
- We know that buffer overflow usually triggers shellcode, so keep a list of common shellcodes and alert if any request contains shellcode.
Benefits
- Conceptually simple.
- Very good at detecting known attacks.
- Easy to share signatures and build up shared libraries of attacks.
Drawbacks
- Won’t catch new attacks without a known signature.
- Might not catch variants of known attacks if the variant doesn’t match the signature.
- The attacker can modify their attack to avoid matching a signature.
- Simpler versions only look at raw bytes, without parsing them in context.
  - May miss variants.
  - May generate lots of false positives.

Specification-based detection

On the opposite of signature-based detection, specification-based detection specifies allowed behavior and flag any behavior that isn’t allowed behavior.
So it’s basically whitelisting.
Some examples:
- For path traversal attacks, we specify that only alphanumeric characters are allowed as input!
- For buffer overflows, we do a similar thing, only allow certain patterns of input.
Benefits
- Can detect new attacks we’ve never seen before.
- If we properly specify all allowed behavior, can have low false positive rate.
Drawbacks
- Takes a lot of time and effort to manually specify all allowed behavior.
- May need to update specifications as things change.

Anomaly-based detection

The main idea is that attacks usually look unusual, so we can develop a model that judges whether a request looks suspicious, and alert if it does.
It’s similar to specification-based detection, but now the detector learn a model of normal behavior instead of using whitelist manually specified.
Benefits:
- Can detect attacks we haven’t seen before.
Drawbacks
- Can fail to detect known attacks.
- Can fail to detect new attacks if they don’t look unusual to our model.
- What if our model is trained on bad data (e.g. data with a lot of attacks)?
- The false positive rate might be high (lots of non-attacks look unusual).
- If we try to reduce false positives by only flagging the most unusual inputs, the false negative rate might be high (we miss slightly unusual attacks).

Behavioral detection

Instead of looking for the exploit, we’re looking for the result of the exploit. Behaviors can themselves be analyzed using blacklists (signature-based), whitelists (specification-based), or normal behavior (anomaly-based).
Some examples:
- For path traversal attacks, the detector just see if any unexpected files are being accessed (e.g. the passwords file).
- For buffer overflows, the detector checks if he program calls unexpected functions.
Benefits
- Can detect attacks we haven’t seen before.
- Can have low false positive rates if we’re looking for behavior that rarely occurs in normal programs (e.g. in the exec example, there are probably no false positives!)
- Can be cheap to implement (e.g. existing tools to monitor system calls for a program).
Drawbacks
- Legitimate processes could perform the behavior as well (e.g. accessing a password file).
- Only detects attacks after they’ve already happened.
- Only detects successful attacks (maybe we want to detect failed attacks as well).
- The attacker can modify their attack to avoid triggering some behavior.

Other intrusion detection strategies

Vulnerability scanning

The idea is to launch attacks on your own system first, and add defenses to those attacks that worked.

As the name indicates, vulnerability scanning is to use a tool that probes your own system with a wide range of attacks (and fix any successful attacks).

Benefits
- Accuracy: If your scanning tool is good, it will find real vulnerabilities.
- Proactive: Prevents attacks before they happen.
- Intelligence: If your intrusion detection system alerts on an attack you know you already fixed, you can safely ignore the alert.
Drawbacks
- Can take a lot of work.
- Not helpful for systems you can’t modify.
- Dangerous for disruptive attacks (you might not know which attacks are dangerous before you run the scanning tool).

Honeypots

It’s a sacrificial system with no real purpose ;) Trick the attackers to attack it and see how the defenses should be added.

Benefits
- Can detect attacks we haven’t seen before.
- Can analyze the attacker’s actions.
  - Who is the attacker?
  - What are they doing to the system?
- Can distract the attacker from legitimate targets.
Drawbacks
- Can be difficult to trick the attacker into accessing the honeypot.
- Building a convincing honeypot might take a lot of work.
- These drawbacks matter less if the honeypot is aimed at automated attacks (e.g. the spam detection honeypot).

Forensics

Analyzing what happened after a successful attack. For that, you might need things like logs.

Intrusion prevention systems

Yeah, detecting and blocking attacks are two different things. We’ve already seen some intrusion detection strategies that only detect attacks after they took place!

So, we call those intrusion detection systems that can also block the attacks intrusion prevention systems, or IPS.

The very last thing to notice in this lecture about intrusion detection systems is that these IDS themselves may be attacked. Given that IDS cost both computational power and storage to do detection, DoS attack is a perfect choice to attack on these IDS!

Malware

Lecture 24 is all about malware, which means malicious software, sometimes also called malcode. Malware is usually executed on victim’s machine, with a purpose that might be listed below,

Deletes files.
Sends spam email
Launches a DoS attack(remember DDoS?).
Steals private information.
Records user inputs (key logging, screen capture, webcam capture).
Encrypts files and demands money to decrypt them (ransomware).
Physically damages machines.

The lecture will focus on how does malware propagate, meaning how does malware spread itself.

Self-replicating code describes a piece of code that outputs a copy of itself, which is the exact thing malwares need for propagation.

Depending on how the self-replicating process will be triggered, malwares can be divided into viruses and worms.

Both viruses and worms are automatic self-replicating codes.
Viruses require user’s action to propagate.
Worms don’t require user’s action to propagate.
In fact there’s no clear line between virus and worm.

Viruses

Once again, viruses are malwares that requires user action to propagate.

Propagation strategies

Since the virus can only propagate after user gives a certain action, it might propagate in the following ways:
- Infect code that runs when opening an app.
- Infect code that runs when the system starts up.
- Infect code that runs when the user opens an attachment.
When the malcode runs, it looks for opportunities to infect more systems
- Example: Send emails to other users with the code attached.
- Example: Copy the code to a USB flash drive (so any other users who run the files on the USB drive will be infected too).

Detection strategies

An effective way of defending viruses is to use signature-based detection.

Viruses replicate by using copies of the same code.
Capture a virus on one system and look for bytes corresponding to the virus code on other systems.
Usually, the antivirus softwares will keep a list of common viruses, and scan the software to see if there’re code matching, which will be considered suspicious.

Again, there’s always a arm race between attackers and defenders, with the defenses above in mind, the attackers can look for evasion strategies.

Change the appearance of the virus so that each copy looks different, thus making signature-based detection harder.
While, how to automatically make the virus look different for each propagation? We’ll mention this later.

In such an arm race, attackers have a slight advantage, since they can see what strategies defenders will use, while the defenders have no idea about the new attacks.

Polymorphic code

As we mentioned in the section above, the attacker might implement virus in a way such that it appears in different form each time, such code is called polymorphic code.

The key to achieve this is to use cryptography!

Each time the virus propagates, it inserts an encrypted copy of the code.
The code also includes the key and decryptor.
When the code runs, it uses the key and decryptor to obtain the original malcode.
So each time of propagation, the virus should just generate a random key and use it to encrypt the malicious code, then pass it on to the victim with the key, the encrypted code and the decryptor.
Since encryption is kinda random, the signature-based detection won’t work.
Note that here encryption is used for obfuscation, not confidentiality, the virus can leave the key as plaintext, the goal is just making signature-based detection useless.

So, how to defend against such polymorphic code?

Strategy #1: Add a signature for detecting the decryptor code.
- Issue: Less code to match against → More false positives(since there’re many legitimate codes that involves decryption).
- Issue: The decryptor code could be scattered across different parts of memory.
Strategy #2: Safely check if the code performs decryption.
- Execute the code in a sandbox.
- Analyze the code structure without executing the code.
- Issue: Legitimate programs might perform similar operations too (e.g. decompressing ZIP files).
- Issue: How long do you let the code execute? The decryptor might only execute after a long delay(on purpose).

Metamorphic code

This is another way to tackle signature-based detection for attackers. The main idea is to generate a semantically different version of code for each propagation.

Include a code rewriter with the virus to change the code randomly each time.
- Renumber registers.
- Change order of conditional (if/else) statements.
- Reorder independent operations.
- Replace a low-level algorithm with another (e.g. mergesort and quicksort).
- Add some code that does nothing useful (or is never executed).

To defend metamorphic code, behavioral detection is necessary,

Need to analyze behavior instead of syntax.
Look at the effect of the instructions, not the appearance of the instructions.
Antivirus company analyzes a new virus to find a behavioral signature, and antivirus software analyzes code for the behavioral signature.

However, again, this may be subverted by the virus,

Delay analysis by waiting a long time before executing malcode.
Detect that the code is being analyzed (e.g. running in a debugger or a virtual machine) and choose different behavior :) .
Antivirus can look for these subversion strategies and skip over them.

Again ;) there’re corresponding defenses, one of which is to flag unfamiliar code.

Keep a central repository of previously-seen code.
If some code has never been seen before, treat it as more suspicious.
The central repository can store secure cryptographic hashes of previously-seen code snippets for efficiency (the software hashes code and see if the hash matches a hash in the repository).
Note that flagging unfamiliar code is a powerful defense.
- You have a detector for malicious behavior (e.g. signature detection).
- Now you also have a strategy for people avoiding your first detector.
- Attacker is in trouble either way:
  - If the attacker doesn’t modify the code for each propagation, it will have a detectable signature.
  - If the attacker modifies the code each time, it always appears as new and suspicious.
  - When avoiding one strategy, the attacker will be caught by the other strategy!

Worms

Recall that worms are code that don’t require user action to propagate.

Propagation strategies

How does the worm find new users to infect?

Randomly choose machines: generate a random 32-bit IP address and try connecting to it.
Search worms: Use Google searches to find victims.
Scanning: Look for targets (can be limited by bandwidth).
Target lists.
- Pre-generated lists (hit lists).
- Lists of users stored on infected hosts.
- Query a third-party server that lists other servers.
Passive: Wait for another user to contact you, and reply with the infection.

How does the worm force code to run?

Buffer overflows for code injection.
A web worm might propagate with an XSS vulnerability.

Modeling worm propagation

Worms can spread pretty fast, in fact, the intuition would be exponentially fast, since they parallelize the process of propagating/replicating.

The spread of the worm depends on:

The size of the population.
The proportion of the population vulnerable to infection.
The number of infected hosts.
The contact rate (how often an infected host communicates with other hosts).

In reality, the number of infected hosts grow logistically.

Initial growth is exponential:
More infected hosts = more opportunities to infect.
Later growth slows down: Harder to find new non-infected hosts to infect!

There’s been an experiment done for that, and the result is shown below,

In the lecture, there were some examples of worms starting from the famous Morris Worm. However I’m not gonna cover the stories here.

I’ll just leave the names of the example here for you to lookup by yourself:

Morris Worm
Code Red
Slammer
Witty
Stuxnet
WannaCry
NotPetya

Inflection cleanup and rootkits

Finally, let’s consider what should we do if we find a malware on our system.

To cleanup, we must restore and repair many files, antivirus companies sell software that helps with disinfection.

But, what if the malware executed with administrator privileges?

The entire computer is potentially compromised.
The operating system might be compromised too!
Best solution: Rebuild the system from data backups and a fresh copy of the operating system.

Further, what if the malware also infected the tools you use to rebuild the system ;) ?

There is no good way of cleaning up malware using only tools in the system!

Here we introduce a concept called rootkit, which is the malcode that hides its presence in the operating system.

It can hide the fact that it is stored on the disk.
- When other processes (e.g. antivirus) try to read the malware, the rootkit reports “file doesn’t exist”.
It can hide the fact that it’s currently running a process.
- When other processes (e.g. antivirus) list running processes, the rootkit doesn’t report the running malware.
Where can rookits be hidden? Startup code!
- Examples: BIOS/EFI firmware, disk controller firmware.
- The startup code can inject malcode into the operating system.
- Defense: Only run cryptographically signed startup code.
To defend against rootkits on the disk, one may scan the disk of a trusted machine and scan the disk of a potentially infected machine, then compare the results.

Written on May 10, 2023

Comments

Comment on Github