Review of CS 463 Materials

This is a review of the CS 463 lecture materials. I simply copy-paste some of them here as a review for the final.

Table of Contents
Introduction
Social Networks
De-Indentification
Machine Learning in Security
Cryptography
Trusted Computing
Bitcoin
Information Flow
- Noninterference
Health Information Technology
- Genomic Privacy Attack
Mobile OS Security
- PC vs. smart phones

Introduction

Define Computer Security

A collection of properties that hold in a system in the presence of an adversary under a set of constraints.

Definitions

Confidentiality (privacy): Prevent unauthorized parties from accessing certain data/system
Integrity: Prevent unauthorized parties from tampering with certain data/system
Availability: Make sure certain data/system is available to users
Authenticity: Proof of true identity/origin
Anonymity: Cannot be distinguished from others
Accountability: The ability to identify the responsible party
Defenses
Backups
Automatic updates
Two-factor authentication
2FA Bypass (Real Time Phishing)

Phish site to ask for 2FA code from user while attacker attempting to login the real site.

Homophily

The tendency of individuals to associate and bond with similar others.

Choice Homophily: Closeness due to preferences by the individual. Example: Favorite teams
Induced Homophily: Closeness due to other constraints. Examples: Geographic closeness, Age closeness with friends.
Value Homophily: Individuals with similar values, thinking. Example: Religion
Status Homophily Individual with similar social status. Example: Aristocracy

Age Inference

Ages of friends should be similar to that of the user, high-school graduation year of friends should be closer to the high-school graduation of the user.

Baseline

Just take the mean / median of the known ages in the whole dataset as the age estimate.

Approach

Train a linear-regression model of Birth Year given the High-school Graduation Year.
For the users with known ages: use them to train the linear-regression model.
For the users with HSY, use the model to get estimated BY.
For the users without HSY, if enough friends with HSY available, estimate the BY with the most frequent HSY of friends.
Iterative approach: for those without enough friends, iteratively estimate the HSY and BY until the whole graph gets covered.

What if the user has not made their friend list public?

Use reverse look up.

Discussion Questions

How can social networks be best used by advertisers? (Think like an advertiser or social network vendor)
Are there alternative approaches to social networking that may limit inference of attributes about users? (Consider architecture, business models, regulation, etc.)

De-Indentification

Case Studies

GIC incident: Re-identification of the governor with ZIP code + Birth date + Sex
AOL incident: Search logs identified to link with searchers
Netflix incident: Use 8 movie ratings to identify users

k-anonymity

Any sequence of quasi-identifiers (zip code, sex, birth date) must appear in at least k records.

Metrics

l-diversity: Within each quasi-identifier group, there must be at least l distinct values for each attribute t-closeness: The distance between the distribution of attributes within a quasi-identifier group and the overall distribution should not exceed t

Differential Privacy

What can be learned from accessing the database is (roughly) the same regardless of whether an individual is in the database.

Sensitivity

Sensitivity measures how much an individual record can change the output f(D)

Laplacian Mechanism

Add noise from Laplace distribution

In Practice

Set a privacy budget, each query some of the remaining budget, once running out of budget, stop answering.

HIPAA (Mechanism in Practice)

Health Insurance Portability and Accountability Act (HIPAA) 1996 – In particular, it addresses security and privacy of health data

HIPAA Privacy Rule

Two options for de-identification

Safe Harbor: redaction of 18 sensitive attributes
Expert Determination: e.g., statistician certifies risk of re-identification is “small”

Discussion Questions

What should be concerned about in terms of privacy?
What techniques would you use to de-identify a dataset?

Machine Learning in Security

DeepLog: Anomaly Detection through Deep Learning

Anomaly detection fro system logs
Challenges
- Large volume of data
- Sequential data
- Unstructured data
- Outlier detection
- High cost of errors
- Semantic gaps: difficult to transfer results into actionable report for the network operator
- Diversity with data and concept drift
- Difficulties with evaluations
  Taxonomy
  
  Influence
  1. Causative attacks alter the training process through influence over the training data (poisoning, backdoor attacks)
  2. Exploratory attacks do not alter the training process but use other techniques, such as probing the detector, to discover information about it or its training data (evasion, privacy attacks)
    Background Knowledge
  3. White-box attacks
  4. Black-box attacks
    Security Violation
  5. Integrity attacks result in intrusion points being classified as normal (i.e., cause false negatives)
  6. Availability attacks cause so many classification errors (e.g., false positives), that the system becomes effectively unusable
  7. Privacy violation: the adversary obtains information from the learner, compromising the secrecy or privacy of the system’s users
    Specificity
  8. Targeted attack
  9. Indiscrimintate adversary

Case Study

Poisoning Attack for Traffic Anomalies Detection

Goal is to launch a DoS on some victim
Add additional traffic called chaff over the targeted flow (causative attack)
Chaff selection can be locally informed or globally informed

Boling Frog Poisoning

Set a theta parameter controlling the intensity of the attack, initially small, increase it slowly over time.

Impersonation Attack

Pertubation applyed by glasses
Constrained by: smooth transitions among pixels, printability of RGB values
Limitations: low success rate for some targets
Some variations in lighting

Discussion Questions

How can you attack the spam filtering model discussed in the lecture?
Do you think ML will replace human analysts in detecting security threats? Why or why not?
How can we defend against the adversarial machine learning attacks mentioned in the lecture?

Cryptography

Symmetric

AES

Hash

MD5, SHA1, SHA2, SHA3

Asymmetric

RSA (Prime number)

RSA vs. AES

AES is 1000x faster than RSA
AES is less complex than RSA
AES has 10x shorter keys than RSA (e.g., 192 bits vs. 2048 bits)
RSA requires no shared secrets

Digital Signature

Based on RSA

IND-CPA

Indistinguishability under Chosen Plaintext Attack.

Homomorphic Encryption

FHE

Fully Homomorphic Encryption. Addition and multiplication. Not efficient.

PHE

Partially Homorphic Encryption. Only multiplication (RSA is a PHE)

Applications:

e-Voting
Digital cash
Private matching

Private Seet Intersection

Client has a set C of n items, server has a set S of m items, want to compute C intersect with S without revealing anything more about C and S.

Use homomorphic encryption.

Searchable Encryption

Client encrypts documents, sneds them to server, client asks the server to return the documents containing an encrypted keyword.

Discussion Questions

Why not just trust the cloud provider?
What other problems could be solved using Private Set Intersection?
Are there alternative architectures for searchable encryption?

Trusted Computing

Trusted Computing allows “a piece of data to dictate what Operating System and Application must be used to open it”

TPM

Trusted Platform Module.

Hardware that provides encryption, certification, authenticated boot.

Secure Boot

Hashing of bootloader, OS kernel, kernel module, etc. Concatenating hashes.

Certification Service

Once a configuration is achieved and logged, the TPM can certify configuration to others (attestation).

Encryption Service

Encrypts data so that it can only be decrypted by a machine with a certain configuration.

TPM maintains a master secret key unique to machine.

Critisms Against TPM

Root of trust
Anti-competitive effect

Secure Enclave and SGX

Motivation: apps not protected from privileged code attacks.

Approach: reduce the attack surface of the app with SGX

SGX

Intel Software Guard Extensions.

Built into Intel CPUs
The built-in CPU instructions allow user-level as well as OS code to define private regions of memory, called enclaves
Contents in enclaves are encrypted and unable to be either read or written by any process outside the enclave (including privileged processes).

SGX enabled processors offer two crucial properties.

Isolation: Each enclave’s environment is isolated from the untrusted software outside the enclave, as well as from other enclaves.
Attestation: A software attestation scheme that allows a remote party to authenticate the software running inside an enclave.

How Secure Enclaves Work

Application is built with trusted and untrusted parts
Trusted and untrusted parts are explicitly separated by app developers
ECALL: Trusted function call
OCALL: Return of function

SGX Limitations

SGX does not defend against software side-channel adversary!

Access Control Models

Bell-LaPadula (BLP), Biba, Clark-Wilson, Chinese Wall

Discussion Questions

Should we accept Intel as a root of trust?
What are some use cases for Trusted Computing in addition to disk encryption (e.g., Bitlocker)?

Bitcoin

SKIP THIS :P

Information Flow

Noninterference

Private data does not interfere with network communication
Baseline confidentiality policy

Health Information Technology

Genomic Privacy Attack

How much is the individual’s genomic privacy threatened by their relatives revealing their genomes.

Human DNA sequence is identical at 99.5% of the positions.

SNP (Single Nucleotide Polymorphism):

Positions where a nucleotide is different between people
Define physical characteristics, indicator of diseases
50 million SNP positions

Mobile OS Security

PC vs. smart phones

Users: root privileges typically not given to user
Persistent personal data, persistent login within apps
Battery performance is an issue (implementing some security features may drain battery)
Network usage can be expensive
Location Data (GPS and Wifi-based tracking)
Premium SMS Messages (expensive)
Placing and recording phone calls
Different authentication mechanisms
Mobile payments
Specific third-party app markets

ALRIGHT I GIVE UP!!!

Written on December 10, 2024

Review of CS 463 Materials

Table of Contents

Introduction

Define Computer Security

Definitions

Defenses

2FA Bypass (Real Time Phishing)

Social Networks

Homophily

Age Inference

Baseline

Approach

What if the user has not made their friend list public?

Discussion Questions

De-Indentification

Case Studies

k-anonymity

Metrics

Differential Privacy

Sensitivity

Laplacian Mechanism

In Practice

HIPAA (Mechanism in Practice)

HIPAA Privacy Rule

Discussion Questions

Machine Learning in Security

DeepLog: Anomaly Detection through Deep Learning

Taxonomy

Influence

Background Knowledge

Security Violation

Specificity

Case Study

Poisoning Attack for Traffic Anomalies Detection

Boling Frog Poisoning

Impersonation Attack

Discussion Questions

Cryptography

Symmetric

Hash

Asymmetric

RSA vs. AES

Digital Signature

IND-CPA

Homomorphic Encryption

FHE

PHE

Private Seet Intersection

Searchable Encryption

Discussion Questions

Trusted Computing

TPM

Secure Boot

Certification Service

Encryption Service

Critisms Against TPM

Secure Enclave and SGX

SGX

How Secure Enclaves Work

SGX Limitations

Access Control Models

Discussion Questions

Bitcoin

Information Flow

Noninterference

Health Information Technology

Genomic Privacy Attack

Mobile OS Security

PC vs. smart phones