Review of CS 463 Materials
This is a review of the CS 463 lecture materials. I simply copy-paste some of them here as a review for the final.
Table of Contents
- Table of Contents
- Introduction
- Social Networks
- De-Indentification
- Machine Learning in Security
- Cryptography
- Trusted Computing
- Bitcoin
- Information Flow
- Health Information Technology
- Mobile OS Security
Introduction
Define Computer Security
A collection of properties that hold in a system in the presence of an adversary under a set of constraints.
Definitions
- Confidentiality (privacy): Prevent unauthorized parties from accessing certain data/system
- Integrity: Prevent unauthorized parties from tampering with certain data/system
- Availability: Make sure certain data/system is available to users
- Authenticity: Proof of true identity/origin
- Anonymity: Cannot be distinguished from others
- Accountability: The ability to identify the responsible party
Defenses
- Backups
- Automatic updates
- Two-factor authentication
2FA Bypass (Real Time Phishing)
Phish site to ask for 2FA code from user while attacker attempting to login the real site.
Social Networks
Homophily
The tendency of individuals to associate and bond with similar others.
- Choice Homophily: Closeness due to preferences by the individual. Example: Favorite teams
- Induced Homophily: Closeness due to other constraints. Examples: Geographic closeness, Age closeness with friends.
- Value Homophily: Individuals with similar values, thinking. Example: Religion
- Status Homophily Individual with similar social status. Example: Aristocracy
Age Inference
Ages of friends should be similar to that of the user, high-school graduation year of friends should be closer to the high-school graduation of the user.
Baseline
Just take the mean / median of the known ages in the whole dataset as the age estimate.
Approach
- Train a linear-regression model of Birth Year given the High-school Graduation Year.
- For the users with known ages: use them to train the linear-regression model.
- For the users with HSY, use the model to get estimated BY.
- For the users without HSY, if enough friends with HSY available, estimate the BY with the most frequent HSY of friends.
- Iterative approach: for those without enough friends, iteratively estimate the HSY and BY until the whole graph gets covered.
What if the user has not made their friend list public?
Use reverse look up.
Discussion Questions
- How can social networks be best used by advertisers? (Think like an advertiser or social network vendor)
- Are there alternative approaches to social networking that may limit inference of attributes about users? (Consider architecture, business models, regulation, etc.)
De-Indentification
Case Studies
- GIC incident: Re-identification of the governor with ZIP code + Birth date + Sex
- AOL incident: Search logs identified to link with searchers
- Netflix incident: Use 8 movie ratings to identify users
k-anonymity
Any sequence of quasi-identifiers (zip code, sex, birth date) must appear in at least k records.
Metrics
l-diversity: Within each quasi-identifier group, there must be at least l distinct values for each attribute t-closeness: The distance between the distribution of attributes within a quasi-identifier group and the overall distribution should not exceed t
Differential Privacy
What can be learned from accessing the database is (roughly) the same regardless of whether an individual is in the database.
Sensitivity
Sensitivity measures how much an individual record can change the output f(D)
Laplacian Mechanism
Add noise from Laplace distribution
In Practice
Set a privacy budget, each query some of the remaining budget, once running out of budget, stop answering.
HIPAA (Mechanism in Practice)
Health Insurance Portability and Accountability Act (HIPAA) 1996 – In particular, it addresses security and privacy of health data
HIPAA Privacy Rule
Two options for de-identification
- Safe Harbor: redaction of 18 sensitive attributes
- Expert Determination: e.g., statistician certifies risk of re-identification is “small”
Discussion Questions
- What should be concerned about in terms of privacy?
- What techniques would you use to de-identify a dataset?
Machine Learning in Security
DeepLog: Anomaly Detection through Deep Learning
- Anomaly detection fro system logs
- Challenges
- Large volume of data
- Sequential data
- Unstructured data
- Outlier detection
- High cost of errors
- Semantic gaps: difficult to transfer results into actionable report for the network operator
- Diversity with data and concept drift
- Difficulties with evaluations
Taxonomy
Influence
- Causative attacks alter the training process through influence over the training data (poisoning, backdoor attacks)
- Exploratory attacks do not alter the training process but use other techniques, such as probing the detector, to discover information about it or its training data (evasion, privacy attacks)
Background Knowledge
- White-box attacks
- Black-box attacks
Security Violation
- Integrity attacks result in intrusion points being classified as normal (i.e., cause false negatives)
- Availability attacks cause so many classification errors (e.g., false positives), that the system becomes effectively unusable
- Privacy violation: the adversary obtains information from the learner, compromising the secrecy or privacy of the system’s users
Specificity
- Targeted attack
- Indiscrimintate adversary
Case Study
Poisoning Attack for Traffic Anomalies Detection
- Goal is to launch a DoS on some victim
- Add additional traffic called chaff over the targeted flow (causative attack)
- Chaff selection can be locally informed or globally informed
Boling Frog Poisoning
- Set a theta parameter controlling the intensity of the attack, initially small, increase it slowly over time.
Impersonation Attack
- Pertubation applyed by glasses
- Constrained by: smooth transitions among pixels, printability of RGB values
- Limitations: low success rate for some targets
- Some variations in lighting
Discussion Questions
- How can you attack the spam filtering model discussed in the lecture?
- Do you think ML will replace human analysts in detecting security threats? Why or why not?
- How can we defend against the adversarial machine learning attacks mentioned in the lecture?
Cryptography
Symmetric
AES
Hash
MD5, SHA1, SHA2, SHA3
Asymmetric
RSA (Prime number)
RSA vs. AES
- AES is 1000x faster than RSA
- AES is less complex than RSA
- AES has 10x shorter keys than RSA (e.g., 192 bits vs. 2048 bits)
- RSA requires no shared secrets
Digital Signature
Based on RSA
IND-CPA
Indistinguishability under Chosen Plaintext Attack.
Homomorphic Encryption
FHE
Fully Homomorphic Encryption. Addition and multiplication. Not efficient.
PHE
Partially Homorphic Encryption. Only multiplication (RSA is a PHE)
Applications:
- e-Voting
- Digital cash
- Private matching
Private Seet Intersection
Client has a set C of n items, server has a set S of m items, want to compute C intersect with S without revealing anything more about C and S.
Use homomorphic encryption.
Searchable Encryption
Client encrypts documents, sneds them to server, client asks the server to return the documents containing an encrypted keyword.
Discussion Questions
- Why not just trust the cloud provider?
- What other problems could be solved using Private Set Intersection?
- Are there alternative architectures for searchable encryption?
Trusted Computing
Trusted Computing allows “a piece of data to dictate what Operating System and Application must be used to open it”
TPM
Trusted Platform Module.
Hardware that provides encryption, certification, authenticated boot.
Secure Boot
Hashing of bootloader, OS kernel, kernel module, etc. Concatenating hashes.
Certification Service
Once a configuration is achieved and logged, the TPM can certify configuration to others (attestation).
Encryption Service
Encrypts data so that it can only be decrypted by a machine with a certain configuration.
TPM maintains a master secret key unique to machine.
Critisms Against TPM
- Root of trust
- Anti-competitive effect
Secure Enclave and SGX
Motivation: apps not protected from privileged code attacks.
Approach: reduce the attack surface of the app with SGX
SGX
Intel Software Guard Extensions.
- Built into Intel CPUs
- The built-in CPU instructions allow user-level as well as OS code to define private regions of memory, called enclaves
- Contents in enclaves are encrypted and unable to be either read or written by any process outside the enclave (including privileged processes).
SGX enabled processors offer two crucial properties.
- Isolation: Each enclave’s environment is isolated from the untrusted software outside the enclave, as well as from other enclaves.
- Attestation: A software attestation scheme that allows a remote party to authenticate the software running inside an enclave.
How Secure Enclaves Work
- Application is built with trusted and untrusted parts
-
Trusted and untrusted parts are explicitly separated by app developers
- ECALL: Trusted function call
- OCALL: Return of function
SGX Limitations
SGX does not defend against software side-channel adversary!
Access Control Models
Bell-LaPadula (BLP), Biba, Clark-Wilson, Chinese Wall
Discussion Questions
- Should we accept Intel as a root of trust?
- What are some use cases for Trusted Computing in addition to disk encryption (e.g., Bitlocker)?
Bitcoin
SKIP THIS :P
Information Flow
Noninterference
- Private data does not interfere with network communication
- Baseline confidentiality policy
Health Information Technology
Genomic Privacy Attack
How much is the individual’s genomic privacy threatened by their relatives revealing their genomes.
Human DNA sequence is identical at 99.5% of the positions.
SNP (Single Nucleotide Polymorphism):
- Positions where a nucleotide is different between people
- Define physical characteristics, indicator of diseases
- 50 million SNP positions
Mobile OS Security
PC vs. smart phones
- Users: root privileges typically not given to user
- Persistent personal data, persistent login within apps
- Battery performance is an issue (implementing some security features may drain battery)
- Network usage can be expensive
- Location Data (GPS and Wifi-based tracking)
- Premium SMS Messages (expensive)
- Placing and recording phone calls
- Different authentication mechanisms
- Mobile payments
- Specific third-party app markets
ALRIGHT I GIVE UP!!!