Review of CS 463 Materials

This is a review of the CS 463 lecture materials. I simply copy-paste some of them here as a review for the final.

Table of Contents

Introduction

Define Computer Security

A collection of properties that hold in a system in the presence of an adversary under a set of constraints.

Definitions

  • Confidentiality (privacy): Prevent unauthorized parties from accessing certain data/system
  • Integrity: Prevent unauthorized parties from tampering with certain data/system
  • Availability: Make sure certain data/system is available to users
  • Authenticity: Proof of true identity/origin
  • Anonymity: Cannot be distinguished from others
  • Accountability: The ability to identify the responsible party

    Defenses

  • Backups
  • Automatic updates
  • Two-factor authentication

    2FA Bypass (Real Time Phishing)

    Phish site to ask for 2FA code from user while attacker attempting to login the real site.

Social Networks

Homophily

The tendency of individuals to associate and bond with similar others.

  • Choice Homophily: Closeness due to preferences by the individual. Example: Favorite teams
  • Induced Homophily: Closeness due to other constraints. Examples: Geographic closeness, Age closeness with friends.
  • Value Homophily: Individuals with similar values, thinking. Example: Religion
  • Status Homophily Individual with similar social status. Example: Aristocracy

Age Inference

Ages of friends should be similar to that of the user, high-school graduation year of friends should be closer to the high-school graduation of the user.

Baseline

Just take the mean / median of the known ages in the whole dataset as the age estimate.

Approach

  1. Train a linear-regression model of Birth Year given the High-school Graduation Year.
  2. For the users with known ages: use them to train the linear-regression model.
  3. For the users with HSY, use the model to get estimated BY.
  4. For the users without HSY, if enough friends with HSY available, estimate the BY with the most frequent HSY of friends.
  5. Iterative approach: for those without enough friends, iteratively estimate the HSY and BY until the whole graph gets covered.

What if the user has not made their friend list public?

Use reverse look up.

Discussion Questions

  1. How can social networks be best used by advertisers? (Think like an advertiser or social network vendor)
  2. Are there alternative approaches to social networking that may limit inference of attributes about users? (Consider architecture, business models, regulation, etc.)

De-Indentification

Case Studies

  1. GIC incident: Re-identification of the governor with ZIP code + Birth date + Sex
  2. AOL incident: Search logs identified to link with searchers
  3. Netflix incident: Use 8 movie ratings to identify users

k-anonymity

Any sequence of quasi-identifiers (zip code, sex, birth date) must appear in at least k records.

Metrics

l-diversity: Within each quasi-identifier group, there must be at least l distinct values for each attribute t-closeness: The distance between the distribution of attributes within a quasi-identifier group and the overall distribution should not exceed t

Differential Privacy

What can be learned from accessing the database is (roughly) the same regardless of whether an individual is in the database.

Sensitivity

Sensitivity measures how much an individual record can change the output f(D)

Laplacian Mechanism

Add noise from Laplace distribution

In Practice

Set a privacy budget, each query some of the remaining budget, once running out of budget, stop answering.

HIPAA (Mechanism in Practice)

Health Insurance Portability and Accountability Act (HIPAA) 1996 – In particular, it addresses security and privacy of health data

HIPAA Privacy Rule

Two options for de-identification

  1. Safe Harbor: redaction of 18 sensitive attributes
  2. Expert Determination: e.g., statistician certifies risk of re-identification is “small”

Discussion Questions

  1. What should be concerned about in terms of privacy?
  2. What techniques would you use to de-identify a dataset?

Machine Learning in Security

DeepLog: Anomaly Detection through Deep Learning

  • Anomaly detection fro system logs
  • Challenges
    • Large volume of data
    • Sequential data
    • Unstructured data
    • Outlier detection
    • High cost of errors
    • Semantic gaps: difficult to transfer results into actionable report for the network operator
    • Diversity with data and concept drift
    • Difficulties with evaluations

      Taxonomy

      Influence

      1. Causative attacks alter the training process through influence over the training data (poisoning, backdoor attacks)
      2. Exploratory attacks do not alter the training process but use other techniques, such as probing the detector, to discover information about it or its training data (evasion, privacy attacks)

        Background Knowledge

      3. White-box attacks
      4. Black-box attacks

        Security Violation

      5. Integrity attacks result in intrusion points being classified as normal (i.e., cause false negatives)
      6. Availability attacks cause so many classification errors (e.g., false positives), that the system becomes effectively unusable
      7. Privacy violation: the adversary obtains information from the learner, compromising the secrecy or privacy of the system’s users

        Specificity

      8. Targeted attack
      9. Indiscrimintate adversary

Case Study

Poisoning Attack for Traffic Anomalies Detection

  1. Goal is to launch a DoS on some victim
  2. Add additional traffic called chaff over the targeted flow (causative attack)
  3. Chaff selection can be locally informed or globally informed

Boling Frog Poisoning

  1. Set a theta parameter controlling the intensity of the attack, initially small, increase it slowly over time.

Impersonation Attack

  1. Pertubation applyed by glasses
  2. Constrained by: smooth transitions among pixels, printability of RGB values
  3. Limitations: low success rate for some targets
  4. Some variations in lighting

Discussion Questions

  1. How can you attack the spam filtering model discussed in the lecture?
  2. Do you think ML will replace human analysts in detecting security threats? Why or why not?
  3. How can we defend against the adversarial machine learning attacks mentioned in the lecture?

Cryptography

Symmetric

AES

Hash

MD5, SHA1, SHA2, SHA3

Asymmetric

RSA (Prime number)

RSA vs. AES

  • AES is 1000x faster than RSA
  • AES is less complex than RSA
  • AES has 10x shorter keys than RSA (e.g., 192 bits vs. 2048 bits)
  • RSA requires no shared secrets

Digital Signature

Based on RSA

IND-CPA

Indistinguishability under Chosen Plaintext Attack.

Homomorphic Encryption

FHE

Fully Homomorphic Encryption. Addition and multiplication. Not efficient.

PHE

Partially Homorphic Encryption. Only multiplication (RSA is a PHE)

Applications:

  1. e-Voting
  2. Digital cash
  3. Private matching

Private Seet Intersection

Client has a set C of n items, server has a set S of m items, want to compute C intersect with S without revealing anything more about C and S.

Use homomorphic encryption.

Searchable Encryption

Client encrypts documents, sneds them to server, client asks the server to return the documents containing an encrypted keyword.

Discussion Questions

  1. Why not just trust the cloud provider?
  2. What other problems could be solved using Private Set Intersection?
  3. Are there alternative architectures for searchable encryption?

Trusted Computing

Trusted Computing allows “a piece of data to dictate what Operating System and Application must be used to open it”

TPM

Trusted Platform Module.

Hardware that provides encryption, certification, authenticated boot.

Secure Boot

Hashing of bootloader, OS kernel, kernel module, etc. Concatenating hashes.

Certification Service

Once a configuration is achieved and logged, the TPM can certify configuration to others (attestation).

Encryption Service

Encrypts data so that it can only be decrypted by a machine with a certain configuration.

TPM maintains a master secret key unique to machine.

Critisms Against TPM

  • Root of trust
  • Anti-competitive effect

Secure Enclave and SGX

Motivation: apps not protected from privileged code attacks.

Approach: reduce the attack surface of the app with SGX

SGX

Intel Software Guard Extensions.

  • Built into Intel CPUs
  • The built-in CPU instructions allow user-level as well as OS code to define private regions of memory, called enclaves
  • Contents in enclaves are encrypted and unable to be either read or written by any process outside the enclave (including privileged processes).

SGX enabled processors offer two crucial properties.

  1. Isolation: Each enclave’s environment is isolated from the untrusted software outside the enclave, as well as from other enclaves.
  2. Attestation: A software attestation scheme that allows a remote party to authenticate the software running inside an enclave.

How Secure Enclaves Work

  • Application is built with trusted and untrusted parts
  • Trusted and untrusted parts are explicitly separated by app developers

  • ECALL: Trusted function call
  • OCALL: Return of function

SGX Limitations

SGX does not defend against software side-channel adversary!

Access Control Models

Bell-LaPadula (BLP), Biba, Clark-Wilson, Chinese Wall

Discussion Questions

  1. Should we accept Intel as a root of trust?
  2. What are some use cases for Trusted Computing in addition to disk encryption (e.g., Bitlocker)?

Bitcoin

SKIP THIS :P

Information Flow

Noninterference

  • Private data does not interfere with network communication
  • Baseline confidentiality policy

Health Information Technology

Genomic Privacy Attack

How much is the individual’s genomic privacy threatened by their relatives revealing their genomes.

Human DNA sequence is identical at 99.5% of the positions.

SNP (Single Nucleotide Polymorphism):

  • Positions where a nucleotide is different between people
  • Define physical characteristics, indicator of diseases
  • 50 million SNP positions

Mobile OS Security

PC vs. smart phones

  • Users: root privileges typically not given to user
  • Persistent personal data, persistent login within apps
  • Battery performance is an issue (implementing some security features may drain battery)
  • Network usage can be expensive
  • Location Data (GPS and Wifi-based tracking)
  • Premium SMS Messages (expensive)
  • Placing and recording phone calls
  • Different authentication mechanisms
  • Mobile payments
  • Specific third-party app markets

ALRIGHT I GIVE UP!!!

Written on December 10, 2024