Adversarial Machine Learning Against Voice Assistant Systems

Project Objective

This project aims to study the security of voice assistant systems under adversarial machine learning. Adversarial learning algorithms can generate adversarial audio samples to serve as the input of voice assistant systems, so as to fool the machine learning models in the system. In this project, we will focus on the white-box attack in the digital domain by generating adversarial samples using adversarial machine learning algorithms to attack a speaker recognition system based on X-Vector. If time allows, we will further enhance the robustness of the attack by simulating room impulse response and conduct over-the-air attack.
Weekly plan


*Week 1

*Week 2

*Week 3

*Week 4

*Week 5

*Week 6

Reading Material

Week 1 Activities

  • Get ORBIT/COSMOS account and familiarize oneself with the testbed procedures

Week 2 Activities

  • Get familiar with Python language.
    — Install Python environment
    — Use Jupyter Notebook to run Python code samples
  • Learn the concept of deep learning and deep neural networks.
    — Slides: Neural Network Basics of Energy-Efficient Machine Learning System
    — Video tutorial (Optional): Neural Networks and Deep Learning by Andrew Ng (Recommended chapters: Week 2: Logistic Regression as a Neural Network, Week 3: Shallow Neural Network)

Week 3 Activities

  • Setup TensorFlow and Keras environment using Anaconda
    — Follow the tutorial “Basic classification: Classify Images of Clothing” to get familiar with TensorFlow and Keras
    — Read the tutorial “Simple Neural Networks in Python” (code implementation not required)
    — Read the “TensorFlow Neural Network Tutorial” and run the code implementation (optional)
  • Read the paper “X-Vectors: Robust DNN Embeddings for Speaker Recognition” (IEEE ICASSP 2018).
    — Try to understand the workflow of x-vector and learn background knowledge, such as the application of x-vector, concept of the phoneme, data augmentation, etc. (try to learn TDNN and MFCC if time allows)

Week 4 Activities

  • Learn the feature extraction process of MFCC and extract the MFCC feature using TensorFlow based on the sample code.
  • Understand the speaker recognition system (X-Vector) and time-delay neural network.
    — Understand the concept of Time Delay Neural Network (TDNN).
    — (Optional) Learn the concept of Convolutional Neural Network (CNN) and find the similarities between CNN and TDNN. (Note: the implementation of TDNN will be based on one-dimensional CNN.)

Week 5 Activities

  • Learn the steps of using X-Vector model for speaker recognition
    — Understand the 1D convolutional layer and use it to implement TDNN
    — Understand Statistical Pooling layer
    — Classify speakers using Probabilistic Linear Discriminant Analysis (PLDA): trained with the embeddings from the X-vector
  • Study the Python code samples for X-Vector and implement X-Vector using TensorFlow
  • Read the paper: Practical Adversarial Attacks Against Speaker Recognition Systems (HotMobile’20) and get familiar with the untargeted attack

Week 6 Activities

  • Develop an untargeted attack that can generate adversarial samples based on the sample code and tutorial.

— Understand Fast Gradient Sign Method (FSGM)
— Understand cross-entropy as cost function

  • Evaluate the performance of the adversarial samples on the voice assistant system (X-Vector).

Project Website

Last modified 2 days ago Last modified on Jul 9, 2020, 5:27:45 PM
Note: See TracWiki for help on using the wiki.