wiki:Other/Summer/2024/rT

Real time, robust and reliable (R3) machine learning over wireless networks

Group Members: Akshar Vedantham, Kirthana Ram, Varun Kota

Advisor: Anand Sarwate

Project Overview

As machine learning applications continue to be developed, more and more computationally intense tasks will have to be performed on mobile devices such as phones, cars, and drones. Mobile devices often offload data to the cloud to help execute these applications. However, offloading this process can result in delays and a lack of latency.

To reduce latency when working with the cloud, several methods have been proposed. The two that we will be focusing on are called split computing and early exiting. Our goal will be to construct AI/ML algorithms, implement them on Orbit nodes using split computing and early exiting, and build a documented codebase while evaluating the efficiency of these algorithms.

Weekly Progress

Week 1 (5/28 - 5/30)

  • Phones, cars, and other devices will want to start using ML/AI applications
  • Leveraging the cloud to help them with this
  • Issues - latency and security

Possible Solution - Early Exiting

Week 2 (6/03 - 6/06)

  • Familiarizing ourselves with Machine Learning concepts, PyTorch, neural network architecture, gradient descent, cost function, weights and biases
  • Met with our advisors, learned about their work, and discussed what projects we wanted to work on

Week 3 (6/10 - 6/13)

  • Created a NN using the MNIST dataset
  • Achieved an overall network accuracy of 98.17%
  • Worked on an NN for classifying fashion outfits via image recognition
  • Read several research papers given to us
  • Worked with Orbit to familiarize ourselves with communicating between nodes

Week 4 (6/17 - 6/20)

  • Created an NN using the CIFAR-10 dataset

Week 5 (6/24 - 6/27)

  • Compared the mean accuracy and standard deviation for different thresholds
  • Compared the mean number of early exits and standard deviation for different thresholds
  • Set up Nvidia CUDA on orbit nodes
  • Training and testing AI models on the nodes (Alternative to Google Colab)
  • Encountered hardware issues with measuring latency (PTP)

Week 6 (7/01 - 7/03)

  • Used Cosmos SB1, SB2 and Bed
  • Got PTP working on Bed, but it doesn’t have wireless connection
  • We need PTP AND wireless connection

Week 7 (7/08 - 7/11)

  • Experimented with new variables (feature variance and entropy)
  • Feature Variance represents the diversity of features detected by CNN layers (e.g., edges, textures, shapes)
  • Entropy measures the amount of disorder or uncertainty in a dataset

Week 8 (7/15 - 7/18)

  • Evaluated the results of our normal confidence model and our feature variance entropy model
  • Got PTP working on Orbit, not Cosmos
  • Got nodes to synchronize with each other, recorded latency
  • Was able to send and run code from one node to another via ssh

Week 9 (7/22 - 7/25)

  • Separate models to run on multiple nodes
  • Train models before setting up on nodes - FVE and Confidence Model
  • Resolve overfitting
  • Used SSH to have nodes access each other - very slow and huge security/privacy risk for client/server
  • Began using Restful API interface (sends data securely over the internet via json)
  • Orders of magnitude faster
  • Models stay running and send/receive data fast
  • More secure
  • Device can’t run or access programs on the server
Last modified 3 months ago Last modified on Jul 29, 2024, 6:41:32 PM
Note: See TracWiki for help on using the wiki.