https://github.com/amankrsahu/deep-audio-cnn
This repository contains implementation of a ResNet-style CNN in PyTorch for real-time environmental sound classification.
https://github.com/amankrsahu/deep-audio-cnn
cnn-classification fastapi modal nextjs python3 pytorch resnet tailwindcss tensorboard typescript
Last synced: 3 months ago
JSON representation
This repository contains implementation of a ResNet-style CNN in PyTorch for real-time environmental sound classification.
- Host: GitHub
- URL: https://github.com/amankrsahu/deep-audio-cnn
- Owner: AmanKrSahu
- Created: 2025-07-05T06:39:10.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-08-22T21:02:16.000Z (10 months ago)
- Last Synced: 2025-09-12T01:29:57.878Z (10 months ago)
- Topics: cnn-classification, fastapi, modal, nextjs, python3, pytorch, resnet, tailwindcss, tensorboard, typescript
- Language: TypeScript
- Homepage: https://deep-audio-cnn.vercel.app
- Size: 465 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Audio Classification CNN
Classify short audio clips (e.g., **dog bark**, **bird chirp**, **siren**, **rain**) with a ResNet-style CNN trained on **Mel Spectrograms**. The project includes a full **training pipeline (PyTorch)**, **FastAPI** inference service, **serverless GPU inference with Modal**, and an **interactive Next.js + React dashboard** for uploads, real-time predictions, and feature‑map visualization.

---
## ✨ Features
* 🧠 **Deep Audio CNN** for sound classification
* 🧱 **ResNet-style** architecture with residual blocks
* 🎼 **Mel Spectrogram** audio-to-image conversion
* 🎛️ **Data augmentation**: Mixup + SpecAugment (Time/Freq masking)
* ⚡ **Serverless GPU inference** with **Modal**
* 📊 **Interactive Next.js & React dashboard** (Tailwind + shadcn/ui)
* 📈 **Real-time classification** with confidence scores
* 🌊 **Waveform & Spectrogram** visualization
* 🚀 **FastAPI** inference endpoint (+ Pydantic validation)
* 📈 **TensorBoard** integration for training analysis
* ✅ **Pydantic** validation for robust API requests
---
## 🧱 Architecture Overview
* **Why Mel Spectrograms?** They convert audio to a perceptual time–frequency image that CNNs handle well.
* **Why ResNet?** Residual connections ease optimization of deeper models and boost accuracy.
* **Why Mixup/SpecAugment?** Strong regularization for robustness against noise and domain shift.
---
## 🧩 Project Setup
### 1. Python environment
```bash
cd server
conda create -n audio-cnn python=3.11 -y
conda activate audio-cnn
pip install -r requirements.txt
```
### 2. Next.js frontend
```bash
cd client
npm install
npm run dev
```
---
## 🔧 Environment Variables
Create `.env` in your client root
```
NEXT_PUBLIC_MODAL_API_ENDPOINT="Your_API_Key"
```
---
## Features and Interfaces


---
## 🧰 Troubleshooting
* **Torchaudio backend errors**: ensure `ffmpeg`/`libsndfile` installed.
* **Noisy predictions**: raise clip length, tweak Mixup `alpha`, reduce masks.
* **Overfitting**: stronger Mixup/SpecAug, Dropout in classifier, early stopping.
* **Underfitting**: deeper ResNet, higher `base_channels`, longer training, lower weight decay.
---
## 🚀 Need Help??
Feel free to contact me on [Linkedin](https://www.linkedin.com/in/amankrsahu)
[](https://www.instagram.com/itz.amansahu/) [](discordapp.com/users/539751578866024479)