https://github.com/amankrsahu/deep-audio-cnn

This repository contains implementation of a ResNet-style CNN in PyTorch for real-time environmental sound classification.
https://github.com/amankrsahu/deep-audio-cnn

cnn-classification fastapi modal nextjs python3 pytorch resnet tailwindcss tensorboard typescript

Last synced: 3 months ago
JSON representation

This repository contains implementation of a ResNet-style CNN in PyTorch for real-time environmental sound classification.

Host: GitHub
URL: https://github.com/amankrsahu/deep-audio-cnn
Owner: AmanKrSahu
Created: 2025-07-05T06:39:10.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-08-22T21:02:16.000Z (10 months ago)
Last Synced: 2025-09-12T01:29:57.878Z (10 months ago)
Topics: cnn-classification, fastapi, modal, nextjs, python3, pytorch, resnet, tailwindcss, tensorboard, typescript
Language: TypeScript
Homepage: https://deep-audio-cnn.vercel.app
Size: 465 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Audio Classification CNN

Classify short audio clips (e.g., **dog bark**, **bird chirp**, **siren**, **rain**) with a ResNet-style CNN trained on **Mel Spectrograms**. The project includes a full **training pipeline (PyTorch)**, **FastAPI** inference service, **serverless GPU inference with Modal**, and an **interactive Next.js + React dashboard** for uploads, real-time predictions, and feature‑map visualization.

---

## ✨ Features

* 🧠 **Deep Audio CNN** for sound classification
* 🧱 **ResNet-style** architecture with residual blocks
* 🎼 **Mel Spectrogram** audio-to-image conversion
* 🎛️ **Data augmentation**: Mixup + SpecAugment (Time/Freq masking)
* ⚡ **Serverless GPU inference** with **Modal**
* 📊 **Interactive Next.js & React dashboard** (Tailwind + shadcn/ui)
* 📈 **Real-time classification** with confidence scores
* 🌊 **Waveform & Spectrogram** visualization
* 🚀 **FastAPI** inference endpoint (+ Pydantic validation)
* 📈 **TensorBoard** integration for training analysis
* ✅ **Pydantic** validation for robust API requests

---

## 🧱 Architecture Overview

* **Why Mel Spectrograms?** They convert audio to a perceptual time–frequency image that CNNs handle well.
* **Why ResNet?** Residual connections ease optimization of deeper models and boost accuracy.
* **Why Mixup/SpecAugment?** Strong regularization for robustness against noise and domain shift.

---

## 🧩 Project Setup

### 1. Python environment

```bash
cd server
conda create -n audio-cnn python=3.11 -y
conda activate audio-cnn
pip install -r requirements.txt
```

### 2. Next.js frontend

```bash
cd client
npm install
npm run dev
```

---

## 🔧 Environment Variables

Create `.env` in your client root

```
NEXT_PUBLIC_MODAL_API_ENDPOINT="Your_API_Key"
```

---

## Features and Interfaces

cnn-1

cnn-2

---

## 🧰 Troubleshooting

* **Torchaudio backend errors**: ensure `ffmpeg`/`libsndfile` installed.
* **Noisy predictions**: raise clip length, tweak Mixup `alpha`, reduce masks.
* **Overfitting**: stronger Mixup/SpecAug, Dropout in classifier, early stopping.
* **Underfitting**: deeper ResNet, higher `base_channels`, longer training, lower weight decay.

---

## 🚀 Need Help??

Feel free to contact me on [Linkedin](https://www.linkedin.com/in/amankrsahu)

[![Instagram URL](https://img.shields.io/badge/Instagram-E4405F?style=for-the-badge&logo=instagram&logoColor=white)](https://www.instagram.com/itz.amansahu/) [![Discord URL](https://img.shields.io/badge/Discord-7289DA?style=for-the-badge&logo=discord&logoColor=white)](discordapp.com/users/539751578866024479)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amankrsahu/deep-audio-cnn

Awesome Lists containing this project

README