{"id":30325826,"url":"https://github.com/amankrsahu/deep-audio-cnn","last_synced_at":"2026-04-15T10:34:54.228Z","repository":{"id":303024190,"uuid":"1014152370","full_name":"AmanKrSahu/deep-audio-cnn","owner":"AmanKrSahu","description":"This repository contains implementation of a ResNet-style CNN in PyTorch for real-time environmental sound classification.","archived":false,"fork":false,"pushed_at":"2025-08-22T21:02:16.000Z","size":476,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-12T01:29:57.878Z","etag":null,"topics":["cnn-classification","fastapi","modal","nextjs","python3","pytorch","resnet","tailwindcss","tensorboard","typescript"],"latest_commit_sha":null,"homepage":"https://deep-audio-cnn.vercel.app","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AmanKrSahu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-05T06:39:10.000Z","updated_at":"2025-08-22T20:59:56.000Z","dependencies_parsed_at":"2025-07-05T08:35:11.977Z","dependency_job_id":"b4b2b277-4604-4dce-892d-44556f6d4ac1","html_url":"https://github.com/AmanKrSahu/deep-audio-cnn","commit_stats":null,"previous_names":["amankrsahu/deep-audio-cnn"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AmanKrSahu/deep-audio-cnn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmanKrSahu%2Fdeep-audio-cnn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmanKrSahu%2Fdeep-audio-cnn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmanKrSahu%2Fdeep-audio-cnn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmanKrSahu%2Fdeep-audio-cnn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AmanKrSahu","download_url":"https://codeload.github.com/AmanKrSahu/deep-audio-cnn/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmanKrSahu%2Fdeep-audio-cnn/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31837336,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T10:26:52.245Z","status":"ssl_error","status_checked_at":"2026-04-15T10:26:51.649Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cnn-classification","fastapi","modal","nextjs","python3","pytorch","resnet","tailwindcss","tensorboard","typescript"],"created_at":"2025-08-17T23:08:17.313Z","updated_at":"2026-04-15T10:34:54.178Z","avatar_url":"https://github.com/AmanKrSahu.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Audio Classification CNN\n\nClassify short audio clips (e.g., **dog bark**, **bird chirp**, **siren**, **rain**) with a ResNet-style CNN trained on **Mel Spectrograms**. The project includes a full **training pipeline (PyTorch)**, **FastAPI** inference service, **serverless GPU inference with Modal**, and an **interactive Next.js + React dashboard** for uploads, real-time predictions, and feature‑map visualization.\n\n\u003cimg src=\"https://img.shields.io/badge/Next.js-000?logo=nextdotjs\u0026logoColor=fff\u0026style=for-the-badge\" /\u003e \u003cimg src=\"https://img.shields.io/badge/TypeScript-007ACC?style=for-the-badge\u0026logo=typescript\u0026logoColor=white\" /\u003e \u003cimg src=\"https://img.shields.io/badge/Python-14354C?style=for-the-badge\u0026logo=python\u0026logoColor=white\" /\u003e \u003cimg src=\"https://img.shields.io/badge/PyTorch-DE3412?style=for-the-badge\u0026logo=pytorch\u0026logoColor=white\" /\u003e \u003cimg src=\"https://img.shields.io/badge/TensorFlow-FF6F00?style=for-the-badge\u0026logo=tensorflow\u0026logoColor=white\" /\u003e \u003cimg src=\"https://img.shields.io/badge/Flask-000000?style=for-the-badge\u0026logo=flask\u0026logoColor=white\" /\u003e \u003cimg src=\"https://img.shields.io/badge/npm-CB3837?style=for-the-badge\u0026logo=npm\u0026logoColor=white\" /\u003e\n\n---\n\n## ✨ Features\n\n* 🧠 **Deep Audio CNN** for sound classification\n* 🧱 **ResNet-style** architecture with residual blocks\n* 🎼 **Mel Spectrogram** audio-to-image conversion\n* 🎛️ **Data augmentation**: Mixup + SpecAugment (Time/Freq masking)\n* ⚡ **Serverless GPU inference** with **Modal**\n* 📊 **Interactive Next.js \u0026 React dashboard** (Tailwind + shadcn/ui)\n* 📈 **Real-time classification** with confidence scores\n* 🌊 **Waveform \u0026 Spectrogram** visualization\n* 🚀 **FastAPI** inference endpoint (+ Pydantic validation)\n* 📈 **TensorBoard** integration for training analysis\n* ✅ **Pydantic** validation for robust API requests\n\n---\n\n## 🧱 Architecture Overview\n\n* **Why Mel Spectrograms?** They convert audio to a perceptual time–frequency image that CNNs handle well.\n* **Why ResNet?** Residual connections ease optimization of deeper models and boost accuracy.\n* **Why Mixup/SpecAugment?** Strong regularization for robustness against noise and domain shift.\n\n---\n\n## 🧩 Project Setup\n\n### 1. Python environment\n\n```bash\ncd server\nconda create -n audio-cnn python=3.11 -y\nconda activate audio-cnn\npip install -r requirements.txt\n```\n\n### 2. Next.js frontend\n\n```bash\ncd client\nnpm install\nnpm run dev\n```\n\n---\n\n## 🔧 Environment Variables\n\nCreate `.env` in your client root\n\n```\nNEXT_PUBLIC_MODAL_API_ENDPOINT=\"Your_API_Key\"\n```\n\n---\n\n## Features and Interfaces\n\n\u003cimg width=\"1920\" height=\"1080\" alt=\"cnn-1\" src=\"https://github.com/user-attachments/assets/82bcd6d6-410a-4605-a564-ff0c67c57b1e\" /\u003e\n\n\u003cimg width=\"1920\" height=\"1080\" alt=\"cnn-2\" src=\"https://github.com/user-attachments/assets/f4315322-216d-44a1-a1e1-9384806ad253\" /\u003e\n\n---\n\n## 🧰 Troubleshooting\n\n* **Torchaudio backend errors**: ensure `ffmpeg`/`libsndfile` installed.\n* **Noisy predictions**: raise clip length, tweak Mixup `alpha`, reduce masks.\n* **Overfitting**: stronger Mixup/SpecAug, Dropout in classifier, early stopping.\n* **Underfitting**: deeper ResNet, higher `base_channels`, longer training, lower weight decay.\n\n---\n\n## 🚀 Need Help??\n\nFeel free to contact me on [Linkedin](https://www.linkedin.com/in/amankrsahu)\n\n[![Instagram URL](https://img.shields.io/badge/Instagram-E4405F?style=for-the-badge\u0026logo=instagram\u0026logoColor=white)](https://www.instagram.com/itz.amansahu/) \u0026nbsp; [![Discord URL](https://img.shields.io/badge/Discord-7289DA?style=for-the-badge\u0026logo=discord\u0026logoColor=white)](discordapp.com/users/539751578866024479)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famankrsahu%2Fdeep-audio-cnn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famankrsahu%2Fdeep-audio-cnn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famankrsahu%2Fdeep-audio-cnn/lists"}