{"id":49984253,"url":"https://github.com/qhuyitb/engine-audio-retrieval-system","last_synced_at":"2026-05-18T19:49:49.998Z","repository":{"id":352259620,"uuid":"1208700481","full_name":"qhuyitb/engine-audio-retrieval-system","owner":"qhuyitb","description":"Audio retrieval system for vehicle sound similarity search using MFCC-based feature extraction, FastAPI, and Qdrant vector database.","archived":false,"fork":false,"pushed_at":"2026-05-01T02:51:23.000Z","size":52,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-01T04:17:01.741Z","etag":null,"topics":["audio-retrieval","audio-similarity","fastapi","information-retrieval","librosa","python","qdrant"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qhuyitb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-12T16:27:31.000Z","updated_at":"2026-05-01T03:01:41.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/qhuyitb/engine-audio-retrieval-system","commit_stats":null,"previous_names":["qhuyitb/engine-audio-retrieval-system"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/qhuyitb/engine-audio-retrieval-system","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qhuyitb%2Fengine-audio-retrieval-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qhuyitb%2Fengine-audio-retrieval-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qhuyitb%2Fengine-audio-retrieval-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qhuyitb%2Fengine-audio-retrieval-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qhuyitb","download_url":"https://codeload.github.com/qhuyitb/engine-audio-retrieval-system/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qhuyitb%2Fengine-audio-retrieval-system/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33189279,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-18T09:27:30.708Z","status":"ssl_error","status_checked_at":"2026-05-18T09:27:28.300Z","response_time":71,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-retrieval","audio-similarity","fastapi","information-retrieval","librosa","python","qdrant"],"created_at":"2026-05-18T19:49:49.124Z","updated_at":"2026-05-18T19:49:49.993Z","avatar_url":"https://github.com/qhuyitb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🚗 Engine Audio Retrieval System\n\nHệ thống tìm kiếm và so sánh độ tương đồng âm thanh các phương tiện giao thông (airplane, car, train, ...) sử dụng hand-crafted audio features + Qdrant vector database.\n\n---\n\n## 📌 Mô tả\n\nDự án xây dựng hệ thống **audio retrieval** hoàn chỉnh:\n- Trích xuất 72 đặc trưng âm thanh (MFCC, Spectral, Temporal, Harmonic)\n- Chuẩn hóa vector bằng StandardScaler + L2 normalize\n- Lưu trữ và tìm kiếm bằng Qdrant vector database (Cosine similarity, HNSW index)\n- REST API với FastAPI, giao diện demo với Next.js\n- Đánh giá hệ thống: MAP@5 = 90.7% trên 800 files, 8 class\n\n---\n\n## 📁 Project Structure\n\n```bash\nengine-audio-retrieval-system/\n│\n├── README.md\n├── .gitignore\n│\n├── backend/\n│   ├── main.py                        # Entry point: uvicorn backend.main:app\n│   ├── requirements.txt\n│   │\n│   ├── api/\n│   │   ├── __init__.py\n│   │   ├── schemas.py\n│   │   └── routes/\n│   │       ├── search.py              # POST /search — upload file, trả top-5\n│   │       ├── audio.py               # CRUD audio records\n│   │       └── stats.py               # Thống kê dataset\n│   │\n│   ├── features/\n│   │   ├── extractor.py               # Pipeline tổng hợp + StandardScaler + L2 normalize\n│   │   ├── mfcc.py                    # MFCC + Spectral features (52 dims)\n│   │   ├── temporal.py                # ZCR, RMS, Tempo (5 dims)\n│   │   ├── harmonic.py                # Harmonic ratio, Chroma (15 dims)\n│   │   └── scaler_params.npz          # StandardScaler params — generated bởi compute_scaler.py\n│   │\n│   ├── db/\n│   │   ├── config.py                  # Qdrant constants (URL, collection, dim, HNSW)\n│   │   ├── collection.py              # Init/delete/info collection\n│   │   ├── writer.py                  # Upsert/delete points\n│   │   ├── reader.py                  # Get points, scroll, stats\n│   │   └── search.py                  # Cosine similarity search + build_payload\n│   │\n│   ├── search/\n│   │   └── retrieval.py               # Retrieval engine — file .wav → top-5\n│   │\n│   └── scripts/\n│       ├── check_dataset.py           # Kiểm tra dataset trước khi xử lý\n│       ├── preprocess_audio.py        # Chuẩn hóa audio (mono, 22050Hz, [-1,1])\n│       ├── compute_scaler.py          # Tính StandardScaler params → scaler_params.npz\n│       ├── extract_all_features.py    # Build Qdrant index từ toàn bộ dataset\n│       └── evaluate.py               # Đánh giá P@K, MAP@K, Confusion Matrix\n│\n├── frontend/                          # Next.js + React\n│   ├── package.json\n│   ├── next.config.js\n│   ├── tsconfig.json\n│   └── src/\n│       ├── app/\n│       │   ├── layout.tsx\n│       │   ├── page.tsx               # Trang chủ\n│       │   ├── search/\n│       │   │   └── page.tsx           # Search Engine — upload file, top-5 kết quả\n│       │   ├── explorer/\n│       │   │   └── page.tsx           # Dataset Explorer — duyệt và nghe audio\n│       │   ├── stats/\n│       │   │   └── page.tsx           # Stats — thống kê dataset + Qdrant info\n│       │   └── evaluation/\n│       │       └── page.tsx           # Evaluation — P@K, MAP@K, Confusion Matrix\n│       ├── components/\n│       │   ├── AudioPlayer.tsx\n│       │   ├── SimilarityChart.tsx\n│       │   ├── ResultCard.tsx\n│       │   └── SearchPipeline.tsx\n│       └── lib/\n│           ├── api.ts\n│           └── types.ts\n│\n├── data/\n│   ├── raw/                           # File âm thanh gốc (gitignored)\n│   │   ├── Airplane/\n│   │   ├── Bics/\n│   │   ├── Bus/\n│   │   ├── Cars/\n│   │   ├── Helicopter/\n│   │   ├── Motocycles/\n│   │   ├── Train/\n│   │   └── Truck/\n│   ├── processed/                     # File đã chuẩn hóa (gitignored)\n│   │   ├── airplane/\n│   │   ├── bicycle/\n│   │   ├── bus/\n│   │   ├── car/\n│   │   ├── helicopter/\n│   │   ├── motorcycle/\n│   │   ├── train/\n│   │   └── truck/\n│   └── metadata/                      # Kết quả evaluation (gitignored)\n│       ├── eval_results_k1.json\n│       ├── eval_results_k5.json\n│       ├── eval_results_k10.json\n│       └── confusion_matrix_k5.json\n│\n└── qdrant_storage/                    # Qdrant data volume (gitignored)\n```\n\n---\n\n## 📦 Dataset\n\nDataset **không được lưu trong repo**. Tải tại:\n\n👉 https://drive.google.com/drive/folders/1bFD9h3TCubiwOWnEUVhi-ZuHEcAMSB76?usp=drive_link\n\n### 🔧 Setup dữ liệu\n\n1. Tải folder từ link trên và giải nén\n\n2. Sau khi giải nén sẽ có cấu trúc:\n```\nengine-audio-dataset/        ← xóa thư mục bọc ngoài này\n    Airplane/\n    Bics/\n    Bus/\n    ...\n```\n\n3. Chuyển các thư mục con vào thẳng `data/raw/` — kết quả đúng:\n```\ndata/raw/\n    Airplane/\n    Bics/\n    Bus/\n    Cars/\n    Helicopter/\n    Motocycles/\n    Train/\n    Truck/\n```\n\n---\n\n## ⚙️ Cài đặt\n\n```bash\n# 1. Tạo và kích hoạt virtual environment\npython -m venv .venv\nsource .venv/bin/activate        # Linux/Mac\n.venv\\Scripts\\activate           # Windows\n\n# 2. Cài dependencies\npip install -r backend/requirements.txt\n\n# 3. Khởi động Qdrant bằng Docker\ndocker run -d -p 6333:6333 \\\n  -v ${PWD}/qdrant_storage:/qdrant/storage \\\n  qdrant/qdrant\n\n# Kiểm tra Qdrant đang chạy\ncurl http://localhost:6333/healthz\n```\n\n---\n\n## 🚀 Chạy hệ thống (lần đầu)\n\nChạy theo đúng thứ tự sau:\n\n```bash\n# Bước 1: Kiểm tra dataset\npython -m backend.scripts.check_dataset\n\n# Bước 2: Tiền xử lý audio (mono, 22050Hz, normalize biên độ)\npython -m backend.scripts.preprocess_audio\n\n# Bước 3: Tính StandardScaler params trên toàn dataset (~3.5 phút)\npython -m backend.scripts.compute_scaler\n\n# Bước 4: Build Qdrant index — extract features + upsert 800 points (~3.5 phút)\npython -m backend.scripts.extract_all_features\n\n# Bước 5: Chạy API server\npython -m uvicorn backend.main:app --reload --port 8000\n\n# Bước 6: Chạy frontend\ncd frontend \u0026\u0026 npm install \u0026\u0026 npm run dev\n```\n\n---\n\n## 🔍 Test retrieval từ terminal\n\n```bash\npython -m backend.search.retrieval data/processed/car/car_001.wav\npython -m backend.search.retrieval data/processed/airplane/airplane_001.wav\n```\n\n---\n\n## 📊 Đánh giá hệ thống\n\n```bash\n# Precision, Recall, MAP @ K=1, 5, 10\npython -m backend.scripts.evaluate --k 1\npython -m backend.scripts.evaluate --k 5\npython -m backend.scripts.evaluate --k 10\n\n# Confusion matrix K=5\npython -m backend.scripts.evaluate --confusion --k 5\n```\n\n**Kết quả thực nghiệm (800 files, 8 classes):**\n\n| K | Precision@K | MAP@K |\n|---|-------------|-------|\n| 1 | 90.0% | 90.0% |\n| 5 | 85.1% | 90.7% |\n| 10 | 81.9% | 89.2% |\n\n---\n\n## 🌐 API Endpoints\n\n| Method | Endpoint | Mô tả |\n|--------|----------|-------|\n| POST | `/api/search` | Upload file .wav → top-5 giống nhất |\n| GET | `/api/audio` | Danh sách audio (filter theo class) |\n| GET | `/api/audio/{id}` | Thông tin 1 file |\n| DELETE | `/api/audio/{id}` | Xóa 1 file khỏi index |\n| GET | `/api/stats` | Thống kê dataset |\n| GET | `/api/stats/collection` | Thông tin Qdrant collection |\n\nSwagger UI: http://localhost:8000/docs\n\n---\n\n## 🔊 Pipeline xử lý\n\n```\nfile .wav\n  → load (mono, 22050Hz)\n  → normalize biên độ [-1, 1]\n  → extract 72 features (MFCC × 39 + Spectral × 13 + Temporal × 5 + Harmonic × 15)\n  → StandardScaler (mean=0, std=1)\n  → L2 normalize (|vector| = 1.0)\n  → Qdrant cosine similarity search (HNSW index)\n  → top-5 kết quả\n```\n\n---\n\n## ⚠️ Lưu ý\n\n- Đảm bảo Docker đang chạy **trước** khi start API server\n- Phải chạy `compute_scaler.py` **trước** `extract_all_features.py`\n- Các thành viên phải dùng cùng dataset và cùng file `scaler_params.npz`\n- Không commit: `data/raw/`, `data/processed/`, `qdrant_storage/`, `data/metadata/`\n- Nếu rebuild index: chạy `extract_all_features.py --recreate`\n\n---\n\n## 📄 License\n\nMIT License © 2026 Quang Huy","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqhuyitb%2Fengine-audio-retrieval-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqhuyitb%2Fengine-audio-retrieval-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqhuyitb%2Fengine-audio-retrieval-system/lists"}