{"id":30677029,"url":"https://github.com/mudassir-a/vendor-performance-analysis","last_synced_at":"2026-05-18T03:34:05.347Z","repository":{"id":311911234,"uuid":"1039537458","full_name":"Mudassir-A/Vendor-Performance-Analysis","owner":"Mudassir-A","description":"vendor performance data analysis project using sql, python and power bi","archived":false,"fork":false,"pushed_at":"2025-08-27T11:22:41.000Z","size":2155,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-08-27T20:34:44.995Z","etag":null,"topics":["data-analysis","powerbi","python","sql"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mudassir-A.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-17T13:05:09.000Z","updated_at":"2025-08-27T11:22:44.000Z","dependencies_parsed_at":"2025-08-27T20:34:49.230Z","dependency_job_id":"1a0b1ec5-9386-406b-bf26-80351461bfd7","html_url":"https://github.com/Mudassir-A/Vendor-Performance-Analysis","commit_stats":null,"previous_names":["mudassir-a/vendor-performance-analysis"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Mudassir-A/Vendor-Performance-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mudassir-A%2FVendor-Performance-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mudassir-A%2FVendor-Performance-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mudassir-A%2FVendor-Performance-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mudassir-A%2FVendor-Performance-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mudassir-A","download_url":"https://codeload.github.com/Mudassir-A/Vendor-Performance-Analysis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mudassir-A%2FVendor-Performance-Analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33163754,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-17T22:39:12.733Z","status":"online","status_checked_at":"2026-05-18T02:00:06.436Z","response_time":71,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","powerbi","python","sql"],"created_at":"2025-09-01T11:11:31.721Z","updated_at":"2026-05-18T03:34:05.334Z","avatar_url":"https://github.com/Mudassir-A.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SpiritVALS (Vendor Analytics for Liquor Sales)\n\nThis data analytics project analyzes vendor performance in liquor sales using an end-to-end workflow. Raw CSV files are ingested from Google Drive and transformed into a streamlined SQLite data mart (~10k rows from ~15M). Exploratory data analysis (EDA) and vendor-level analytics are performed in Python to deliver actionable insights for procurement and sales optimization.\n\n---\n\n## 📌 Project Goals\n\n* Transform raw transactional data into an analysis‑ready SQLite database.\n* Create a compact **summary table** optimized for fast iteration and visual analysis.\n* Perform **EDA** to uncover vendor performance patterns and sales drivers.\n* Lay groundwork for an **interactive Power BI dashboard** and a stakeholder‑friendly report.\n\n---\n\n## 🧱 Repository Structure\n\n```\nnotebooks/\n    ├── EDA.ipynb\n    ├── notebook.ipynb\n    └── vendor_performance_analysis.ipynb\n.gitignore\nget_vendor_summary.py\ningestion_db.py\nLICENSE\nREADME.md\nrequirements.txt\n```\n\n**What’s where**\n\n* `ingestion_db.py` – Extracts raw data and loads it into **SQLite** using SQL DDL/DML.\n* `get_vendor_summary.py` – Builds a **10k‑row vendor summary** (from \\~15M rows) for analytics.\n* `notebooks/` – Jupyter notebooks for EDA, visualizations, and vendor insights.\n* `requirements.txt` – Python dependencies.\n\n---\n\n## 🗂️ Data Source\n\n* **Drive link** (raw data): [Google Drive file](https://drive.google.com/file/d/18s64R0xY4KMSeTqpx9609KCVnvRjwKbs/view?usp=sharing)\n* **Expected content:** transactional liquor sales with fields such as date, vendor, product/brand, quantity, price, outlet/region, etc. (schema inferred during ingestion).\n\n\u003e ⚠️ The raw file is large; the pipeline creates a compact SQLite layer for faster analysis.\n\n---\n\n## 🔄 Data Pipeline (ETL → Data Mart)\n\n1. **Extract \u0026 Load**: Download the file from Drive and run `ingestion_db.py` to create a SQLite DB (`inventory.db`).\n2. **Transform**: Use SQL (CTEs/indexes) to normalize types, handle nulls, and add keys.\n3. **Summarize**: Run `get_vendor_summary.py` to aggregate \\~15M rows into \\~10k rows (vendor‑day/month/brand metrics).\n4. **Analyze \u0026 Visualize**: Open notebooks to perform EDA and vendor performance analysis.\n\n---\n\n## ⚙️ Quickstart\n\n```bash\n# 1) Create and activate a virtual environment (recommended)\npython -m venv .venv\nsource .venv/bin/activate   # Windows: .venv\\Scripts\\activate\n\n# 2) Install dependencies\npip install -r requirements.txt\n\n# 3) Place/download the raw file locally (from the Drive link)\n\"\"\" \ne.g., data/\n    ├── begin_inventory.csv\n    ├── end_inventory.csv\n    ├── purchase_prices.csv\n    ├── purchases.csv\n    ├── sales.csv\n    └── vendor_invoice.csv\n\"\"\"\n\n# 4) Build the SQLite database\npython ingestion_db.py \n\n# 5) Create the vendor summary table for analytics\npython get_vendor_summary.py \n\n# 6) Explore notebooks\n#    notebooks/EDA.ipynb, notebooks/vendor_performance_analysis.ipynb\n```\n---\n\n## 🧪 EDA \u0026 Analytics (Highlights)\n\n- **Brands with Low Sales but High Margins:** Identified brands that may benefit from targeted promotions or pricing adjustments.\n- **Top Vendors \u0026 Brands by Sales:** Ranked and visualized the leading vendors and brands based on total sales.\n- **Vendor Contribution Analysis:** Top 10 vendors account for a major share of total procurement, as shown by Pareto and donut charts.\n- **Bulk Purchasing Impact:** Large orders secure the lowest unit prices, confirming significant cost savings.\n- **Unsold Inventory \u0026 Turnover:** Highlighted vendors with excess stock and calculated capital locked in unsold\n\n*Notebooks:*\n\n* [EDA.ipynb](./notebooks/EDA.ipynb) – data sanity checks, profiling, core distributions.\n* [vendor_performance_analysis.ipynb](./notebooks/vendor_performance_analysis.ipynb) – KPI build‑out, vendor rankings, trend diagnostics.\n* [notebook.ipynb](./notebooks/notebook.ipynb) – scratchpad/experiments supporting the final analysis.\n\n---\n\n## 📊 Future Scope\n\n* **Power BI dashboard**: interactive pages for Vendor Overview, Brand Mix, Geography, and Seasonality.\n* **Stakeholder report**: concise, narrative‐led write‑up for senior management.\n* **Automation**: schedule ingestion \u0026 summary table refresh (e.g., cron/GitHub Actions) and CI data tests (e.g., Great Expectations).\n\n---\n\n## 🛠️ Tech Stack\n\n* **Python**: pandas, numpy, matplotlib, seaborn, sqlalchemy\n* **Database**: SQLite (SQL DDL/DML, indexes for performance)\n* **Environment**: Jupyter notebooks\n\n---\n\n## ✅ Reproducibility \u0026 Performance Notes\n\n* The **\\~10k‑row summary** enables fast iteration vs. scanning \\~15M rows.\n* SQLite indices and typed columns materially improve query time.\n* Random seeds used in any sampling steps (if applicable) are fixed for reproducibility.\n\n---\n\n## 📄 License\n\nThis project is released under the terms of the [**LICENSE**](./LICENSE) file in this repository.\n\n---\n\n## 🙋 Contact\n\n**Author**: Mudassir Ansari  \n**Role**: Final‑year Computer Engineering student • Data/ML enthusiast  \n**Reach**: Open to feedback, internships, and collaboration.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmudassir-a%2Fvendor-performance-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmudassir-a%2Fvendor-performance-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmudassir-a%2Fvendor-performance-analysis/lists"}