{"id":25392601,"url":"https://github.com/radanpro/data-mining","last_synced_at":"2026-02-12T01:06:13.223Z","repository":{"id":274538342,"uuid":"923242623","full_name":"radanpro/data-mining","owner":"radanpro","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-05T14:49:18.000Z","size":3882,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-05T15:45:22.199Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/radanpro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-27T21:57:19.000Z","updated_at":"2025-02-05T14:49:22.000Z","dependencies_parsed_at":"2025-01-27T22:37:43.644Z","dependency_job_id":"3730cb80-7f9a-4224-aed5-945c81481bec","html_url":"https://github.com/radanpro/data-mining","commit_stats":null,"previous_names":["radanpro/data-mining"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radanpro%2Fdata-mining","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radanpro%2Fdata-mining/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radanpro%2Fdata-mining/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radanpro%2Fdata-mining/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/radanpro","download_url":"https://codeload.github.com/radanpro/data-mining/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239024483,"owners_count":19569525,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-15T16:55:43.087Z","updated_at":"2025-10-30T18:30:42.970Z","avatar_url":"https://github.com/radanpro.png","language":"Python","readme":"# Advanced Data Mining Project\n\n![Data Mining](https://img.shields.io/badge/Data-Mining-blue)\n![Python](https://img.shields.io/badge/Python-3.8%2B-success)\n![Streamlit](https://img.shields.io/badge/UI-Streamlit-important)\n\nA comprehensive data analysis platform implementing four key data mining algorithms with an interactive interface.\n\n## Table of Contents\n\n- [Features](#features)\n- [Requirements](#requirements)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Algorithms](#algorithms)\n- [Project Structure](#project-structure)\n- [License](#license)\n- [Contributing](#contributing)\n- [Contact](#contact)\n\n## Features\n\n- **Four Core Algorithms**:\n  - 🛒 Apriori (Association Rule Mining)\n  - 🧠 Naive Bayes (Classification)\n  - 🌳 ID3 Decision Tree\n  - 📊 K-Means Clustering\n- Interactive Web UI using Streamlit\n- Automatic documentation generation\n- Visualizations for all algorithms\n- Customizable parameters for each algorithm\n\n## Requirements\n\n- Python 3.8+\n- Required Packages:\n- streamlit==1.26.0\n- pandas==2.0.3\n- numpy==1.24.3\n- scikit-learn==1.3.0\n- mlxtend==0.22.0\n- matplotlib==3.7.2\n- seaborn==0.12.2\n\n## Installation\n\n1. Clone the repository:\n\n```bash\ngit clone https://github.com/radanpro/data-mining.git\ncd data-mining\n```\n\nCreate and activate virtual environment (recommended):\n\n```bash\npython -m venv venv\n```\n\n#### Windows\n\n```bash\nvenv\\Scripts\\activate\n```\n\n#### Mac/Linux\n\n```bash\nsource venv/bin/activate\n```\n\n## Install dependencies:\n\n```bash\npip install -r requirements.txt\n```\n\n## Usage\n\n#### Start the application:\n\n```bash\nstreamlit run app.py\n```\n\n## Workflow:\n\n1.  Upload CSV dataset\n\n2.  Navigate through algorithm sections\n\n3.  Adjust parameters using sliders\n\n4.  View interactive results and visualizations\n\n5.  Check generated documentation in documentation.md\n\n#### Example Dataset Format:\n\nAge Income Gender Purchased\n25 50000 Male No\n30 70000 Female Yes\n\nAlgorithms\n\n1. Apriori (Association Rule Mining)\n   -Finds relationships between items in transactions\n   -Configurable support and confidence levels\n   -Outputs rules with lift metric\n\n2. Naive Bayes Classifier\n   -Probabilistic classification model\n   -Shows accuracy and confusion matrix\n   -Displays feature importance\n\n3. ID3 Decision Tree\n   -Implements information gain strategy\n   -Visualizes decision tree structure\n   -Displays classification rules\n\n4. K-Means Clustering\n   -Unsupervised clustering algorithm\n   -Elbow method visualization\n   -Silhouette score evaluation\n   -Project Structure\n\n## Project Structure\n\n```\ndata-mining-project/\n├── app.py # Main application code\n├── documentation.md # Auto-generated documentation\n├── plots/ # Saved visualizations\n├── dataset.scv\n└── requirements.txt # Dependency list\n```\n\n## Contributing\n\nFork the repository\n\nCreate your feature branch (`git checkout -b feature/AmazingFeature`)\n\nCommit your changes (`git commit -m 'Add some AmazingFeature'`)\n\nPush to the branch (`git push origin feature/AmazingFeature`)\n\nOpen a Pull Request\n\u003cbr\u003e\n\u003cbr\u003e\n\u003cbr\u003e\n\u003cbr\u003e\n\n---\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n# README - Data Mining Project\n\n## **1. Dataset Contents**\n\nThe dataset contains transactional sales data with the following columns:\n\n1. **Region**: Geographic region where the sale occurred (e.g., Europe, Asia, Sub-Saharan Africa).\n2. **Country**: Specific country in the region.\n3. **Item Type**: Type of product sold (e.g., Beverages, Baby Food, Vegetables).\n4. **Sales Channel**: Sales method used (Online or Offline).\n5. **Order Priority**: Priority level of the order (L: Low, M: Medium, C: Critical, H: High).\n6. **Order Date**: Date when the order was placed.\n7. **Order ID**: Unique identifier for the order.\n8. **Ship Date**: Date when the order was shipped.\n9. **Units Sold**: Number of units sold.\n10. **Unit Price**: Price per unit of the product.\n11. **Unit Cost**: Production cost per unit.\n12. **Total Revenue**: Total revenue from the sale (Units Sold \\* Unit Price).\n13. **Total Cost**: Total cost incurred (Units Sold \\* Unit Cost).\n14. **Total Profit**: Total profit from the sale (Total Revenue - Total Cost).\n\n---\n\n## **2. Project Goal**\n\nThe goal of this project is to perform data mining on the provided dataset to:\n\n- **Analyze sales patterns**: Understand the relationship between sales performance and attributes like region, product type, and sales channel.\n- **Discover hidden insights**: Use data mining algorithms to uncover meaningful patterns and relationships within the data.\n- **Build predictive models**: Develop models to classify and cluster the data for better decision-making.\n- **Generate actionable insights**: Provide recommendations to optimize sales strategies and improve profitability.\n\n---\n\n## **3. Expected Results**\n\n1. **Frequent Itemset Analysis**: Using the Apriori algorithm to discover patterns and rules such as \"products frequently sold together.\"\n2. **Classification Results**:\n\n- Predict the priority level of orders based on attributes.\n- Evaluate model accuracy using metrics like Precision, Recall, and F1-Score.\n\n3. **Cluster Analysis**: Group sales data into clusters for better segmentation (e.g., high-profit vs. low-profit orders).\n4. **Insights and Recommendations**: Actionable insights based on analysis results to optimize sales and improve decision-making.\n\n---\n\n## **4. Why Data Mining?**\n\nData mining is essential for this project to:\n\n- Identify meaningful patterns in complex datasets.\n- Provide actionable insights that support data-driven decision-making.\n- Improve the efficiency and effectiveness of sales strategies.\n- Segment customers or orders for targeted actions.\n\n---\n\n## **5. Task Assignments and Dependencies**\n\n| **Task**                                     | **Description**                                                                                      | **Assigned To** | **Dependencies**                                                              |\n| -------------------------------------------- | ---------------------------------------------------------------------------------------------------- | --------------- | ----------------------------------------------------------------------------- |\n| **Task 1: Data Exploration \u0026 Preprocessing** | Load, clean, and preprocess the dataset (handle missing values, normalize data, encode categorical). | Person 1        | Independent task; outputs clean dataset for other tasks.                      |\n| **Task 2: Feature Selection \u0026 Analysis**     | Analyze data relationships, select relevant features, and provide visualizations.                    | Person 2        | Depends on clean dataset from Person 1.                                       |\n| **Task 3: Apriori Algorithm Implementation** | Use the Apriori algorithm to discover association rules and visualize patterns.                      | Person 3        | Requires selected features from Person 2.                                     |\n| **Task 4: Naïve Bayes Classification**       | Build a Naïve Bayes model to classify order priorities and evaluate performance metrics.             | Person 4        | Requires selected features from Person 2 and preprocessed data from Person 1. |\n| **Task 5: ID3 Algorithm Implementation**     | Develop a decision tree for classification and visualize decision-making logic.                      | Person 5        | Requires selected features from Person 2 and preprocessed data from Person 1. |\n| **Task 6: K-Means Clustering**               | Apply K-Means clustering to segment sales data and visualize clusters.                               | Person 6        | Requires selected features from Person 2 and preprocessed data from Person 1. |\n| **Task 7: Documentation and Reporting**      | Compile a detailed report covering preprocessing, algorithms, results, and insights.                 | Collaborative   | Depends on contributions and results from all team members.                   |\n\n---\n\n## **6. Notes**\n\n- Collaboration tools such as **Trello** or **Jira** can be used to track progress and ensure smooth communication.\n- Code should be documented and modular to ensure reproducibility.\n- Final deliverables include a complete report, codebase, and visualizations of results.\n\n\u003cbr\u003e\n\u003cbr\u003e\n\u003cbr\u003e\n\n---\n\n\u003cbr\u003e\n\u003cbr\u003e\n\u003cbr\u003e\n\n# README - مشروع تنقيب البيانات\n\n## **1. محتويات البيانات (Dataset Contents)**\n\nتتضمن البيانات معلومات معاملات المبيعات مع الأعمدة التالية:\n\n1. **المنطقة (Region):** المنطقة الجغرافية التي تمت فيها المبيعات (مثل أوروبا، آسيا، إفريقيا جنوب الصحراء).\n2. **الدولة (Country):** اسم الدولة المرتبطة بالمبيعات.\n3. **نوع المنتج (Item Type):** تصنيف المنتجات (مثل المشروبات، غذاء الأطفال، الخضروات).\n4. **قناة البيع (Sales Channel):** طريقة البيع (عبر الإنترنت أو دون اتصال).\n5. **أولوية الطلب (Order Priority):** مستوى الأولوية المعطى للطلب (L: منخفض، M: متوسط، C: حرج، H: عالي).\n6. **تاريخ الطلب (Order Date):** تاريخ إجراء الطلب.\n7. **معرف الطلب (Order ID):** معرف فريد لكل طلب.\n8. **تاريخ الشحن (Ship Date):** تاريخ شحن الطلب.\n9. **الوحدات المباعة (Units Sold):** عدد الوحدات المباعة.\n10. **سعر الوحدة (Unit Price):** سعر الوحدة الواحدة من المنتج.\n11. **تكلفة الوحدة (Unit Cost):** تكلفة إنتاج الوحدة.\n12. **إجمالي الإيرادات (Total Revenue):** الإيرادات الكلية الناتجة عن المبيعات (الوحدات المباعة × سعر الوحدة).\n13. **إجمالي التكلفة (Total Cost):** التكلفة الكلية (الوحدات المباعة × تكلفة الوحدة).\n14. **إجمالي الربح (Total Profit):** الفرق بين الإيرادات والتكاليف (إجمالي الإيرادات - إجمالي التكلفة).\n\n---\n\n## **2. الغاية من المشروع (Project Goal)**\n\nالهدف من هذا المشروع هو تنفيذ تنقيب البيانات على البيانات المقدمة من أجل:\n\n- **تحليل أنماط المبيعات:** فهم العلاقة بين أداء المبيعات والخصائص مثل المنطقة، نوع المنتج، وقناة البيع.\n- **اكتشاف الأفكار المخفية:** استخدام خوارزميات التنقيب عن البيانات لاستخراج الأنماط والعلاقات المفيدة داخل البيانات.\n- **بناء نماذج تنبؤية:** تطوير نماذج لتصنيف وتجميع البيانات لدعم اتخاذ القرارات.\n- **تقديم رؤى قابلة للتنفيذ:** تقديم توصيات لتحسين استراتيجيات المبيعات وزيادة الربحية.\n\n---\n\n## **3. النتائج المتوقعة (Expected Results)**\n\n1. **تحليل العناصر المتكررة:** استخدام خوارزمية Apriori لاكتشاف الأنماط والقواعد مثل \"المنتجات التي تُباع معًا بشكل متكرر\".\n2. **نتائج التصنيف:**\n\n- التنبؤ بمستوى أولوية الطلبات بناءً على الخصائص.\n- تقييم دقة النموذج باستخدام مقاييس مثل الدقة (Accuracy)، والاستدعاء (Recall)، وF1-Score.\n\n3. **تحليل التجمعات (Clusters):** تقسيم بيانات المبيعات إلى مجموعات لتحسين التقسيم (مثل الطلبات ذات الربح العالي مقابل الربح المنخفض).\n4. **رؤى وتوصيات:** تقديم رؤى قابلة للتنفيذ بناءً على نتائج التحليل لتحسين اتخاذ القرارات.\n\n---\n\n## **4. لماذا نحتاج إلى تنقيب البيانات؟ (Why Data Mining?)**\n\nتنقيب البيانات ضروري لهذا المشروع من أجل:\n\n- تحديد الأنماط المفيدة في مجموعات البيانات المعقدة.\n- توفير رؤى قابلة للتنفيذ لدعم اتخاذ القرارات المبنية على البيانات.\n- تحسين كفاءة وفعالية استراتيجيات المبيعات.\n- تقسيم العملاء أو الطلبات لاتخاذ إجراءات مستهدفة.\n\n---\n\n## **5. توزيع المهام والاعتماد بينها (Task Assignments and Dependencies)**\n\n| **المهمة**                                   | **الوصف**                                                                                | **المسؤول**   | **الاعتماد**                                                                           |\n| -------------------------------------------- | ---------------------------------------------------------------------------------------- | ------------- | -------------------------------------------------------------------------------------- |\n| **المهمة 1: استكشاف ومعالجة البيانات**       | تحميل وتنظيف وإعداد البيانات (التعامل مع القيم المفقودة، تطبيع البيانات، وترميز الفئات). | الشخص الأول   | مهمة مستقلة؛ تُنتج بيانات نظيفة للمهمات الأخرى.                                        |\n| **المهمة 2: تحليل البيانات واختيار الميزات** | تحليل العلاقات بين الخصائص، اختيار الميزات المهمة، وتوفير تصورات بيانية.                 | الشخص الثاني  | تعتمد على البيانات النظيفة من الشخص الأول.                                             |\n| **المهمة 3: تنفيذ خوارزمية Apriori**         | استخدام خوارزمية Apriori لاكتشاف القواعد الترابطية وتصوير الأنماط.                       | الشخص الثالث  | تعتمد على الميزات المختارة من الشخص الثاني.                                            |\n| **المهمة 4: تصنيف Naïve Bayes**              | بناء نموذج Naïve Bayes لتصنيف الأولويات وتقييم الأداء باستخدام مقاييس متعددة.            | الشخص الرابع  | تعتمد على الميزات المختارة من الشخص الثاني والبيانات المُعالجة من الشخص الأول.         |\n| **المهمة 5: تنفيذ خوارزمية ID3**             | تطوير شجرة قرارات للتصنيف وتصوير منطق اتخاذ القرار.                                      | الشخص الخامس  | تعتمد على الميزات المختارة من الشخص الثاني والبيانات المُعالجة من الشخص الأول.         |\n| **المهمة 6: تنفيذ خوارزمية K-Means**         | تطبيق K-Means لتقسيم البيانات إلى مجموعات وتصوير النتائج.                                | الشخص السادس  | تعتمد على الميزات العددية المختارة من الشخص الثاني والبيانات المُعالجة من الشخص الأول. |\n| **المهمة 7: التوثيق وإعداد التقارير**        | إعداد تقرير شامل يغطي مراحل معالجة البيانات، الخوارزميات، النتائج، والرؤى.               | الفريق بأكمله | تعتمد على مساهمات ونتائج جميع أعضاء الفريق.                                            |\n\n---\n\n## **6. ملاحظات (Notes)**\n\n- يمكن استخدام أدوات تعاون مثل **Trello** أو **Jira** لتتبع التقدم وضمان تواصل فعال.\n- يجب توثيق الكود وجعله معياريًا لضمان إمكانية إعادة الإنتاج.\n- المخرجات النهائية تشمل تقريرًا كاملًا، الشيفرة المصدرية، ورسوم بيانية للنتائج.\n\n```\n\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fradanpro%2Fdata-mining","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fradanpro%2Fdata-mining","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fradanpro%2Fdata-mining/lists"}