{"id":49076397,"url":"https://github.com/tryomar/data-miner","last_synced_at":"2026-04-20T10:03:31.916Z","repository":{"id":290035722,"uuid":"973161151","full_name":"TryOmar/data-miner","owner":"TryOmar","description":"DataMiner is an interactive web application for data mining and machine learning. It helps users upload, clean, transform, and analyze datasets while building predictive models — all through a simple and powerful Streamlit interface.","archived":false,"fork":false,"pushed_at":"2025-04-27T03:51:29.000Z","size":483,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-02T12:39:58.143Z","etag":null,"topics":["data-cleaning","data-mining","data-preprocessing","data-science","data-visualization","interactive-dashboards","pandas","python","scikit-learn","streamlit"],"latest_commit_sha":null,"homepage":"https://data-miner-abbas.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TryOmar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-26T11:51:01.000Z","updated_at":"2025-04-27T03:51:32.000Z","dependencies_parsed_at":"2025-05-02T12:40:00.655Z","dependency_job_id":null,"html_url":"https://github.com/TryOmar/data-miner","commit_stats":null,"previous_names":["omar7001-b/data-miner","tryomar/data-miner"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TryOmar/data-miner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TryOmar%2Fdata-miner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TryOmar%2Fdata-miner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TryOmar%2Fdata-miner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TryOmar%2Fdata-miner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TryOmar","download_url":"https://codeload.github.com/TryOmar/data-miner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TryOmar%2Fdata-miner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32042294,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-20T00:18:06.643Z","status":"online","status_checked_at":"2026-04-20T02:00:06.527Z","response_time":94,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-cleaning","data-mining","data-preprocessing","data-science","data-visualization","interactive-dashboards","pandas","python","scikit-learn","streamlit"],"created_at":"2026-04-20T10:03:30.733Z","updated_at":"2026-04-20T10:03:31.905Z","avatar_url":"https://github.com/TryOmar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataMiner\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n[![Python](https://img.shields.io/badge/Python-3.9%2B-blue.svg)](https://www.python.org/)\n[![Streamlit](https://img.shields.io/badge/Streamlit-1.10%2B-red.svg)](https://streamlit.io/)\n\n**DataMiner** is a modern, interactive web application for data mining and machine learning workflows. Built with Streamlit, it empowers users to upload, inspect, and profile datasets, laying the foundation for advanced analytics and model building.\n\n---\n\n## 🚀 Features\n\n- **Multi-page Streamlit app** with sidebar navigation\n- **Single data upload**: Upload once, use everywhere\n- **Paginated data preview**: Easily browse large datasets\n- **Data profiling**: Instantly see shape, columns, and types\n- **Summary statistics**: Numeric and categorical summaries, with clear explanations\n- **Scalable architecture**: Ready for future ML and data transformation features\n\n---\n\n## 📸 Demo\n\n| Page                | Screenshot Preview                                  |\n|---------------------|-----------------------------------------------------|\n| Home                | ![Home](docs/phase1/Home.png)                      |\n| Data Upload         | ![Data Upload](docs/phase1/Data_Upload.png)         |\n| Data Upload (Paged) | ![Data Upload 2](docs/phase1/Data_Upload_2.png)     |\n| Profiling           | ![Profiling](docs/phase1/Profiling.png)             |\n| Profiling (Types)   | ![Profiling 2](docs/phase1/Profiling_2.png)         |\n| Summary Statistics  | ![Summary Statistics](docs/phase1/Summary_Statistics.png) |\n\n---\n\n## 🛠️ How It Works\n\n1. **Upload Data**: Go to the \"Data Upload\" page and upload your CSV, Excel, or JSON file.\n2. **Profile Data**: Navigate to \"Profiling\" to view dataset shape, columns, and data types.\n3. **View Summary**: Check \"Summary Statistics\" for numeric and categorical summaries.\n4. **Navigate Easily**: Use the sidebar to switch between features. Your uploaded data is available on all pages.\n\n---\n\n## 📁 Folder Structure\n\n```\n.\n├── Home.py                  # Landing page\n├── requirements.txt         # Python dependencies\n├── pages/\n│   ├── Data_Upload.py       # Upload and preview data\n│   ├── Profiling.py         # Dataset profiling\n│   └── Summary_Statistics.py# Summary statistics\n```\n\n---\n\n## ✅ Progress Checklist\n\n### Phase 1: Data Handling and Basic Preprocessing\n\n- [x] Multi-page Streamlit app structure\n- [x] Upload CSV, Excel, JSON files\n- [x] Paginated data preview\n- [x] Data profiling (shape, columns, types)\n- [x] Summary statistics (numeric \u0026 categorical, with explanations)\n- [x] Single upload shared across all pages\n\n### Next Phases (Planned)\n\n- [ ] Advanced data preprocessing (missing values, outliers, feature engineering)\n- [ ] Data transformation (scaling, normalization, dimensionality reduction)\n- [ ] Machine learning model training and evaluation\n- [ ] Model export and deployment\n- [ ] Results sharing and reporting\n\n---\n\n## 🛠️ Recent Issues Fixed\n\n- [Summary statistics table shows many null values for non-numeric columns (#1)](https://github.com/Omar7001-B/data-miner/issues/1)\n- [Improve dashboard navigation and layout for better usability (#2)](https://github.com/Omar7001-B/data-miner/issues/2)\n\n---\n\n## 📦 Requirements\n\n- Python 3.9+\n- streamlit\n- pandas\n- numpy\n- openpyxl\n- pyarrow\n\n---\n\n## 🤝 Contributing\n\nContributions are welcome!  \n1. Fork the repo  \n2. Create a feature branch  \n3. Commit your changes  \n4. Open a pull request\n\n---\n\n## 📄 License\n\nThis project is licensed under the MIT License.\n\n## 🚦 How to Run\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/Omar7001-B/data-miner.git\n   cd data-miner\n   ```\n2. Install dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n3. Start the app:\n   ```bash\n   streamlit run Home.py\n   ```\n4. Open your browser at [http://localhost:8501](http://localhost:8501) ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftryomar%2Fdata-miner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftryomar%2Fdata-miner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftryomar%2Fdata-miner/lists"}