{"id":25352058,"url":"https://github.com/code-str8/customer-frauds-detection","last_synced_at":"2026-04-17T05:02:17.772Z","repository":{"id":276190754,"uuid":"928338541","full_name":"Code-str8/customer-frauds-detection","owner":"Code-str8","description":"fraud detection challenge for STEG (Tunisian Company of Electricity and Gas) focused on identifying fraudulent meter manipulation through billing history data","archived":false,"fork":false,"pushed_at":"2025-03-21T13:55:26.000Z","size":80325,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-24T11:43:29.457Z","etag":null,"topics":["api","binaryclassification","data-science","docker","machine-learning","notebook-jupyter","python","streamlit-webapp"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Code-str8.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-02-06T13:26:02.000Z","updated_at":"2025-03-21T13:55:30.000Z","dependencies_parsed_at":"2025-02-06T20:23:02.913Z","dependency_job_id":"afa18394-3aa5-4699-be07-88cc3fc99255","html_url":"https://github.com/Code-str8/customer-frauds-detection","commit_stats":null,"previous_names":["code-str8/customer-frauds-detection"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Code-str8/customer-frauds-detection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Code-str8%2Fcustomer-frauds-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Code-str8%2Fcustomer-frauds-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Code-str8%2Fcustomer-frauds-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Code-str8%2Fcustomer-frauds-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Code-str8","download_url":"https://codeload.github.com/Code-str8/customer-frauds-detection/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Code-str8%2Fcustomer-frauds-detection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31915900,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-16T18:22:33.417Z","status":"online","status_checked_at":"2026-04-17T02:00:06.879Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","binaryclassification","data-science","docker","machine-learning","notebook-jupyter","python","streamlit-webapp"],"created_at":"2025-02-14T18:05:18.997Z","updated_at":"2026-04-17T05:02:17.753Z","avatar_url":"https://github.com/Code-str8.png","language":"Jupyter Notebook","readme":"# Customer Frauds Detection 🔍⚡\n\nThis project addresses a fraud detection challenge for STEG (Tunisian Company of Electricity and Gas), focusing on identifying fraudulent meter manipulation through billing history data.\n\n## 📋 Table of Contents\n\n- [Introduction](#introduction)\n- [Dataset](#dataset)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Streamlit App](#streamlit-app)\n  - [API Usage](#api-usage)\n  - [Model Training](#model-training)\n- [API Documentation](#api-documentation)\n- [Methodology](#methodology)\n- [Results](#results)\n- [Challenges \u0026 Trade-offs](#challenges--trade-offs)\n- [Future Work](#future-work)\n- [Contributing](#contributing)\n- [License](#license)\n\n## 🚀 Introduction\n\nThe goal of this project is to develop a machine learning model that can accurately detect fraudulent activities in electricity and gas consumption. By analyzing historical billing data, the model aims to help STEG reduce losses due to fraud.\n\n## 💾 Dataset\n\nThe dataset consists of historical billing data, including features such as client ID, invoice date, consumption levels, and counter types. The target variable indicates whether a client is fraudulent or not.\n\n## 🛠️ Installation\n\nTo run this project, you need to have Python installed along with the required libraries. You can install the dependencies using the following command:\n\n```bash\npip install -r requirements.txt\n```\n\n## 🚀 Usage\n\n### 💫 Streamlit App\n\nWe've developed an interactive Streamlit application that provides a user-friendly interface for fraud detection:\n\n1. Start the Streamlit app:\n   ```bash\n   streamlit run 1_Welcome.py\n   ```\n\n2. Login credentials:\n   - Username: admin\n   - Password: Admin01\n\nThe app includes several features:\n\n#### 🏠 Welcome Page\n![Welcome Page](images/app%201.PNG)\n![Welcome Page](images/app%202.PNG)\n![Welcome Page](images/app%203.PNG)\nA welcoming interface introducing the fraud detection system.\n\n#### 📚 Data Explorer\n![Data Explorer](images/app%204.PNG)\n![Data Explorer](images/app%205.PNG)\n![Data Explorer](images/app%206.PNG)\n![Data Explorer](images/app%207.PNG)\nExplore and analyze the dataset with interactive visualizations.\n\n\n#### 🔮 Prediction Interface\n![Prediction Form](images/app%208.PNG)\n![Prediction Form](images/app%209.PNG)\n![Prediction Form](images/app%2010.PNG)\nEasy-to-use form for making fraud predictions:\n- Input transaction details\n- Choose between models\n- Get instant predictions\n\n![Prediction Results](images/app%2011.PNG)\nDetailed prediction results with confidence scores.\n\n#### ⏳ History Tracking\n\n![Prediction History](images/app%2012.PNG)\nTrack and analyze prediction history:\n- View all past predictions\n- Analyze trends\n- Export results\n### 🔄 API Usage\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/yourusername/customer-frauds-detection.git\n   ```\n\n2. Navigate to the project directory:\n   ```bash\n   cd customer-frauds-detection\n   ```\n\n3. Start the API server:\n   ```bash\n   uvicorn api:app\n   ```\n\n4. Open your browser and navigate to:\n   ```\n   http://127.0.0.1:8000/docs\n   ```\n\nThe API provides two main endpoints ✨:\n- `/stacked/predict`: Uses the stacked ensemble model\n- `/xgb/predict`: Uses the XGBoost model\n\n![API Documentation UI](images/api%201.PNG)\n![API Documentation UI](images/api%202.PNG)\n\nExample of making predictions using the API:\n\n![API Prediction Example - Stacked Model](images/stacked%20api%203.PNG)\n![API Prediction Example - Stacked Model](images/stacked%20api%204.PNG)\n![API Prediction Example - XGBoost Model](images/xgb%20api%205.PNG)\n![API Prediction Example - XGBoost Model](images/xgb%20api%206.PNG)\n\n### 🤖 Model Training\n\nTo train or experiment with the models:\n\n1. Run the Jupyter Notebook:\n   ```bash\n   jupyter notebook fraud_detection.ipynb\n   ```\n\n## 📝 API Documentation\n\nThe API accepts the following input parameters:\n\n```json\n{\n    \"counter_number\": int,\n    \"account_age_days\": int,\n    \"new_index\": int,\n    \"old_index\": int,\n    \"consumption_level_1\": float,\n    \"counter_coefficient\": float,\n    \"client_catg\": int,\n    \"invoice_year\": int,\n    \"creation_year\": int,\n    \"creation_month\": int\n}\n```\n\nResponse format:\n```json\n{\n    \"prediction\": int,  // 0 or 1\n    \"probability\": string,  // percentage\n    \"prediction_text\": string  // \"Fraudulent\" or \"Non-Fraudulent\"\n}\n```\n\nExample curl request:\n```bash\ncurl -X 'POST' \\\n  'http://127.0.0.1:8000/stacked/predict' \\\n  -H 'accept: application/json' \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n    \"counter_number\": 0,\n    \"account_age_days\": 0,\n    \"new_index\": 0,\n    \"old_index\": 0,\n    \"consumption_level_1\": 0,\n    \"counter_coefficient\": 0,\n    \"client_catg\": 0,\n    \"invoice_year\": 2015,\n    \"creation_year\": 0,\n    \"creation_month\": 6\n  }'\n```\n\n## 🏗️ Methodology\n\nThe project follows these steps:\n1. **Data Preprocessing** 📊: Cleaning and transforming the data to make it suitable for modeling.\n2. **Exploratory Data Analysis (EDA)** 📊: Visualizing data distributions and relationships.\n3. **Feature Engineering** ✨: Creating new features and selecting the most relevant ones.\n4. **Modeling** 🤖: Training various machine learning models, including ensemble methods.\n5. **Evaluation** 📈: Assessing model performance using cross-validation and ROC AUC scores.\n\n## 📈 Results\n\nThe stacked model, combining XGBoost, Extra Trees, and Random Forest, achieved the best performance with an AUC of 0.83. This indicates a strong ability to distinguish between fraudulent and non-fraudulent clients.\n\n## ⚠️ Challenges \u0026 Trade-offs\n\n- **Hyperparameter Tuning** ⚙️: Limited by computational resources, which could have improved model robustness.\n- **High Variance** 📊: Models performed well on training data but showed lower performance on testing data.\n- **Class Imbalance** ⚖️: Addressed through resampling techniques to ensure balanced training data.\n\n  ### 🎉Deployment \u003ca name=\"deployment\"\u003e\u003c/a\u003e\n \n- Streamlit: [Streamlit app](https://customer-frauds-detection.streamlit.app/)\n\n  ### Article \u003ca name=\"article\"\u003e\u003c/a\u003e\n \n- Medium article: [Article](https://medium.com/@Codestr8/building-a-fraud-detection-system-for-utility-companies-a-complete-guide-969d9cc0a151)\n\n- Power BI: [PowerBI Dashboard](https://app.powerbi.com/view?r=eyJrIjoiODM4OWE0ZTMtN2ZkMC00YTJhLTg1ZTYtZmNjZjdhYWQwNjIwIiwidCI6IjQ0ODdiNTJmLWYxMTgtNDgzMC1iNDlkLTNjMjk4Y2I3MTA3NSJ9)\n\n\n## 🔮 Future Work\n\n- ✅ **API Development**: Implemented a FastAPI-based REST API for real-time fraud detection.\n- 🔄 **Model Updates**: Regular retraining with new data to maintain accuracy.\n- ⚡ **Performance Optimization**: Further API optimization for higher throughput.\n- 📊 **Monitoring**: Add model performance monitoring and drift detection.\n\n## 🤝 Contributing\n\nContributions are welcome! Please fork the repository and submit a pull request for any improvements or bug fixes.\n\n## 📜 License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-str8%2Fcustomer-frauds-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcode-str8%2Fcustomer-frauds-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-str8%2Fcustomer-frauds-detection/lists"}