{"id":21349525,"url":"https://github.com/SimranShaikh20/Credit-Card-fraud-Detection","last_synced_at":"2025-07-12T19:30:47.675Z","repository":{"id":246654801,"uuid":"821768247","full_name":"SimranShaikh20/Hackathon_Project","owner":"SimranShaikh20","description":"Fraud detection using machine learning","archived":false,"fork":false,"pushed_at":"2024-10-16T13:23:50.000Z","size":302,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-18T10:15:39.781Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SimranShaikh20.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-29T11:31:31.000Z","updated_at":"2024-10-16T13:23:54.000Z","dependencies_parsed_at":"2024-10-18T10:26:27.606Z","dependency_job_id":null,"html_url":"https://github.com/SimranShaikh20/Hackathon_Project","commit_stats":null,"previous_names":["simranshaikh20/hacakathon_project","simranshaikh20/hackathon_project"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SimranShaikh20%2FHackathon_Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SimranShaikh20%2FHackathon_Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SimranShaikh20%2FHackathon_Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SimranShaikh20%2FHackathon_Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SimranShaikh20","download_url":"https://codeload.github.com/SimranShaikh20/Hackathon_Project/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225834459,"owners_count":17531469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-22T02:46:57.357Z","updated_at":"2025-07-12T19:30:42.424Z","avatar_url":"https://github.com/SimranShaikh20.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hackathon_Project\nFraud transaction detection using machine learning\nkaggle dataset file link is here:\nhttps://www.kaggle.com/datasets/mlg-ulb/creditcardfraud\n\n\n\n## Project Aims\n\nThis project aims to convey several key points:\n\n1. **Practical Application of Machine Learning**: \n   Demonstrates a real-world application of machine learning in financial security, specifically in detecting credit card fraud.\n\n2. **Handling Imbalanced Datasets**: \n   Showcases how to deal with imbalanced datasets, a common challenge in fraud detection. The code uses undersampling of the majority class (legitimate transactions) to balance the dataset.\n\n3. **Basic Machine Learning Workflow**: \n   Illustrates the fundamental steps in a machine learning project:\n   - Data loading and preprocessing\n   - Splitting data into features and target\n   - Dividing data into training and testing sets\n   - Model selection, training, and evaluation\n\n4. **Use of Popular Data Science Libraries**: \n   Demonstrates the use of common Python libraries for data science and machine learning:\n   - pandas for data manipulation\n   - scikit-learn for machine learning tasks\n   - numpy for numerical operations\n\n5. **Simple Model Implementation**: \n   Uses Logistic Regression, a straightforward and interpretable model, as a starting point for fraud detection.\n\n6. **Model Evaluation**: \n   Shows how to evaluate a model's performance using accuracy scores on both training and test data.\n\n7. **Interactive Web Application**: \n   Integrates with Streamlit to create a simple web interface for the model, allowing users to input data and receive predictions.\n\n8. **Reproducibility**: \n   Includes a link to the dataset and provides the code, emphasizing reproducibility in data science.\n\n9. **Potential for Expansion**: \n   While the current implementation is basic, it provides a foundation that can be built upon with more advanced techniques.\n\n10. **Importance of Fraud Detection**: \n    Highlights the significance of fraud detection in the financial sector, addressing a real-world problem that affects many people and businesses.\n\n\n### Data Preprocessing\n\n1. **Data Loading**:\n   - The dataset is loaded from 'creditcard.csv' using pandas:\n     ```python\n     data = pd.read_csv(\"creditcard.csv\")\n     ```\n\n2. **Class Separation**:\n   - Legitimate and fraudulent transactions are separated:\n     ```python\n     legit = data[data.Class == 0]\n     fraud = data[data['Class'] == 1]\n     ```\n   - This separation allows for analysis of class imbalance.\n\n3. **Feature and Target Separation**:\n   - Features (X) and target variable (y) are split:\n     ```python\n     x = data.drop('Class', axis=1)\n     y = data['Class']\n     ```\n\n4. **Handling Class Imbalance**:\n   - Undersampling of the majority class (legitimate transactions) is performed:\n     ```python\n     legit_s = legit.sample(n=len(fraud), random_state=2)\n     data = pd.concat([legit_s, fraud], axis=0)\n     ```\n   - This creates a balanced dataset for training.\n\n### Model Training\n\n1. **Train-Test Split**:\n   - Data is split into training and testing sets:\n     ```python\n     x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, stratify=y, random_state=2)\n     ```\n   - 20% of data is reserved for testing.\n   - Stratification ensures that the class distribution is maintained in both sets.\n\n2. **Model Selection**:\n   - Logistic Regression is chosen as the classification algorithm:\n     ```python\n     model = LogisticRegression()\n     ```\n   - This is a good baseline model for binary classification tasks.\n\n3. **Model Training**:\n   - The model is trained on the training data:\n     ```python\n     model.fit(x_train, y_train)\n     ```\n\n### Model Evaluation\n\n1. **Accuracy Calculation**:\n   - The model's performance is evaluated using accuracy scores:\n     ```python\n     train_acc = accuracy_score(model.predict(x_train), y_train)\n     test_acc = accuracy_score(model.predict(x_test), y_test)\n     ```\n   - Both training and testing accuracies are calculated to assess overfitting.\n\n\n### Conclusion\n\nThis implementation provides a solid foundation for credit card fraud detection. The use of undersampling to balance the dataset and Logistic Regression as the classification algorithm offers a good starting point. The separate calculation of training and testing accuracies allows for basic assessment of model generalization.This project serves as an introductory example of applying machine learning to a critical financial problem, demonstrating how relatively simple techniques can be used to approach complex real-world issues. It provides a starting point for understanding and implementing fraud detection systems.\n\n\nThank You !","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSimranShaikh20%2FCredit-Card-fraud-Detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSimranShaikh20%2FCredit-Card-fraud-Detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSimranShaikh20%2FCredit-Card-fraud-Detection/lists"}