{"id":28489370,"url":"https://github.com/spk-22/phish-guard","last_synced_at":"2025-06-30T15:31:33.331Z","repository":{"id":295093726,"uuid":"989102069","full_name":"spk-22/Phish-Guard","owner":"spk-22","description":"A comprehensive deep learning framework for phishing detection, utilizing Graph Neural Networks (GraphSAGE) to analyze interconnected web features. Features include temporal graph construction, causal learning for robust time-series analysis, and integrated noise injection testing to evaluate model resilience against data imperfections. ","archived":false,"fork":false,"pushed_at":"2025-05-27T14:56:26.000Z","size":49,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-08T06:42:01.017Z","etag":null,"topics":["casual-sampling","gnn-model","graphsage","ids","phishing-detection","pytorch","temporal-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spk-22.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-23T14:58:21.000Z","updated_at":"2025-05-27T14:56:31.000Z","dependencies_parsed_at":"2025-05-23T16:21:14.471Z","dependency_job_id":"d76385e7-ba96-4df5-ab2e-06d69718a317","html_url":"https://github.com/spk-22/Phish-Guard","commit_stats":null,"previous_names":["spk-22/phish-guard"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/spk-22/Phish-Guard","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spk-22%2FPhish-Guard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spk-22%2FPhish-Guard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spk-22%2FPhish-Guard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spk-22%2FPhish-Guard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spk-22","download_url":"https://codeload.github.com/spk-22/Phish-Guard/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spk-22%2FPhish-Guard/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262800611,"owners_count":23366392,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["casual-sampling","gnn-model","graphsage","ids","phishing-detection","pytorch","temporal-data"],"created_at":"2025-06-08T06:36:32.759Z","updated_at":"2025-06-30T15:31:33.319Z","avatar_url":"https://github.com/spk-22.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Phishing Detection Using Graph Sage using Casual Sampling (GNNs)\n\nThis repository presents a complete workflow for phishing detection leveraging **GraphSAGE**, a type of Graph Neural Network (GNN), with temporal modeling, causal sampling, and robustness testing.\n\n## 🧠 Overview\n\nPhishing attacks often involve subtle patterns that can be better detected using relational and temporal data. This project converts phishing datasets into graphs and applies a GNN model that:\n\n- Respects **causal constraints** in message passing.\n- Incorporates **temporal windowing** for realistic data flow.\n- Tests **robustness** through noise injection.\n\n## 🛠 Tech Stack\n\n- **Programming Language:** Python\n- **Graph Processing:** [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/)\n- **Machine Learning:** PyTorch, Scikit-learn\n- **Data Handling:** pandas, numpy\n- **Visualization:** matplotlib\n\n## 📊 Workflow Summary\n\n### 1. **Data Preprocessing**\n- Load and clean phishing data from `phish.xlsx`\n- One-hot encode categorical features\n- Scale numerical features\n- Combine features for each URL\n\n### 2. **Graph Construction**\n- Create a similarity graph using cosine similarity\n- Connect each node to k=5 nearest neighbors\n- Partition data into time windows of 10 samples\n- Generate PyG `Data` objects for each time window\n\n### 3. **Causal GraphSAGE Model**\n- Custom model using `SAGEConv`, `BatchNorm`, `Dropout`\n- Enforces **causal message passing** (no future info leakage)\n\n### 4. **Noise Injection for Robustness**\n- Add Gaussian noise to node features\n- Randomly flip labels to simulate real-world inconsistencies\n\n### 5. **Training**\n- Trained with Binary Cross-Entropy loss and Adam optimizer\n- Evaluated using AUC-ROC score and ROC curve visualization\n\n## 📈 Evaluation\n\nThe model achieved strong performance on phishing detection:\n\n| Metric     | Value  |\n|------------|--------|\n| Accuracy   | 86.36% |\n| Precision  | 86.32% |\n| Recall     | 86.36% |\n| F1-Score   | 86.14% |\n| AUC-ROC    | 0.9023 | Visualized in final plot |\n\n# Visualizations\n* Training Loss and Accuracy Over Epochs (Causal GraphSAGE): Visualizes the convergence of the model during causal training, showing decreasing loss and increasing accuracy over epochs. \n* Confusion Matrix: Provides a detailed breakdown of true positives, true negatives, false positives, and false negatives from the final evaluation, illustrating the model's classification accuracy for each class.\n* ROC Curve: Illustrates the model's trade-off between True Positive Rate and False Positive Rate across various classification thresholds, with the AUC-ROC score quantifying overall performance. \n* Training Loss - Phishing Noise Training: Depicts the loss reduction during the training phase where noise was intentionally injected, demonstrating the model's ability to learn effectively despite data imperfections. \n* Overall Training Loss/Accuracy: Shows the general learning progression of the model, likely from an initial training phase, with loss decreasing and accuracy increasing.\n* Visual Interface: The dashboard helps to visualize the data fed to the global (fusion classifier) and attack - specific models for viewing class probabilities, graph plot visualization and accuracy metrics, confidence scores of both models and the probable reason behind the respective model's classification.\n\n# Dependencies\nThe project relies on the following key libraries:\n\nPython 3.x\ntorch (PyTorch)\ntorch-geometric (PyG)\ntorch-scatter\npandas\nnumpy\nscikit-learn\nmatplotlib\ngradio\n\n```bash\ngit clone https://github.com/spk-22/Phish-Guard\n```\n```bash\npip install -r requirements.txt\n# (Or manually install: torch, torch-geometric, scikit-learn, pandas, numpy, matplotlib)\n# Ensure torch-geometric, torch-scatter, and torch-sparse versions are compatible with your PyTorch version.\n```\n```bash \npython phish.py\n```\n```bash\nstreamlit run web_app.py\n```\n## 🔍 Use Case\n\nThis pipeline is ideal for cybersecurity researchers and engineers looking to detect phishing attempts using relational and temporal patterns within data.\nThe AUC-ROC score of 0.9023 signifies excellent discriminative power, even when trained on noisy data, indicating the model's strong ability to differentiate between phishing and legitimate attempts.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspk-22%2Fphish-guard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspk-22%2Fphish-guard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspk-22%2Fphish-guard/lists"}