{"id":25804791,"url":"https://github.com/yogsec/web-history-analysis","last_synced_at":"2025-08-12T12:33:24.063Z","repository":{"id":276613220,"uuid":"929784539","full_name":"yogsec/Web-History-Analysis","owner":"yogsec","description":"Web History Analysis is an advanced tool for classifying and categorizing URLs from browser history logs using machine learning techniques.","archived":false,"fork":false,"pushed_at":"2025-03-31T20:24:23.000Z","size":347,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-31T21:27:08.577Z","etag":null,"topics":["bug-bounty-tools","cyber-security","cybersecurity","cybersecurity-tools","deep-learning-cybersecurity","deeplearning","digital-forensics","foremost","forensics","forensics-investigations","forensics-tools","forinsics-investivation","hack-with-ai","hackers","hacking","hacking-tool","hacking-tools","machine-learning","penetration-testing","yogsec"],"latest_commit_sha":null,"homepage":"https://linktr.ee/yogsec","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yogsec.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["yogsec"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":null,"thanks_dev":null,"custom":null}},"created_at":"2025-02-09T11:55:12.000Z","updated_at":"2025-03-31T20:29:07.000Z","dependencies_parsed_at":"2025-03-31T21:26:00.589Z","dependency_job_id":"7352c070-023a-4423-bf5f-1ea2e8fd9e05","html_url":"https://github.com/yogsec/Web-History-Analysis","commit_stats":null,"previous_names":["yogsec/web-history-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/yogsec/Web-History-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yogsec%2FWeb-History-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yogsec%2FWeb-History-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yogsec%2FWeb-History-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yogsec%2FWeb-History-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yogsec","download_url":"https://codeload.github.com/yogsec/Web-History-Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yogsec%2FWeb-History-Analysis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270061133,"owners_count":24520250,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-12T02:00:09.011Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bug-bounty-tools","cyber-security","cybersecurity","cybersecurity-tools","deep-learning-cybersecurity","deeplearning","digital-forensics","foremost","forensics","forensics-investigations","forensics-tools","forinsics-investivation","hack-with-ai","hackers","hacking","hacking-tool","hacking-tools","machine-learning","penetration-testing","yogsec"],"created_at":"2025-02-27T18:53:50.995Z","updated_at":"2025-08-12T12:33:23.921Z","avatar_url":"https://github.com/yogsec.png","language":"Python","funding_links":["https://github.com/sponsors/yogsec","https://buymeacoffee.com/yogsec"],"categories":[],"sub_categories":[],"readme":"# Web History Analysis\n\n**Web History Analysis** is an advanced tool for classifying and categorizing URLs from browser history logs using machine learning techniques. This project leverages deep learning models, specifically an LSTM (Long Short-Term Memory) network, to classify URLs into predefined categories based on historical browsing data. It’s ideal for security researchers, data analysts, or anyone interested in analyzing web browsing activity and categorizing web traffic effectively.\n\n![screenshot](https://github.com/yogsec/Web-History-Analysis/blob/main/Hack-with-ai.jpeg)\n\nDesigned by **YogSec**, Web History Analysis is a powerful solution to analyze large sets of URLs, offering valuable insights into user browsing patterns and website classifications.\n\n## Key Features\n\n- **URL Classification**: Automatically classifies URLs from browser history logs into predefined categories.\n- **Machine Learning Integration**: Utilizes TensorFlow's LSTM network to classify URLs based on labeled training data.\n- **Preprocessing Capabilities**: Cleans and processes URLs to remove unwanted parts such as protocols, numbers, and special characters.\n- **File Input Support**: Classify a list of URLs from a CSV file or text file, making it easy to work with large datasets.\n- **Model Evaluation**: After training, the model evaluates its performance using a test dataset, providing an accuracy report.\n\n## Installation\n\nBefore using Web History Analysis, make sure to install the necessary dependencies. The tool requires Python 3.x and the following Python packages:\n\n- **TensorFlow**: A deep learning library to train the model.\n- **pandas**: For handling and processing CSV data.\n- **numpy**: For numerical operations.\n- **scikit-learn**: For machine learning utilities such as label encoding and train-test splitting.\n\n### Install the required dependencies:\n\n```bash\npip install tensorflow pandas numpy scikit-learn\n```\n\n## How to Use\n\nFollow these simple steps to get started with **Web History Analysis**:\n\n### 1. Prepare Your Labeled Data\n\nYou need to prepare a CSV file (`labeled_data.csv`) with the following structure:\n\n- `url`: The URL from your browser history.\n- `category`: The category that the URL belongs to (e.g., Shopping, News, Social Media, etc.).\n\nExample (`labeled_data.csv`):\n\n| url                                       | category     |\n|-------------------------------------------|--------------|\n| https://www.example.com                   | Shopping     |\n| https://news.example.com                  | News         |\n| https://www.facebook.com                  | Social Media |\n\n### 2. Preprocessing and Model Training\n\nThe code will load the labeled data and preprocess the URLs by removing the protocol (http, https), replacing numbers with a placeholder, and cleaning up special characters. Then, it tokenizes and pads the URLs to make them compatible with the deep learning model. After this, the LSTM model is trained on the preprocessed data.\n\n### Example of how the training works:\n\n```python\nimport pandas as pd\n\ndf = pd.read_csv('labeled_data.csv')  # Load your labeled data\n```\n\nOnce the data is prepared, the training process starts and will automatically evaluate the model's accuracy on a test dataset.\n\n### 3. Classify URLs\n\nOnce the model is trained, you can classify URLs from any file (e.g., CSV or text). This is done using the `classify_urls_from_file()` function. It processes the URLs, applies the model for classification, and outputs the predicted categories.\n\nTo classify URLs from a file:\n\n```bash\npython web_history_analysis.py\nEnter the filename containing URLs: urls.txt\n```\n\n### Example of the output:\n\n```\nURL: https://example.com/product/123 → Category: Shopping\nURL: https://news.example.com/article/456 → Category: News\n```\n\n### 4. Evaluate Model Accuracy\n\nAfter training, the model will evaluate its accuracy on the test set and output the result.\n\n```python\nloss, accuracy = model.evaluate(X_test, y_test)\nprint(f'Accuracy: {accuracy * 100:.2f}%')\n```\n\n## File Structure\n\nHere’s an example structure for the project:\n\n```\n.\n├── labeled_data.csv        # CSV file with labeled URLs and categories\n├── web_history_analysis.py # The script to train the model and classify URLs\n├── urls.txt                # A text file containing URLs to be classified\n└── README.md               # This README file\n```\n\n## Licensing\n\nThis project is licensed under the **MIT License**. Feel free to fork, modify, and distribute this tool as per your needs.\n\n## 🌟 Let's Connect!\n\nHello, Hacker! 👋 We'd love to stay connected with you. Reach out to us on any of these platforms and let's build something amazing together:\n\n🌐 **Website:** [https://yogsec.github.io/yogsec/](https://yogsec.github.io/yogsec/)  \n📜 **Linktree:** [https://linktr.ee/yogsec](https://linktr.ee/yogsec)  \n🔗 **GitHub:** [https://github.com/yogsec](https://github.com/yogsec)  \n💼 **LinkedIn (Company):** [https://www.linkedin.com/company/yogsec/](https://www.linkedin.com/company/yogsec/)  \n📷 **Instagram:** [https://www.instagram.com/yogsec.io/](https://www.instagram.com/yogsec.io/)  \n🐦 **Twitter (X):** [https://x.com/yogsec](https://x.com/yogsec)  \n👨‍💼 **Personal LinkedIn:** [https://www.linkedin.com/in/bug-bounty-hunter/](https://www.linkedin.com/in/bug-bounty-hunter/)  \n📧 **Email:** abhinavsingwal@gmail.com\n\n---\n\n## ☕ Buy Me a Coffee\n\nIf you find our work helpful and would like to support us, consider buying us a coffee. Your support keeps us motivated and helps us create more awesome content. ❤️\n\n☕ **Support Us Here:** [https://buymeacoffee.com/yogsec](https://buymeacoffee.com/yogsec)\n\nThank you for your support! 🚀\n\n**Designed by YogSec** - A cybersecurity startup focused on vulnerability assessment and security research.\n\nFor any questions or feedback, contact us via email at [abhinavsingwal@gmail.com](mailto:abhinavsingwal@gmail.com) or visit our [LinkedIn](https://www.linkedin.com/in/bug-bounty-hunter).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyogsec%2Fweb-history-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyogsec%2Fweb-history-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyogsec%2Fweb-history-analysis/lists"}