{"id":18176642,"url":"https://github.com/s-r-e-e-r-a-j/SiteScraper","last_synced_at":"2025-04-01T18:31:14.881Z","repository":{"id":260677147,"uuid":"882038705","full_name":"s-r-e-e-r-a-j/SiteScraper","owner":"s-r-e-e-r-a-j","description":"The Site Scraper Tool is an ethical hacking program developed in Python that enables users to clone websites for educational purposes by copying HTML, CSS, JavaScript, and PHP. ","archived":false,"fork":false,"pushed_at":"2025-03-12T16:32:54.000Z","size":105,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-24T10:21:40.363Z","etag":null,"topics":["ethical-hacking","ethical-hacking-tools","hacking-tool","hacking-tools","hackingtools","kali-linux-hacking","kali-linux-tools","python","python-3","python-script","python3","sitescraper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/s-r-e-e-r-a-j.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-01T18:48:11.000Z","updated_at":"2025-03-12T16:32:57.000Z","dependencies_parsed_at":"2024-11-19T18:26:24.368Z","dependency_job_id":"4fd52e9d-8c08-4d47-aa68-ea4250cc2ff8","html_url":"https://github.com/s-r-e-e-r-a-j/SiteScraper","commit_stats":{"total_commits":44,"total_committers":1,"mean_commits":44.0,"dds":0.0,"last_synced_commit":"7ebe8394bb903fd4a96bc1b7a382a25ad1ed8194"},"previous_names":["s-r-e-e-r-a-j/site-scraper-tool"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s-r-e-e-r-a-j%2FSiteScraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s-r-e-e-r-a-j%2FSiteScraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s-r-e-e-r-a-j%2FSiteScraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/s-r-e-e-r-a-j%2FSiteScraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/s-r-e-e-r-a-j","download_url":"https://codeload.github.com/s-r-e-e-r-a-j/SiteScraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246691497,"owners_count":20818521,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ethical-hacking","ethical-hacking-tools","hacking-tool","hacking-tools","hackingtools","kali-linux-hacking","kali-linux-tools","python","python-3","python-script","python3","sitescraper"],"created_at":"2024-11-02T17:09:50.034Z","updated_at":"2025-04-01T18:31:14.871Z","avatar_url":"https://github.com/s-r-e-e-r-a-j.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SiteScraper\n\nThe Site Scraper Tool is an ethical hacking program developed in Python that enables users to clone websites for educational purposes by copying HTML, CSS, JavaScript, and PHP.\n\n\n**Note:** Use this tool responsibly and only on sites where you have explicit permission, as unauthorized scraping can lead to legal issues.\n\n## Use Responsibly\n\n\n `Warning`: Use SiteScraper only on websites you own or have explicit permission to test and analyze. Unauthorized use of this tool on external sites without permission may violate laws and terms of service.\n\n\n## Installation\n\n### Clone the repository:\n\n```bash\ngit clone https://github.com/s-r-e-e-r-a-j/SiteScraper.git\n```\n\n\n### Navigate to the SiteScraper directory\n\n```bash\ncd SiteScraper\n```\n\n### install Required libraries:-\n\n```bash\npip3 install -r requirements.txt\n``````\n\n\n### Navigate to the Site Scraper directory\n``` bash\ncd 'Site Scraper'\n ```\n### install the tool:\n```bash\nsudo python3 install.py\n```\nThen Enter `y` for install\n\n\n## Usage\n\n\n Run SiteScraper from the command line with the following options:\n\n``` bash\nsitescraper \u003cURL\u003e [options]\n```\n\n\n## Command-Line Options\n\n### Option\tDescription\n\n\n ```\u003cURL\u003e\tThe URL of the website to clone```\n\n\n ```-d, --depth (Optional) Set the maximum crawl depth (default: 3)```\n\n\n ```-o, --output\t(Optional) Set the output directory (default: website_clone)     you can also specify path to save    example -o /home/kali/Desktop/result    ```\n\n\n\n\n### Example\n\n To clone a website up to a depth of 2 and save it in a directory named `my_clone`, use the following command:\n\n```bash\nsitescraper https://example.com -d 2 -o /home/kali/Desktop/my_clone\n```\nAfter the cloning process is complete, a directory named after the domain (e.g., `http.example.com`) will be created inside my_clone.\n\nTo view the cloned website, open the `index.html` file in a browser.\n\nIf you see `.php` files in the directory, it means the website has a PHP backend, and you need to start a PHP server to run it properly.\n\n#### Starting the PHP Server\n1. **Navigate to the Cloned Website Directory**\n   \n```bash\ncd /home/kali/Desktop/my_clone/http.example.com\n```\n3. **Start the PHP Server**\n\n   \nReplace `yourmachineipaddress` with your actual local IP (e.g., `192.168.1.5`):\n\n```nginx\nphp -S yourmachineipaddress:8080\n```\n\n**Example:**\n\n```nginx\nphp -S 192.168.1.5:8080\n```\n3. **Open the Cloned Website in a Browser**\n\n   \nIn your web browser, enter:\n\n```arduino\nhttp://yourmachineipaddress:8080\n```\n**Example:**\n\n```cpp\nhttp://192.168.1.5:8080\n```\n\n**Now, you should be able to access and interact with the cloned website.**\n\n\n#### How It Works\nSiteScraper follows these steps:\n\n1. `Initial Crawl`: Downloads the main page of the target site.\n\n \n 2. ` Recursive Crawling`: Finds all internal links, then recursively crawls and saves them.\n\n  \n 3. `Asset Handling`: Downloads and saves linked assets (CSS, JS, images).\n\n\n 4. `File Structure Preservation`: Saves files with the same structure as the original website, maintaining directories and paths.\n\n## uninstallation\n\n```bash\ncd SiteScraper\n```\n```bash\ncd 'Site Scraper'\n```\n```bash\nsudo python3 install.py\n```\nThen Enter `n` for uninstall\n\n\n## License\n\n\n This project is licensed under the MIT License.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fs-r-e-e-r-a-j%2FSiteScraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fs-r-e-e-r-a-j%2FSiteScraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fs-r-e-e-r-a-j%2FSiteScraper/lists"}