{"id":23007581,"url":"https://github.com/ice-wzl/datareaper","last_synced_at":"2025-08-14T03:31:44.656Z","repository":{"id":212393025,"uuid":"731395186","full_name":"ice-wzl/DataReaper","owner":"ice-wzl","description":"DataReaper is a powerful Python tool designed to harvest data from publicly accessible HTTP servers. It combines the capabilities of Shodan search with web scraping techniques to efficiently gather information from targeted websites.","archived":false,"fork":false,"pushed_at":"2024-02-22T04:27:29.000Z","size":50,"stargazers_count":10,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-02-22T05:28:57.808Z","etag":null,"topics":["data-visualization","datascience","datascraping","osint","osint-python","osint-tool","python3","redteam","vulnerability"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ice-wzl.png","metadata":{"files":{"readme":"README.md","changelog":"history.txt","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-12-14T01:45:12.000Z","updated_at":"2024-02-11T15:51:52.000Z","dependencies_parsed_at":"2023-12-14T02:55:07.124Z","dependency_job_id":"f5c77dc0-0c75-42e9-bb08-2fb7c29fd1d6","html_url":"https://github.com/ice-wzl/DataReaper","commit_stats":{"total_commits":24,"total_committers":2,"mean_commits":12.0,"dds":"0.29166666666666663","last_synced_commit":"3e420aaa35a8593c85d6ae3c08c49cf556347f3d"},"previous_names":["ice-wzl/datareaper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ice-wzl%2FDataReaper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ice-wzl%2FDataReaper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ice-wzl%2FDataReaper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ice-wzl%2FDataReaper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ice-wzl","download_url":"https://codeload.github.com/ice-wzl/DataReaper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229795792,"owners_count":18125286,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","datascience","datascraping","osint","osint-python","osint-tool","python3","redteam","vulnerability"],"created_at":"2024-12-15T08:16:28.037Z","updated_at":"2024-12-15T08:16:28.715Z","avatar_url":"https://github.com/ice-wzl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataReaper (DARE)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/ice-wzl/DataReaper/assets/75596877/c537207c-1d48-4766-b7e3-91a1f896ec04\"/\u003e\n\u003c/p\u003e\n\n# DataReaper (DARE): Documentation\n\nDataReaper is a powerful Python tool designed to harvest data from publicly accessible HTTP servers. It combines the capabilities of Shodan search with web scraping techniques to efficiently gather information from targeted websites.\n\n## Key Features:\n\n- Shodan Integration: Queries Shodan based on specific criteria and stores results in a text file.\n- Web Scraping: Extracts valuable content and links from target websites.\n- Reaping: Optionally gathers subdirectories and files for deeper analysis.\n- Tor Support: Anonymize your scans and protect your identity by using Tor.\n- History Tracking: Maintains a history file to avoid redundant scans and save time.\n\n## Installation:\n\n- Python 3.6+: Ensure Python 3.6 or later is installed.\n\n- Virtual Environment: Create a virtual environment to manage program dependencies separately from your system packages.\n````\npython3 -m venv venv\nsource venv/bin/activate\n````\n### Dependencies: Install required packages from requirements.txt.\n````\npip3 install -r requirements.txt\n````\n- Non-Free Shodan Membership:\n    - A paid Shodan membership is required to access the API and use this program's full functionality.\n    - Create a Shodan account and upgrade to a paid plan if needed.\n    - Obtain your API key from your account dashboard.\n    - Create a file named api.txt in the same directory as the program.\n    - Enter your API key as the only line in the file.\n\n## Usage:\n- Utilizing Tor for making requests is the default, if you plan on using the default option of Tor, ensure it is started on your system. Install Tor if it is not already present on your system.\n````\nsudo apt install tor\nsudo systemctl start tor\n````\n\n- Run the program:\n````\npython3 DataReaper.py\n````\n- Options:\\\n        - `-q`: Perform a Shodan query and update the result.txt file.\\\n        - `-s`: Scan and enumerate targets listed in the result.txt file.\\\n        - `-r`: Reap subdirectories and files from harvested targets (requires -s).\\\n        - `-x`: Execute all actions: Perform a Shodan query, scan targets, and reap data (equivalent to -q -s -r).\\\n        - `-n`: Disable Tor support: Do not use Tor for anonymized scanning.\\\n        - `-i`: Ignore history file: Scan all targets again, regardless of past scans.\\\n        - `-p [port number]`: Port number to do query or scan on. Default 8000.\\\n        - `-t [target ip]`: Target ip to scan. Assumes scan unless -r specified.\n\n- Output:\n    - Shodan query results are stored in the result.txt file.\n    - A history of scanned targets is maintained in the history.txt file.\n    - Harvested files are saved in directories based on the target IP address.\n\n## Examples:\n\n- Update results and scan targets:\n````\npython data_reaper.py -q -s\n````\n- Perform a complete data harvest with Tor:\n````\npython data_reaper.py -x\n````\n- Ignore history and scan all targets without Tor:\n````\npython data_reaper.py -s -i -n\n````\n## Disclaimer:\n\n- DataReaper is designed for educational and research purposes only. Use it responsibly and ethically, considering any relevant legal and ethical implications of data collection activities.\n\n- Further Information:\n\n- Shodan API Documentation: https://developer.shodan.io/api\n- Python Requests Library: https://readthedocs.org/projects/requests/\n\n### Thank you for using DataReaper!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fice-wzl%2Fdatareaper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fice-wzl%2Fdatareaper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fice-wzl%2Fdatareaper/lists"}