{"id":13585506,"url":"https://github.com/megadose/OnionSearch","last_synced_at":"2025-04-07T10:31:00.577Z","repository":{"id":40342155,"uuid":"248205029","full_name":"megadose/OnionSearch","owner":"megadose","description":"OnionSearch is a script that scrapes urls on different .onion search engines. ","archived":false,"fork":false,"pushed_at":"2024-08-08T11:31:59.000Z","size":75,"stargazers_count":1369,"open_issues_count":14,"forks_count":182,"subscribers_count":25,"default_branch":"master","last_synced_at":"2025-04-02T02:11:12.909Z","etag":null,"topics":["ahmia","deeplink","information-gathering","onion","open-source-intelligence","osint","osint-tools","phobos","pypi","python","scrapes-urls","search-engines"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/megadose.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-18T10:48:51.000Z","updated_at":"2025-04-01T22:13:51.000Z","dependencies_parsed_at":"2024-01-14T10:22:39.663Z","dependency_job_id":"d3c6e8c5-a258-404b-8251-0c09ea243141","html_url":"https://github.com/megadose/OnionSearch","commit_stats":{"total_commits":40,"total_committers":6,"mean_commits":6.666666666666667,"dds":0.7,"last_synced_commit":"fc9d62c84f9cac9cd260d727bdac770ce3c4c743"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megadose%2FOnionSearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megadose%2FOnionSearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megadose%2FOnionSearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megadose%2FOnionSearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/megadose","download_url":"https://codeload.github.com/megadose/OnionSearch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247636056,"owners_count":20970852,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ahmia","deeplink","information-gathering","onion","open-source-intelligence","osint","osint-tools","phobos","pypi","python","scrapes-urls","search-engines"],"created_at":"2024-08-01T15:04:58.999Z","updated_at":"2025-04-07T10:31:00.300Z","avatar_url":"https://github.com/megadose.png","language":"Python","funding_links":[],"categories":["[↑](#-content) 🛠️ Tools","Python","[](#table-of-contents) Table of contents"],"sub_categories":["[](#darknetdeepweb-search-tools)Darknet/deepweb search tools"],"readme":"# OnionSearch\n👋 Hi there! For any professional inquiries or collaborations, please reach out to me at:\nmegadose@protonmail.com\n\n📧 Preferably, use your professional email for correspondence. Let's keep it short and sweet, and all in English!\n\n![PyPI](https://img.shields.io/pypi/v/onionsearch) ![PyPI - Week](https://img.shields.io/pypi/dw/onionsearch) ![PyPI - Downloads](https://static.pepy.tech/badge/onionsearch) ![PyPI - License](https://img.shields.io/pypi/l/onionsearch)\n#### For BTC Donations : 1FHDM49QfZX6pJmhjLE5tB2K6CaTLMZpXZ\n## Educational purposes only\n\nOnionSearch is a Python3 script that scrapes urls on different \".onion\" search engines.\n\n![](https://files.catbox.moe/vguy1e.png)\n\n### Demo\n\n![](https://github.com/megadose/gif-demo/raw/master/onionsearch.gif)\n\n\n## 💡 Prerequisite\n[Python 3](https://www.python.org/download/releases/3.0/)\n\n## 📚 Currently supported Search engines\n- ahmia\n- darksearchio\n- onionland\n- notevil\n- darksearchenginer\n- phobos\n- onionsearchserver\n- torgle\n- onionsearchengine\n- tordex\n- tor66\n- tormax\n- haystack\n- multivac\n- evosearch\n- deeplink\n\n## 🛠️ Installation\n### With PyPI\n\n```pip3 install onionsearch```\n\n### With Github\n\n```bash\ngit clone https://github.com/megadose/OnionSearch.git\ncd OnionSearch/\npython3 setup.py install\n```\n\n\n## 📈  Usage\n\nHelp:\n```\nusage: onionsearch [-h] [--proxy PROXY] [--output OUTPUT]\n                  [--continuous_write CONTINUOUS_WRITE] [--limit LIMIT]\n                  [--engines [ENGINES [ENGINES ...]]]\n                  [--exclude [EXCLUDE [EXCLUDE ...]]]\n                  [--fields [FIELDS [FIELDS ...]]]\n                  [--field_delimiter FIELD_DELIMITER] [--mp_units MP_UNITS]\n                  search\n\npositional arguments:\n  search                The search string or phrase\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --proxy PROXY         Set Tor proxy (default: 127.0.0.1:9050)\n  --output OUTPUT       Output File (default: output_$SEARCH_$DATE.txt), where $SEARCH is replaced by the first chars of the search string and $DATE is replaced by the datetime\n  --continuous_write CONTINUOUS_WRITE\n                        Write progressively to output file (default: False)\n  --limit LIMIT         Set a max number of pages per engine to load\n  --engines [ENGINES [ENGINES ...]]\n                        Engines to request (default: full list)\n  --exclude [EXCLUDE [EXCLUDE ...]]\n                        Engines to exclude (default: none)\n  --fields [FIELDS [FIELDS ...]]\n                        Fields to output to csv file (default: engine name link), available fields are shown below\n  --field_delimiter FIELD_DELIMITER\n                        Delimiter for the CSV fields\n  --mp_units MP_UNITS   Number of processing units (default: core number minus 1)\n\n[...]\n```\n\n### Multi-processing behaviour\n\nBy default, the script will run with the parameter `mp_units = cpu_count() - 1`. It means if you have a machine with 4 cores,\nit will run 3 scraping functions in parallel. You can force `mp_units` to any value but it is recommended to leave to default.\nYou may want to set it to 1 to run all requests sequentially (disabling multi-processing feature).\n\nPlease note that continuous writing to csv file has not been *heavily* tested with multiprocessing feature and therefore\nmay not work as expected.\n\nPlease also note that the progress bars may not be properly displayed when `mp_units` is greater than 1.\n**It does not affect the results**, so don't worry.\n\n### Examples\n\nTo request all the engines for the word \"computer\":\n```\nonionsearch \"computer\"\n```\n\nTo request all the engines excepted \"Ahmia\" and \"Candle\" for the word \"computer\":\n```\nonionsearch \"computer\" --exclude ahmia candle\n```\n\nTo request only \"Tor66\", \"DeepLink\" and \"Phobos\" for the word \"computer\":\n```\nonionsearch \"computer\" --engines tor66 deeplink phobos\n```\n\nThe same as previously but limiting to 3 the number of pages to load per engine:\n```\nonionsearch \"computer\" --engines tor66 deeplink phobos --limit 3\n```\n\nPlease kindly note that the list of supported engines (and their keys) is given in the script help (-h).\n\n\n### Output\n\n#### Default output\n\nBy default, the file is written at the end of the process. The file will be csv formatted, containing the following columns:\n```\n\"engine\",\"name of the link\",\"url\"\n```\n\n#### Customizing the output fields\n\nYou can customize what will be flush in the output file by using the parameters `--fields` and `--field_delimiter`.\n\n`--fields` allows you to add, remove, re-order the output fields. The default mode is show just below. Instead, you can for instance\nchoose to output:\n```\n\"engine\",\"name of the link\",\"url\",\"domain\"\n```\nby setting `--fields engine name link domain`.\n\nOr even, you can choose to output:\n```\n\"engine\",\"domain\"\n```\nby setting `--fields engine domain`.\n\nThese are examples but there are many possibilities.\n\nFinally, you can also choose to modify the CSV delimiter (comma by default), for instance: `--field_delimiter \";\"`.\n\n#### Changing filename\n\nThe filename will be set by default to `output_$DATE_$SEARCH.txt`, where $DATE represents the current datetime and $SEARCH the first\ncharacters of the search string.\n\nYou can modify this filename by using `--output` when running the script, for instance:\n```\nonionsearch \"computer\" --output \"\\$DATE.csv\"\nonionsearch \"computer\" --output output.txt\nonionsearch \"computer\" --output \"\\$DATE_\\$SEARCH.csv\"\n...\n```\n(Note that it might be necessary to escape the dollar character.)\n\nIn the csv file produced, the name and url strings are sanitized as much as possible, but there might still be some problems...\n\n#### Write progressively\n\nYou can choose to progressively write to the output (instead of everything at the end, which would prevent\nlosing the results if something goes wrong). To do so you have to use `--continuous_write True`, just as is:\n```\nonionsearch \"computer\" --continuous_write True\n```\nYou can then use the `tail -f` (tail follow) Unix command to actively watch or monitor the results of the scraping.\n## Thank you to [Gobarigo](https://github.com/Gobarigo)\n## Thank you [mxrch](https://github.com/mxrch) for this logo\n\n## 📝 License\n[GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.fr.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmegadose%2FOnionSearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmegadose%2FOnionSearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmegadose%2FOnionSearch/lists"}