Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bonedaddy/ipfs-searx-scraper
a tool for scraping ipfs results from searx
https://github.com/bonedaddy/ipfs-searx-scraper
Last synced: 4 days ago
JSON representation
a tool for scraping ipfs results from searx
- Host: GitHub
- URL: https://github.com/bonedaddy/ipfs-searx-scraper
- Owner: bonedaddy
- License: mit
- Created: 2018-10-15T23:45:39.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2015-12-13T08:14:47.000Z (almost 9 years ago)
- Last Synced: 2024-10-15T18:10:09.421Z (29 days ago)
- Language: Python
- Size: 1000 Bytes
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# noetic-searx-scraper
This is a python script for scraping search results from the json api of [a public searx instance](https://github.com/asciimoo/searx/wiki/Searx-instances)\*. It was set up to scrape results for pages from [ipfs](https://ipfs.io), but the query can be changed to anything. The code isn't very pretty; you have to hardcode the number of pages of results you want to scrape. I'm sure there are lots of things that could be cleaned up with a little bit of work.
\* Ideally, you should just [run searx locally](https://github.com/asciimoo/searx#alternative-recommended-installation) and scrape that, so you don't put too much of a load on a public instance. I had troubles setting it up locally, but ymmv. If you do use a public instance, be mindful of how many pages you're scraping.
## Contributing
This whole thing could probably be made much nicer. I'm happy to accept any pull requests, although it's unlikely I'll work on it much myself.
## License
MIT.