Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bonedaddy/ipfs-searx-scraper

a tool for scraping ipfs results from searx
https://github.com/bonedaddy/ipfs-searx-scraper

Last synced: 4 days ago
JSON representation

a tool for scraping ipfs results from searx

Host: GitHub
URL: https://github.com/bonedaddy/ipfs-searx-scraper
Owner: bonedaddy
License: mit
Created: 2018-10-15T23:45:39.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2015-12-13T08:14:47.000Z (almost 9 years ago)
Last Synced: 2024-10-15T18:10:09.421Z (29 days ago)
Language: Python
Size: 1000 Bytes
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # noetic-searx-scraper

This is a python script for scraping search results from the json api of [a public searx instance](https://github.com/asciimoo/searx/wiki/Searx-instances)\*. It was set up to scrape results for pages from [ipfs](https://ipfs.io), but the query can be changed to anything. The code isn't very pretty; you have to hardcode the number of pages of results you want to scrape. I'm sure there are lots of things that could be cleaned up with a little bit of work.

\* Ideally, you should just [run searx locally](https://github.com/asciimoo/searx#alternative-recommended-installation) and scrape that, so you don't put too much of a load on a public instance. I had troubles setting it up locally, but ymmv. If you do use a public instance, be mindful of how many pages you're scraping.

## Contributing

This whole thing could probably be made much nicer. I'm happy to accept any pull requests, although it's unlikely I'll work on it much myself.

## License

MIT.