Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ngshiheng/burplist
Web crawler for Burplist, a search engine for craft beers in Singapore
https://github.com/ngshiheng/burplist
craftbeer python scrapy
Last synced: 15 days ago
JSON representation
Web crawler for Burplist, a search engine for craft beers in Singapore
- Host: GitHub
- URL: https://github.com/ngshiheng/burplist
- Owner: ngshiheng
- License: mit
- Created: 2021-03-25T14:16:25.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-17T11:52:06.000Z (about 1 month ago)
- Last Synced: 2024-10-25T09:29:21.248Z (23 days ago)
- Topics: craftbeer, python, scrapy
- Language: Python
- Homepage: https://burplist.com
- Size: 1.42 MB
- Stars: 13
- Watchers: 1
- Forks: 5
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: docs/CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
Burplist
[![CI](https://github.com/ngshiheng/burplist/actions/workflows/ci.yml/badge.svg)](https://github.com/ngshiheng/burplist/actions/workflows/ci.yml)
[![CD](https://github.com/ngshiheng/burplist/actions/workflows/cd.yml/badge.svg)](https://github.com/ngshiheng/burplist/actions/workflows/cd.yml)## Context
Welcome to the official web crawler repository for [Burplist](https://burplist.com) built using [Scrapy](https://scrapy.org/).
Growing up in a frugal family, I would spend hours browsing online, looking for the best bang for my bucks. Needless to say, the process was super exhausting and slowly turns into frustration.
So then I thought, why not just create a search engine for craft beers?
[Read more...](https://jerrynsh.com/how-i-built-burplist-for-free/).
## Disclaimer
This software is only used for research purposes, users must abide by the relevant laws and regulations of their location, please do not use it for illegal purposes. The user shall bear all the consequences caused by illegal use.
## Features
- [x] 10+ unique [spiders](./burplist/spiders/) for top craft beer sites in Singapore
- [x] [Sentry](https://sentry.io/) integration
- [x] [ScrapeOps](https://scrapeops.io) integration
- [x] [Scraper API](https://www.scraperapi.com/?fp_ref=jerryng) for proxy requests
- [x] Automated random user agent rotation
- [x] Colored logging
- [x] Data deduplication pipeline
- [x] Database migration with [Alembic](https://alembic.sqlalchemy.org/en/latest/)
- [x] Delayed requests middleware## Requirements
- [python](https://www.python.org/downloads/)
- [pip](https://pip.pypa.io/en/stable/installation/)
- [poetry](https://python-poetry.org/docs/#installation)
- [docker](https://docs.docker.com/get-docker/)## Usage
See [this documentation](docs/USAGE.md) on how to use Burplist.
## Contributing
For guidance on setting up a development environment and how to make a contribution, read the [contributing guidelines](docs/CONTRIBUTING.md).