Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/marcbperez/python-webcrawler
Crawls HTML pages for prices and other pieces of data.
https://github.com/marcbperez/python-webcrawler
crawler docker gradle python
Last synced: about 2 months ago
JSON representation
Crawls HTML pages for prices and other pieces of data.
- Host: GitHub
- URL: https://github.com/marcbperez/python-webcrawler
- Owner: marcbperez
- License: apache-2.0
- Created: 2017-03-09T21:18:48.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2020-11-18T12:06:02.000Z (about 4 years ago)
- Last Synced: 2024-10-20T00:35:45.980Z (3 months ago)
- Topics: crawler, docker, gradle, python
- Language: Python
- Homepage:
- Size: 73.2 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# python-webcrawler
Crawls HTML pages for prices and other pieces of data.
## Installation
Start by downloading and building the project when necessary. The following
commands will do the job on most Debian based Linux distributions.```bash
git clone https://github.com/marcbperez/python-webcrawler
cd python-webcrawler
sudo ./gradlew
```## Usage
The parser can retrieve the price of calling a landline in Canada, Germany,
Iceland, Pakistan, Singapore and South Africa.```python
from webcrawler.parser import Parser
from webcrawler.parserhelper import ParserHelperhelper = ParserHelper()
parser = Parser(helper, 'utf-8')
canada_landline_price = parser.landline_price(helper.CANADA())
```## Testing
Test checks are executed automatically every time the project is built. Builds
can be done remotely or continuously on a development context. For continuous
integration and development use docker-compose. This is recommended to keep the
system clean while the project is built every time the sources change.```bash
sudo docker-compose up
```For continuous integration and development without any dependencies use the
Gradle wrapper. This is the best option if the wrapper is available and the
Docker context is not valid. For a full list of tasks, see
`sudo ./gradlew tasks --all`. For a CI cycle use `sudo ./gradlew --continuous`.For continuous integration and development without Docker or the project wrapper
use Gradle directly. This will create the wrapper in case it is not present.
Similar to the above, for a CI cycle use `sudo gradle --continuous`. Gradle
3.4.1 is required for this to work. Plain Docker is also available for remote
integration tasks and alike. Build the image with `sudo docker build .` and run
a new container with it. Information on how to install Docker and docker-compose
can be found in their [official page][install-docker-compose]. A similar
installation guide is available [for Gradle][install-gradle].## Troubleshooting
The [issue tracker][issue-tracker] intends to manage and compile bugs,
enhancements, proposals and tasks. Reading through its material or reporting to
its contributors via the platform is strongly recommended.## Contributing
This project adheres to [Semantic Versioning][semver] and to certain syntax
conventions defined in [.editorconfig][editorconfig]. To get a list of changes
refer to the [CHANGELOG][changelog]. Only branches prefixed by *feature-*,
*hotfix-*, or *release-* will be considered:- Fork the project.
- Create your new branch: `git checkout -b feature-my-feature develop`
- Commit your changes: `git commit -am 'Added my new feature.'`
- Push the branch: `git push origin feature-my-feature`
- Submit a pull request.## Credits
This project is created by [marcbperez][author] and maintained by its
[author][author] and contributors.## License
This project is licensed under the [Apache License Version 2.0][license].
[author]: https://marcbperez.github.io
[issue-tracker]: https://github.com/marcbperez/python-webcrawler/issues
[editorconfig]: .editorconfig
[changelog]: CHANGELOG.md
[license]: LICENSE
[semver]: http://semver.org
[install-docker-compose]: https://docs.docker.com/compose/install/
[install-gradle]: https://gradle.org/install