https://github.com/stupidcucumber/elephant-crawler
System for mining texts from websites.
https://github.com/stupidcucumber/elephant-crawler
data data-mining-python python
Last synced: about 1 month ago
JSON representation
System for mining texts from websites.
- Host: GitHub
- URL: https://github.com/stupidcucumber/elephant-crawler
- Owner: stupidcucumber
- License: mit
- Created: 2024-10-18T17:23:00.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-27T10:27:04.000Z (over 1 year ago)
- Last Synced: 2025-01-19T08:15:31.402Z (over 1 year ago)
- Topics: data, data-mining-python, python
- Language: Python
- Homepage:
- Size: 111 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# elephant-crowler


## Development
To start contributing this repository:
1. Install requirements:
```
python -m pip install -r requirements.dev.txt
```
2. Install pre-commit hook:
```
pre-commit install
```
You're good to go!
### Architecture

1. DB stores all data from the texts.
2. Core-API provides access to the database for the external services.
3. Crawler-SVC starts all
## Deployment
Only thing you need to do is:
```
docker-compose up --build
```
Then all scrapped texts are available on the endpoint:
```
http://localhost:8081/scrapped-texts
```