Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ivangolt/telegram_posts_parser
telegram posts parser using ETL process
https://github.com/ivangolt/telegram_posts_parser
dvc dvc-pipeline python s3 telegram
Last synced: 6 days ago
JSON representation
telegram posts parser using ETL process
- Host: GitHub
- URL: https://github.com/ivangolt/telegram_posts_parser
- Owner: ivangolt
- Created: 2024-07-09T07:19:42.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-10-27T16:49:27.000Z (19 days ago)
- Last Synced: 2024-10-27T19:48:19.128Z (19 days ago)
- Topics: dvc, dvc-pipeline, python, s3, telegram
- Language: Python
- Homepage:
- Size: 14.8 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Telegram vacancies parser (DVC pipeline and S3 storage)
Parse posts from list of telegram channels using dvc pipeline. For parsing using [snscrape](https://github.com/tobe93gf/snscrape) library.
**Stage of pipeline:**
1) Parsing telegram channels
2) Posts preprocessing (using for training model)
3) Push to S3 storage (in this project use cloud from cloud.ru)
## Project set up
`pip install poetry`
`poetry install `
Run dvc pipeline:
`dvc repro`