Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ivangolt/telegram_posts_parser

telegram posts parser using ETL process
https://github.com/ivangolt/telegram_posts_parser

dvc dvc-pipeline python s3 telegram

Last synced: 6 days ago
JSON representation

telegram posts parser using ETL process

Awesome Lists containing this project

README

        

# Telegram vacancies parser (DVC pipeline and S3 storage)

Parse posts from list of telegram channels using dvc pipeline. For parsing using [snscrape](https://github.com/tobe93gf/snscrape) library.

**Stage of pipeline:**

1) Parsing telegram channels

2) Posts preprocessing (using for training model)

3) Push to S3 storage (in this project use cloud from cloud.ru)

## Project set up

`pip install poetry`

`poetry install `

Run dvc pipeline:

`dvc repro`