An open API service indexing awesome lists of open source software.

https://github.com/samalyarov/telegram_monitor

A telegram channel parser + binary text classifier utilizing a simple logistic regression model
https://github.com/samalyarov/telegram_monitor

binary-classification classifier-model eli5 logistic-regression nltk-python openpyxl parsing regular-expressions scikit-learn sklearn sklearn-library telegram telegrambot-python text-classification tqdm

Last synced: 6 months ago
JSON representation

A telegram channel parser + binary text classifier utilizing a simple logistic regression model

Awesome Lists containing this project

README

          

# Telegram monitoring script

A script is based on telethon library and utilizes a personal Telegram account for acquiring messages from open channels or channel to which the user has access. Data can be collected from multiple channels (from a list in separate files). That particular version of the scripts then utilizes text pre-processing and a trained logistic regression model to determine if the message is "of interest" and if so - forwards it to a specified channel.
Whether the message is "of interest" is arbitrary and based purely on the samples and interests of one using the script.

In this particular example, the script is used by a concierge company to ease the monitoring of many chats in which they search for potential clients. This helped save a lot of working time (estimated up to 70-80% workload reduction on that particular task) and expand the monitoring range - the script allowed to consistently monitor many more channels and groups, thus substantially increasing the company coverage without incurring any additional costs (a single worker could monitor several times more channels).

Files included in repository:

- `t_channels.xlsx` is a list of channels (with names and links) from which the posts are to be loaded. The list is done in a .xlsx file for ease of communicating with colleagues (especially those with no coding experience) and leaving comments and remarks.
- `config.ini` is a config file listing username, password, phone number of a user as well as bot API settings for connection.
- `TG_Monitor.py` is a script itself
- `Posts_prediction.ipynb` is an Jupyter Notebook with preliminary data analysis and model training, as well as prospects for future development and overall analysis of the model.
- In order to be operational the folder should also include 3 .pkl files - a trained logistic regression model, a trained vectorizer model and vocabulary for said vectorizer model. All three can be created through the use of the "Posts_prediction" notebook.

The folder is designed to later on be compressed into a single .exe file (via the use of pyinstaller) for easy use.

*Libraries used: telethon, configparser, datetime, numpy, pandas, pytz, matplotlib, seaborn, pickle, re, pymorphy2, nltk, scikit-learn(sklearn), eli5, tqdm (tqdm_notebook)*