{"id":16087991,"url":"https://github.com/twoeightnine/xvii_admin_bot","last_synced_at":"2026-01-20T00:36:47.747Z","repository":{"id":37221052,"uuid":"281902558","full_name":"TwoEightNine/xvii_admin_bot","owner":"TwoEightNine","description":"ml-bot to answer the most common questions to users","archived":false,"fork":false,"pushed_at":"2022-12-08T11:14:41.000Z","size":112,"stargazers_count":2,"open_issues_count":12,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-13T18:53:10.874Z","etag":null,"topics":["bot","machine-learning","nlp","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TwoEightNine.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-07-23T09:00:42.000Z","updated_at":"2024-12-27T17:40:38.000Z","dependencies_parsed_at":"2023-01-25T12:15:37.843Z","dependency_job_id":null,"html_url":"https://github.com/TwoEightNine/xvii_admin_bot","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TwoEightNine%2Fxvii_admin_bot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TwoEightNine%2Fxvii_admin_bot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TwoEightNine%2Fxvii_admin_bot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TwoEightNine%2Fxvii_admin_bot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TwoEightNine","download_url":"https://codeload.github.com/TwoEightNine/xvii_admin_bot/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247694392,"owners_count":20980729,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bot","machine-learning","nlp","python"],"created_at":"2024-10-09T13:34:33.298Z","updated_at":"2026-01-20T00:36:47.710Z","avatar_url":"https://github.com/TwoEightNine.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## russian answer bot for groups of vk (and not only)\n\nthis is ML-solution to create a bot for groups of vk social network \n(but you can easily create your own social network delegate).\nthe bot's behavior is based on real users' messages sent earlier. \ncurrently the bot supports only russian language\n\nthis bot was created to assist me with answering the most frequent questions in vk group's messages (see [xvii messenger for vk](https://github.com/TwoEightNine/XVII))\n\nread further for more information about how it works\n\n**this project is in state of baseline**\n\n### installation\n\n#### step 0. cloning and setup\n\nclone this repository using git\n```bash\ngit clone https://github.com/TwoEightNine/xvii_admin_bot.git\ncd xvii_admin_bot\n```\n\nthen install and activate a virtual environment\n\n```bash\nsudo apt install python3-venv\npython3.6 -m venv admin_bot_env\nsource admin_bot_env/bin/activate\npip install -r requirements.txt\n```\n\nin the root directory you should create file `secret.py` with some sensitive information. the file should be like this:\n\n```python\naccess_token = 'your token here'\nno_fetch_users = [13371337228]\n```\n\n`access_token` is a token to access group messages (to obtain the token visit [this page](https://vk.com/dev/authcode_flow_group), do not forget to use `scope=messages,offline`).\n`no_fetch_user` is a list of users' ids. if you want to ignore messages from a user, put here his id\n\n#### step 1. fetching messages\n\nto fetch messages run\n```bash\npython3 fetcher.py --count COUNT --social SOCIAL [-h]\n```\n\nwhere `COUNT` is how many dialogs to fetch to get messages from, \n`SOCIAL` is which social network to use. request `-h` help\nto see which social networks are supported\n\nthe script will load messages into `data/messages.csv`\n\n#### step 2. find clusters\n\nto perform semi-automatic labelling here goes this step. \nfetched messages are being lemmatized and cleaned, then converted to tf-idf vectors.\nspectral clustering is used. to perform clustering run:\n\n```bash\npython3 clusterizer.py [--search] [--clusters_count CL_COUNT] --random_state RND_ST [-h]\n```\n\nwhere `--search` is an optional flag to perform search for better clusters count,\n`--clusters_count` is required to perform final clustering, \ndefines preferred number of clusters,\n`--random_state` is a random int for better reproducibility\n\nyou may want to run search (with `--search` flag) to calculate clustering metrics\nfor different number of clusters. in this case the script will print this information\n\nafter search you have already defined 'good' clusters count for your task.\nnow run this script again but with `--clusters_count YOUR_VALUE` and \nthe script will create `data/model_explanation.txt` \nwith information about the most frequent words in every cluster.\nif you think that the result of clustering is not so good,\nyou can rerun clustering with other number of cluster or other random state\n\n\nusing the data you are going to create `classes.json` in next format:\n\n```json\n{\n  \"your_class_1\": {\n    \"clusters\": [3, 7, 11],\n    \"response\": \"your_response_for_class_1\"\n  },\n  \"your_class_2\": {\n    \"clusters\": [2],\n    \"response\": \"__UNREAD\"\n  },\n  \"your_class_3\": {\n    \"clusters\": [16],\n    \"response\": \"your_response_for_class_3\"\n  }\n}\n```\n\nthis will help to convert clusters into needed classes.\nusing these classes the model will train. \n\n`clusters` are indexes of clusters that matches your class\n`response` is an answer to user. this field may contain special markers \nlike `__UNREAD` and `__READ`. in these cases the response will not be sent\nbut the conversation will be left read (no answer needed) or unread \n(human attention needed)\n\nall not mentioned clusters implicitly belong to class `undefined` \nwith response `__UNREAD`\n\n#### step 3. find and train a model\n\nafter you created `classes.json` you can start to search for and train a model to\nperform predictions. \n\nto search execute\n\n```bash\npython3 modeller.py --search [--cv CV] [--sort_by METRIC]\n```\n\nwhere `--search` is an optional flag that indicates that you want to\nperform search (using sklearn's `GridSearch`), \n`CV` is how many k-folds to use in cross validation,\n`METRIC` is a metric alias to sort by\n\nyou can use default search params (like estimators and parameters) \nor define own in `hyperparams.py` (variable `search_estimators`)\n\nafter search you can see 5 best results (according to `--sort_by`)\nand explore all configurations in `data/search_results.csv`. best model\nshould be set in `hyperparams.py` as `final_estimator`\n\nto train a model run\n\n```bash\npython3 modeller.py [--cv CV] [--pca_n_components N_COM]\n```\n\nwhere `N_COM` is an argument for `PCA()`'s `n_components` value,\nif not set, PCA is not used\n\n`data/model_pipeline.pkl` and `data/model_classes.pkl` will be created\n\n**optionally**, you can interactively check the model using\n\n```bash\npython3 predictor.py\n```\n\nenter russian message and see which class the model thinks it belongs to\n\n#### step 4. run and chill\n\nthe bot is ready to start. to launch it enter\n\n```bash\npython3 bot.py --social SOCIAL\n```\n\nin stdout you will see status messages, incoming messages and \npredicted answers\n\n###### twoeightnine, 2020","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwoeightnine%2Fxvii_admin_bot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftwoeightnine%2Fxvii_admin_bot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwoeightnine%2Fxvii_admin_bot/lists"}