{"id":15651296,"url":"https://github.com/pszemraj/ai-msgbot","last_synced_at":"2025-10-14T03:31:50.948Z","repository":{"id":40496192,"uuid":"417333815","full_name":"pszemraj/ai-msgbot","owner":"pszemraj","description":"Training \u0026 Implementation of chatbots leveraging GPT-like architecture with the aitextgen package to enable dynamic conversations.","archived":true,"fork":false,"pushed_at":"2022-09-06T07:22:08.000Z","size":70054,"stargazers_count":48,"open_issues_count":0,"forks_count":10,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-30T23:32:22.238Z","etag":null,"topics":["ai","aitextgen","chat-application","chatbot","deep-learning","deepspeed","deployment","gpt-2","gpt-j","gpt-j-6b","gradio","huggingface","huggingface-transformers","natural-language-processing","nlp","nlp-parsing","telegram","telegram-bot","text-generation","transformers"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pszemraj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-15T01:31:37.000Z","updated_at":"2025-01-16T07:37:13.000Z","dependencies_parsed_at":"2022-08-27T18:50:35.629Z","dependency_job_id":null,"html_url":"https://github.com/pszemraj/ai-msgbot","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/pszemraj/ai-msgbot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pszemraj%2Fai-msgbot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pszemraj%2Fai-msgbot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pszemraj%2Fai-msgbot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pszemraj%2Fai-msgbot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pszemraj","download_url":"https://codeload.github.com/pszemraj/ai-msgbot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pszemraj%2Fai-msgbot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279017767,"owners_count":26086145,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","aitextgen","chat-application","chatbot","deep-learning","deepspeed","deployment","gpt-2","gpt-j","gpt-j-6b","gradio","huggingface","huggingface-transformers","natural-language-processing","nlp","nlp-parsing","telegram","telegram-bot","text-generation","transformers"],"created_at":"2024-10-03T12:37:49.965Z","updated_at":"2025-10-14T03:31:45.932Z","avatar_url":"https://github.com/pszemraj.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI Chatbots based on GPT Architecture\n\n\u003e It sure seems like there are a lot of text-generation chatbots out there, but it's hard to find a python package or model that is easy to tune around a simple text file of message data. This repo is a simple attempt to help solve that problem.\n\n\u003cimg src=\"https://i.imgur.com/aPxUkH7.png\" width=\"384\" height=\"256\"/\u003e\n\n`ai-msgbot` covers the practical use case of building a chatbot that sounds like you (or some dataset/persona you choose) by training a text-generation model to generate conversation in a consistent structure. This structure is then leveraged to deploy a chatbot that is a \"free-form\" model that _consistently_ replies like a human.\n\n There are three primary components to this project:\n\n1. parsing a dataset of conversation-like data\n2. training a text-generation model. This repo is designed around using the Google Colab environment for model training.\n3. Deploy the model to a chatbot interface for users to interact with, either locally or on a cloud service.\n\nIt relies on the [`aitextgen`](https://github.com/minimaxir/aitextgen) and [`python-telegram-bot`](https://github.com/python-telegram-bot/python-telegram-bot) libraries. Examples of how to train larger models with DeepSpeed are in the `notebooks/colab-huggingface-API` directory.\n\n```sh\npython ai_single_response.py -p \"greetings sir! what is up?\"\n\n... generating...\n\nfinished!\n('hello, i am interested in the history of vegetarianism. i do not like the '\n 'idea of eating meat out of respect for sentient life')\n```\n\nSome of the trained models can be interacted with through the HuggingFace spaces and model inference APIs on the [ETHZ Analytics Organization](https://huggingface.co/ethzanalytics) page on huggingface.co.\n\n* * *\n\n**Table of Contents**\n\n\u003c!-- TOC --\u003e\n\n- [Quick outline of repo](#quick-outline-of-repo)\n  - [Example](#example)\n- [Quickstart](#quickstart)\n- [Repo Overview and Usage](#repo-overview-and-usage)\n  - [New to Colab?](#new-to-colab)\n  - [Parsing Data](#parsing-data)\n  - [Training a text generation model](#training-a-text-generation-model)\n    - [Training: Details](#training-details)\n  - [Interaction with a Trained model](#interaction-with-a-trained-model)\n  - [Improving Response Quality: Spelling \u0026 Grammar Correction](#improving-response-quality-spelling--grammar-correction)\n- [WIP: Tasks \u0026 Ideas](#wip-tasks--ideas)\n- [Extras, Asides, and Examples](#extras-asides-and-examples)\n  - [examples of command-line interaction with \"general\" conversation bot](#examples-of-command-line-interaction-with-general-conversation-bot)\n  - [Other resources](#other-resources)\n- [Citations](#citations)\n\n\u003c!-- /TOC --\u003e\n\n* * *\n\n## Quick outline of repo\n\n- training and message EDA notebooks in `notebooks/`\n- python scripts for parsing message data into a standard format for training GPT are in `parsing-messages/`\n- example data (from the _Daily Dialogues_ dataset) is in `conversation-data/`\n- Usage of default models is available via the `dowload_models.py` script\n\n### Example\n\nThis response is from a [bot on Telegram](https://t.me/GPTPeter_bot), finetuned on the author's messages\n\n\u003cimg src=\"https://i.imgur.com/OJB5EMw.png\" width=\"550\" height=\"150\" /\u003e\n\nThe model card can be found [here](https://huggingface.co/pszemraj/opt-peter-2.7B).\n\n## Quickstart\n\n\u003e **NOTE: to build all the requirements, you _may_ need Microsoft C++ Build Tools, found [here](https://visualstudio.microsoft.com/visual-cpp-build-tools/)**\n\n1. clone the repo\n2. cd into the repo directory: `cd ai-msgbot/`\n3. Install the requirements: `pip install -r requirements.txt`\n    - if using conda: `conda env create --file environment.yml`\n    - _NOTE:_ if there are any errors with the conda install, it may ask for an environment name which is `msgbot`\n4. download the models: `python download_models.py` _(if you have a GPT-2 model, save the model to the working directory, and you can skip this step)_\n5. run the bot: `python ai_single_response.py -p \"hey, what's up?\"` or enter a \"chatroom\" with `python conv_w_ai.py -p \"hey, what's up?\"`\n    - _Note:_ for either of the above, the `-h` parameter can be passed to see the options (or look in the script file)\n\nPut together in a shell block:\n\n```sh\ngit clone https://github.com/pszemraj/ai-msgbot.git\ncd ai-msgbot/\npip install -r requirements.txt\npython download_models.py\npython ai_single_response.py -p \"hey, what's up?\"\n```\n\n* * *\n\n## Repo Overview and Usage\n\n### A general process-flow\n\n\u003cimg src=\"https://i.imgur.com/9iUEzvV.png\" mwidth=\"872\" height=\"400\" /\u003e\n\n### Parsing Data\n\n- the first step in understanding what is going on here is to understand what is happening ultimately is teaching GPT-2 to recognize a \"script\" of messages and respond as such.\n  - this is done with the `aitextgen` library, and it's recommended to read through [some of their docs](https://docs.aitextgen.io/tutorials/colab/) and take a look at the _Training your model_ section before returning here.\n- essentially, to generate a novel chatbot from just text (without going through too much trouble as required in other libraries.. can you easily abstract your friend's WhatsApp messages into a \"persona\"?)\n\n**An example of what a \"script\" is:**\n\n    speaker a:\n    hi, becky, what's up?\n\n    speaker b:\n    not much, except that my mother-in-law is driving me up the wall.\n\n    speaker a:\n    what's the problem?\n\n    speaker b:\n    she loves to nit-pick and criticizes everything that i do. i can never do anything right when she's around.\n\n    ..._Continued_...\n\nmore to come, but check out `parsing-messages/parse_whatsapp_output.py` for a script that will parse messages exported with the standard [whatsapp chat export feature](https://faq.whatsapp.com/196737011380816/?locale=en_US#:~:text=You%20can%20use%20the%20export,with%20media%20or%20without%20media.). consolidate all the WhatsApp message export folders into a root directory, and pass the root directory to this\n\n\u003cfont color=\"yellow\"\u003e TODO: more words \u003c/font\u003e\n\n### Training a text generation model\n\nThe next step is to leverage the text-generative model to reply to messages. This is done by \"behind the scenes\" parsing/presenting the query with either a real or artificial speaker name and having the response be from `target_name` and, in the case of GPT-Peter, it is me.\n\nDepending on computing resources and so forth, it is possible to keep track of the conversation in a helper script/loop and then feed in the prior conversation and _then_ the prompt, so the model can use the context as part of the generation sequence, with of course the [attention mechanism](https://arxiv.org/abs/1706.03762) ultimately focusing on the last text past to it (the actual prompt)\n\nThen, deploying this pipeline to an endpoint where a user can send in a message, and the model will respond with a response. This repo has several options; see the `deploy-as-bot/` directory, which has an associated README.md file.\n\n#### Training: Details\n\n- an example dataset (_Daily Dialogues_) parsed into the script format can be found locally in the `conversation-data` directory.\n  - When learning, it is probably best to use a conversational dataset such as _Daily Dialogues_ as the last dataset to finetune the GPT2 model. Still, before that, the model can \"learn\" various pieces of information from something like a natural questions-focused dataset.\n  - many more datasets are available online at [PapersWithCode](https://paperswithcode.com/datasets?task=dialogue-generation\u0026mod=texts) and [GoogleResearch](https://research.google/tools/datasets/). Seems that _Google Research_ also has a tool for searching for datasets online.\n- Note that training is done in google colab itself. try opening `notebooks/colab-notebooks/GPT_general_conv_textgen_training_GPU.ipynb` in Google Colab (see the HTML button at the top of that notebook or click [this link to a shared git gist](https://colab.research.google.com/gist/pszemraj/06a95c7801b7b95e387eafdeac6594e7/gpt2-general-conv-textgen-training-gpu.ipynb))\n- Essentially, a script needs to be parsed and loaded into the notebook as a standard .txt file with formatting as outlined above. Then, the text-generation model will load and train using _aitextgen's_ wrapper around the PyTorch lightning trainer. Essentially, the text is fed into the model, and it self-evaluates for a \"test\" as to whether a text message chain (somewhere later in the doc) was correctly predicted or not.\n\n\u003cfont color=\"yellow\"\u003e TODO: more words \u003c/font\u003e\n\n### New to Colab?\n\n`aitextgen` is largely designed around leveraging Colab's free-GPU capabilities to train models. Training a text generation model and most transformer models, _is resource intensive_. If new to the Google Colab environment, check out the below to understand more of what it is and how it works.\n\n- [Google's FAQ](https://research.google.com/colaboratory/faq.html)\n- [Medium Article on Colab + Large Datasets](https://satyajitghana.medium.com/working-with-huge-datasets-800k-files-in-google-colab-and-google-drive-bcb175c79477)\n- [Google's Demo Notebook on I/O](https://colab.research.google.com/notebooks/io.ipynb)\n- [A better Colab Experience](https://towardsdatascience.com/10-tips-for-a-better-google-colab-experience-33f8fe721b82)\n\n### Interaction with a Trained model\n\n- Command line scripts:\n  - `python ai_single_response.py -p \"hey, what's up?\"`\n  - `python conv_w_ai.py -p \"hey, what's up?\"`\n  - You can pass the argument `--model \u003cNAME OF LOCAL MODEL DIR\u003e` to change the model.\n  - Example: `python conv_w_ai.py -p \"hey, what's up?\" --model \"GPT2_trivNatQAdailydia_774M_175Ksteps\"`\n- Some demos are available on the ETHZ Analytics Group's huggingface.co page (_no code required!_):\n  - [basic chatbot](https://huggingface.co/spaces/ethzanalytics/dialogue-demo)\n  - [GPT-2 XL Conversational Chatbot](https://huggingface.co/spaces/ethzanalytics/dialogue-demo)\n- Gradio - locally hosted runtime with public URL.\n  - See: `deploy-as-bot/gradio_chatbot.py`\n  - The UI and interface will look similar to the demos above, but run locally \u0026 are more customizable.\n- Telegram bot - Runs locally, and anyone can message the model from the Telegram messenger app(s).\n  - See: `deploy-as-bot/telegram_bot.py`\n  - An example chatbot by one of the authors is usually online and can be found [here](https://t.me/GPTPeter_bot)\n\n### Improving Response Quality: Spelling \u0026 Grammar Correction\n\nOne of this project's primary goals is to train a chatbot/QA bot that can respond to the user \"unaided\" where it does not need hardcoding to handle questions/edge cases. That said, sometimes the model will generate a bunch of strings together. Applying \"general\" spell correction helps make the model responses as understandable as possible without interfering with the response/semantics.\n\n- Implemented methods:\n  - **symspell** (via the pysymspell library) _NOTE: while this is fast and works, it sometimes corrects out common text abbreviations to random other short words that are hard to understand, i.e., **tues** and **idk** and so forth_\n  - **gramformer** (via transformers `pipeline()`object). a pretrained NN that corrects grammar and (to be tested) hopefully does not have the issue described above. Links: [model page](https://huggingface.co/prithivida/grammar_error_correcter_v1), [the models github](https://github.com/PrithivirajDamodaran/Gramformer/)\n- **Grammar Synthesis** (WIP) - Some promising results come from training a text2text generation model that, through \"pseudo-diffusion,\" is trained to denoise **heavily** corrupted text while learning to _not_ change the semantics of the text. A checkpoint and more details can be found [here](https://huggingface.co/pszemraj/grammar-synthesis-base) and a notebook [here](https://colab.research.google.com/gist/pszemraj/91abb08aa99a14d9fdc59e851e8aed66/demo-for-grammar-synthesis-base.ipynb).\n\n* * *\n\n## WIP: Tasks \u0026 Ideas\n\n- [x] finish out `conv_w_ai.py` that is capable of being fed a whole conversation (or at least, the last several messages) to prime response and \"remember\" things.\n- [ ] better text generation\n\n- add-in option of generating multiple responses to user prompts, automatically applying sentence scoring to them, and returning the one with the highest mean sentence score.\n- constrained textgen\n  - [x] explore constrained textgen\n  - [ ] add constrained textgen to repo\n        [x] assess generalization of hyperparameters for \"text-message-esque\" bots\n- [ ] add write-up with hyperparameter optimization results/learnings\n\n- [ ] switch repo API from `aitextgen` to `transformers pipeline` object\n- [ ] Explore model size about \"human-ness.\"\n\n## Extras, Asides, and Examples\n\n### examples of command-line interaction with \"general\" conversation bot\n\nThe following responses were received for general conversational questions with the `GPT2_trivNatQAdailydia_774M_175Ksteps` model. This is an example of what is capable (and much more!!) in terms of learning to interact with another person, especially in a different language:\n\n    python ai_single_response.py --time --model \"GPT2_trivNatQAdailydia_774M_175Ksteps\" --prompt \"where is the grocery store?\"\n\n    ... generating...\n\n    finished!\n\n    \"it's on the highway over there.\"\n    took 38.9 seconds to generate.\n\n    Python ai_single_response.py --time --model \"GPT2_trivNatQAdailydia_774M_175Ksteps\" --prompt \"what should I bring to the party?\"\n\n    ... generating...\n\n    finished!\n\n    'you need to just go to the station to pick up a bottle.'\n    took 45.9 seconds to generate.\n\n    C:\\Users\\peter\\PycharmProjects\\gpt2_chatbot\u003epython ai_single_response.py --time --model \"GPT2_trivNatQAdailydia_774M_175Ksteps\" --prompt \"do you like your job?\"\n\n    ... generating...\n\n    finished!\n\n    'no, not very much.'\n    took 50.1 seconds to generate.\n\n### Other resources\n\nThese are probably worth checking out if you find you like NLP/transformer-style language modeling:\n\n1. [The Huggingface Transformer and NLP course](https://huggingface.co/course/chapter1/2?fw=pt)\n2. [Practical Deep Learning for Coders](https://course.fast.ai/) from fast.ai\n\n* * *\n\n## Citations\n\nTODO: add citations for datasets and main packages used.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpszemraj%2Fai-msgbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpszemraj%2Fai-msgbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpszemraj%2Fai-msgbot/lists"}