{"id":20405544,"url":"https://github.com/astrabert/bloom-multilingual-chatbot","last_synced_at":"2026-04-23T03:32:00.217Z","repository":{"id":237494672,"uuid":"794548309","full_name":"AstraBert/bloom-multilingual-chatbot","owner":"AstraBert","description":"Conversate effortlessly in more than 50 languages!","archived":false,"fork":false,"pushed_at":"2024-05-06T09:18:25.000Z","size":62,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-25T15:12:44.443Z","etag":null,"topics":["deep-translator","docker-image","gradio-interface","gradio-python-llm","huggingface-transformers","langdetect","large-language-models","local-ai","localhost","multilingual","text-generation","text-generation-webui"],"latest_commit_sha":null,"homepage":"https://astrabert.github.io/bloom-multilingual-chatbot/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AstraBert.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-01T12:41:21.000Z","updated_at":"2025-06-15T08:01:08.000Z","dependencies_parsed_at":"2024-05-02T08:16:50.477Z","dependency_job_id":"6345b0a1-f610-46d8-b002-57d1fdc1df6c","html_url":"https://github.com/AstraBert/bloom-multilingual-chatbot","commit_stats":null,"previous_names":["astrabert/bloom-multilingual-chatbot"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/AstraBert/bloom-multilingual-chatbot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2Fbloom-multilingual-chatbot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2Fbloom-multilingual-chatbot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2Fbloom-multilingual-chatbot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2Fbloom-multilingual-chatbot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AstraBert","download_url":"https://codeload.github.com/AstraBert/bloom-multilingual-chatbot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2Fbloom-multilingual-chatbot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32164890,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-23T02:19:40.750Z","status":"ssl_error","status_checked_at":"2026-04-23T02:17:55.737Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-translator","docker-image","gradio-interface","gradio-python-llm","huggingface-transformers","langdetect","large-language-models","local-ai","localhost","multilingual","text-generation","text-generation-webui"],"created_at":"2024-11-15T05:11:53.101Z","updated_at":"2026-04-23T03:32:00.201Z","avatar_url":"https://github.com/AstraBert.png","language":"Python","readme":"# Bloom Multilingual Chatbot\n\n## Conversate effortlessly in more than 50 languages!\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/languages/top/AstraBert/bloom-multilingual-chatbot\" alt=\"GitHub top language\"\u003e\n   \u003cimg src=\"https://img.shields.io/github/commit-activity/t/AstraBert/bloom-multilingual-chatbot\" alt=\"GitHub commit activity\"\u003e\n   \u003cimg src=\"https://img.shields.io/badge/chatbot-stable-green\" alt=\"Static Badge\"\u003e\n   \u003cimg src=\"https://img.shields.io/badge/Release-v0.0.0-purple\" alt=\"Static Badge\"\u003e\n   \u003cimg src=\"https://img.shields.io/badge/Docker_image_size-5.54GB-red\" alt=\"Static Badge\"\u003e\n   \u003cimg src=\"https://img.shields.io/badge/Supported_platforms-linux/amd64-brown\" alt=\"Static Badge\"\u003e\n   \u003cdiv\u003e\n        \u003ca href=\"https://astrabert.github.io/bloom-multilingual-chat\"\u003e\u003cimg src=\"./multilingualbloom.png\"\u003e\u003c/a\u003e\n        \u003cp\u003e\u003ci\u003eThis logo was generated with \u003ca href=\"https://www.coze.com/s/ZmFqxkofJ/\"\u003eCoderLogon\u003c/a\u003e, a Coze bot that generates logos for your GitHub repos, exploiting \u003ca href=\"https://pollinations.ai/\"\u003ePollinations AI\u003c/a\u003e API\u003c/i\u003e\u003c/p\u003e\n   \u003c/div\u003e\n\u003c/div\u003e\n\n## Yes, ChatGPT is multilingual, but...\n...It does not yield the same high performances that you can get by querying it in English. \n\nFor non-native speakers this can represent an initial barrier, for two reasons:🚧\n\n### 1. Engineering effective prompts \nWhen English is not your first language, generating on-point questions that fully express what you mean can be hard, and it is not unusual that ChatGPT or other language model get confused about what you are asking for, at least with their first answers.🤔\n\n### 2. Unreliable results in your mother-tongue\nOn the other hand, when trying to speak with the LLM in your native language (especially if it is not well represented in the World-Wide-Web cultural products), you can bump into awkward phrasing, errors or difficulties in interpreting idioms and other everyday expressions.🤨\n\n## What can we do?\nIt would be great if we could generate a multilingual LLM from scratch, and [Bigscience](https://bigscience.huggingface.co/), for instance, is doing a lot in this direction with Bloom🌸.\n\nNevertheless, we can also decide to build upon already-existent English-based models, without finetuning or retraining them, but with a clever workaround: we can use a filtering function that is able to translate the user's native language query in English, feeding it to the LLM and retrieving the response, which will be eventually back-translated from English to the original language.㊗️\n\nCurious of trying? Let's use some python to build it!🐍\n\n### 1. Import all the necessary dependencies\nTo build a multi-lingual chatbot, you'll need several dependencies, which you can install via `pip`:\n\n```bash\npython3 -m pip install transformers==4.39.3 \\\nlangdetect==1.0.9 \\\ndeep-translator==1.11.4 \\\ntorch==2.1.2 \\\ngradio==4.28.3\n```\nLet's see what these packages do:\n\n- **transformers** is a package by Hugging Face, that helps you interact with models on HF Hub ([GitHub](https://github.com/huggingface/transformers))\n- **langdetect** is a package for automated language detection ([Github](https://github.com/Mimino666/langdetect))\n- **deep-translator** is a package to translate sentences, based on several translation services ([GitHub](https://github.com/nidhaloff/deep-translator))\n- **torch** is a package to manage tensors and dynamic neural networks in python ([GitHub](https://github.com/pytorch/pytorch))\n- **gradio** is a package developed to ease the development of app interfaces in python and other languages ([GitHub](https://github.com/gradio-app/gradio))\n\n### 2. Build the back-end architecture\nWe need to build a back-end architecture that looks like this (realized with [Drawio](https://app.diagrams.net/)):\n\n![Multilingual chatbot flowchart](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/njyj6q8h96z8yhrb0dv5.png) \n\nLet's define a `Translation` class that helps us with detecting the original language and translating it:\n\n```python\nfrom langdetect import detect\nfrom deep_translator import GoogleTranslator\n\nclass Translation:\n    def __init__(self, text, destination):\n        self.text = text\n        self.destination = destination\n        try:\n            self.original = detect(self.text) # detect original\n        except Exception as e:\n            self.original = \"auto\" # if it does not work, default to \"auto\"\n    def translatef(self):\n        translator = GoogleTranslator(source=self.original, target=self.destination) # use Google Translate, one of the fastest translators available\n        translation = translator.translate(self.text)\n        return translation\n```\n\nAs you can see, the class takes, as arguments, the text we want to translate (`text`) and the language we want to translate it into (`destination`). \n\nLet's now load the LLM that we want to use for our purposes: we'll start with Bigscience's Bloom-1.7B, which is a medium-sized LLM and a good match for a 16GB RAM, 2-core CPU hardware.\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\n\nmodel = AutoModelForCausalLM.from_pretrained(\"bigscience/bloom-1b7\") # import the model\ntokenizer = AutoTokenizer.from_pretrained(\"bigscience/bloom-1b7\") # load the tokenizer\n\npipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer, max_new_tokens=2048, repetition_penalty=1.2, temperature=0.4) # prepare the inference pipeline\n```\nWe define a maximum number of generated tokens (2048), set the repetition penalty to 1.2 (fairly high) in order to avoid the model repeating the same thing over and over again, and we keep the temperature (\"creativity\" in generating the response) quite low. \n\nNow, let's create a function that is able to take a message from the chat, translate it to English (unless it is already in English), feed it as a prompt to Bloom, retrieve the English response and back-translate it into the original language:\n\n```python\ndef reply(message, history):\n    txt = Translation(message, \"en\")\n    if txt.original == \"en\":\n        response = pipe(message)\n        return response[0][\"generated_text\"]\n    else:\n        translation = txt.translatef()\n        response = pipe(translation)\n        t = Translation(response[0][\"generated_text\"], txt.original)\n        res = t.translatef()\n        return res\n```\nWe have all we need for our back-end architecture, it is time to build the front-end interface!\n\n### 3. Build the front-end user interface\nWith Gradio, building the user's interface is as simple as one line of code:\n\n```python\ndemo = gr.ChatInterface(fn=reply, title=\"Multilingual-Bloom Bot\")\n```\n\nNow we can launch the application with:\n\n```python\ndemo.launch()\n```\n\nAnd, imagining that we saved the whole script in a file titled `chat.py`, to make the chatbot run we go to our terminal and type:\n\n```bash\npython3 chat.py\n```\n\nThen we patiently wait and head over to the local server link that Gradio will give us once everything is loaded and ready to work!\n\nIf you want to find the source code, go to the [scripts](./scripts/) folder.\n\n## Demo\nDo you want to try what we just created? Make sure to visit this Hugging Face Space I built: [as-cle-bert/bloom-multilingual-chat](https://huggingface.co/spaces/as-cle-bert/bloom-multilingual-chat)💻.\n\n## Run `bloom-multilingual-chatbot` on your machine\n\n**bloom-multilingual-chatbot** is also available as a Docker image:\n\n```bash\ndocker pull ghcr.io/astrabert/bloom-multilingual-chatbot:latest\n```\nYou can then make it run with the following command:\n\n```bash\ndocker run -p 7860:7860 ghcr.io/astrabert/bloom-multilingual-chatbot:latest\n```\n**IMPORTANT NOTE**: running the app within `docker run` does not log the port on which the app is running until you press `Ctrl+C`, but in that moment it also interrupt the execution! The app will run on port `0.0.0.0:7860` (or `localhost:7860` if your browser is Windows-based), so just make sure to open your browser on that port and to refresh it after 1 to 5 mins (depending on your computer and network capacities), when the model and the tokenizer should be loaded and the app should be ready to work!\n\nAnother fundamental caveat is that we are dealing here with a relatively small language model (approx. 3GB), so the it is CPU-friendly (you can run it GPUless): to make the docker container work, indeed, 8GB RAM + 12 cores CPU can be enough, but language generation will be really slow. \n\n**You will need at least 16 to 32 GB RAM and/or a GPU to speed up the model.**\n\n## Support\nIf you like the idea, make sure to show your support by leaving a little ⭐ on GitHub!\n\nIf you please, support my open-source work by [funding me on GitHub](https://github.com/sponsors/AstraBert): in this way, it will be possible for me to improve my multilingual chatbot performances by hosting it on a more powerful hardware on HF.\n","funding_links":["https://github.com/sponsors/AstraBert"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrabert%2Fbloom-multilingual-chatbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fastrabert%2Fbloom-multilingual-chatbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrabert%2Fbloom-multilingual-chatbot/lists"}