{"id":30692920,"url":"https://github.com/pytlicek/llm-train-chat-example","last_synced_at":"2025-09-02T05:07:04.303Z","repository":{"id":227547583,"uuid":"771746805","full_name":"Pytlicek/LLM-Train-Chat-Example","owner":"Pytlicek","description":"LLM Training experiment","archived":false,"fork":false,"pushed_at":"2024-03-13T22:01:22.000Z","size":61,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-09-01T07:31:49.522Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Pytlicek.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-13T21:52:30.000Z","updated_at":"2024-03-13T22:01:26.000Z","dependencies_parsed_at":"2024-03-13T23:23:48.780Z","dependency_job_id":"186a5a99-4777-4fe7-9661-58e89c9f6ca3","html_url":"https://github.com/Pytlicek/LLM-Train-Chat-Example","commit_stats":null,"previous_names":["pytlicek/llm-train-chat-example"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Pytlicek/LLM-Train-Chat-Example","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pytlicek%2FLLM-Train-Chat-Example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pytlicek%2FLLM-Train-Chat-Example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pytlicek%2FLLM-Train-Chat-Example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pytlicek%2FLLM-Train-Chat-Example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Pytlicek","download_url":"https://codeload.github.com/Pytlicek/LLM-Train-Chat-Example/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pytlicek%2FLLM-Train-Chat-Example/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273233235,"owners_count":25068731,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-02T02:00:09.530Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-02T05:06:55.794Z","updated_at":"2025-09-02T05:07:04.293Z","avatar_url":"https://github.com/Pytlicek.png","language":"Python","readme":"# BERT-Based Text Generation\n\nThis repository contains code for training a BERT model for masked language modeling and generating text based on prompts using the trained model.\n\n## Installation\n\nBefore running the code, ensure you have Python and PyTorch installed. You also need to install the `transformers` library by Hugging Face:\n\n```\npip install transformers\npip install tokenizers\npip install torch\n```\n\n## Files\n\n- `train.py`: This script trains a BERT model on a text dataset for masked language modeling. It uses the transformers library and a custom dataset class for training.\n- `chat.py`: This script demonstrates how to generate text based on prompts using the trained BERT model. Note that BERT is not primarily designed for text generation, so the results might not always be coherent.\n\n## Usage\n\n### Training the Model\n\nRun the `train.py` script to train the model. Ensure you have a dataset named `dataset.txt` in the same directory:\n\n```\npython train.py\n```\n\nThe trained model and tokenizer will be saved in the `./results` directory.\n\n### Generating Text\n\nUse the `chat.py` script to generate text based on prompts using the trained model:\n\n```\npython chat.py\n```\n\n## Note\n\nThe generated text quality might vary as BERT is primarily designed for understanding tasks rather than generation. However, this project serves as a demonstration of custom training and text generation capabilities.\n\nEnjoy exploring BERT-based text generation!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytlicek%2Fllm-train-chat-example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpytlicek%2Fllm-train-chat-example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytlicek%2Fllm-train-chat-example/lists"}