{"id":16271672,"url":"https://github.com/jianzhnie/LLMToolkit","last_synced_at":"2025-03-19T23:30:58.398Z","repository":{"id":103457907,"uuid":"466051066","full_name":"jianzhnie/LLMToolkit","owner":"jianzhnie","description":"LLMToolkit  is a toolkit for NLP(Natural Language Processing) and LLM(Large Language Models) using Pytorch.  ","archived":false,"fork":false,"pushed_at":"2024-11-25T03:18:27.000Z","size":1370,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-28T21:41:34.146Z","etag":null,"topics":["bert","elmo","gpt","nlp","pytorch","t5","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jianzhnie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-04T08:57:09.000Z","updated_at":"2025-01-31T09:20:21.000Z","dependencies_parsed_at":"2023-10-16T02:46:33.079Z","dependency_job_id":"45d90e95-6fcb-4da8-b21d-f330ba1ff057","html_url":"https://github.com/jianzhnie/LLMToolkit","commit_stats":null,"previous_names":["jianzhnie/llmtoolkit"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jianzhnie%2FLLMToolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jianzhnie%2FLLMToolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jianzhnie%2FLLMToolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jianzhnie%2FLLMToolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jianzhnie","download_url":"https://codeload.github.com/jianzhnie/LLMToolkit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244031033,"owners_count":20386534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","elmo","gpt","nlp","pytorch","t5","transformer"],"created_at":"2024-10-10T18:14:22.429Z","updated_at":"2025-03-19T23:30:58.392Z","avatar_url":"https://github.com/jianzhnie.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLMToolkit\n\n\u003cimg src=\"docs/imgs/PLMfamily.jpg\" alt=\"PLMfamily\" style=\"zoom:200%;\" /\u003e\n\n## Introduction\n\n**`llmtoolkit`** is a toolkit for NLP(Natural Language Processing) and LLM(Large Language Models) using **Pytorch**.  **`llmtoolkit`**  has implemented many language models and data preprocessing methods. More importantly, it provides a lot of examples that can run end-to-end.\n\n## Tokenizer\n\n- [x] [BaseTokenizer](\u003c\u003e)\n- [x] [JiebaTokenizer](\u003c\u003e)\n- [x] [SentencePieceTokenizer](\u003c\u003e)\n- [x] [BytePairEncoding(BPE)Tokenizer](\u003c\u003e)\n- [x] [BertTokenizer](\u003c\u003e)\n\n## Support Models\n\nSupported Language Models:\n\n- [x] [RNNLM](\u003c\u003e)\n- [x] [CNNLM](\u003c\u003e)\n- [x] [Ngram](\u003c\u003e)\n- [x] [SkipGram](\u003c\u003e)\n- [x] [CBOW](\u003c\u003e)\n- [x] [Glove](\u003c\u003e)\n- [x] [CoVe](\u003c\u003e)\n- [x] [ELMO](\u003c\u003e)\n- [x] [ULMFiT](\u003c\u003e)\n- [x] [Seq2Seq | Attention Seq2Seq](\u003c\u003e)\n\nSupported Transformer Models:\n\n- [x] [Transformer](\u003c\u003e)\n- [x] [Bert](\u003c\u003e)\n- [x] [XLNet](\u003c\u003e)\n- [x] [GPT](\u003c\u003e)\n- [x] [GPT2](\u003c\u003e)\n- [x] [RoBERTa](\u003c\u003e)\n- [x] [T5](\u003c\u003e)\n\n## Dependencies\n\n- Python 3.7+\n- Pytorch 1.5.0+\n\n## Reference:\n\n- https://zh.d2l.ai/\n  - Dive into Deep Learning，D2L.ai\n- https://github.com/dmlc/gluon-nlp/\n  - GluonNLP: NLP made easy\n- https://github.com/huggingface/tokenizers\n  - Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.\n- https://github.com/The-AI-Summer/self-attention-cv\n  - Self-attention building blocks for computer vision applications in PyTorch\n- [自然语言处理：基于预训练模型的方法](https://item.jd.com/13344628.html)（作者：车万翔、郭江、崔一鸣）\n\n## License\n\n`llmtoolkit` is released under the Apache 2.0 license.\n\n## Citation\n\nPlease cite the repo if you use the data or code in this repo.\n\n```bibtex\n@misc{llmtoolkit,\n  author = {jianzhnie},\n  title = {llmtoolkit: llmtoolkit is a toolkit for NLP and LLMs using Pytorch},\n  year = {2023},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/jianzhnie/LLMToolkit}},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjianzhnie%2FLLMToolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjianzhnie%2FLLMToolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjianzhnie%2FLLMToolkit/lists"}