https://github.com/jianzhnie/LLMToolkit

LLMToolkit is a toolkit for NLP(Natural Language Processing) and LLM(Large Language Models) using Pytorch.
https://github.com/jianzhnie/LLMToolkit

bert elmo gpt nlp pytorch t5 transformer

Last synced: 2 months ago
JSON representation

LLMToolkit is a toolkit for NLP(Natural Language Processing) and LLM(Large Language Models) using Pytorch.

Host: GitHub
URL: https://github.com/jianzhnie/LLMToolkit
Owner: jianzhnie
License: apache-2.0
Created: 2022-03-04T08:57:09.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-11-25T03:18:27.000Z (6 months ago)
Last Synced: 2025-02-28T21:41:34.146Z (3 months ago)
Topics: bert, elmo, gpt, nlp, pytorch, t5, transformer
Language: Python
Homepage:
Size: 1.31 MB
Stars: 6
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # LLMToolkit



## Introduction

**`llmtoolkit`** is a toolkit for NLP(Natural Language Processing) and LLM(Large Language Models) using **Pytorch**.  **`llmtoolkit`**  has implemented many language models and data preprocessing methods. More importantly, it provides a lot of examples that can run end-to-end.

## Tokenizer

- [x] [BaseTokenizer](<>)

- [x] [JiebaTokenizer](<>)

- [x] [SentencePieceTokenizer](<>)

- [x] [BytePairEncoding(BPE)Tokenizer](<>)

- [x] [BertTokenizer](<>)

## Support Models

Supported Language Models:

- [x] [RNNLM](<>)

- [x] [CNNLM](<>)

- [x] [Ngram](<>)

- [x] [SkipGram](<>)

- [x] [CBOW](<>)

- [x] [Glove](<>)

- [x] [CoVe](<>)

- [x] [ELMO](<>)

- [x] [ULMFiT](<>)

- [x] [Seq2Seq | Attention Seq2Seq](<>)

Supported Transformer Models:

- [x] [Transformer](<>)

- [x] [Bert](<>)

- [x] [XLNet](<>)

- [x] [GPT](<>)

- [x] [GPT2](<>)

- [x] [RoBERTa](<>)

- [x] [T5](<>)

## Dependencies

- Python 3.7+

- Pytorch 1.5.0+

## Reference:

- https://zh.d2l.ai/

  - Dive into Deep Learning，D2L.ai

- https://github.com/dmlc/gluon-nlp/

  - GluonNLP: NLP made easy

- https://github.com/huggingface/tokenizers

  - Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

- https://github.com/The-AI-Summer/self-attention-cv

  - Self-attention building blocks for computer vision applications in PyTorch

- [自然语言处理：基于预训练模型的方法](https://item.jd.com/13344628.html)（作者：车万翔、郭江、崔一鸣）

## License

`llmtoolkit` is released under the Apache 2.0 license.

## Citation

Please cite the repo if you use the data or code in this repo.

```bibtex

@misc{llmtoolkit,

  author = {jianzhnie},

  title = {llmtoolkit: llmtoolkit is a toolkit for NLP and LLMs using Pytorch},

  year = {2023},

  publisher = {GitHub},

  journal = {GitHub repository},

  howpublished = {\url{https://github.com/jianzhnie/LLMToolkit}},

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jianzhnie/LLMToolkit

Awesome Lists containing this project

README