Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Tele-AI/Telechat
https://github.com/Tele-AI/Telechat
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/Tele-AI/Telechat
- Owner: Tele-AI
- Created: 2024-01-07T06:06:18.000Z (12 months ago)
- Default Branch: master
- Last Pushed: 2024-08-27T03:07:19.000Z (4 months ago)
- Last Synced: 2024-08-28T03:46:27.620Z (4 months ago)
- Language: Python
- Size: 6.64 MB
- Stars: 1,729
- Watchers: 24
- Forks: 94
- Open Issues: 44
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- StarryDivineSky - Tele-AI/Telechat - 7B与TeleChat-12B。TeleChat-PTD 是由电信星辰大模型TeleChat预训练语料中抽取出的的综合性大规模中文数据集。数据主要来源于网页、书籍、官方媒体等。 我们使用规则+模型的方式进行了相关的过滤,并对数据进行了相似性去重,尽可能地提取出高质量地数据。TeleChat-PTD 数据集大约公开了2.7亿条数据,数据由纯中文文本构成,原始大小约1TB,压缩后480G,共189个文件。数据集中已经去除了其它冗余信息。[数据下载](https://huggingface.co/datasets/Tele-AI/TeleChat-PTD) (A01_文本生成_文本对话 / 大语言对话模型及数据)