https://github.com/philipmay/llm-data
LLM Training Data
https://github.com/philipmay/llm-data
llm sft
Last synced: 11 months ago
JSON representation
LLM Training Data
- Host: GitHub
- URL: https://github.com/philipmay/llm-data
- Owner: PhilipMay
- License: mit
- Created: 2024-04-06T06:24:26.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-07T06:53:40.000Z (about 2 years ago)
- Last Synced: 2025-06-30T17:49:56.271Z (12 months ago)
- Topics: llm, sft
- Language: Jupyter Notebook
- Homepage:
- Size: 237 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLM Data
This repository is mainly about cleaning, converting and checking LLM training datasets.
## Datasets
New datasets cleaned and created by this project:
- [UltraChat-200k-ShareGPT-clean](https://huggingface.co/datasets/PhilipMay/UltraChat-200k-ShareGPT-clean)
- [Nectar-ShareGPT-clean](https://huggingface.co/datasets/PhilipMay/Nectar-ShareGPT-clean)
## Licensing
Copyright (c) 2024 [Philip May](https://philipmay.org/)
Licensed under the **MIT License** (the "License"); you may not use this file except in compliance with the License.
You may obtain a copy of the License by reviewing the file
[LICENSE](https://github.com/PhilipMay/llm-data/blob/main/LICENSE) in the repository.