An open API service indexing awesome lists of open source software.

https://github.com/philipmay/llm-data

LLM Training Data
https://github.com/philipmay/llm-data

llm sft

Last synced: 11 months ago
JSON representation

LLM Training Data

Awesome Lists containing this project

README

          

# LLM Data

This repository is mainly about cleaning, converting and checking LLM training datasets.

## Datasets

New datasets cleaned and created by this project:

- [UltraChat-200k-ShareGPT-clean](https://huggingface.co/datasets/PhilipMay/UltraChat-200k-ShareGPT-clean)
- [Nectar-ShareGPT-clean](https://huggingface.co/datasets/PhilipMay/Nectar-ShareGPT-clean)

## Licensing

Copyright (c) 2024 [Philip May](https://philipmay.org/)

Licensed under the **MIT License** (the "License"); you may not use this file except in compliance with the License.
You may obtain a copy of the License by reviewing the file
[LICENSE](https://github.com/PhilipMay/llm-data/blob/main/LICENSE) in the repository.