Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/TanGentleman/EduScribe-LLM-Backend
Pythonic dataset processing for fine-tuning LLMs. Used for a CalHacks 2023 award winning project.
https://github.com/TanGentleman/EduScribe-LLM-Backend
Last synced: 3 days ago
JSON representation
Pythonic dataset processing for fine-tuning LLMs. Used for a CalHacks 2023 award winning project.
- Host: GitHub
- URL: https://github.com/TanGentleman/EduScribe-LLM-Backend
- Owner: TanGentleman
- Created: 2023-10-31T23:36:25.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-01T00:13:17.000Z (about 1 year ago)
- Last Synced: 2023-11-01T01:23:34.770Z (about 1 year ago)
- Language: Python
- Size: 7.81 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome_ai_agents - Eduscribe-Llm-Backend - Pythonic dataset processing for fine-tuning LLMs. Used for a CalHacks 2023 award winning project. (Building / Datasets)
- awesome_ai_agents - Eduscribe-Llm-Backend - Pythonic dataset processing for fine-tuning LLMs. Used for a CalHacks 2023 award winning project. (Building / Datasets)
README
# EduScribe-LLM-Backend
Pythonic dataset processing for fine-tuning LLMs. Used for a CalHacks 2023 award winning project.EduScribe Devpost: https://devpost.com/software/eduscribe
Repository: https://github.com/VinnyXP/EduScribe## Procedure
1. Download a dataset from huggingface. For this project, I chose https://huggingface.co/datasets/vgoldberg/longform_article_summarization
2. Set filepaths and configuration constants in `config.py`
3. Run `python parse_parquet.py`## Features
1. Various functions to parse parquet files into a usable format for our use case, fine-tuning LLMs using the Together.ai API.