Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/autodistill/autodistill-gpt-text
Use GPT and LLaMAfile to classify text for use in training smaller, fine-tuned text classification models.
https://github.com/autodistill/autodistill-gpt-text
Last synced: 5 days ago
JSON representation
Use GPT and LLaMAfile to classify text for use in training smaller, fine-tuned text classification models.
- Host: GitHub
- URL: https://github.com/autodistill/autodistill-gpt-text
- Owner: autodistill
- Created: 2024-05-21T10:15:41.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-06-11T15:28:28.000Z (5 months ago)
- Last Synced: 2024-06-12T00:48:31.187Z (5 months ago)
- Language: Python
- Homepage:
- Size: 12.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Autodistill GPT Module
This repository contains the code supporting the GPT (text) base model for use with [Autodistill](https://github.com/autodistill/autodistill).
You can use Autodistill GPT to classify text using OpenAI's GPT models for use in training smaller, fine-tuned text classification models. You can also use Autodistill GPT to use LLaMAfile text generation models.
Read the full [Autodistill documentation](https://autodistill.github.io/autodistill/).
## Installation
To use GPT or LLaMAfile models with Autodistill, you need to install the following dependency:
```bash
pip3 install autodistill-gpt-text
```## Quickstart (LLaMAfile)
```python
from autodistill_gpt_text import GPTClassifier# define an ontology to map class names to our GPT prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated annotations
# then, load the model
base_model = GPTClassifier(
ontology=CaptionOntology(
{
"computer vision": "computer vision",
"natural language processing": "nlp"
}
),
base_url = "http://localhost:8080/v1", # your llamafile server
model_id="LLaMA_CPP"
)# label a single text
result = GPTClassifier.predict("This is a blog post about computer vision.")# label a JSONl file of texts
base_model.label("data.jsonl", output="output.jsonl")
```## Quickstart (GPT)
```python
from autodistill_gpt_text import GPTClassifier# define an ontology to map class names to our GPT prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated annotations
# then, load the model
base_model = GPTClassifier(
ontology=CaptionOntology(
{
"computer vision": "computer vision",
"natural language processing": "nlp"
}
)
)# label a single text
result = GPTClassifier.predict("This is a blog post about computer vision.")# label a JSONl file of texts
base_model.label("data.jsonl", output="output.jsonl")
```The output JSONl file will contain all the data in your original file, with a new `classification` key in each entry that contains the predicted text label associated with that entry.
## License
This project is licensed under an [MIT license](LICENSE).
## 🏆 Contributing
We love your input! Please see the core Autodistill [contributing guide](https://github.com/autodistill/autodistill/blob/main/CONTRIBUTING.md) to get started. Thank you 🙏 to all our contributors!