https://github.com/autodistill/autodistill-gpt-text

Use GPT and LLaMAfile to classify text for use in training smaller, fine-tuned text classification models.
https://github.com/autodistill/autodistill-gpt-text

Last synced: 4 months ago
JSON representation

Use GPT and LLaMAfile to classify text for use in training smaller, fine-tuned text classification models.

Host: GitHub
URL: https://github.com/autodistill/autodistill-gpt-text
Owner: autodistill
Created: 2024-05-21T10:15:41.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-06-11T15:28:28.000Z (about 1 year ago)
Last Synced: 2025-02-14T03:18:33.850Z (5 months ago)
Language: Python
Homepage:
Size: 12.7 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        


  

    

      

    

  



# Autodistill GPT Module

This repository contains the code supporting the GPT (text) base model for use with [Autodistill](https://github.com/autodistill/autodistill).

You can use Autodistill GPT to classify text using OpenAI's GPT models for use in training smaller, fine-tuned text classification models. You can also use Autodistill GPT to use LLaMAfile text generation models.

Read the full [Autodistill documentation](https://autodistill.github.io/autodistill/).

## Installation

To use GPT or LLaMAfile models with Autodistill, you need to install the following dependency:

```bash

pip3 install autodistill-gpt-text

```

## Quickstart (LLaMAfile)

```python

from autodistill_gpt_text import GPTClassifier

# define an ontology to map class names to our GPT prompt

# the ontology dictionary has the format {caption: class}

# where caption is the prompt sent to the base model, and class is the label that will

# be saved for that caption in the generated annotations

# then, load the model

base_model = GPTClassifier(

    ontology=CaptionOntology(

        {

            "computer vision": "computer vision",

            "natural language processing": "nlp"

        }

    ),

    base_url = "http://localhost:8080/v1", # your llamafile server

    model_id="LLaMA_CPP"

)

# label a single text

result = GPTClassifier.predict("This is a blog post about computer vision.")

# label a JSONl file of texts

base_model.label("data.jsonl", output="output.jsonl")

```

## Quickstart (GPT)

```python

from autodistill_gpt_text import GPTClassifier

# define an ontology to map class names to our GPT prompt

# the ontology dictionary has the format {caption: class}

# where caption is the prompt sent to the base model, and class is the label that will

# be saved for that caption in the generated annotations

# then, load the model

base_model = GPTClassifier(

    ontology=CaptionOntology(

        {

            "computer vision": "computer vision",

            "natural language processing": "nlp"

        }

    )

)

# label a single text

result = GPTClassifier.predict("This is a blog post about computer vision.")

# label a JSONl file of texts

base_model.label("data.jsonl", output="output.jsonl")

```

The output JSONl file will contain all the data in your original file, with a new `classification` key in each entry that contains the predicted text label associated with that entry.

## License

This project is licensed under an [MIT license](LICENSE).

## 🏆 Contributing

We love your input! Please see the core Autodistill [contributing guide](https://github.com/autodistill/autodistill/blob/main/CONTRIBUTING.md) to get started. Thank you 🙏 to all our contributors!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/autodistill/autodistill-gpt-text

Awesome Lists containing this project

README