https://github.com/sidmohan0/gpt_annotator
https://github.com/sidmohan0/gpt_annotator
Last synced: over 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/sidmohan0/gpt_annotator
- Owner: sidmohan0
- License: mit
- Created: 2023-10-22T02:15:17.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-26T17:38:34.000Z (over 2 years ago)
- Last Synced: 2025-01-21T10:09:53.545Z (over 1 year ago)
- Language: Python
- Size: 74.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.MD
Awesome Lists containing this project
README
# GPT Annotator
## Overview
GPT Annotator is a tool that utilizes GPT models to annotate textual data for Named Entity Recognition (NER). It works with various file formats including `.md`, `.txt`, `.pdf`, `.docx`, and `.html`.
## Features
- Extract text from multiple file formats.
- Tokenize and process sentences using NLTK.
- Generate annotated data suitable for NER tasks. Currently output as JSONL
## Requirements
- Python 3.x
- NLTK
- OpenAI API key
- PyPDF2
- python-docx
## Installation
```bash
# Clone the repository
git clone https://github.com/sidmohan0/gpt_annotator.git
# Install dependencies
pip install -r requirements.txt
```
## Usage
Set up your `.env` file with the following variables:
```
SAMPLES=
MODEL=
OPENAI_API_KEY=
PATH=
```
Run the main script:
```bash
python main.py
```
## Contributing
See [`CONTRIBUTING.md`](CONTRIBUTING.md) for guidelines on how to contribute to this project.
## License
This project is licensed under the MIT License. See [`LICENSE.md`](LICENSE.md) for more details.
---