https://github.com/shamspias/gpt3-data-preprocessing
This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.
https://github.com/shamspias/gpt3-data-preprocessing
artificial-intelligence data-preprocessing data-preprocessing-pipelines data-science gpt-3 machine-learning
Last synced: 2 months ago
JSON representation
This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.
- Host: GitHub
- URL: https://github.com/shamspias/gpt3-data-preprocessing
- Owner: shamspias
- Created: 2023-01-20T20:05:07.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-29T10:46:34.000Z (over 2 years ago)
- Last Synced: 2023-03-05T11:07:48.877Z (over 2 years ago)
- Topics: artificial-intelligence, data-preprocessing, data-preprocessing-pipelines, data-science, gpt-3, machine-learning
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0