https://github.com/jhj0517/document_classification
finetune text classification model
https://github.com/jhj0517/document_classification
ai deep-learning document-classification open-source text-classification
Last synced: 12 months ago
JSON representation
finetune text classification model
- Host: GitHub
- URL: https://github.com/jhj0517/document_classification
- Owner: jhj0517
- Created: 2023-12-07T12:59:18.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-12-10T10:04:34.000Z (over 2 years ago)
- Last Synced: 2025-02-12T10:18:22.749Z (about 1 year ago)
- Topics: ai, deep-learning, document-classification, open-source, text-classification
- Language: Jupyter Notebook
- Homepage:
- Size: 4.9 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# document clasification
This repository is dedicated to fine-tune the text classification models.
It primarily focuses on fine-tuning the pre-trained BERT model, utilizing the [ratsnlp](https://github.com/ratsgo/ratsnlp) package.
# Notebook
If you want to try it in the colab, please refer to [notebook](https://colab.research.google.com/github/jhj0517/document_classification/blob/master/notebook/document_classification.ipynb) here.
# Dataset
You will need to prepare a dataset comprising two columns: one for the `document` and the other for `label`. An example of the dataset format is as follows:
| label | document |
|----------|----------------|
| sadness | I'm so sad |
| happiness | I'm happy!! |
For a more detailed understanding, please refer to the [example dataset](https://github.com/jhj0517/document_classification/tree/master/example_data).
This repository includes a very small sample example dataset sourced from Kaggle, available here: [Kaggle Dataset](https://www.kaggle.com/datasets/pashupatigupta/emotion-detection-from-text)
Note: The sample in the repo is very small size, it is recommended to prepare a much larger dataset.