https://github.com/taylor-eos/bert-classifier
Text classification experiment using language model
https://github.com/taylor-eos/bert-classifier
Last synced: 26 days ago
JSON representation
Text classification experiment using language model
- Host: GitHub
- URL: https://github.com/taylor-eos/bert-classifier
- Owner: Taylor-eOS
- Created: 2024-09-22T18:25:01.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-29T09:10:08.000Z (over 1 year ago)
- Last Synced: 2025-01-14T18:41:52.468Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 26.4 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This was a learning project that uses DistilBERT to train and predict what type the blocks of text in a PDF are (header, body, footer, quote). The script used to make predictions in the vacinity of 97% accuracy, but then I added neighboring blocks as context, and now the model doesn't know what to focus on, and predictions are very bad. I made a [manual tool](https://github.com/Taylor-eOS/manual-classifier) instead, which is more robust. Machine learning is not the right tool for a task where you have limited ground truth and prefer accurate results. But the project runs and might be an interesting application of the technology for learning.