An open API service indexing awesome lists of open source software.

https://github.com/taylor-eos/bert-classifier

Text classification experiment using language model
https://github.com/taylor-eos/bert-classifier

Last synced: 26 days ago
JSON representation

Text classification experiment using language model

Awesome Lists containing this project

README

          

This was a learning project that uses DistilBERT to train and predict what type the blocks of text in a PDF are (header, body, footer, quote). The script used to make predictions in the vacinity of 97% accuracy, but then I added neighboring blocks as context, and now the model doesn't know what to focus on, and predictions are very bad. I made a [manual tool](https://github.com/Taylor-eOS/manual-classifier) instead, which is more robust. Machine learning is not the right tool for a task where you have limited ground truth and prefer accurate results. But the project runs and might be an interesting application of the technology for learning.