https://github.com/taylorwood/adrdemo
Simple F# demonstration of text classification
https://github.com/taylorwood/adrdemo
f-sharp text-classification tf-idf
Last synced: 5 months ago
JSON representation
Simple F# demonstration of text classification
- Host: GitHub
- URL: https://github.com/taylorwood/adrdemo
- Owner: taylorwood
- License: mit
- Created: 2014-03-05T02:01:44.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2015-06-27T16:26:08.000Z (over 10 years ago)
- Last Synced: 2025-05-14T04:18:32.390Z (5 months ago)
- Topics: f-sharp, text-classification, tf-idf
- Language: F#
- Homepage: http://taylorwood.github.io/2015/06/15/text-classification.html
- Size: 160 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
ADRDemo
=======A simple F# demonstration of Automated Document Recognition using techniques like text tokenization, n-grams, TF-IDF weighting, CSV parsing, and text classification.
The code assumes the existence of some training data, in the form of plaintext files organized into folders by category:
- \TrainingData
- \CategoryA
- \Sample1.txt
- \Sample2.txt
- \CategoryB
- \SampleA.txt
...and a plain text file to be classified: "unknown.txt".It also assumes the existence of a word whitelist CSV file, but this can be easily changed to a blacklist ("stopwords") or removed altogether.