An open API service indexing awesome lists of open source software.

https://github.com/taylorwood/adrdemo

Simple F# demonstration of text classification
https://github.com/taylorwood/adrdemo

f-sharp text-classification tf-idf

Last synced: 5 months ago
JSON representation

Simple F# demonstration of text classification

Awesome Lists containing this project

README

          

ADRDemo
=======

A simple F# demonstration of Automated Document Recognition using techniques like text tokenization, n-grams, TF-IDF weighting, CSV parsing, and text classification.

The code assumes the existence of some training data, in the form of plaintext files organized into folders by category:

- \TrainingData
- \CategoryA
- \Sample1.txt
- \Sample2.txt
- \CategoryB
- \SampleA.txt

...and a plain text file to be classified: "unknown.txt".

It also assumes the existence of a word whitelist CSV file, but this can be easily changed to a blacklist ("stopwords") or removed altogether.