An open API service indexing awesome lists of open source software.

https://github.com/kanishknavale/text-mining-with-tf-idf-and-cosine-similarity

A simple python repository for developing perceptron based text mining involving dataset linguistics preprocessing for text classification and extracting similar text for a given query.
https://github.com/kanishknavale/text-mining-with-tf-idf-and-cosine-similarity

cosine-similarity-scores information-retreival l2-regularization lemmatization linguistics machine-learning nltk optimization perceptron text-classification text-mining tf-idf tokenization torch-sparse-matrix

Last synced: 3 months ago
JSON representation

A simple python repository for developing perceptron based text mining involving dataset linguistics preprocessing for text classification and extracting similar text for a given query.

Awesome Lists containing this project

README

        

# Text Mining with TF-IDF & Cosine Similarity

A simple python repository for developing perceptron based text mining involving dataset linguistics preprocessing for text classification and extracting similar text for a given query.

New Implementation: Added PyTorch based optimization handling buggy loading of sparse 'csr_matrix' to cuda tensor.

## Outcomes

1. Numpy implementation,

|Vanilla Optimization|Optimization with L2-Regularization|
|:--:|:--:|
|

|

|

Top 5 weighted terms,

|Terms|Weights|Terms: L2|Weights: L2|
|:--:|:--:|:--:|:--:|
|langeweile|7.094|top|5.8911|
|geilo|7.0535|langeweile|5.8396|
|best|6.7828|geilo|5.7615|
|love|6.376|perfekt|5.6325|
|exzellent|6.3534|super|5.6279|

2. PyTorch implementation,

|Vanilla Optimization|Optimization with L2-Regularization|
|:--:|:--:|
|

|

|
|Histogram:Weights|Penalized Weights|
|

|

|

Top 5 weighted terms,

|Terms|Weights|Terms: L2|Weights: L2|
|:--:|:--:|:--:|:--:|
|erfolgreichen|20.5452|cool|8.8814|
|anmeldungen|20.0064|geil|8.0933|
|angemessene|19.658|super|6.7332|
|eonfach|19.5906|top|5.4004|
|verarbeitung|19.5136|gut|4.8924|

## Dependencies

Install dependencies using:

```bash
pip3 install -r requirements.txt
```

## Contact

* Email: [email protected]
* Website: