https://github.com/vibhoothi/vsm
Vector Space Model Calculation using NLTK
https://github.com/vibhoothi/vsm
information-retrieval vector-space-model
Last synced: over 1 year ago
JSON representation
Vector Space Model Calculation using NLTK
- Host: GitHub
- URL: https://github.com/vibhoothi/vsm
- Owner: vibhoothi
- Created: 2019-05-01T20:02:52.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2019-05-02T09:46:18.000Z (about 7 years ago)
- Last Synced: 2025-01-25T15:13:03.080Z (over 1 year ago)
- Topics: information-retrieval, vector-space-model
- Language: Python
- Size: 27.3 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Vector Space Model
## What are different types of models in Information Retrival
There are two types of models
* Boolean Retrival
* Vector Space Model
## Disadvantages of Boolean Retrival Model
* Similarity function is boolean
* Exact-match only, no partial matches
* Retrieved documents not ranked
* All terms are equally important
* Boolean operator usage has much more influence than a critical word
* Query language is expressive but complicated
## What is Vector Space Model
* In Vector Space Model both Documents and queries are vectors each w(i,j) is a weight for term j in document i
* "bag-of-words representation"
* Similarity of a document vector to a query vector = cosine of the angle between them
* Cosine is a normalized dot product
* Documents ranked by decreasing cosine value
* Formula is 
* sim(d,q) = 1 when d = q
* sim(d,q) = 0 when d and q share no terms