Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/felipehummel/TinySearchEngine
Vector space model implemented in a few lines
https://github.com/felipehummel/TinySearchEngine
Last synced: 3 months ago
JSON representation
Vector space model implemented in a few lines
- Host: GitHub
- URL: https://github.com/felipehummel/TinySearchEngine
- Owner: felipehummel
- Created: 2011-06-21T21:22:01.000Z (over 13 years ago)
- Default Branch: master
- Last Pushed: 2016-01-15T16:43:55.000Z (almost 9 years ago)
- Last Synced: 2024-05-31T20:58:14.162Z (6 months ago)
- Language: Scala
- Homepage:
- Size: 212 KB
- Stars: 81
- Watchers: 7
- Forks: 23
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Tiny Search
-----------How many lines of code it takes to write a reasonable, understandable, full-text search engine?
The code in this repository can give an easy and fast overview on the Vector Space Model (tf-idf).
Feel free to contribute with improvements and other language implementations.Other languages
----------Feel free to submit pull requests with implementations in any other languages.
You can follow the same requirements of the Scala version:- in-memory index;
- norms and IDF calculated online;
- default OR operator between query terms;
- index a document per line from a single file.
- read stopwords from a fileScala
---------
There are two Scala versions of the Vector Space Model. They are similar, except that "freakinTinySearch.scala" squeezes some more lines by getting rid of classes.Warnings:
- I only tested the Scala code with 2.9.
- This is not intented for real world production code. It is just for fun and educational purposes.
- The Scala code calculates document norm and term IDF on-the-fly while processing the query. This is far from optimal, but it makes things shorter.