https://github.com/raphsenn/info-retrieval-notebooks
Some information retrieval algorithms and datastructures (inverted index, ranking (bm25, tf, idf scores), fuzzy search, ...)
https://github.com/raphsenn/info-retrieval-notebooks
data-science fuzzy-search information-retrieval information-systems inverted-index searching-algorithms
Last synced: 6 months ago
JSON representation
Some information retrieval algorithms and datastructures (inverted index, ranking (bm25, tf, idf scores), fuzzy search, ...)
- Host: GitHub
- URL: https://github.com/raphsenn/info-retrieval-notebooks
- Owner: raphsenn
- License: mit
- Created: 2024-08-08T12:30:04.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-09-03T09:59:46.000Z (11 months ago)
- Last Synced: 2024-11-20T18:50:23.466Z (8 months ago)
- Topics: data-science, fuzzy-search, information-retrieval, information-systems, inverted-index, searching-algorithms
- Language: Jupyter Notebook
- Homepage:
- Size: 492 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# info-retrieval-notebooks
* Designed for viewing in GitHub.
## Implemented Algorithms and Datastructures
### search
* InvertedIndex* InvertedIndex (via vector space model, linear algebra, sparse matrices)
* Similarity search (via cosine similarity)
* Fuzzy string search
* Ranking and evaluation
### databases
* Basic database operations (project, select, cartesian product)* more database operations (equi join, merge join, hash join, group by)
* SPARQL to SQL algorithm
* SQL to SPARQL algorithm
### Used datasets
#### IMDB movies dataset
[https://www.kaggle.com/datasets/ashpalsingh1525/imdb-movies-dataset](https://www.kaggle.com/datasets/ashpalsingh1525/imdb-movies-dataset)