https://github.com/sadit/textsearch.jl
Searching methods and models for textual data; it was designed to work with SimilaritySearch.jl
https://github.com/sadit/textsearch.jl
similarity-search term-weighting-schemes text-search textmodel vector-representations
Last synced: 3 months ago
JSON representation
Searching methods and models for textual data; it was designed to work with SimilaritySearch.jl
- Host: GitHub
- URL: https://github.com/sadit/textsearch.jl
- Owner: sadit
- License: mit
- Created: 2017-05-22T16:16:53.000Z (about 8 years ago)
- Default Branch: main
- Last Pushed: 2024-10-08T15:12:24.000Z (8 months ago)
- Last Synced: 2025-02-21T02:34:13.421Z (4 months ago)
- Topics: similarity-search, term-weighting-schemes, text-search, textmodel, vector-representations
- Language: Julia
- Homepage:
- Size: 2.76 MB
- Stars: 8
- Watchers: 2
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://sadit.github.io/TextSearch.jl/dev)
[](https://github.com/sadit/TextSearch.jl/actions)
[](https://codecov.io/gh/sadit/TextSearch.jl)# TextSearch
`TextSearch.jl` is a package to create vector representations of text, mostly, independently of the language. It is intended to be used with [SimilaritySearch.jl](https://github.com/sadit/SimilaritySearch.jl), but can be used independetly if needed.
`TextSearch.jl` was renamed from `TextModel.jl` to reflect its capabilities and mission.For generic text analysis you should use other packages like [TextAnalysis.jl](https://github.com/johnmyleswhite/TextAnalysis.jl).
It supports a number of simple text preprocessing functions, and three different kinds of tokenizers, i.e., word n-grams, character q-grams, and skip-grams. It supports creating multisets of tokens, commonly named bag of words (BOW).
`TextSearch.jl` can produce sparse vector representations based on term-weighting schemes like TF, IDF, and TFIDF. It also supports term-weighting schemes designed to cope text classification tasks, mostly based on distributional representations.# Installing
You may install the package as follows
```julia
] add TextSearch
```
also, you can run the set of tests as follows
```julia
] test TextSearch
```## Using the library
The directory [examples](https://github.com/sadit/TextSearch.jl/tree/master/src) contains a few examples of how to use it, based on [Pluto.jl](https://github.com/fonsp/Pluto.jl)
After cloning the repository, you must intantiate the directory.
```julia
using Pkg
pkg"instantiate"
```once you instantiated your environment, just run Pluto notebook and explore the examples
```julia
using Pluto
Pluto.run()
```