Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tobka777/text-similarity-search
Semantic Search
https://github.com/tobka777/text-similarity-search
Last synced: 20 days ago
JSON representation
Semantic Search
- Host: GitHub
- URL: https://github.com/tobka777/text-similarity-search
- Owner: tobka777
- License: apache-2.0
- Created: 2021-12-13T10:45:23.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-12-26T12:39:36.000Z (about 2 months ago)
- Last Synced: 2024-12-26T13:25:37.633Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 1.43 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SearchAPI
SearchAPI is a library for a semantic similarity search. This API creates the representations of the search queries and the attributes of the documents using a chosen text similarity method. Depending on the configuration, the vector representation or the text of an attribute is used to create the index of the documents. A search query requests a search server such as Elasticsearch, which uses an appropriate distance measure and a score function to determine a similarity value and use it to rank the results.
As an example, the SearchAPI was adapted for the web-based information system for serious games SG-IC.## Installation
```
$ git clone https://github.com/serious-games-darmstadt/text-similarity-search.git
$ cd text-similarity-search
$ cp .env.example .env
$ docker-compose up -d --build
```## Configuration
### .env / [.env.example](https://github.com/tobka777/text-similarity-search/blob/main/.env.example)
- `WEBSITE_URL`: URL of the web page that provides the games as JSON under /api/{lang}/games (default: http://localhost:3000)
- `ELASTIC_URL`: URL of the Elasticsearch cluster (default: http://localhost:9200)
- `APP_KEY`: Key as a protection of the index creation
- `CACHE_MIN`: Duration of the caching (in memory) of the search queries in minutes (default: 60 min)
- `MIN_SCORE`: fixed minimum score (default: sum of all weights of the attributes in vector representation)### [attribute.json](https://github.com/tobka777/text-similarity-search/blob/main/app/config/attribute.json)
The JSON array contains an object with the following values for each attribute to be considered.
- `attribute`: path in JSON like `gameInfo.titleInfo.title` (`.`: object; `[]`: array; `[*]`: array of objects; `*`: all values of an object)
- `boost`: weighting of the attribute (default: `0` - no boost)
- `vector`: `true` if saved as vector representation and `false` if saved as text (default: `true`)
- `name`: alternative name if not to be derived from attributes (default: `attribute`)
- `source`: Attributes to be returned on result (default: `false`)
- `search`: Attributes to be taken into account during search (default: `true`)
- `similar`: Attributes to be considered when comparing similarity between documents (default: `false`)
- `similar_boost`: Alternative weighting for similarity comparison between documents (default: `boost`)## Contributing
Pull requests are welcome. For major changes, please open an [issue](https://github.com/tobka777/text-similarity-search/issues) first to discuss what you would like to change.## License
Distributed under the Apache 2.0 License. See [LICENSE](https://github.com/tobka777/text-similarity-search/blob/main/LICENCE) for more information.