https://github.com/tobka777/text-similarity-search

Semantic Search
https://github.com/tobka777/text-similarity-search

Last synced: 9 days ago
JSON representation

Semantic Search

Host: GitHub
URL: https://github.com/tobka777/text-similarity-search
Owner: tobka777
License: apache-2.0
Created: 2021-12-13T10:45:23.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-12-26T12:39:36.000Z (7 months ago)
Last Synced: 2025-03-28T02:23:47.546Z (4 months ago)
Language: Python
Homepage:
Size: 1.42 MB
Stars: 3
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# SearchAPI

SearchAPI is a library for a semantic similarity search. This API creates the representations of the search queries and the attributes of the documents using a chosen text similarity method. Depending on the configuration, the vector representation or the text of an attribute is used to create the index of the documents. A search query requests a search server such as Elasticsearch, which uses an appropriate distance measure and a score function to determine a similarity value and use it to rank the results.
As an example, the SearchAPI was adapted for the web-based information system for serious games SG-IC.

## Installation
```
$ git clone https://github.com/serious-games-darmstadt/text-similarity-search.git
$ cd text-similarity-search
$ cp .env.example .env
$ docker-compose up -d --build
```

## Configuration
### .env / [.env.example](https://github.com/tobka777/text-similarity-search/blob/main/.env.example)
- `WEBSITE_URL`: URL of the web page that provides the games as JSON under /api/{lang}/games (default: http://localhost:3000)
- `ELASTIC_URL`: URL of the Elasticsearch cluster (default: http://localhost:9200)
- `APP_KEY`: Key as a protection of the index creation
- `CACHE_MIN`: Duration of the caching (in memory) of the search queries in minutes (default: 60 min)
- `MIN_SCORE`: fixed minimum score (default: sum of all weights of the attributes in vector representation)

### [attribute.json](https://github.com/tobka777/text-similarity-search/blob/main/app/config/attribute.json)
The JSON array contains an object with the following values for each attribute to be considered.
- `attribute`: path in JSON like `gameInfo.titleInfo.title` (`.`: object; `[]`: array; `[*]`: array of objects; `*`: all values of an object)
- `boost`: weighting of the attribute (default: `0` - no boost)
- `vector`: `true` if saved as vector representation and `false` if saved as text (default: `true`)
- `name`: alternative name if not to be derived from attributes (default: `attribute`)
- `source`: Attributes to be returned on result (default: `false`)
- `search`: Attributes to be taken into account during search (default: `true`)
- `similar`: Attributes to be considered when comparing similarity between documents (default: `false`)
- `similar_boost`: Alternative weighting for similarity comparison between documents (default: `boost`)

## Contributing
Pull requests are welcome. For major changes, please open an [issue](https://github.com/tobka777/text-similarity-search/issues) first to discuss what you would like to change.

## License

Distributed under the Apache 2.0 License. See [LICENSE](https://github.com/tobka777/text-similarity-search/blob/main/LICENCE) for more information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tobka777/text-similarity-search

Awesome Lists containing this project

README