https://github.com/codelibs/vespa-opensearch-app
https://github.com/codelibs/vespa-opensearch-app
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/codelibs/vespa-opensearch-app
- Owner: codelibs
- Created: 2023-11-13T13:01:14.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-28T12:52:44.000Z (over 2 years ago)
- Last Synced: 2025-06-04T17:11:38.233Z (about 1 year ago)
- Language: Java
- Size: 17.6 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Vespa OpenSearch Proxy Application
==================================
## Overview
This project provides a comprehensive OpenSearch-compatible API proxy for the Vespa search engine. It allows applications using the OpenSearch/Elasticsearch client libraries to seamlessly interact with Vespa as if it were an OpenSearch cluster.
The proxy translates OpenSearch API requests into Vespa operations, providing compatibility between Vespa's advanced search capabilities and the popular OpenSearch/Elasticsearch ecosystems.
## Features
### Index Operations
- **Create Index**: `PUT /` - Create a new index with optional settings and mappings
- **Delete Index**: `DELETE /` - Remove an index
- **Get Index**: `GET /` - Retrieve index information
- **Index Exists**: `HEAD /` - Check if an index exists
- **List Indices**: `GET /_cat/indices` - List all indices
### Document Operations
- **Index Document**: `POST //_doc` or `POST //_doc/` - Add a document with auto-generated or specified ID
- **Create Document**: `POST //_create/` or `PUT //_create/` - Create a document (fails if exists)
- **Update Document**: `PUT //_doc/` - Update an existing document
- **Get Document**: `GET //_doc/` - Retrieve a document by ID
- **Delete Document**: `DELETE //_doc/` - Remove a document
### Bulk Operations
- **Bulk API**: `POST /_bulk` or `POST //_bulk` - Perform multiple index/create/update/delete operations in a single request
### Search Operations
- **Search**: `GET/POST //_search` or `GET/POST /_search` - Search for documents using OpenSearch query DSL
- **Count**: `GET/POST //_count` or `GET/POST /_count` - Count documents matching a query
- **Multi Get**: `GET/POST //_mget` or `GET/POST /_mget` - Retrieve multiple documents by IDs
### Advanced Document Operations
- **Partial Update**: `POST //_update/` - Update specific fields of a document
- **Refresh**: `POST //_refresh` or `POST /_refresh` - Refresh the index (no-op for Vespa, returns success)
### Cluster Information
- **Cluster Health**: `GET /_cluster/health` - Get cluster health status
- **Cluster State**: `GET /_cluster/state` - Get cluster state information
- **Root Info**: `GET /` - Get basic cluster and version information
### Index Settings and Mappings
- **Get Mapping**: `GET //_mapping` - Retrieve index mappings
- **Update Mapping**: `PUT //_mapping` - Update index mappings
- **Get Settings**: `GET //_settings` - Retrieve index settings
- **Update Settings**: `PUT //_settings` - Update index settings
## Architecture
The application consists of several key components:
- **RestApiProxyHandler**: Main HTTP request handler that routes requests to appropriate actions
- **VespaClient**: Client for communicating with Vespa's Document API
- **Action Classes**: Individual handlers for different OpenSearch API endpoints
- `RootAction`: Root endpoint (/)
- `ClusterHealthAction`: Cluster health endpoint
- `ClusterStateAction`: Cluster state endpoint
- `IndicesAction`: Index management operations
- `DocumentAction`: Document CRUD operations
- `MappingAction`: Index mapping operations
- `SettingsAction`: Index settings operations
- `BulkAction`: Bulk operations
- `CatIndicesAction`: Indices listing
- `SearchAction`: Search operations
- `CountAction`: Document count operations
- `MgetAction`: Multi-document get operations
- `UpdateAction`: Partial document updates
- `RefreshAction`: Index refresh operations
## Usage
### Build
```bash
mvn package
```
### Start Vespa
```bash
docker run --detach --name vespa --hostname vespa-container \
--publish 8080:8080 --publish 19071:19071 vespaengine/vespa
```
### Deploy Application
```bash
curl --header Content-Type:application/zip \
--data-binary @target/application.zip \
localhost:19071/application/v2/tenant/default/prepareandactivate
```
### Example Requests
#### Create an index
```bash
curl -X PUT "localhost:8080/opensearch/myindex" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": { "type": "text" },
"content": { "type": "text" }
}
}
}'
```
#### Index a document
```bash
curl -X POST "localhost:8080/opensearch/myindex/_doc/1" -H 'Content-Type: application/json' -d'
{
"title": "Hello Vespa",
"content": "This is a test document"
}'
```
#### Get a document
```bash
curl -X GET "localhost:8080/opensearch/myindex/_doc/1"
```
#### Delete a document
```bash
curl -X DELETE "localhost:8080/opensearch/myindex/_doc/1"
```
#### Bulk operations
```bash
curl -X POST "localhost:8080/opensearch/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_index":"myindex","_id":"1"}}
{"title":"Document 1","content":"First document"}
{"index":{"_index":"myindex","_id":"2"}}
{"title":"Document 2","content":"Second document"}
'
```
#### Check cluster health
```bash
curl -X GET "localhost:8080/opensearch/_cluster/health"
```
#### Search for documents
```bash
curl -X POST "localhost:8080/opensearch/myindex/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"title": "Hello"
}
},
"size": 10
}'
```
#### Count documents
```bash
curl -X GET "localhost:8080/opensearch/myindex/_count" -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
```
#### Get multiple documents
```bash
curl -X POST "localhost:8080/opensearch/myindex/_mget" -H 'Content-Type: application/json' -d'
{
"ids": ["1", "2", "3"]
}'
```
#### Partial update
```bash
curl -X POST "localhost:8080/opensearch/myindex/_update/1" -H 'Content-Type: application/json' -d'
{
"doc": {
"title": "Updated Title"
}
}'
```
#### Refresh index
```bash
curl -X POST "localhost:8080/opensearch/myindex/_refresh"
```
## Testing
The project includes comprehensive unit and integration tests:
### Unit Tests
- Action routing tests for all endpoint handlers
- Path matching validation
### Integration Tests
- Full API workflow tests
- Index lifecycle management
- Document CRUD operations
- Cluster information retrieval
Run tests with:
```bash
mvn test
```
## Configuration
The application can be configured through `services.xml`:
```xml
http://localhost:8080
doc
/opensearch
```
## Supported Query DSL
The application supports comprehensive OpenSearch Query DSL translation to Vespa YQL:
### Supported Query Types
**Full Text Queries:**
- `match` - Full text search with tokenization
- `match_phrase` - Phrase matching
- `multi_match` - Search across multiple fields
- `query_string` - Query string syntax (basic support)
**Term-Level Queries:**
- `term` - Exact term matching
- `terms` - Match any of multiple terms
- `range` - Numeric/date range queries (gt, gte, lt, lte)
- `exists` - Field existence check
- `prefix` - Prefix matching
- `wildcard` - Wildcard pattern matching
- `ids` - Match documents by IDs
**Compound Queries:**
- `bool` - Boolean query with must, should, must_not, and filter clauses
- `must` - All clauses must match (AND)
- `should` - At least one clause should match (OR)
- `must_not` - Clauses must not match (NOT)
- `filter` - Clauses must match (like must, without scoring)
**Special Features:**
- Proper escaping for security (prevents YQL injection)
- Nested boolean queries support
- Complex query combinations
### Query Examples
```json
// Bool query with multiple clauses
{
"query": {
"bool": {
"must": [
{ "match": { "title": "search" } }
],
"should": [
{ "match": { "content": "vespa" } },
{ "match": { "description": "engine" } }
],
"must_not": [
{ "term": { "status": "deleted" } }
],
"filter": [
{ "range": { "created_at": { "gte": "2024-01-01" } } }
]
}
}
}
// Range query
{
"query": {
"range": {
"age": {
"gte": 18,
"lte": 65
}
}
}
}
// Multi-match query
{
"query": {
"multi_match": {
"query": "search engine",
"fields": ["title", "content", "description"]
}
}
}
```
## Limitations
- Index metadata (settings, mappings) is stored in-memory as Vespa schemas are static
- Some advanced OpenSearch features are not yet fully supported:
- Aggregations (coming soon)
- Suggesters
- Percolate queries
- Scripting
- Nested/Parent-child documents
- Advanced query string syntax (partial support only)
## License
This project is under development. License information will be provided in future releases.
## Contributing
Contributions are welcome! Please submit pull requests or open issues for bugs and feature requests.