An open API service indexing awesome lists of open source software.

https://github.com/datashaman/opensearch-model


https://github.com/datashaman/opensearch-model

Last synced: 10 months ago
JSON representation

Awesome Lists containing this project

README

          

# datashaman/opensearch-model

Laravel-oriented implementation of [opensearch-model](https://github.com/elastic/elasticsearch-rails/tree/master/elasticsearch-model).

Supports Laravel 5.4 and higher on PHP 7.0/7.1. PHP 7.2 will be supported and tested once Travis directly supports it.

Note that at least PHP 7.1 is required for Laravel 5.6+.

[![Build Status](https://travis-ci.org/datashaman/opensearch-model.svg?branch=master)](https://travis-ci.org/datashaman/opensearch-model)
[![StyleCI](https://styleci.io/repos/61363628/shield?style=flat)](https://styleci.io/repos/61363628)

*NB* This is currently *BETA* quality software. Use on production at your own risk, and be aware that there might some
further simplifications of the API in the next version or two.

The idea is to stay fairly faithful to the *Ruby on Rails* implementation, but housed in *Laravel*.

## Installation

Install the package using composer:

composer require datashaman/opensearch-model

Configure the service provider in `config/app.php`:

```
...
Datashaman\OpenSearch\Model\ServiceProvider::class,
...
```

Configure the alias in `config.app.php`:

```
...
'OpenSearch' => Datashaman\OpenSearch\Model\OpenSearchFacade::class,
...
```

Copy base config into your applicatin:

php artisan vendor:publish --tag=config --provider='Datashaman\OpenSearch\Model\ServiceProvider'

Edit `config/opensearch.php` to your liking, setting `OPENSEARCH_HOSTS` (comma-delimited definition of host:port) in `.env` should cover most use cases.

## Usage

Let's suppose you have an `Article` model:

```php
Schema::create('articles', function (Blueprint $table) {
$table->increments('id');
$table->string('title');
});

class Article extends Eloquent
{
}

Article::create([ 'title' => 'Quick brown fox' ]);
Article::create([ 'title' => 'Fast black dogs' ]);
Article::create([ 'title' => 'Swift green frogs' ]);
```

## Setup

To add the OpenSearch integration for this model, use the `Datashaman\OpenSearch\Model\OpenSearchModel` trait in your class. You must also add a protected static `$opensearch` property for storage:

```php
use Datashaman\OpenSearch\Model\OpenSearchModel;

class Article extends Eloquent
{
use OpenSearchModel;
protected static $opensearch;
}
```

This will extend the model with functionality related to OpenSearch.

### Proxy

The package contains a big amount of class and instance methods to provide all this functionality.

To prevent polluting your model namespace, *nearly* all functionality is accessed via static method `Article::opensearch()`.

### OpenSearch client

The module will setup a [client](https://github.com/elasticsearch/elasticsearch-ruby/tree/master/elasticsearch), connected to `localhost:9200`, by default. You can access and use it like any other `OpenSearch::Client`:

```php
Article::opensearch()->client()->cluster()->health();
=> [ "cluster_name" => "opensearch", "status" => "yellow", ... ]
```

To use a client with a different configuration, set a client for the model using `OpenSearch\ClientBuilder`:

```php
Article::opensearch()->client(ClientBuilder::fromConfig([ 'hosts' => [ 'api.server.org' ] ]));
```

### Importing the data

The first thing you'll want to do is import your data to the index:

```php
Article::opensearch()->import([ 'force' => true ]);
```

It's possible to import only records from a specific scope or query, transform the batch with the transform and preprocess options,
or re-create the index by deleting it and creating it with correct mapping with the force option -- look for examples in the method documentation.

No errors were reported during importing, so... let's search the index!

### Searching

For starters, we can try the *simple* type of search:

```php
$response = Article::search('fox dogs');

$response->took();
=> 3

$response->total();
=> 2

$response[0]->_score;
=> 0.02250402

$response[0]->title;
=> "Fast black dogs"
```

#### Search results

The returned `response` object is a rich wrapper around the JSON returned from OpenSearch, providing access to response metadata and the actual results (*hits*).

The `response` object delegates to an internal `LengthAwarePaginator`. You can get a `Collection` via the delegate `getCollection` method, althought the paginator also delegates mmethods to its `Collection` so either of these work:

```php
$response->results()
->map(function ($r) { return $r->title; })
->all();
=> ["Fast black dogs", "Quick brown fox"]

$response->getCollection()
->map(function ($r) { return $r->title; })
->all();
=> ["Fast black dogs", "Quick brown fox"]

$response->filter(function ($r) { return preg_match('/^Q/', $r->title); })
->map(function ($r) { return $r->title; })
->all();
=> ["Quick brown fox"]
```

As you can see in the examples above, use the `Collection::all()` method to get a regular array.

Each OpenSearch *hit* is wrapped in the `Result` class.

`Result` has a dynamic getter:

* *index*, *id*, *score* and *source* are pulled from the top-level of the *hit*.
e.g. *index* is *hit\[_index\]* etc
* if not one of the above, it looks for an existing item in the top-level hit.
e.g. *_version* is *hit\[_version\]* (if defined)
* if not one of the above, it looks for an existing item in *hit\[_source\]* \(the document\).
e.g. *title* is *hit\[_source\]\[title\]* (if defined)
* if nothing resolves from above, it triggers a notice and returns null

It also has a `toArray` method which returns the hit as an array.

#### Search results as database records

Instead of returning documents from OpenSearch, the records method will return a collection of model instances, fetched from the primary database, ordered by score:

```php
$response->records()
->map(function ($article) { return $article->title; })
->all();
=> ["Fast black dogs", "Quick brown fox"]
```

The returned object is a `Collection` of model instances returned by your database, i.e. the `Eloquent` instance.

The records method returns the real instances of your model, which is useful when you want to access your model methods - at the expense of slowing down your application, of course.

In most cases, working with results coming from OpenSearch is sufficient, and much faster.

When you want to access both the database `records` and search `results`, use the `eachWithHit` (or `mapWithHit`) iterator:

```php
$lines = [];
$response->records()->eachWithHit(function ($record, $hit) {
$lines[] = "* {$record->title}: {$hit->_score}";
});

$lines;
=> [ "* Fast black dogs: 0.01125201", "* Quick brown fox: 0.01125201" ]

$lines = $response->records()->mapWithHit(function ($record, $hit) {
return "* {$record->title}: {$hit->_score}";
})->all();

$lines;
=> [ "* Fast black dogs: 0.01125201", "* Quick brown fox: 0.01125201" ]
```

Note the use `Collection::all()` to convert to a regular array in the `mapWithHit` example. `Collection` methods prefer to return `Collection` instances instead of regular arrays.

The first argument to `records` is an `options` array, the second argument is a callback which is passed the query builder to modify it on-the-fly. For example, to re-order the records differently to the results (from above):

```php
$response
->records([], function ($query) {
$query->orderBy('title', 'desc');
})
->map(function ($article) { return $article->title; })
->all();

=> [ 'Quick brown fox', 'Fast black dogs' ]
```

Notice that adding an `orderBy` call to the query overrides the ordering of the records, so that it is no longer the same as the results.

#### Searching multiple models

**TODO** Implement a Facade for cross-model searching.

#### Pagination

You can implement pagination with the `from` and `size` search parameters. However, search results can be automatically paginated much like Laravel does.

```php
# Delegates to the results on page 2 with 20 per page
$response->perPage(20)->page(2);

# Records on page 2 with 20 per page; records ordered the same as results
# Order of the `page` and `perPage` calls doesn't matter
$response->page(2)->perPage(20)->records();

# Results on page 2 with (default) 15 results per page
$response->page(2)->results();

# Records on (default) page 1 with 10 records per page
$response->perPage(10)->records();
```

You have access to a length-aware paginator (the response delegates internally to the `results()` call, so you don't need to call results() on the chain):

```php
$response->page(2)->results();
=> object(Illuminate\Pagination\LengthAwarePaginator) ...

$results = response->page(2);

$results->setPath('/articles');
$results->render();
=>


```

The rendered HTML was tidied up slightly for readability.

#### The OpenSearch DSL

**TODO** Integrate this with a query builder.

### Index Configuration

For proper search engine function, it's often necessary to configure the index properly. This package provides class methods to set up index settings and mappings.

```php
Article::settings(['index' => ['number_of_shards' => 1]], function ($s) {
$s['index'] = array_merge($s['index'], [
'number_of_replicas' => 4,
]);
});

Article::settings->toArray();
=> [ 'index' => [ 'number_of_shards' => 1, 'number_of_replicas' => 4 ] ]

Article::mappings(['dynamic' => false], function ($m) {
$m->indexes('title', [
'analyzer' => 'english',
'index_options' => 'offsets'
]);
});

Article::mappings()->toArray();
=> [ "article" => [
"dynamic" => false,
"properties" => [
"title" => [
"analyzer" => "english",
"index_options" => "offsets",
"type" => "string",
]
]
]]
```

You can use the defined settings and mappings to create an index with desired configuration:

```php
Article::opensearch()->client()->indices()->delete(['index' => Article::indexName()]);
Article::opensearch()->client()->indices()->create([
'index' => Article::indexName(),
'body' => [
'settings' => Article::settings()->toArray(),
'mappings' => Article::mappings()->toArray(),
],
]);
```

There's a shortcut available for this common operation (convenient e.g. in tests):

```php
Article::opensearch()->createIndex(['force' => true]);
Article::opensearch()->refreshIndex();
```

By default, index name will be inferred from your class name, you can set it explicitely, however:

```php
class Article {
protected static $indexName = 'article-production';
protected static $documentType = 'post';
}
```

Alternately, you can set them using the following static methods:

```php
Article::indexName('article-production');
Article::documentType('post');
```

## Updating the Documents in the Index

Usually, we need to update the OpenSearch index when records in the database are created, updated or deleted; use the index_document, update_document and delete_document methods, respectively:

```php
Article::first()->indexDocument();
=> [ 'ok' => true, ... "_version" => 2 ]

Note that this implementation differs from the Ruby one, where the instance has an opensearch() method and proxy object. In this package, the instance methods are added directly to the model. Implementing the same pattern in PHP is not easy to do cleanly.

### Automatic callbacks

You can auomatically update the index whenever the record changes, by using the `Datashaman\\OpenSearch\\Model\\Callbacks` trait in your model:

```php
use Datashaman\OpenSearch\Model\OpenSearchModel;
use Datashaman\OpenSearch\Model\Callbacks;

class Article
{
use OpenSearchModel;
use Callbacks;
}

Article::first()->update([ 'title' => 'Updated!' ]);

Article::search('*')->map(function ($r) { return $r->title; });
=> [ 'Updated!', 'Fast black dogs', 'Swift green Frogs' ]
```

The automatic callback on record update keeps track of changes in your model (via Laravel's `getDirty` implementation), and performs a partial update when this support is available.

The automatic callbacks are implemented in database adapters coming with this package. You can easily implement your own adapter: please see the relevant chapter below.

### Custom Callbacks

In case you would need more control of the indexing process, you can implement these callbacks yourself, by hooking into `created`, `saved`, `updated` or `deleted` events:

```php
Article::saved(function ($article) {
$result = $article->indexDocument();
Log::debug("Saved document", compact('result'));
});

Article::deleted(function ($article) {
$result = $article->deleteDocument();
Log::debug("Deleted document", compact('result'));
});
```

Regrettably there are no `committed` events in `Eloquent` like in Ruby's `ActiveRecord`.

### Asychronous Callbacks

Of course, you're still performing an HTTP request during your database transaction, which is not optimal for large-scale applications. A better option would be to process the index operations in background, with Laravel's `Queue` facade:

```php
Article::saved(function ($article) {
Queue::pushOn('default', new Indexer('index', Article::class, $article->id));
});
```

An example implementation of the `Indexer` class could look like this (source included in package):

```php
class Indexer implements SelfHandling, ShouldQueue
{
use InteractsWithQueue, SerializesModels;

public function __construct($operation, $class, $id)
{
$this->operation = $operation;
$this->class = $class;
$this->id = $id;
}

public function handle()
{
$class = $this->class;

switch ($this->operation) {
case 'index':
$record = $class::find($this->id);
$class::opensearch()->client()->index([
'index' => $class::indexName(),
'id' => $record->id,
'body' => $record->toIndexedArray(),
]);
$record->indexDocument();
break;
case 'delete':
$class::opensearch()->client()->delete([
'index' => $class::indexName(),
'id' => $this->id,
]);
break;
default:
throw new Exception('Unknown operation: '.$this->operation);
}
}
}
```

## Model Serialization

By default, the model instance will be serialized to JSON using the output of the `toIndexedArray` method, which is defined automatically by the package:

```php
Article::first()->toIndexedArray();
=> [ 'title' => 'Quick brown fox' ]
```

If you want to customize the serialization, just implement the `toIndexedArray` method yourself, for instance with the `toArray` method:

```php
class Article
{
use OpenSearchModel;

public function toIndexedArray($options = null)
{
return $this->toArray();
}
}
```

The re-defined method will be used in the indexing methods, such as `indexDocument`.

## Attribution

Original design from [elasticsearch-model](https://github.com/elastic/elasticsearch-rails/tree/master/elasticsearch-model) which is:

* Copyright (c) 2014 OpenSearch
* Licensed with Apache 2.0 license (detail in LICENSE.txt)

Changes include a rewrite of the core logic in PHP, as well as slight enhancements to accomodate Laravel and Eloquent.

## License

This package inherits the same license as its original. It is licensed under the Apache2 license, quoted below:

Copyright (c) 2023 datashaman

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.