Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/machinio/solrb
Solr + Ruby + OOP + ❤️ = Solrb
https://github.com/machinio/solrb
object-oriented oop ruby solr solr-client
Last synced: about 1 month ago
JSON representation
Solr + Ruby + OOP + ❤️ = Solrb
- Host: GitHub
- URL: https://github.com/machinio/solrb
- Owner: machinio
- License: mit
- Created: 2018-07-25T12:44:49.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-01-17T14:28:13.000Z (12 months ago)
- Last Synced: 2024-10-29T00:54:39.897Z (2 months ago)
- Topics: object-oriented, oop, ruby, solr, solr-client
- Language: Ruby
- Homepage:
- Size: 294 KB
- Stars: 39
- Watchers: 7
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
[![CircleCI](https://circleci.com/gh/machinio/solrb/tree/master.svg?style=svg)](https://circleci.com/gh/machinio/solrb/tree/master)
[![Maintainability](https://api.codeclimate.com/v1/badges/81e84c1c42f10f9da801/maintainability)](https://codeclimate.com/github/machinio/solrb/maintainability)
[![Gem Version](https://badge.fury.io/rb/solrb.svg)](https://badge.fury.io/rb/solrb)Solrb
======Object-Oriented approach to Solr in Ruby.
## Table of contents
* [Installation](#installation)
* [Configuration](#configuration)
* [Setting Solr URL via environment variable](#setting-solr-url-via-environment-variable)
* [Single core configuration](#single-core-configuration)
* [Multiple core configuration](#multiple-core-configuration)
* [Solr Cloud](#solr-cloud)
* [Master-slave](#master-slave)
* [Gray list](#gray-list)
* [Basic Authentication](#basic-authentication)
* [Indexing](#indexing)
* [Querying](#querying)
* [Simple Query](#simple-query)
* [Querying multiple cores](#querying-multiple-cores)
* [Query with field boost](#query-with-field-boost)
* [Query with filtering](#query-with-filtering)
* [Query with sorting](#query-with-sorting)
* [Query with grouping](#query-with-grouping)
* [Query with facets](#query-with-facets)
* [Query with boosting functions](#query-with-boosting-functions)
* [Dictionary boosting function](#dictionary-boosting-function)
* [Field list](#field-list)
* [Deleting documents](#deleting-documents)
* [Active Support instrumentation](#active-support-instrumentation)
* [Testing](#running-specs)
* [Running specs](#running-specs)# Installation
Add `solrb` to your Gemfile:
```ruby
gem 'solrb'
```If you are going to use solrb with solr cloud:
```ruby
gem 'zk' # required for solrb solr-cloud integration
gem 'solrb'
```# Configuration
## Setting Solr URL via environment variable
The simplest way to use Solrb is `SORL_URL` environment variable (that has a core name in it):
```bash
ENV['SOLR_URL'] = 'http://localhost:8983/solr/demo'
```You can also use `Solr.configure` to specify the solr URL explicitly:
```ruby
Solr.configure do |config|
config.url = 'http://localhost:8983/solr/demo'
end
```It's important to note that those fields that are not configured, will be passed as-is to solr.
*So you only need to specify fields in configuration if you want Solrb to modify them at runtime*.## Single core configuration
Use `Solr.configure` for an additional configuration:
```ruby
Solr.configure do |config|
config.url = 'http://localhost:8983/solr/demo'# This gem uses faraday to make requests to Solr. You can specify additional faraday
# options here.
config.faraday_options = {}# Core's URL is 'http://localhost:8983/solr/demo'
# Adding fields to work with
config.define_core do |f|
f.field :title, dynamic_field: :text
f.dynamic_field :text, solr_name: '*_text'
end
end
```## Multiple core configuration
```ruby
Solr.configure do |config|
config.url = 'http://localhost:8983/solr'# Define a core with fields that will be used with Solr.
# Core URL is 'http://localhost:8983/solr/listings'
config.define_core(name: :listings) do |f|
# When a dynamic_field is present, the field name will be mapped to match the dynamic field.
# Here, "title" will be mapped to "title_text"
# You must define a dynamic field to be able to use the dynamic_field option
f.field :title, dynamic_field: :text# When solr_name is present, the field name will be mapped to the solr_name at runtime
f.field :tags, solr_name: :tags_array# define a dynamic field
f.dynamic_field :text, solr_name: '*_text'
end# Pass `default: true` to use one core as a default.
# Core's URL is 'http://localhost:8983/solr/cars'
config.define_core(name: :cars, default: true) do |f|
f.field :manufacturer, solr_name: :manuf_s
f.field :model, solr_name: :model_s
end
end
```Warning: Solrb doesn't support fields with the same name. If you have two fields with the same name mapping
to a single solr field, you'll have to rename one of the fields.```ruby
...
config.define_core do |f|
...
# Not allowed: Two fields with same name 'title'
f.field :title, solr_name: :article_title
f.field :title, solr_name: :page_title
end
...
```## Solr Cloud
To enable solr cloud mode you must define a zookeeper url on solr config block.
In solr cloud mode you don't need to provide a solr url (`config.url` or `ENV['SOLR_URL']`).
Solrb will watch the zookeeper state to receive up-to-date information about active solr nodes including the solr urls.You can also specify the ACL credentials for Zookeeper. [More Information](https://lucene.apache.org/solr/guide/7_6/zookeeper-access-control.html#ZooKeeperAccessControl-AboutZooKeeperACLs)
```ruby
Solr.configure do |config|
config.zookeeper_urls = ['localhost:2181', 'localhost:2182', 'localhost:2183']
config.zookeeper_auth_user = 'zk_acl_user'
config.zookeeper_auth_password = 'zk_acl_password'
end
```If you are using puma web server in clustered mode you must call `enable_solr_cloud!` on `on_worker_boot`
callback to make each puma worker connect with zookeeper.```ruby
on_worker_boot do
Solr.enable_solr_cloud!
end
```## Master-slave
To enable master-slave mode you must define a master url and slave url on solr config block.
In solr master-slave mode you don't need to provide a solr url (`config.url` or `ENV['SOLR_URL']`).```ruby
Solr.configure do |config|
config.master_url = 'localhost:8983'
config.slave_url = 'localhost:8984'
# Disable select queries from master:
config.disable_read_from_master = true
# Specify Gray-list service
config.nodes_gray_list = Solr::MasterSlave::NodesGrayList::InMemory.new
end
```If you are using puma web server in clustered mode you must call `enable_master_slave!` on `on_worker_boot`
callback to make each puma worker connect with zookeeper.```ruby
on_worker_boot do
Solr.enable_master_slave!
end
```### Gray list
Solrb provides two built-in services:
- `Solr::MasterSlave::NodesGrayList::Disabled` — Disabled service (default). Just does nothing.
- `Solr::MasterSlave::NodesGrayList::InMemory` — In memory service. It stores failed URLs in an instance variable, so it's not shared across threads/servers. URLs will be marked as "gray" for 5 minutes, but if all URLs are gray, the policy will try to send requests to these URLs earlier.You are able to implement your own services with corresponding API.
## Force node URL
You can force solrb to use a specific node URL with the `with_node_url` method:
```ruby
Solr.with_node_url('http://localhost:9000') do
Solr::Query::Request.new(search_term: 'example', query_fields: query_fields).run
end
```## Basic Authentication
Basic authentication is supported by solrb. You can enable it by providing `auth_user` and `auth_password`
on the config block.```ruby
Solr.configure do |config|
config.auth_user = 'user'
config.auth_password = 'password'
end
```# Indexing
```ruby
# creates a single document and commits it to index
doc = Solr::Update::Commands::Add.new
doc.add_field(:id, 1)
doc.add_field(:name, 'Solrb!!!')commit = Solr::Update::Commands::Commit.new
request = Solr::Update::Request.new([doc, commit])
request.run
```You can also create indexing document directly from attributes:
```ruby
doc = Solr::Update::Commands::Add.new(doc: { id: 5, name: 'John' })
```# Querying
## Simple Query
```ruby
query_field = Solr::Query::Request::QueryField.new(field: :name)request = Solr::Query::Request.new(search_term: 'term', query_fields: [query_field])
request.run(page: 1, page_size: 10)
```
## Querying multiple coresFor multi-core configuration use `Solr.with_core` block:
```ruby
Solr.with_core(:models) do
Solr.delete_by_id(3242343)
Solr::Query::Request.new(search_term: 'term', query_fields: query_fields)
Solr::Update::Request.new([doc])
end
```## Query with field boost
```ruby
query_fields = [
# Use boost_magnitude argument to apply boost to a specific field that you query
Solr::Query::Request::QueryField.new(field: :name, boost_magnitude: 16),
Solr::Query::Request::QueryField.new(field: :title)
]
request = Solr::Query::Request.new(search_term: 'term', query_fields: query_fields)
request.run(page: 1, page_size: 10)
```## Query with filtering
```ruby
query_fields = [
Solr::Query::Request::QueryField.new(field: :name),
Solr::Query::Request::QueryField.new(field: :title)
]
filters = [Solr::Query::Request::Filter.new(type: :equal, field: :title, value: 'A title')]
request = Solr::Query::Request.new(search_term: 'term', query_fields: query_fields, filters: filters)
request.run(page: 1, page_size: 10)
```### AND and OR filters
```ruby
usa_filter =
Solr::Query::Request::AndFilter.new(
Solr::Query::Request::Filter.new(type: :equal, field: :contry, value: 'USA'),
Solr::Query::Request::Filter.new(type: :equal, field: :region, value: 'Idaho')
)
canada_filter =
Solr::Query::Request::AndFilter.new(
Solr::Query::Request::Filter.new(type: :equal, field: :contry, value: 'Canada'),
Solr::Query::Request::Filter.new(type: :equal, field: :region, value: 'Alberta')
)location_filters = Solr::Query::Request::OrFilter.new(usa_filter, canada_filter)
request = Solr::Query::Request.new(search_term: 'term', filters: location_filters)
request.run(page: 1, page_size: 10)
```### Filtering by a Geofilt
```ruby
spatial_point = Solr::SpatialPoint.new(lat: 40.0, lon: -120.0)filters = [
Solr::Query::Request::Geofilt.new(field: :location, spatial_point: spatial_point, spatial_radius: 100)
]request = Solr::Query::Request.new(search_term: 'term', filters: filters)
request.run(page: 1, page_size: 10)
```### Filtering by an Arbitrary Rectangle
```ruby
spatial_rectangle = Solr::SpatialRectangle.new(
top_left: Solr::SpatialPoint.new(lat: 40.0, lon: -120.0),
bottom_right: Solr::SpatialPoint.new(lat: 30.0, lon: -110.0)
)filters = [
Solr::Query::Request::Filter.new(type: :equal, field: :location, value: spatial_rectangle)
]request = Solr::Query::Request.new(search_term: 'term', filters: filters)
request.run(page: 1, page_size: 10)
```## Query with sorting
```ruby
query_fields = [
Solr::Query::Request::QueryField.new(field: :name),
Solr::Query::Request::QueryField.new(field: :title)
]
sort_fields = [Solr::Query::Request::Sorting::Field.new(name: :name, direction: :asc)]
request = Solr::Query::Request.new(search_term: 'term', query_fields: query_fields)
request.sorting = Solr::Query::Request::Sorting.new(fields: sort_fields)
request.run(page: 1, page_size: 10)
```Default sorting logic is following: nulls last, not-nulls first.
```ruby
query_fields = [
Solr::Query::Request::QueryField.new(field: :name)
]
sort_fields = [
Solr::Query::Request::Sorting::Field.new(name: :is_featured, direction: :desc),
Solr::Query::Request::Sorting::Function.new(function: "score desc")
]
request = Solr::Query::Request.new(search_term: 'term', query_fields: query_fields)
request.sorting = Solr::Query::Request::Sorting.new(fields: sort_fields)
request.run(page: 1, page_size: 10)
```## Query with grouping
```ruby
query_fields = [
Solr::Query::Request::QueryField.new(field: :name),
Solr::Query::Request::QueryField.new(field: :category)
]
request = Solr::Query::Request.new(search_term: 'term', query_fields: query_fields)
request.grouping = Solr::Query::Request::Grouping.new(field: :category, limit: 10)
request.run(page: 1, page_size: 10)
```## Query with facets
```ruby
query_fields = [
Solr::Query::Request::QueryField.new(field: :name),
Solr::Query::Request::QueryField.new(field: :category)
]
request = Solr::Query::Request.new(search_term: 'term', query_fields: query_fields)
request.facets = [Solr::Query::Request::Facet.new(type: :terms, field: :category, options: { limit: 10 })]
request.run(page: 1, page_size: 10)
```## Query with boosting functions
```ruby
query_fields = [
Solr::Query::Request::QueryField.new(field: :name),
Solr::Query::Request::QueryField.new(field: :category)
]
request = Solr::Query::Request.new(search_term: 'term', query_fields: query_fields)
request.boosting = Solr::Query::Request::Boosting.new(
multiplicative_boost_functions: [Solr::Query::Request::Boosting::RankingFieldBoostFunction.new(field: :name)],
phrase_boosts: [Solr::Query::Request::Boosting::PhraseProximityBoost.new(field: :category, boost_magnitude: 4)]
)
request.run(page: 1, page_size: 10)
```### Dictionary boosting function
Sometimes you want to do a dictionary-style boosting
example: given a hash (dictionary)```ruby
{3025 => 2.0, 3024 => 1.5, 3023 => 1.2}
```and a field of `category_id`
the resulting boosting function will be:
```
if(eq(category_id_it, 3025), 2.0, if(eq(category_id_it, 3024), 1.5, if(eq(category_id_it, 3023), 1.2, 1)))
```
note that I added spaces for readability, real Solr query functions must always be w/out spacesExample of usage:
```ruby
category_id_boosts = {3025 => 2.0, 3024 => 1.5, 3023 => 1.2}
request.boosting = Solr::Query::Request::Boosting.new(
multiplicative_boost_functions: [
Solr::Query::Request::Boosting::DictionaryBoostFunction.new(field: :category_id,
dictionary: category_id_boosts)
]
)
```## Query with shards.preference
```ruby
shards_preference = Solr::Query::Request::ShardsPreference.new(
properties: [
Solr::Query::Request::ShardsPreferences::Property.new(name: 'replica.type', value: 'PULL')
]
)
request = Solr::Query::Request.new(search_term: 'term', shards_preference: shards_preference)
request.run(page: 1, page_size: 10)
```## Field list
```ruby
query_fields = [
Solr::Query::Request::QueryField.new(field: :name),
Solr::Query::Request::QueryField.new(field: :category)
]
request = Solr::Query::Request.new(search_term: 'term', query_fields: query_fields)
# Solr::Query::Request will return only :id field by default.
# Specify additional return fields (fl param) by setting the request field_list
request.field_list = [:name, :category]
request.run(page: 1, page_size: 10)
```# Deleting documents
```ruby
# Delete by document ID
Solr.delete_by_id(3242343)
Solr.delete_by_id(3242343, commit: true)# Delete by query
Solr.delete_by_query('*:*')
Solr.delete_by_query('*:*', commit: true)# Delete by filters
filters = [Solr::Query::Request::Filter.new(type: :equal, field: :contry, value: 'Canada')]
commands = [Solr::Update::Commands::Delete.new(filters: filters)]
commands << Solr::Update::Commands::Commit.new if commit?
request = Solr::Update::Request.new(commands)
request.run```
# Active Support instrumentation
This gem publishes events via [Active Support Instrumentation](https://edgeguides.rubyonrails.org/active_support_instrumentation.html)
To subscribe to solrb events, you can add this code to initializer:
```ruby
ActiveSupport::Notifications.subscribe('request.solrb') do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
if Logger::INFO == Rails.logger.level
Rails.logger.info("Solrb #{event.duration.round(1)}ms")
elsif Logger::DEBUG == Rails.logger.level && Rails.env.development?
Pry::ColorPrinter.pp(event.payload)
end
end
```# Testing
It's possible to inspect the parameters for each solr query request done using Solrb by requiring
`solr/testing` file in your test suite. The query parameters will be accessible by reading
`Solr::Testing.last_solr_request` after each request.```ruby
require 'solr/testing'RSpec.describe MyTest do
let(:query) { Solr::Query::Request.new(search_term: 'Solrb') }
it 'returns the last solr request params' do
query.run(page: 1, page_size: 10)
expect(Solr::Testing.last_solr_request.body[:params]).to eq({ ... })
end
end
```# Running specs
This project is setup to use CI to run all specs agains a real solr.
If you want to run it locally, you can either use [CircleCI CLI](https://circleci.com/docs/2.0/local-cli/)
or do a completely manual setup (for up-to-date steps see circleci config)```sh
docker run -it --name test-solr -p 8983:8983/tcp -t solr:8.11.2-slim# Copy default configset
docker exec -u 0 $(docker ps | grep test-solr | cut -d ' ' -f 1) sh -c "mkdir /var/solr/data/configsets \
&& cp -R /opt/solr/server/solr/configsets/_default /var/solr/data/configsets/ \
&& chown -R solr:solr /var/solr/data/configsets"# create a core
curl 'http://localhost:8983/solr/admin/cores?action=CREATE&name=test-core&configSet=_default'# disable field guessing
curl http://localhost:8983/solr/test-core/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
SOLR_URL=http://localhost:8983/solr/test-core rspec
```