Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yannick-cw/elastic-indexer4s
Stream your stuff into elasticsearch
https://github.com/yannick-cw/elastic-indexer4s
akka-streams elastic4s elasticsearch scala streaming
Last synced: about 3 hours ago
JSON representation
Stream your stuff into elasticsearch
- Host: GitHub
- URL: https://github.com/yannick-cw/elastic-indexer4s
- Owner: yannick-cw
- License: mit
- Created: 2017-02-26T12:08:40.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-02-28T14:04:02.000Z (8 months ago)
- Last Synced: 2024-03-15T16:12:10.499Z (8 months ago)
- Topics: akka-streams, elastic4s, elasticsearch, scala, streaming
- Language: Scala
- Size: 157 KB
- Stars: 18
- Watchers: 3
- Forks: 6
- Open Issues: 61
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## ElasticIndexer4s
[![Build Status](https://travis-ci.org/yannick-cw/elastic-indexer4s.svg?branch=master)](https://travis-ci.org/yannick-cw/elastic-indexer4s)
[![Coverage Status](https://coveralls.io/repos/github/yannick-cw/elastic-indexer4s/badge.svg?branch=master)](https://coveralls.io/github/yannick-cw/elastic-indexer4s?branch=master)### Overview
ElasticIndexer4s is a Library that helps you write to elasticsearch from your scala application.
The main purpose is to spare you from writing your own implementation for streaming data to elasticsearch
and taking care of switching alias and deleting old indicies.
It builds heavily on [elastic4s](https://github.com/sksamuel/elastic4s)#### Why?
At work we had multiple scala jobs running to write elasticseach indices. These jobs were running on a
nightly basis and each one had its own implementation and follow up for switching to the new alias and deleting old indices.
So this is a perfect place to generify this task. This library does exactly that for your, write to elasticsearch and
afterwards optionally switch the alias to the newly written index and delete old versions of the index.
The only thing you have to provide are some settings, an `akka.stream.scaladsl.Source[A]` and a `com.sksamuel.elastic4s.Indexable[A]`.### Getting Started
ElasticIndexer4s is currently available for Scala 2.12, if you need it for scala 2.11 open an issue.
To get started with SBT, simply add the following to your `build.sbt`
file:```scala
libraryDependencies += "io.github.yannick-cw" % "elastic_indexer4s_2.12" % "0.6.4"
```First you need a Configuration:
```scala
import com.yannick_cw.elastic_indexer4s.elasticsearch.elasic_config.ElasticWriteConfig
import com.sksamuel.elastic4s.http.ElasticNodeEndpoint// basic configuration, without mapping, settings or analyzers
// will create an index with test_index_
val config = ElasticWriteConfig(
esNodeEndpoints = List(ElasticNodeEndpoint("http", "localhost", 9200, None)),
esTargetIndexPrefix = "test_index",
esTargetType = "documents"
)
```Then you'll need a akka Source of data with an `Indexable`, in this case using circe to transform to json:
```scala
import akka.stream.scaladsl.Source
import com.sksamuel.elastic4s.Indexable
import io.circe.generic.auto._
import io.circe.syntax._case class Document(s: String)
object Document {
implicit val indexable = new Indexable[Document] {
override def json(t: Document): String = t.asJson.noSpaces
}
}val dummySource = Source.single(Document("testDoc"))
```In the last step you connect the Source and run the Indexer with optional steps:
```scala
import com.yannick_cw.elastic_indexer4s.ElasticIndexer4sval runResult: Future[Either[IndexError, RunResult]] = ElasticIndexer4s(config)
.from(dummySource)
.switchAliasFrom(alias = "alias", minT = 0.95, maxT = 1.25)
.deleteOldIndices(keep = 2, aliasProtection = true)
.run
```Your result will either be a `RunResult` if all went fine or an `IndexError`, with detailed information on what
steps succeeded and what failed.
The switching and deleting are optional.### More Information
#### Possible configuration
| config name | meaning |
| ---------------------- | ----------------- |
|`elasticNodeEndpoints: List[ElasticNodeEndpoint]` | all elasticsearch nodes to connect to |
|`indexPrefix: String` | the prefix that will be used in the index name, a date will be added |
|`docType: String` | name for the type of documents to be writte to elasticsearch, will be used as elasticsearch `type` |
|`mappingSetting: MappingSetting` | all mappings and settings, see below for more details |
|`writeBatchSize: Int` | the number of documents written in one batch, default to 50 |
|`writeConcurrentRequest: Int` | the number of parallel request to elasticsearch, defaults to 10 |
|`writeMaxAttempts: Int` | the retry attempts to write before failure, defaults to 5 |
|`logWriteSpeedEvery: FiniteDuration` | the time interval in which it is logged how many documents were written, defaults to a minute |
|`waitForElasticTimeout: FiniteDuratio` | time to wait to count the documents before switching alias, default to 5 seconds |
|`sniffCluster` | activate or deactivate [sniffing feature] (https://www.elastic.co/guide/en/elasticsearch/client/java-rest/6.7/_usage.html) for the clientThe `MappingSetting` can be passed to specify mappings and settings for the index.
You can either pass a
```scala
case class TypedMappingSetting(
analyzer: List[AnalyzerDefinition] = List.empty,
mappings: List[MappingDefinition] = List.empty,
shards: Option[Int] = None,
replicas: Option[Int] = None
)
```
with analyzers and mappings defined in elastic4s syntax or pass a
```scala
StringMappingSetting.unsafeString(source: String): Either[ParsingFailure, MappingSetting]
```
which gives you a `parsingFailure` on creation if the string can't be parsed to json.
The `.from([source])` method requires you to give an `Indexable` for the elements to be indexed. Alternatively
you can use `.fromBuilder`, if you want to give an implicit `RequestBuilder`. This allows for more configuration
for the indexing, e.g. you can specify which `id` or which `timestamp` to use.
#### Alias switching
You can add the `.switchAliasFrom` step to the process of writing, if you want to switch an alias from
an old index to the newly written one. If no old index with this alias was found, the new index gets
the alias without switching.
There are three parameters you can pass to
```scala
.switchAliasFrom(alias = "alias", minT = 0.95, maxT = 1.25)
```
The `alias` to switch from and add to the new index, a `minT` threshold defining how many percent of the old
index the new index must have as document count and the `maxT` defining how many more documents the
new index is allowed to have to allow switching.#### Index deletion
Another optional step is `.deleteOldIndices` which allows you to clean up old indices.
It has two paramters
```scala
.deleteOldIndices(keep = 2, aliasProtection = true)
```
where `keep` defines how many indices with the defined `indexPrefix` are left in the cluster and
with `aliasProtection`, which, if set to `true`, does not allow deletion of any index with and alias set.#### Akka Stream
For writing the stream to elasticsearch a `system` and `materializer` are needed. You can either pass them in to the
`ElasticIndexer4s` or you can let ElasticIndexer4s create its own system.
Additionally you can pass in a `Decider` before starting to write, to control what happens on failure in the stream.
```scala
ElasticIndexer4s(config)
.withDecider(decider)
.from(dummySource)
```### Changelog
* 0.5.0: switch to using elasticsearch 6.x.x version, using http client instead of tcp client
* 0.5.1: greatly increase the detail level of failure reporting
* 0.6.0: breaking change moving to elastic4s `6.3.3`, dropping all integration tests, as they are not supported at this time anymore by elastic4s (only by spinning up a cluster yourself or embedded docker)
* 0.6.5: updates lots of versions and fixes cluster sniffing for the rest client, also removed the cluster from the settings, as it is not supported in the rest client