https://github.com/xiaozhiliaoo/elasticsearch-plugins
Elasticsearch Custom Plugins
https://github.com/xiaozhiliaoo/elasticsearch-plugins
elasticsearch elasticsearch-plugins
Last synced: about 2 months ago
JSON representation
Elasticsearch Custom Plugins
- Host: GitHub
- URL: https://github.com/xiaozhiliaoo/elasticsearch-plugins
- Owner: xiaozhiliaoo
- Created: 2022-11-27T10:41:59.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-11-28T02:39:17.000Z (over 2 years ago)
- Last Synced: 2025-02-16T08:27:43.004Z (3 months ago)
- Topics: elasticsearch, elasticsearch-plugins
- Language: Java
- Homepage:
- Size: 66.4 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ElasticSearch插件(ES版本7.15.1)
[单字分词Analyzer:onecharstandard](#单字分词analyzeronecharstandard)
# 单字分词Analyzer:onecharstandard
ES的数字类型的text默认的分词器是[StandardAnalyzer](https://www.elastic.co/guide/en/elasticsearch/reference/8.5/analysis-standard-analyzer.html),该分词器会将"123456789"(比如手机号)分词成"123456789",所以如果查询"123","345"的时候,会找不到文档,但是ES的StandardAnalyzer 支持配置**max_token_length**,将该值改为1,可以实现单字分词的效果,如下[**分析文本**](#分析文本)有示例。可以应用于如根据手机号的一部分搜索文档的功能,但是每次配置比较繁琐,所以创建了onecharstandard的插件。下面演示正常配置过程:
## 创建索引
```
PUT yourindex
{
"mappings": {
"properties": {
"mobile": {
"type": "text",
"analyzer": "custom_onecharstandard",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"custom_onecharstandard": {
"type": "standard",
"max_token_length": 1
}
}
}
}
}
```## 分析文本
```
//分词成1,2,3,4,5,6,7,8
POST _analyze
{
"tokenizer": {
"type": "standard",
"max_token_length": 1
},
"text": "12345678"
}//分词成12345678整体
POST _analyze
{
"analyzer": "standard",
"text": "12345678"
}```
## 写入文档
```
//doc1
POST yourindex/_doc
{
"mobile": "1"
}
//doc2
POST yourindex/_doc
{
"mobile": "2"
}
//doc3
POST yourindex/_doc
{
"mobile": "12"
}
//doc4
POST yourindex/_doc
{
"mobile": "21"
}
//doc5
POST yourindex/_doc
{
"mobile": "123"
}
```## 搜索文档
- 可以使用手机号中的一部分(如一般常用为后四位)搜索文档。注意必须是match_phrase(slop默认是0)短语搜索,不能是match搜索。因为match搜索(operator默认是OR)会查出不在mobile里面的内容。
| term | docId |
| ---- | ------- |
| 1 | 1,2,3,5 |
| 2 | 2,3,4,5 |
| 3 | 5 |
```
//match查询会查询全部,doc1到doc5。因为operator默认是OR的关系。
GET yourindex/_search
{
"query": {
"match": {
"mobile": "12"
}
}
}
//match_phrase查出只包含12的文档。分词效果是1,2,必须同时包括1,2的文档,且保证顺序不变。此时只有一个,查出doc3
GET yourindex/_search
{
"query": {
"match_phrase": {
"mobile": "12"
}
}
}
//该效果和match_phrase类似,operator配置成and
GET yourindex/_search
{
"query": {
"match": {
"mobile": {
"operator": "and",
"query":"12"
}
}
}
}
//查不到,因为没有这个分词文档
GET yourindex/_search
{
"query": {
"term": {
"mobile": {
"value": "12"
}
}
}
}
//查不到,因为没有这个分词文档
GET yourindex/_search
{
"query": {
"term": {
"mobile": {
"value": "12"
}
}
}
}
//只查到12文档doc3
GET yourindex/_search
{
"query": {
"term": {
"mobile.keyword": {
"value": "12"
}
}
}
}
```- 因为配置了多字段fields:keyword,也可以用term做精确值查询。
```
GET yourindex/_search
{
"query": {
"term": {
"mobile.keyword": "123"
}
}
}
```但是这种方式需要在创建索引的时候在mapping和settings中配置analyzer,比较不方便,因此该插件提供了一个新的Analyzer:**onecharstandard**
```
//创建索引时可以直接配置,无需在settings里再构造自定义analyzer
PUT yourindex
{
"mappings": {
"properties": {
"mobile": {
"type": "text",
"analyzer": "onecharstandard",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
```# 插件
[https://www.elastic.co/guide/en/elasticsearch/plugins/current/intro.html](https://www.elastic.co/guide/en/elasticsearch/plugins/current/intro.html)
## 运行插件
1. 执行 zip task。
2. 把elasticsearch-plugins-0.0.1.jar包移动到ESPlugin-es7.15.1-0.0.1.zip文件中。
3. 安装:elasticsearch-plugin.bat install file:///D:/ELK/ESPlugin-es7.15.1-0.0.1.zip
4. 重启
## 插件列表
elasticsearch-plugin.bat list
## 删除插件
elasticsearch-plugin.bat remove ESPlugin
# Standard Analyzer介绍
官方文档 [Analysis Standard Analyzer](https://www.elastic.co/guide/en/elasticsearch/reference/8.5/analysis-standard-analyzer.html)
| Character Filter | Tokenizer | Token Filter |
| ---------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| 无 | [Standard Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/8.5/analysis-standard-tokenizer.html) | [Lower Case Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/8.5/analysis-lowercase-tokenfilter.html) |
| | | [Stop Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/8.5/analysis-stop-tokenfilter.html) |Standard Analyzer由Standard Tokenizer和两个Token Filter组成。
Standard Tokenizer基于Unicode Text Segmentation算法,lowercase token filter是把大写转小写,stop token filter是移除停用词,类似的效果如下:
```
//会分为2 quick brown foxes jumped over lazy dog bone,其中大写转小写,并且the停用词被移除。
POST _analyze
{
"tokenizer": "standard",
"filter" : ["lowercase","stop"],
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
```# 参考
1. ES
官方Plugin例子:[https://github.com/elastic/elasticsearch/tree/main/plugins/examples](https://github.com/elastic/elasticsearch/tree/main/plugins/examples)2. ES
官方Plugin文档:[https://www.elastic.co/guide/en/elasticsearch/plugins/current/intro.html](https://www.elastic.co/guide/en/elasticsearch/plugins/current/intro.html)