https://github.com/jolicode/emoji-search
:smile: Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr, OpenSearch)
https://github.com/jolicode/emoji-search
analyzer cldr elasticsearch elasticsearch-plugin emoji emoticons hacktoberfest opensearch plugin
Last synced: 3 months ago
JSON representation
:smile: Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr, OpenSearch)
- Host: GitHub
- URL: https://github.com/jolicode/emoji-search
- Owner: jolicode
- License: mit
- Created: 2016-02-25T11:36:19.000Z (over 9 years ago)
- Default Branch: main
- Last Pushed: 2025-02-07T12:32:03.000Z (8 months ago)
- Last Synced: 2025-06-07T00:09:47.328Z (4 months ago)
- Topics: analyzer, cldr, elasticsearch, elasticsearch-plugin, emoji, emoticons, hacktoberfest, opensearch, plugin
- Language: PHP
- Homepage: https://jolicode.com/blog/elasticsearch-icu-now-understands-emoji
- Size: 24.4 MB
- Stars: 221
- Watchers: 25
- Forks: 64
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🙂 Emoji, flags & emoticons support for Elasticsearch
Add support for **emoji** and **flags** in any **Lucene** compatible search engine!
If you wish to search `🍩` to find **donuts** in your documents, you came to the
right place. We offer synonym files ready for usage in Elasticsearch and OpenSearch analyzer.
## Requirements to index emoji in Elasticsearch
There is no requirements for Elasticsearch >= 6.7.
Using older version of Elasticsearch? Open me! 🖱
| Version | Requirements |
|--------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| Elasticsearch >= 6.4 and < 6.7 | You need to install the official [ICU Plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html). See our [blog post about this change](https://jolicode.com/blog/elasticsearch-icu-now-understands-emoji). |
| Elasticsearch < 6.4 | You need our [custom ICU Tokenizer Plugin](https://github.com/jolicode/emoji-search/tree/6.2.4/esplugin), see our [blog post](http://jolicode.com/blog/search-for-emoji-with-elasticsearch) (2016). |Run the following test to verify that you get 4 EMOJI tokens:
```json
GET _analyze
{
"text": ["🍩 🇫🇷 👩🚒 🚣🏾♀"]
}
```## The Synonyms, flags and emoticons
What you need to search with emoji is a way to expand them to words that can
match searches and documents, in **your language**. That's the goal of the
[synonym dictionaries](/synonyms).We build Solr / Lucene compatible synonyms files in all languages supported by
[Unicode CLDR](http://cldr.unicode.org/) so you can set them up in an analyzer.
It looks like this:```
👩🚒 => 👩🚒, firefighter, firetruck, woman
👩✈ => 👩✈, pilot, plane, woman
🥓 => 🥓, bacon, meat, food
🥔 => 🥔, potato, vegetable, food
😅 => 😅, cold, face, open, smile, sweat
😆 => 😆, face, laugh, mouth, open, satisfied, smile
🚎 => 🚎, bus, tram, trolley
🇫🇷 => 🇫🇷, france
🇬🇧 => 🇬🇧, united kingdom
```For emoticons, use [this mapping](emoticons.txt) with a char_filter to replace
emoticons by emoji.### Installation
Download the emoji and emoticon file you want from this repository and store
them in `PATH_TO_ES/config/analysis` (_or anywhere Elasticsearch can read_).```
config
├── analysis
│ ├── cldr-emoji-annotation-synonyms-en.txt
│ └── emoticons.txt
├── elasticsearch.yml
...
```Use them like this (this is a complete _english_ example with Elasticsearch >=
6.7):```json
PUT /tweets
{
"settings": {
"analysis": {
"filter": {
"english_emoji": {
"type": "synonym",
"synonyms_path": "analysis/cldr-emoji-annotation-synonyms-en.txt"
},
"emoji_variation_selector_filter": {
"type": "pattern_replace",
"pattern": "\\uFE0E|\\uFE0F",
"replace": ""
},
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": ["example"]
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"english_with_emoji": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"emoji_variation_selector_filter",
"english_emoji",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "english_with_emoji"
}
}
}
}
```You can now test the result with:
```json
GET tweets/_analyze
{
"field": "content",
"text": "🍩 🇫🇷 👩🚒 🚣🏾♀"
}
```## How to contribute
### Build from CLDR SVN
You will need:
- php cli
- php zip, mbstring, xml and curl extensions
- a running Elasticsearch (`make start`)Edit the tag in `tools/build-released.php` and run `php tools/build-released.php`.
### Update emoticons
Run `php tools/build-emoticon.php`.
## Licenses
Emoji data courtesy of CLDR. See [unicode-license.txt](unicode-license.txt) for
details. Some modifications are done on the data, [see
here](https://github.com/jolicode/emoji-search/issues/6). Emoticon data based on
[https://github.com/wooorm/emoticon/](https://github.com/wooorm/emoticon/)
(MIT).This repository in distributed under [MIT License](LICENSE). Feel free to use
and contribute as you please!