Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/YadaYuki/omochi
Full text search engine from scratch by Goʕ◔ϖ◔ʔ (Just a toy) 😊
https://github.com/YadaYuki/omochi
ddd ent go golang information-retrieval search search-engine
Last synced: about 10 hours ago
JSON representation
Full text search engine from scratch by Goʕ◔ϖ◔ʔ (Just a toy) 😊
- Host: GitHub
- URL: https://github.com/YadaYuki/omochi
- Owner: YadaYuki
- License: mit
- Created: 2022-05-03T15:16:22.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-07-08T05:36:41.000Z (over 2 years ago)
- Last Synced: 2024-08-03T13:13:51.795Z (3 months ago)
- Topics: ddd, ent, go, golang, information-retrieval, search, search-engine
- Language: Go
- Homepage: https://zenn.dev/yukiyada/articles/7e2c67d8406f0d
- Size: 16.8 MB
- Stars: 100
- Watchers: 1
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Omochi 😊
Full text search engine from scratch by Golangʕ◔ϖ◔ʔ (Just a toy)
## ✨ Features
- Omochi is an inverted index based search engine by Golang.
- If indexed correctly, any document can be searched.
- You can search documents from RESTful API.
- Supported language: English, Japanese.
## 📍 Requirements
- [Golang](https://golang.org/) 1.18+
- [Docker](https://www.docker.com/) 20.10+## 📦 Setup
#### **Create network**
Create [docker network](https://docs.docker.jp/engine/reference/commandline/network_create.html)(omochi_network) by:
```
$ docker network create omochi_network
```#### **Database migration**
Omochi uses [MariaDB](https://mariadb.org/) for storing Inverted Indexes & Documents, and [Ent](https://entgo.io/) for ORM.
For database migration, connect docker container shell by:
```
$ docker-compose run api bash
```Then, running database migration by:
```
$ go run ./cmd/migrate/migrate.go
```#### **Seed data**
To try search engine, this project provides two datasets as samples in TSV Format.
The dataset for English is a **[Movie title dataset](https://github.com/YadaYuki/omochi/tree/yadayuki/add-readme/cmd/seeds/data/en/movie.tsv)**, and the dataset for Japanese is a **[Doraemon comic title dataset](https://github.com/YadaYuki/omochi/blob/yadayuki/add-readme/cmd/seeds/data/ja/doraemon.tsv)**.
At first, connect docker container shell by:
```
$ docker-compose run api bash
```Then, seed data by:
```
$ go run {path to seed.go}
```If you initialize with a Japanese dataset, `{path to seed.go}` should be `./cmd/seeds/ja/seed.go `. On the other hand, for English, `./cmd/seeds/eng/seed.go `.
## 🏇 Start Application
After completing setup, you can start application by running:
```
$ docker-compose up
```This app starts a RESTful API and listens on port 8081 for connections
## 🌎 How to use & Demo
After seeding data , you can search documents by send GET request to `/v1/document/search` .
Query parameters are as follow:
- **`"keywords"`**: Keywords to search. If there are multiple search terms, specify them separated by commas like `"hoge,fuga,piyo"`
- **`"mode"`**: Search mode. The search modes that can be specified are `"And"` and `"Or"`#### Demo
- **Doraemon comic title dataset**
After data seeding by **Doraemon comic title dataset**, you can search documents which include "ドラえもん" by:
```
$ curl "http://localhost:8081/v1/document/search?keywords=ドラえもん" | jq .
{
"documents": [
{
"id": 12054,
"content": "ドラえもんの歌",
"tokenized_content": [
"ドラえもん",
"歌"
],
"created_at": "2022-07-08T12:59:49+09:00",
"updated_at": "2022-07-08T12:59:49+09:00"
},
{
"id": 11992,
"content": "恋するドラえもん",
"tokenized_content": [
"恋する",
"ドラえもん"
],
"created_at": "2022-07-08T12:59:48+09:00",
"updated_at": "2022-07-08T12:59:48+09:00"
},
{
"id": 11230,
"content": "ドラえもん登場!",
"tokenized_content": [
"ドラえもん",
"登場"
],
"created_at": "2022-07-08T12:59:44+09:00",
"updated_at": "2022-07-08T12:59:44+09:00"
},
...
```- **Movie title dataset**
After data seeding by **Movie title dataset**, you can search documents which include "toy" and "story" by:
```
$ curl "http://localhost:8081/v1/document/search?keywords=toy,story&mode=And" | jq .
{
"documents": [
{
"id": 1,
"content": "Toy Story",
"tokenized_content": [
"toy",
"story"
],
"created_at": "2022-07-08T13:49:24+09:00",
"updated_at": "2022-07-08T13:49:24+09:00"
},
{
"id": 39,
"content": "Toy Story of Terror!",
"tokenized_content": [
"toy",
"story",
"terror"
],
"created_at": "2022-07-08T13:49:34+09:00",
"updated_at": "2022-07-08T13:49:34+09:00"
},
{
"id": 83,
"content": "Toy Story That Time Forgot",
"tokenized_content": [
"toy",
"story",
"time",
"forgot"
],
"created_at": "2022-07-08T13:49:53+09:00",
"updated_at": "2022-07-08T13:49:53+09:00"
},
{
"id": 213,
"content": "Toy Story 2",
"tokenized_content": [
"toy",
"story"
],
"created_at": "2022-07-08T13:50:35+09:00",
"updated_at": "2022-07-08T13:50:35+09:00"
},
{
"id": 352,
"content": "Toy Story 3",
"tokenized_content": [
"toy",
"story"
],
"created_at": "2022-07-08T13:51:23+09:00",
"updated_at": "2022-07-08T13:51:23+09:00"
}
]
}
```## 📚 Reference
#### Dataset
- Fujiko.F.Fujio,Doraemon(Tentomushi Comics) 1~45, Shogakukan , 1974~1996
- ROUNAK BANIK."The Movies Dataset".kaggle.https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset. Accessed on 07/08#### Book
- [Information Retrieval: Implementing and Evaluating Search Engines](https://www.amazon.co.jp/Information-Retrieval-Implementing-Evaluating-Engines/dp/0262026511)
- [情報検索アルゴリズム](https://www.amazon.co.jp/%E6%83%85%E5%A0%B1%E6%A4%9C%E7%B4%A2%E3%82%A2%E3%83%AB%E3%82%B4%E3%83%AA%E3%82%BA%E3%83%A0-%E5%8C%97-%E7%A0%94%E4%BA%8C/dp/4320120361/ref=pd_lpo_3?pd_rd_i=4320120361&psc=1)
- [Pythonではじめる 情報検索プログラミング](https://www.amazon.co.jp/Python%E3%81%A7%E3%81%AF%E3%81%98%E3%82%81%E3%82%8B-%E6%83%85%E5%A0%B1%E6%A4%9C%E7%B4%A2%E3%83%97%E3%83%AD%E3%82%B0%E3%83%A9%E3%83%9F%E3%83%B3%E3%82%B0-%E4%BD%90%E8%97%A4-%E9%80%B2%E4%B9%9F/dp/4627818610)
- [WEB+DB PRESS Vol.126. 特集 Goで作って学ぶ検索エンジン](https://www.amazon.co.jp/WEB-DB-PRESS-Vol-126-%E7%9C%9F%E5%A3%81/dp/4297125390)
- [検索エンジン自作入門 ~手を動かしながら見渡す検索の舞台裏](https://www.amazon.co.jp/%E6%A4%9C%E7%B4%A2%E3%82%A8%E3%83%B3%E3%82%B8%E3%83%B3%E8%87%AA%E4%BD%9C%E5%85%A5%E9%96%80-%E6%89%8B%E3%82%92%E5%8B%95%E3%81%8B%E3%81%97%E3%81%AA%E3%81%8C%E3%82%89%E8%A6%8B%E6%B8%A1%E3%81%99%E6%A4%9C%E7%B4%A2%E3%81%AE%E8%88%9E%E5%8F%B0%E8%A3%8F-%E5%B1%B1%E7%94%B0-%E6%B5%A9%E4%B9%8B/dp/4774167533)## 🧑💻 License
[MIT](https://github.com/YadaYuki/omochi/blob/yadayuki/add-readme/LICENSE)