{"id":13676065,"url":"https://github.com/YadaYuki/omochi","last_synced_at":"2025-04-29T03:30:32.413Z","repository":{"id":36982673,"uuid":"488259772","full_name":"YadaYuki/omochi","owner":"YadaYuki","description":"Full text search engine from scratch by Goʕ◔ϖ◔ʔ (Just a toy) 😊","archived":false,"fork":false,"pushed_at":"2022-07-08T05:36:41.000Z","size":17584,"stargazers_count":102,"open_issues_count":0,"forks_count":5,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-11-11T17:45:42.054Z","etag":null,"topics":["ddd","ent","go","golang","information-retrieval","search","search-engine"],"latest_commit_sha":null,"homepage":"https://zenn.dev/yukiyada/articles/7e2c67d8406f0d","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/YadaYuki.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-05-03T15:16:22.000Z","updated_at":"2024-10-31T13:46:32.000Z","dependencies_parsed_at":"2022-08-08T18:31:26.171Z","dependency_job_id":null,"html_url":"https://github.com/YadaYuki/omochi","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YadaYuki%2Fomochi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YadaYuki%2Fomochi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YadaYuki%2Fomochi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YadaYuki%2Fomochi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/YadaYuki","download_url":"https://codeload.github.com/YadaYuki/omochi/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251426693,"owners_count":21587633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ddd","ent","go","golang","information-retrieval","search","search-engine"],"created_at":"2024-08-02T13:00:17.575Z","updated_at":"2025-04-29T03:30:31.461Z","avatar_url":"https://github.com/YadaYuki.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n    \u003cimg height=200 src=\"https://user-images.githubusercontent.com/57289763/177349765-887dd049-f5cf-440f-9a57-e04161019759.png\" alt=\"七輪の上で焼かれたお餅\"\u003e\n\u003c/div\u003e\n\n\u003ch1 align=\"center\"\u003eOmochi 😊\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\u003cstrong\u003eFull text search engine from scratch by Golangʕ◔ϖ◔ʔ (Just a toy)\u003c/strong\u003e\u003c/p\u003e\n\n## ✨ Features\n\n- Omochi is an inverted index based search engine by Golang.\n- If indexed correctly, any document can be searched.\n- You can search documents from RESTful API.\n- Supported language: English, Japanese.\n\u003cdiv align=\"center\"\u003e\n \u003cimg width=\"673\" alt=\"スクリーンショット 2022-07-08 11 08 15\" src=\"https://user-images.githubusercontent.com/57289763/177902420-998af3a1-1387-4943-8332-eea8fe90aca2.png\"\u003e\n\u003c/div\u003e\n\n\u003c!-- https://app.diagrams.net/#G15YMpFAnxuCpX0XI_Yjx7IuFPEc-k9Zf9 --\u003e\n\n\n## 📍 Requirements\n\n- [Golang](https://golang.org/) 1.18+\n- [Docker](https://www.docker.com/) 20.10+\n\n## 📦 Setup\n\n#### **Create network**\n\nCreate [docker network](https://docs.docker.jp/engine/reference/commandline/network_create.html)(omochi_network) by:\n```\n$ docker network create omochi_network\n```\n\n#### **Database migration**\n\nOmochi uses [MariaDB](https://mariadb.org/) for storing Inverted Indexes \u0026 Documents, and [Ent](https://entgo.io/) for ORM.\n\nFor database migration, connect docker container shell by:\n```\n$ docker-compose run api bash\n```\n\nThen, running database migration by:\n\n```\n$ go run ./cmd/migrate/migrate.go \n```\n\n####  **Seed data**\n\nTo try search engine, this project provides two datasets as samples in TSV Format. \n\nThe dataset for English is a **[Movie title dataset](https://github.com/YadaYuki/omochi/tree/yadayuki/add-readme/cmd/seeds/data/en/movie.tsv)**, and the dataset for Japanese is a **[Doraemon comic title dataset](https://github.com/YadaYuki/omochi/blob/yadayuki/add-readme/cmd/seeds/data/ja/doraemon.tsv)**.\n\nAt first, connect docker container shell by:\n\n```\n$ docker-compose run api bash\n```\n\nThen, seed data by:\n\n```\n$ go run {path to seed.go}\n```\n\nIf you initialize with a Japanese dataset, `{path to seed.go}` should be `./cmd/seeds/ja/seed.go `. On the other hand, for English, `./cmd/seeds/eng/seed.go `.\n\n\n## 🏇 Start Application\n\nAfter completing setup, you can start application by running:\n\n```\n$ docker-compose up\n```\n\nThis app starts a RESTful API and listens on port 8081 for connections\n\n## 🌎 How to use \u0026 Demo\n\nAfter seeding data , you can search documents by send GET request to `/v1/document/search` . \n\nQuery parameters are as follow:\n\n- **`\"keywords\"`**: Keywords to search. If there are multiple search terms, specify them separated by commas like `\"hoge,fuga,piyo\"`\n- **`\"mode\"`**: Search mode. The search modes that can be specified are `\"And\"` and `\"Or\"`\n\n#### Demo\n\n- **Doraemon comic title dataset**\n\nAfter data seeding by **Doraemon comic title dataset**, you can search documents which include \"ドラえもん\" by: \n```\n$ curl \"http://localhost:8081/v1/document/search?keywords=ドラえもん\" | jq . \n{\n  \"documents\": [\n    {\n      \"id\": 12054,\n      \"content\": \"ドラえもんの歌\",\n      \"tokenized_content\": [\n        \"ドラえもん\",\n        \"歌\"\n      ],\n      \"created_at\": \"2022-07-08T12:59:49+09:00\",\n      \"updated_at\": \"2022-07-08T12:59:49+09:00\"\n    },\n    {\n      \"id\": 11992,\n      \"content\": \"恋するドラえもん\",\n      \"tokenized_content\": [\n        \"恋する\",\n        \"ドラえもん\"\n      ],\n      \"created_at\": \"2022-07-08T12:59:48+09:00\",\n      \"updated_at\": \"2022-07-08T12:59:48+09:00\"\n    },\n    {\n      \"id\": 11230,\n      \"content\": \"ドラえもん登場！\",\n      \"tokenized_content\": [\n        \"ドラえもん\",\n        \"登場\"\n      ],\n      \"created_at\": \"2022-07-08T12:59:44+09:00\",\n      \"updated_at\": \"2022-07-08T12:59:44+09:00\"\n    },\n    ... \n```\n\n- **Movie title dataset**\n\nAfter data seeding by **Movie title dataset**, you can search documents which include \"toy\" and \"story\" by: \n```\n$ curl \"http://localhost:8081/v1/document/search?keywords=toy,story\u0026mode=And\" | jq .\n{\n  \"documents\": [\n    {\n      \"id\": 1,\n      \"content\": \"Toy Story\",\n      \"tokenized_content\": [\n        \"toy\",\n        \"story\"\n      ],\n      \"created_at\": \"2022-07-08T13:49:24+09:00\",\n      \"updated_at\": \"2022-07-08T13:49:24+09:00\"\n    },\n    {\n      \"id\": 39,\n      \"content\": \"Toy Story of Terror!\",\n      \"tokenized_content\": [\n        \"toy\",\n        \"story\",\n        \"terror\"\n      ],\n      \"created_at\": \"2022-07-08T13:49:34+09:00\",\n      \"updated_at\": \"2022-07-08T13:49:34+09:00\"\n    },\n    {\n      \"id\": 83,\n      \"content\": \"Toy Story That Time Forgot\",\n      \"tokenized_content\": [\n        \"toy\",\n        \"story\",\n        \"time\",\n        \"forgot\"\n      ],\n      \"created_at\": \"2022-07-08T13:49:53+09:00\",\n      \"updated_at\": \"2022-07-08T13:49:53+09:00\"\n    },\n    {\n      \"id\": 213,\n      \"content\": \"Toy Story 2\",\n      \"tokenized_content\": [\n        \"toy\",\n        \"story\"\n      ],\n      \"created_at\": \"2022-07-08T13:50:35+09:00\",\n      \"updated_at\": \"2022-07-08T13:50:35+09:00\"\n    },\n    {\n      \"id\": 352,\n      \"content\": \"Toy Story 3\",\n      \"tokenized_content\": [\n        \"toy\",\n        \"story\"\n      ],\n      \"created_at\": \"2022-07-08T13:51:23+09:00\",\n      \"updated_at\": \"2022-07-08T13:51:23+09:00\"\n    }\n  ]\n}\n```\n\n## 📚 Reference\n\n#### Dataset\n\n- Fujiko.F.Fujio,Doraemon(Tentomushi Comics) 1~45, Shogakukan , 1974～1996\n- ROUNAK BANIK.\"The Movies Dataset\".kaggle.https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset. Accessed on 07/08\n\n#### Book\n\n- [Information Retrieval: Implementing and Evaluating Search Engines](https://www.amazon.co.jp/Information-Retrieval-Implementing-Evaluating-Engines/dp/0262026511)\n- [情報検索アルゴリズム](https://www.amazon.co.jp/%E6%83%85%E5%A0%B1%E6%A4%9C%E7%B4%A2%E3%82%A2%E3%83%AB%E3%82%B4%E3%83%AA%E3%82%BA%E3%83%A0-%E5%8C%97-%E7%A0%94%E4%BA%8C/dp/4320120361/ref=pd_lpo_3?pd_rd_i=4320120361\u0026psc=1)\n- [Pythonではじめる 情報検索プログラミング](https://www.amazon.co.jp/Python%E3%81%A7%E3%81%AF%E3%81%98%E3%82%81%E3%82%8B-%E6%83%85%E5%A0%B1%E6%A4%9C%E7%B4%A2%E3%83%97%E3%83%AD%E3%82%B0%E3%83%A9%E3%83%9F%E3%83%B3%E3%82%B0-%E4%BD%90%E8%97%A4-%E9%80%B2%E4%B9%9F/dp/4627818610)\n- [WEB+DB PRESS Vol.126. 特集 Goで作って学ぶ検索エンジン](https://www.amazon.co.jp/WEB-DB-PRESS-Vol-126-%E7%9C%9F%E5%A3%81/dp/4297125390)\n- [検索エンジン自作入門 ~手を動かしながら見渡す検索の舞台裏](https://www.amazon.co.jp/%E6%A4%9C%E7%B4%A2%E3%82%A8%E3%83%B3%E3%82%B8%E3%83%B3%E8%87%AA%E4%BD%9C%E5%85%A5%E9%96%80-%E6%89%8B%E3%82%92%E5%8B%95%E3%81%8B%E3%81%97%E3%81%AA%E3%81%8C%E3%82%89%E8%A6%8B%E6%B8%A1%E3%81%99%E6%A4%9C%E7%B4%A2%E3%81%AE%E8%88%9E%E5%8F%B0%E8%A3%8F-%E5%B1%B1%E7%94%B0-%E6%B5%A9%E4%B9%8B/dp/4774167533)\n\n\n## 🧑‍💻 License\n\n[MIT](https://github.com/YadaYuki/omochi/blob/yadayuki/add-readme/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYadaYuki%2Fomochi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FYadaYuki%2Fomochi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYadaYuki%2Fomochi/lists"}