{"id":20703873,"url":"https://github.com/euiyounghwang/python-search_engine","last_synced_at":"2026-04-12T14:54:35.163Z","repository":{"id":211657411,"uuid":"729666831","full_name":"euiyounghwang/python-search_engine","owner":"euiyounghwang","description":"python-elasticsearch","archived":false,"fork":false,"pushed_at":"2024-04-06T01:53:53.000Z","size":60651,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-17T19:55:26.079Z","etag":null,"topics":["ansible","ansible-playbook","docker","docker-compose","elasticsearch","elasticsearch-curator","grpc","kafka","kafka-consumer","kafka-producer","logstash","poetry","pytest","python3","restapi-framework","shell-script","swagger","swagger-ui"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/euiyounghwang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"support-diagnostics-8.0.3/diagnostics.bat","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-10T00:21:11.000Z","updated_at":"2024-02-10T05:26:56.000Z","dependencies_parsed_at":"2025-01-17T20:05:22.108Z","dependency_job_id":null,"html_url":"https://github.com/euiyounghwang/python-search_engine","commit_stats":null,"previous_names":["euiyounghwang/python-elasticsearch"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/euiyounghwang%2Fpython-search_engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/euiyounghwang%2Fpython-search_engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/euiyounghwang%2Fpython-search_engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/euiyounghwang%2Fpython-search_engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/euiyounghwang","download_url":"https://codeload.github.com/euiyounghwang/python-search_engine/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242970887,"owners_count":20214887,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ansible","ansible-playbook","docker","docker-compose","elasticsearch","elasticsearch-curator","grpc","kafka","kafka-consumer","kafka-producer","logstash","poetry","pytest","python3","restapi-framework","shell-script","swagger","swagger-ui"],"created_at":"2024-11-17T01:09:53.098Z","updated_at":"2026-04-12T14:54:30.140Z","avatar_url":"https://github.com/euiyounghwang.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Python-Search-Engine\n\u003cimg src=\"https://img.shields.io/badge/fastapi-109989?style=for-the-badge\u0026logo=FASTAPI\u0026logoColor=white\"/\u003e \u003cimg src=\"https://img.shields.io/badge/Elastic_Search-005571?style=for-the-badge\u0026logo=elasticsearch\u0026logoColor=white\" /\u003e \u003cimg src=\"https://img.shields.io/badge/Apache_Kafka-231F20?style=for-the-badge\u0026logo=apache-kafka\u0026logoColor=white\" /\u003e \u003cimg src=\"https://img.shields.io/badge/Prometheus-000000?style=for-the-badge\u0026logo=prometheus\u0026labelColor=000000\" /\u003e \u003cimg src=\"\thttps://img.shields.io/badge/Grafana-F2F4F9?style=for-the-badge\u0026logo=grafana\u0026logoColor=orange\u0026labelColor=F2F4F9\" /\u003e \u003cimg src=\"https://img.shields.io/badge/Docker-2CA5E0?style=for-the-badge\u0026logo=docker\u0026logoColor=white\" /\u003e \u003cimg src=\"https://img.shields.io/badge/Shell_Script-121011?style=for-the-badge\u0026logo=gnu-bash\u0026logoColor=white\" /\u003e \u003cimg src=\"https://img.shields.io/badge/Ansible-000000?style=for-the-badge\u0026logo=ansible\u0026logoColor=white\" /\u003e \u003cimg src=\"https://img.shields.io/badge/Grafana-F2F4F9?style=for-the-badge\u0026logo=grafana\u0026logoColor=orange\u0026labelColor=F2F4F9\" /\u003e \u003cimg src=\"https://img.shields.io/badge/Swagger-85EA2D?style=for-the-badge\u0026logo=Swagger\u0026logoColor=white\" /\u003e\n\t\n\nI will use this project as a basic api for building and searching with Elasticsearch(`In 7.10, Elasticsearch released the point-in-time API`. As of 2021, starting with the version 7.11 release, it’s free under the Server Side Public License (SSPL) or Elastic License)\n- Build docker service and pytest instance with a Dockerfile for creating an index with sample datasets and searching with Elasticsearch\n- Also run local environment with this project using `./service_start.sh` script\n- Build Docker Instance for testing with pytest using using `./docker-compose.yml` or different ways the following method like the step `Install Service and Test with Elasicsearch Cluster based on Docker`\n- __\u003ci\u003e[Resharding Stratege]\u003ci\u003e__ : To optimize for better performance of elasticsearch clusters, we have to think of a few questions like how to organize the index? Shard number? Replica Number?  Mapping? Any other settings?  (Optmize index, Index/Search Peformance with the number of shards)\n- Estimate the number of primary/replica(replica should be one by default) shards with resharding stratege by using `'/cluster/sharding_predict'` with POST method against data size \n- Also we can test Opensearch \u0026 Dashboard with Cerebro tool for implementing \u0026 monitoring after building the docker instance using single node or multiple nodes with `./docker-compose.yml`\n- Implement script file(`https://github.com/euiyounghwang/python-search_engine/tree/master/tools`) for indexing with Elasticsearch and Opensearch\n- Elasticsearch Curator : Curator(`./Curator/curator-run-job.sh`) is breaking into version dependent releases. Curator 6.x will work with Elasticsearch 6.x, Curator 7.x will work with Elasticsearch 7.x, and when it is released, Curator 8.x will work with Elasticsearch 8.x. (\u003ci\u003ehttps://github.com/euiyounghwang/python-search_engine/blob/master/Curator/README.md\u003c/i\u003e)\n- Indexing through Logstash from Kafka Broker : Kafka Producer/Consumer (https://github.com/euiyounghwang/python-search_engine/blob/master/kafka/READMD.md), Kafka-Logstash-Elasticsearch, Kafka Consumer with Fastapi Framework, Kafka-Prometheus-Exporter for monitoring\n\n\n#### Install Poerty\n```bash\nhttps://python-poetry.org/docs/?ref=dylancastillo.co#installing-with-the-official-installer\n```\n\n#### Using Python Virtual Environment\n```bash\npython -m venv .venv\nsource .venv/bin/activate\n```\n\n#### Using Poetry: Create the virtual environment in the same directory as the project and install the dependencies with basic library:\n```bash\npoetry config virtualenvs.in-project true\npoetry init\npoetry add fastapi\npoetry add uvicorn\npoetry add pytz\npoetry add elasticsearch==7.10\npoetry add numpy\npoetry add pytest\npoetry add python-dotenv\npoetry add opensearch-py\n```\n\n#### Install this project to make an environment using Poetry Dependency\n```bash\nsource .venv/bin/activate\npoetry install (using --no-root option when building the docker on CircleCI or Docker env)\n```\n\n#### Install Service and Test with Elasicsearch Cluster based on Docker\n- Build Single Node ES with Kibana based on 7.10.0 version (You have to choose to build \u0026 create REST-API Service (`fn-elasticsearch-api`) and Pytest(`fn-elasticsearch-api-test`) instances using `./docker-compose.yml` for interacting with single ES cluster)\n```bash\ndocker run --name kibaba-run --network bridge -e \"ELASTICSEARCH_URL=http://host.docker.internal:9209\" -e \"ES_JAVA_OPTS=-Xms1g -Xmx1g\" -e \"ELASTICSEARCH_HOSTS=http://host.docker.internal:9209\" -p 5801:5601 docker.elastic.co/kibana/kibana:7.10.0\ndocker run --name es8-run --network bridge -p 9209:9200 -p 9114:9114 -p 9309:9300 -e \"http.cors.enabled=true\" -e \"http.cors.allow-origin=\\\"*\\\"\" -e \"http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization\" -e \"http.cors.allow-credentials=true\" -e \"xpack.security.enabled=false\" -e \"discovery.type=single-node\" -e \"ES_JAVA_OPTS=-Xms2g -Xmx2g\" docker.elastic.co/elasticsearch/elasticsearch:7.10.0\n```\n- Build Multiple Nodes ES  based on 7.10.0 version with Kibana, REST-API Service \u0026 Pytest instances : Build \u0026 create instances using `./docker-compose.yml` or `./docker-build.sh` for building the docker image, `./docker-run.sh` for running the service and `./docker-tests.sh` for testing using pytest (Also you can build single cluster(`single_node`, `single_node_kibana`) without xpack option or multiple cluster with xpack : `docker-compose -f ./create-certs.yml run --rm create_certs` to create certs)\n\n\n#### Elasticsearh Cluster Diagnostics (Support by Elasticsearch)\n- The support diagnostic utility is a Java application that can interrogate a running Elasticsearch cluster or Logstash process to obtain data about the state of the cluster at that point in time. It is compatible with all versions of Elasticsearch (including alpha, beta and release candidates), and for Logstash versions greater than 5.0, and for Kibana v6.5+. The release version of the diagnostic is independent of the Elasticsearch, Kibana or Logstash version it is being run against.\n- `Diagnostics Guide` \u003ci\u003e(https://olamideolajide.medium.com/how-to-collect-diagnostics-for-a-cloud-elasticsearch-cluster-4a20841a815a, https://github.com/elastic/support-diagnostics/releases/tag/8.0.3)\u003c/i\u003e\n```bash\npython-elasticsearch git:(master) ✗ ./support-diagnostics-8.0.3/diagnostics.sh --host localhost --port 9209\nUsing /usr/bin/java as Java Runtime\nUsing -Xms256m -Xmx2000m  for options.\nProcessing diagnosticInputs...\n\nCreating temp directory: /Users/euiyoung.hwang/ES/Python_Workspace/python-elasticsearch/local-diagnostics\nConfiguring log file.\nChecking for diagnostic version updates.\nIssue encountered while checking diagnostic version for updates.\nFailed to get current diagnostic version from Github.\nIf Github is not accessible from this environemnt current supported version cannot be confirmed.\nGetting Elasticsearch Version.\nChecking the supplied hostname against the node information retrieved to verify location. This may take some time.\n...\nResults written to: /Users/euiyoung.hwang/ES/Python_Workspace/python-elasticsearch/local-diagnostics/commercial/watcher_stack.json\nResults written to: /Users/euiyoung.hwang/ES/Python_Workspace/python-elasticsearch/local-diagnostics/commercial/xpack.json\nWriting diagnostic manifest.\nClosing logger.\nArchiving diagnostic results.\nArchive: /Users/euiyoung.hwang/ES/Python_Workspace/python-elasticsearch/local-diagnostics-20231215-160457.tar.gz was created\nDeleted directory: /Users/euiyoung.hwang/ES/Python_Workspace/python-elasticsearch/local-diagnostics.\n```\n- Extract diagnotics files like `local-diagnostics-20231215-160457.tar` with folder\n\n#### Install OpenSearch based on Docker for testing\n- OpenSearch(\u003ci\u003ehttps://opensearch.org/docs/latest/install-and-configure/install-opensearch/docker\u003c/i\u003e) is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications licensed under Apache 2.0.\n- Build Single Node OpenSearch with Dashboard(`http://localhost:5901`) based on the recent version (You can test using `https://localhost:9250` or `curl https://localhost:9250 -ku 'admin:admin'`)\n```bash\ndocker run --name opensearch-es01 --network bridge -p 9250:9200 -e \"node.name=opensearch-es01\" -e \"discovery.type=single-node\" opensearchproject/opensearch\ndocker exec -it  opensearch-es01 /bin/bash -c /usr/share/opensearch/plugins/opensearch-security/tools/hash.sh\n# docker run --name opensearch-dashboard --network bridge -p 5901:5601 -e \"opensearch_hosts='[\\\"https://host.docker.internal:9250\\\"]'\" opensearchproject/opensearch-dashboards\ndocker run --name opensearch-dashboard --network bridge -p 5901:5601 -v /Users/euiyoung.hwang/ES/Python_Workspace/python-elasticsearch/custom-opensearch-dashboards.yml:/usr/share/opensearch-dashboards/config/opensearch_dashboards.yml opensearchproject/opensearch-dashboards\n```\n- Send requests to the server to verify that OpenSearch is up and running:\n```bash\ncurl -XGET https://localhost:9250 -u 'admin:admin' --insecure\ncurl -XGET https://localhost:9250/_cat/nodes?v -u 'admin:admin' --insecure\ncurl -XGET https://localhost:9250/_cat/plugins?v -u 'admin:admin' --insecure\n```\n\n#### Run Local Environment\n- It will be validate the status of elasticsearch cluster using `./wait_for_es.sh` and `./DevOps_Shell/read_config.sh` for reading es host value in `./config.yaml` automatically when running `./service_start.sh` script.\n```bash\nsource .venv/bin/activate\n(.venv) ➜  python-elasticsearch git:(master) ✗ ./service_start.sh\nget_value_from_yaml -\u003e  http://localhost:9209\nElasticSearch is up\nINFO:     Will watch for changes in these directories: ['/Users/euiyoung.hwang/ES/Python_Workspace/python-elasticsearch']\nWARNING:  \"workers\" flag is ignored when reloading is enabled.\nINFO:     Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)\nINFO:     Started reloader process [17469] using StatReload\n[2023-12-09 20:36:18,420] [INFO] [injector] [read_config_yaml] {\n  \"app\": {\n    \"es\": {\n      \"es_host\": \"http://localhost:9209\",\n      \"index\": {\n        \"alias\": \"metrics_search\"\n      }\n    }\n  }\n}\n[2023-12-09 20:36:18,421] [INFO] [config] [__init__] @@self.hosts - http://localhost:9209\nINFO:     Started server process [17471]\nINFO:     Waiting for application startup.\nINFO:     Application startup complete.\n```\n\n\n#### Swagger for REST-API\n![Alt text](./screenshot/Swagger_API.png)\n\n\n\n### Pytest\n- Go to virtual enviroment using `source .venv/bin/activate`\n- Run this command manually: `poetry run py.test -v --junitxml=test-reports/junit/pytest.xml --cov-report html --cov tests/` or `./pytest.sh`\n```bash\n(.venv) ➜  python-elasticsearch git:(master) ./pytest.sh \n============================================= test session starts ==============================================\nplatform darwin -- Python 3.9.7, pytest-7.4.3, pluggy-1.3.0 -- /Users/euiyoung.hwang/ES/Python_Workspace/python-elasticsearch/.venv/bin/python\ncachedir: .pytest_cache\nrootdir: /Users/euiyoung.hwang/ES/Python_Workspace/python-elasticsearch/tests\nconfigfile: pytest.ini\nplugins: cov-4.1.0, anyio-3.7.1\ncollected 7 items                                                                                              \n\ntests/test_api.py::test_skip SKIPPED (no way of currently testing this)                                  [ 14%]\ntests/test_api.py::test_api PASSED                                                                       [ 28%]\ntests/test_api.py::test_rest_api PASSED                                                                  [ 42%]\ntests/test_elasticsearch.py::test_elasticsearch PASSED                                                   [ 57%]\ntests/test_elasticsearch.py::test_indics_analyzer_elasticsearch PASSED                                   [ 71%]\ntests/test_elasticsearch.py::test_search_elasticsearch PASSED                                            [ 85%]\ntests/test_elasticsearch.py::test_api_es_search PASSED                                                   [100%]\n```\n\n- Badge : https://github.com/alexandresanlim/Badges4-README.md-Profile","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feuiyounghwang%2Fpython-search_engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feuiyounghwang%2Fpython-search_engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feuiyounghwang%2Fpython-search_engine/lists"}