{"id":29401767,"url":"https://github.com/ome/omero_search_engine","last_synced_at":"2025-10-05T18:34:44.697Z","repository":{"id":38190007,"uuid":"410822777","full_name":"ome/omero_search_engine","owner":"ome","description":"App which is used to search metadata (key-value pairs)","archived":false,"fork":false,"pushed_at":"2025-08-29T08:49:27.000Z","size":167178,"stargazers_count":1,"open_issues_count":23,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-09-06T00:40:33.441Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ome.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-09-27T09:33:18.000Z","updated_at":"2025-08-29T08:49:34.000Z","dependencies_parsed_at":"2023-09-24T03:47:51.610Z","dependency_job_id":"2a742418-fecb-4fba-8940-f04430329dd5","html_url":"https://github.com/ome/omero_search_engine","commit_stats":{"total_commits":571,"total_committers":4,"mean_commits":142.75,"dds":0.1996497373029772,"last_synced_commit":"d368bad2556f59531665091bf26d71fda4de886f"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/ome/omero_search_engine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ome%2Fomero_search_engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ome%2Fomero_search_engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ome%2Fomero_search_engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ome%2Fomero_search_engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ome","download_url":"https://codeload.github.com/ome/omero_search_engine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ome%2Fomero_search_engine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278499400,"owners_count":25997322,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-10T16:33:32.224Z","updated_at":"2025-10-05T18:34:44.670Z","avatar_url":"https://github.com/ome.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":".. image:: https://github.com/ome/omero_search_engine/workflows/Build/badge.svg\r\n   :target: https://github.com/ome/omero_search_engine/actions\r\n\r\n.. image:: https://readthedocs.org/projects/omero-search-engine/badge/?version=latest\r\n    :target: https://omero-search-engine.readthedocs.io/en/latest/?badge=latest\r\n    :alt: Documentation Status\r\n\r\nOMERO Search Engine\r\n--------------------\r\n\r\n* OMERO search engine app is used to search metadata (``key-value`` pairs).\r\n* It leverages Elasticsearch, a distributed, free, and open-source search and analytics engine designed for large data volumes.\r\n* Most of the search operators are supported, e.g. equals, not equals, contains.\r\n* It allows users to run complex queries on the data using ‘and’ and ‘or’.\r\n* It enables search for images, projects, screens, plates and datasets.\r\n* It enables search for any attribute value, even when the specific attribute is unknown.\r\n\r\n* It supports indexing data from database servers, database backups, and CSV files. Support for JSON is currently being developed.\r\n\r\n* It supports searching data from multiple data sources.\r\n* Once indexed, data from any source becomes readily searchable.\r\n* The search results can be restricted to one or multiple data sources.\r\n\r\n* The search engine query is a dict that has three parts:\r\n\r\n  * The **first part** is ``and_filter``, it is a list. Each item in the list is a dict that has three keys:\r\n\r\n    * name: attribute name (name in annotation_mapvalue table)\r\n    * value: attribute value (value in annotation_mapvalue table)\r\n    * operator: the operator, which is used to search the data, e.g. ``equals``, ``no_equals``, ``contains``, etc.\r\n  * The **second part** of the query is ``or_filters``; it has alternatives to search the database; it answers a question like finding the images which can satisfy one or more of conditions inside a list. It is a list of dicts and has the same format as the dict inside the ``and_filter``.\r\n  * The **third part** is the ``main_attribute``, it allows the user to search using one or more of ``project _id, dataset_id, group_id, owner_id, group_id, owner_id``, etc. It supports two operators, ``equals`` and ``not_equals``. Hence, it is possible to search one project instead of all the projects. Also it is possible to search the results which belong to a specific user or group.\r\n\r\n* The search engine returns the results in a JSON which has the following keys:\r\n\r\n  * ``notice``: reports a message to the sender which may include an error message.\r\n  * ``Error``: specific error message\r\n  * ``query_details``: The submitted query.\r\n  * ``resource``: The resource, e.g. image\r\n  * ``server_query_time``: The server query time in seconds\r\n  * ``results``: the query results, which is a dict with the following keys:\r\n  * ``size``: Total query results.\r\n  * ``page``: page number, a page contains a maximum of 10000 records. If the results have more records, they will be transformed using more than one page.\r\n  * ``bookmark``: Used to call the next page if the results exceed 10,000 records.\r\n  * ``total_pages``: Total number of pages that contains the results.\r\n  * ``results``: a list that contains the results. Each item inside the list is a dict. The dict key contains image id, name, and all the metadata (key/pair, i.e. \"key_values\") values for the image. Each item has other data such as the image owner Id,group Id, project Id and name etc.\r\n* It is possible to query the search engine to get all the available resources (e.g. image) and their keys (names) using the following URL::\r\n\r\n    127.0.0.01:5577/api/v1/resources/all/keys\r\n\r\n* The user can get the available values for a specific key for a resource, e.g. what are the available values for \"Organism\"::\r\n\r\n    http://127.0.0.1:5577/api/v1/resources/image/getannotationvalueskey/?key=Organism\r\n\r\n* The following python script sends a query to the search engine and gets the results:\r\n\r\n.. code-block:: python\r\n\r\n    import logging\r\n    import requests\r\n    import json\r\n    from datetime import datetime\r\n    # url to send the query\r\n    image_ext = \"/resources/image/searchannotation/\"\r\n    # url to get the next page for a query, bookmark is needed\r\n    image_page_ext = \"/resources/image/searchannotation_page/\"\r\n    # search engine url\r\n    base_url = \"http://127.0.0.1:5577/api/v1/\"\r\n\r\n    import sys\r\n\r\n    logging.basicConfig(stream=sys.stdout, level=logging.INFO)\r\n\r\n\r\n    def query_the_search_ending(query, main_attributes):\r\n        query_data = {\"query\": {'query_details': query,\"main_attributes\":main_attributes}}\r\n        query_data_json = json.dumps(query_data)\r\n        resp = requests.post(url=\"%s%s\" % (base_url, image_ext), data=query_data_json)\r\n        res = resp.text\r\n        start = datetime.now()\r\n        try:\r\n            returned_results = json.loads(res)\r\n        except:\r\n            logging.info(res)\r\n            return\r\n\r\n        if not returned_results.get(\"results\"):\r\n            logging.info(returned_results)\r\n            return\r\n\r\n        elif len(returned_results[\"results\"]) == 0:\r\n            logging.info(\"Your query returns no results\")\r\n            return\r\n\r\n        logging.info(\"Query results:\")\r\n        total_results = returned_results[\"results\"][\"size\"]\r\n        logging.info(\"Total no of result records %s\" % total_results)\r\n        logging.info(\"Server query time: %s seconds\" % returned_results[\"server_query_time\"])\r\n        logging.info(\"Included results in the current page %s\" % len(returned_results[\"results\"][\"results\"]))\r\n\r\n        received_results_data = []\r\n        for res in returned_results[\"results\"][\"results\"]:\r\n            received_results_data.append(res)\r\n\r\n        received_results = len(returned_results[\"results\"][\"results\"])\r\n        #set the bookmark to be used in the next the page if the number of pages is greater than 1\r\n        bookmark = returned_results[\"results\"][\"bookmark\"]\r\n        #get the total number of pages\r\n        total_pages = returned_results[\"results\"][\"total_pages\"]\r\n        page = 1\r\n        logging.info(\"bookmark: %s, page: %s, received results: %s\" % (\r\n        bookmark, (str(page) + \"/\" + str(total_pages)), (str(received_results) + \"/\" + str(total_results))))\r\n        while received_results \u003c total_results:\r\n            page += 1\r\n            query_data = {\"query\": {'query_details': returned_results[\"query_details\"]}, \"bookmark\": bookmark}\r\n            query_data_json = json.dumps(query_data)\r\n            resp = requests.post(url=\"%s%s\" % (base_url, image_page_ext), data=query_data_json)\r\n            res = resp.text\r\n            try:\r\n                returned_results = json.loads(res)\r\n            except Exception as e:\r\n                logging.info(\"%s, Error: %s\"%(resp.text,e))\r\n                return\r\n            bookmark = returned_results[\"results\"][\"bookmark\"]\r\n            received_results = received_results + len(returned_results[\"results\"][\"results\"])\r\n            for res in returned_results[\"results\"][\"results\"]:\r\n                received_results_data.append(res)\r\n\r\n            logging.info(\"bookmark: %s, page: %s, received results: %s\" % (\r\n            bookmark, (str(page) + \"/\" + str(total_pages)), (str(received_results) + \"/\" + str(total_results))))\r\n\r\n        logging.info(\"Total received results: %s\" % len(received_results_data))\r\n        return received_results_data\r\n\r\n\r\n    query_1 = {\"and_filters\": [{\"name\": \"Organism\", \"value\": \"Homo sapiens\", \"operator\": \"equals\"},\r\n                               {\"name\": \"Antibody Identifier\", \"value\": \"CAB034889\", \"operator\": \"equals\"}],\r\n               \"or_filters\": [[{\"name\": \"Organism Part\", \"value\": \"Prostate\", \"operator\": \"equals\"},\r\n                              {\"name\": \"Organism Part Identifier\", \"value\": \"T-77100\", \"operator\": \"equals\"}]]}\r\n    query_2 = {\"and_filters\": [{\"name\": \"Organism\", \"value\": \"Mus musculus\", 'operator': 'equals'}]}\r\n    main_attributes=[]\r\n    logging.info(\"Sending the first query:\")\r\n    results_1 = query_the_search_ending(query_1,main_attributes)\r\n    logging.info(\"=========================\")\r\n    logging.info(\"Sending the second query:\")\r\n    results_2 = query_the_search_ending(query_2,main_attributes)\r\n    #The above returns 130834 within 23 projects\r\n    #[101, 301, 351, 352, 353, 405, 502, 504, 801, 851, 852, 853, 1151, 1158, 1159, 1201, 1202, 1451, 1605, 1606, 1701, 1902, 1903]\r\n    #It is possible to get the results in one project, e.g. 101 by using the main_attributes filter\r\n    main_attributes_2={ \"and_main_attributes\": [{\r\n        \"name\":\"project_id\",\"value\": 101, \"operator\":\"equals\"}]}\r\n    results_3=query_the_search_ending(query_2,main_attributes_2)\r\n    #It is possible to get the results and exculde one project, e.g. 101\r\n    main_attributes_3={\"and_main_attributes\":[{\"name\":\"project_id\",\"value\": 101, \"operator\":\"not_equals\"}]}\r\n    results_4=query_the_search_ending(query_2,main_attributes_3)\r\n\r\n* There is a `simple GUI \u003chttps://github.com/ome/omero_search_engine_client/tree/elastic_search\u003e`_ to build the query and send it to the search engine\r\n\r\n  * It is used to build the query\r\n  * It will display the results when they are ready\r\n* The app uses Elasticsearch\r\n\r\n  * The method ``create_index`` inside `manage.py \u003cmanage.py\u003e`_ creates a separate index for image, project, dataset, screen, plate, and well using two templates:\r\n\r\n    * Image template (image_template) for image index. It is derived from some OMERO tables into a single Elasticsearch index (image, annoation_mapvalue, imageannotationlink, project, dataset, well, plate, and screen to generate a single index.\r\n    * Non-image template (non_image_template) for other indices (project, dataset, well, plate, screen). It is derived from some OMERO tables depending on the resource; for example for the project it combines project, projectannotationlink and annotation_mapvalue.\r\n    * Both of the templates are in `elasticsearch_templates.py \u003comero_search_engine/cache_functions/elasticsearch/elasticsearch_templates.py\u003e`_\r\n    * The data can be moved using SQL queries which generate the CSV files; the queries are in `sql_to_csv.py \u003comero_search_engine/cache_functions/elasticsearch/sql_to_csv.py\u003e`_\r\n    * The method ``add_resource_data_to_es_index`` inside `manage.py \u003cmanage.py\u003e`_ reads the CSV files and inserts the data to the Elasticsearch index.\r\n* The data can be transferred directly from the OMERO database to the Elasticsearch using the ``get_index_data_from_database`` method inside `manage.py \u003cmanage.py\u003e`_:\r\n\r\n  * It creates the elasticsearch indices for each resource\r\n  * It queries the OMERO database after receiving the data, processes, and pushes it to the Elasticsearch indices.\r\n  * This process takes a relatively long time depending on the hosting machine specs. The user can adjust how many rows can be processed per call to the OMERO database:\r\n    * Set the number of rows using the ``set_cache_rows_number`` method inside `manage.py \u003cmanage.py\u003e`_, the following example will set the number to 1000::\r\n\r\n        \r\n        $ python manage.py set_cache_rows_number -s 10000\r\n* The system supports restoring a database from a backup using the ``restore_postgresql_database`` method inside `manage.py \u003cmanage.py\u003e`_.\r\n\r\n* The data can be also moved using SQL queries which generate the CSV files; the queries are in `sql_to_csv.py \u003comero_search_engine/cache_functions/elasticsearch/sql_to_csv.py\u003e`_\r\n\r\nFor the configuration and installation instructions, please read the following document `configuration_installation \u003cdocs/configuration/configuration_installation.rst\u003e`_\r\n\r\nDisclaimer\r\n----------\r\n\r\n* The SearchEngine currently is intended to be used with public data.\r\n* There is no authenticating or access permission in place yet.\r\n* All the data in the Elasticsearch indices is exposed publicly.\r\n\r\nLicense\r\n-------\r\n\r\nOMERO search engine is released under the GPL v2.\r\n\r\nCopyright\r\n---------\r\n\r\n2022, The Open Microscopy Environment, Glencoe Software, Inc.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fome%2Fomero_search_engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fome%2Fomero_search_engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fome%2Fomero_search_engine/lists"}