{"id":21396171,"url":"https://github.com/machinelearningzh/ogd_ai-search","last_synced_at":"2026-02-03T06:34:13.922Z","repository":{"id":249530417,"uuid":"831329159","full_name":"machinelearningZH/ogd_ai-search","owner":"machinelearningZH","description":"Semantic, lexical, multilingual search in your OGD metadata catalog.","archived":false,"fork":false,"pushed_at":"2025-01-03T14:03:27.000Z","size":6491,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-07T17:06:19.283Z","etag":null,"topics":["ai","hybrid-search","machine-learning","ogd","openai","opendata","python","semantic-search","semanticsearch","streamlit","weaviate"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/machinelearningZH.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-20T08:33:19.000Z","updated_at":"2025-01-03T14:03:30.000Z","dependencies_parsed_at":"2024-11-22T21:16:45.147Z","dependency_job_id":null,"html_url":"https://github.com/machinelearningZH/ogd_ai-search","commit_stats":null,"previous_names":["machinelearningzh/ogd_ai-search"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/machinelearningZH/ogd_ai-search","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinelearningZH%2Fogd_ai-search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinelearningZH%2Fogd_ai-search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinelearningZH%2Fogd_ai-search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinelearningZH%2Fogd_ai-search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/machinelearningZH","download_url":"https://codeload.github.com/machinelearningZH/ogd_ai-search/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/machinelearningZH%2Fogd_ai-search/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260590310,"owners_count":23033035,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","hybrid-search","machine-learning","ogd","openai","opendata","python","semantic-search","semanticsearch","streamlit","weaviate"],"created_at":"2024-11-22T14:25:22.400Z","updated_at":"2026-02-03T06:34:13.917Z","avatar_url":"https://github.com/machinelearningZH.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🦄 OGD AI Search\n\n**Semantic, lexical, and multilingual search for your OGD metadata catalog.**\n\n![GitHub License](https://img.shields.io/github/license/machinelearningzh/ogd_ai-search)\n[![PyPI - Python](https://img.shields.io/badge/python-v3.10+-blue.svg)](https://github.com/machinelearningZH/ogd_ai-search)\n[![GitHub Stars](https://img.shields.io/github/stars/machinelearningZH/ogd_ai-search.svg)](https://github.com/machinelearningZH/ogd_ai-search/stargazers)\n[![GitHub Issues](https://img.shields.io/github/issues/machinelearningZH/ogd_ai-search.svg)](https://github.com/machinelearningZH/ogd_ai-search/issues)\n[![GitHub Issues](https://img.shields.io/github/issues-pr/machinelearningZH/ogd_ai-search.svg)](https://img.shields.io/github/issues-pr/machinelearningZH/ogd_ai-search)\n[![Current Version](https://img.shields.io/badge/version-0.2-green.svg)](https://github.com/machinelearningZH/ogd_ai-search)\n\u003ca href=\"https://github.com/astral-sh/ruff\"\u003e\u003cimg alt=\"linting - Ruff\" class=\"off-glb\" loading=\"lazy\" src=\"https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json\"\u003e\u003c/a\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eContents\u003c/summary\u003e\n\n- [Usage](#usage)\n- [Overview](#overview)\n- [What is semantic search?](#what-is-semantic-search)\n- [Project Team](#project-team)\n- [Feedback and Contributing](#feedback-and-contributing)\n- [Disclaimer](#disclaimer)\n\n\u003c/details\u003e\n\n![](_imgs/app_ui.png)\n\n## Usage\n\n```bash\n# Clone the repository\ngit clone https://github.com/statistikZH/ogd_ai-search.git\ncd ogd_ai-search\n\n# Install dependencies\npip3 install uv\nuv venv\nsource .venv/bin/activate\nuv sync\n\n# Create search index\n# Run 01_mdv_search.ipynb to create the Weaviate search index\n\n# Start the app\ncd _streamlit\nstreamlit run ai-search.py\n```\n\n## Overview\n\nSearch the [Canton of Zurich's open government data catalog](https://www.zh.ch/en/politics-state/statistics-data/data-catalog.html#/) using hybrid search that combines **lexical keyword matching** with **semantic similarity**. The application supports **multiple languages**, including German and all European languages.\n\nThe search uses [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) for embeddings via sentence-transformers—a multilingual model optimized for German with a 512-token context length. Search results are powered by [Weaviate](https://weaviate.io/), an open-source vector database.\n\n## What is semantic search?\n\nSemantic search finds text based on meaning rather than exact keywords. For example, searching for _disease_ can return documents containing _illness_, _virus_, _infection_, _treatment_, or _healthcare_ without the exact word _disease_ appearing.\n\nUsing statistical methods and Machine Learning, language models learn word and sentence similarities from large text corpora. While semantic search has many advantages, it is approximate rather than exact and **may include false positives or miss relevant entries**.\n\n**Hybrid search combines lexical and semantic approaches**, delivering both exact keyword matches and semantically similar results.\n\n## Project Team\n\n**Laure Stadler**, **Chantal Amrhein**, **Patrick Arnecke** – [Statistisches Amt Zürich: Team Data](https://www.zh.ch/de/direktion-der-justiz-und-des-innern/statistisches-amt/data.html)\n\nMany thanks to **Corinna Grobe** and our former colleague **Adrian Rupp**.\n\n## Feedback and Contributing\n\nWe'd love to hear from you. Share your feedback or ideas by [emailing us](mailto:datashop@statistik.zh.ch), opening an issue, or submitting a pull request.\n\nWe use [Ruff](https://docs.astral.sh/ruff/) for linting and code formatting with default settings.\n\n## Disclaimer\n\nThis software (the Software) incorporates models (Models) from Hugging Face and others and has been developed according to and with the intent to be used under Swiss law. Please be aware that the EU Artificial Intelligence Act (EU AI Act) may, under certain circumstances, be applicable to your use of the Software. You are solely responsible for ensuring that your use of the Software as well as of the underlying Models complies with all applicable local, national and international laws and regulations. By using this Software, you acknowledge and agree (a) that it is your responsibility to assess which laws and regulations, in particular regarding the use of AI technologies, are applicable to your intended use and to comply therewith, and (b) that you will hold us harmless from any action, claims, liability or loss in respect of your use of the Software.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachinelearningzh%2Fogd_ai-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmachinelearningzh%2Fogd_ai-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachinelearningzh%2Fogd_ai-search/lists"}