{"id":50109042,"url":"https://github.com/mgourlis/search_query_dsl","last_synced_at":"2026-05-23T12:05:57.917Z","repository":{"id":333896216,"uuid":"1138892609","full_name":"mgourlis/search_query_dsl","owner":"mgourlis","description":"Unified search API for Python — write JSON queries once, run them against SQLAlchemy or in-memory collections. Features streaming, nested boolean logic, automatic JOINs, JSONB, PostGIS, and full-text search.","archived":false,"fork":false,"pushed_at":"2026-01-29T21:34:28.000Z","size":69,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-30T10:27:27.702Z","etag":null,"topics":["dsl","fastapi","jsonb","postgis","query-builder","search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mgourlis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-21T08:51:21.000Z","updated_at":"2026-01-29T21:34:32.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mgourlis/search_query_dsl","commit_stats":null,"previous_names":["mgourlis/search_query_dsl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mgourlis/search_query_dsl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgourlis%2Fsearch_query_dsl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgourlis%2Fsearch_query_dsl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgourlis%2Fsearch_query_dsl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgourlis%2Fsearch_query_dsl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mgourlis","download_url":"https://codeload.github.com/mgourlis/search_query_dsl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgourlis%2Fsearch_query_dsl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33394702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T04:15:53.637Z","status":"ssl_error","status_checked_at":"2026-05-23T04:15:53.242Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dsl","fastapi","jsonb","postgis","query-builder","search"],"created_at":"2026-05-23T12:05:57.162Z","updated_at":"2026-05-23T12:05:57.905Z","avatar_url":"https://github.com/mgourlis.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Search Query DSL\n\nA **Domain-Specific Language (DSL)** for expressing complex database queries as JSON with support for:\n\n- ✅ **Unified API**: Single `search()` function for both Memory and SQLAlchemy backends.\n- ✅ **Streaming Support**: Memory-efficient `search_stream()` for large result sets.\n- ✅ **Nested Logic**: Complex boolean expressions (AND, OR, NOT).\n- ✅ **Relationship Traversal**: Automatic, robust JOINs with alias handling.\n- ✅ **Pagination \u0026 Ordering**: Full support for `limit`, `offset`, and multi-field `order_by`.\n- ✅ **Query Validation**: Backend-aware validation ensures only supported operators are used.\n- ✅ **JSONB \u0026 Geospatial**: Advanced field queries and PostGIS support.\n- ✅ **Full-Text Search**: PostgreSQL tsvector and simple token-based search.\n- ✅ **Async Hooks**: Custom traversal/join logic with async side-effect support.\n- ✅ **Smart Resolvers**: Implicit list traversal and fuzzy matching for error suggestions.\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Core Concepts](#core-concepts)\n- [Usage Examples](#usage-examples)\n  - [Basic Queries](#basic-queries)\n  - [Relationship Traversal](#relationship-traversal)\n  - [Complex Nested Logic](#complex-nested-logic)\n  - [Geospatial Queries](#geospatial-queries)\n  - [Full-Text Search](#full-text-search)\n  - [Pagination \u0026 Ordering](#pagination--ordering)\n  - [Streaming Results](#streaming-results)\n- [JSON Structure](#json-structure)\n- [Supported Operators](#supported-operators)\n- [Performance Tips](#performance-tips)\n- [Integration](#integration)\n- [License](#license)\n\n## Installation\n\n```bash\n# Core only\npip install search-query-dsl\n\n# With SQLAlchemy \u0026 PostGIS\npip install search-query-dsl[sqlalchemy,geoalchemy]\n\n# With FastAPI support\npip install search-query-dsl[fastapi]\n\n# Everything\npip install search-query-dsl[all]\n```\n\n## Quick Start\n\n### Unified Search API\n\nThe `search()` function automatically detects the backend based on your source type.\n\n```python\nfrom search_query_dsl.api import search\n\n# 1. Define Query (Dictionary or Object)\nquery = {\n    \"groups\": [{\n        \"group_operator\": \"and\",\n        \"conditions\": [\n            {\"field\": \"status\", \"operator\": \"=\", \"value\": \"active\"},\n            {\"field\": \"priority\", \"operator\": \"\u003e\", \"value\": 5}\n        ]\n    }],\n    \"limit\": 10,\n    \"order_by\": [\"-created_at\"]\n}\n\n# 2. Search In-Memory (Source is List/Iterable)\nitems = [{\"status\": \"active\", \"priority\": 10}, {\"status\": \"inactive\"}]\nresults = await search(query, items)\n\n# 3. Search SQLAlchemy (Source is AsyncSession)\nasync with session:\n    results = await search(query, session, model=User)\n```\n\n## Core Concepts\n\n### Query Builder Pattern\n\nUse the builder for a more Pythonic API:\n\n```python\nfrom search_query_dsl.core.builder import SearchQueryBuilder\n\nquery = (\n    SearchQueryBuilder()\n    .add_condition(\"status\", \"=\", \"active\")\n    .add_condition(\"priority\", \"\u003e=\", 5)\n    .order_by(\"-created_at\")\n    .limit(20)\n    .build()\n)\n```\n\n### Backend Auto-Detection\n\nThe library automatically chooses the right backend:\n\n- **MemoryBackend**: For lists, iterables, or single objects\n- **SQLAlchemyBackend**: For AsyncSession instances\n\n### Query Validation\n\nQueries are validated before execution:\n\n```python\nfrom search_query_dsl.core.validator import validate_search_query\n\n# Validation checks for:\n# - Valid operator names\n# - Required values for operators\n# - Valid limit/offset values\n# - Non-empty condition groups\nvalidate_search_query(query, operators={\"=\", \"\u003e\", \"in\"})\n```\n\n## Usage Examples\n\n### Basic Queries\n\n```python\n# Simple equality\nquery = {\n    \"groups\": [{\n        \"conditions\": [{\"field\": \"name\", \"operator\": \"=\", \"value\": \"Alice\"}]\n    }]\n}\n\n# Range query\nquery = {\n    \"groups\": [{\n        \"conditions\": [\n            {\"field\": \"age\", \"operator\": \"\u003e=\", \"value\": 18},\n            {\"field\": \"age\", \"operator\": \"\u003c=\", \"value\": 65}\n        ]\n    }]\n}\n\n# IN operator\nquery = {\n    \"groups\": [{\n        \"conditions\": [\n            {\"field\": \"status\", \"operator\": \"in\", \"value\": [\"active\", \"pending\"]}\n        ]\n    }]\n}\n```\n\n### Relationship Traversal\n\nAutomatic JOINs are handled for you in SQLAlchemy:\n\n```python\n# Query: \"Find users whose profile city is 'New York'\"\nquery = {\n    \"groups\": [{\n        \"conditions\": [\n            {\"field\": \"profile.address.city\", \"operator\": \"=\", \"value\": \"New York\"}\n        ]\n    }]\n}\nresults = await search(query, session, model=User)\n```\n\n**Features:**\n- Detects self-referential relationships (e.g. `parent.name`)\n- Reuses aliases if a table is already joined\n- Validates leaf nodes are valid SQL columns\n\n### Complex Nested Logic\n\nBuild complex OR/AND/NOT combinations:\n\n```python\n# (status = 'active' AND priority \u003e 5) OR (urgent = true)\nquery = {\n    \"groups\": [{\n        \"group_operator\": \"or\",\n        \"conditions\": [\n            {\n                \"group_operator\": \"and\",\n                \"conditions\": [\n                    {\"field\": \"status\", \"operator\": \"=\", \"value\": \"active\"},\n                    {\"field\": \"priority\", \"operator\": \"\u003e\", \"value\": 5}\n                ]\n            },\n            {\"field\": \"urgent\", \"operator\": \"=\", \"value\": True}\n        ]\n    }]\n}\n```\n\n### Geospatial Queries\n\n```python\n# Find points within a polygon\nquery = {\n    \"groups\": [{\n        \"conditions\": [{\n            \"field\": \"location\",\n            \"operator\": \"within\",\n            \"value\": {\n                \"type\": \"Polygon\",\n                \"coordinates\": [[\n                    [-122.4, 37.8],\n                    [-122.4, 37.7],\n                    [-122.3, 37.7],\n                    [-122.3, 37.8],\n                    [-122.4, 37.8]\n                ]]\n            }\n        }]\n    }]\n}\n\n# Fast bounding box query (uses spatial index)\nquery = {\n    \"groups\": [{\n        \"conditions\": [{\n            \"field\": \"location\",\n            \"operator\": \"bbox_intersects\",\n            \"value\": [-122.5, 37.7, -122.3, 37.9]  # [minX, minY, maxX, maxY]\n        }]\n    }]\n}\n\n# Distance query\nquery = {\n    \"groups\": [{\n        \"conditions\": [{\n            \"field\": \"location\",\n            \"operator\": \"dwithin\",\n            \"value\": [\n                {\"type\": \"Point\", \"coordinates\": [-122.4, 37.8]},\n                1000  # meters\n            ]\n        }]\n    }]\n}\n```\n\n### Full-Text Search\n\n```python\n# PostgreSQL full-text search (SQLAlchemy backend)\nquery = {\n    \"groups\": [{\n        \"conditions\": [{\n            \"field\": \"description\",\n            \"operator\": \"fts\",\n            \"value\": \"python database\"\n        }]\n    }]\n}\n\n# Phrase search\nquery = {\n    \"groups\": [{\n        \"conditions\": [{\n            \"field\": \"content\",\n            \"operator\": \"fts_phrase\",\n            \"value\": \"machine learning\"\n        }]\n    }]\n}\n```\n\n### Pagination \u0026 Ordering\n\n```python\nfrom search_query_dsl.core.builder import SearchQueryBuilder\n\nquery = (\n    SearchQueryBuilder()\n    .add_condition(\"status\", \"=\", \"active\")\n    .order_by(\"-created_at\", \"name\")  # DESC created, ASC name\n    .limit(20)\n    .offset(40)\n    .build()\n)\n```\n\n**Note**: Prefix field names with `-` for descending order.\n\n### Streaming Results\n\nFor large result sets, use `search_stream()` to process results one at a time without loading everything into memory:\n\n```python\nfrom search_query_dsl.api import search_stream\n\n# Stream from SQLAlchemy with batching (recommended)\nasync with async_session() as session:\n    async for user in search_stream(query, session, model=User, batch_size=100):\n        await process(user)  # Process one at a time\n\n# Stream from in-memory collection\nitems = [{\"status\": \"active\", \"priority\": 10}, {\"status\": \"inactive\"}]\nasync for item in search_stream(query, items):\n    await process(item)\n```\n\n#### Batch Size\n\nThe `batch_size` parameter controls how many rows are fetched from the database per round trip:\n\n| `batch_size` | Behavior | Use Case |\n|--------------|----------|----------|\n| `None` (default) | Row-by-row fetching | Minimal memory, many round trips |\n| `100-1000` | Batched fetching | **Recommended** - balanced performance |\n| Large value | More memory per batch | High-throughput scenarios |\n\n```python\n# Fetch 500 rows at a time, yield one at a time\nasync for user in search_stream(query, session, User, batch_size=500):\n    process(user)\n```\n\n**Benefits:**\n- **Memory Efficient**: Doesn't load all results into memory at once.\n- **Server-Side Streaming**: SQLAlchemy backend uses `stream_scalars()` for true database-level streaming.\n- **Configurable Batching**: Tune `batch_size` to balance memory usage vs network round trips.\n- **Same Query Format**: Uses the exact same query structure as `search()`.\n\n### In-Memory List Traversal\n\nThe memory backend supports implicit traversal for lists:\n\n```python\ndata = {\n    \"users\": [\n        {\"name\": \"Alice\", \"role\": \"admin\"},\n        {\"name\": \"Bob\", \"role\": \"user\"}\n    ]\n}\n\n# Query: \"users.name\"\n# Matches if ANY user in the list has name \"Alice\"\nquery = {\n    \"groups\": [{\n        \"conditions\": [\n            {\"field\": \"users.name\", \"operator\": \"=\", \"value\": \"Alice\"}\n        ]\n    }]\n}\n```\n\n### Custom Logic with Async Hooks\n\nCustomize traversal for dynamic tables or polymorphic relationships:\n\n```python\nfrom search_query_dsl.backends.sqlalchemy import SQLAlchemyResolutionContext, HookResult\n\nasync def my_custom_hook(ctx: SQLAlchemyResolutionContext):\n    if ctx.current_attr == \"dynamic_field\":\n         # Perform async lookups (e.g. Redis/Cache)\n         cached_info = await get_schema_info()\n         \n         # Return resolution result\n         return HookResult(...)\n\n# Pass hooks to search function\nresults = await search(query, session, model=MyModel, hooks=[my_custom_hook])\n```\n\n## JSON Structure\n\nA `SearchQuery` is composed of nested `groups` of `conditions`.\n\n```json\n{\n  \"groups\": [\n    {\n      \"group_operator\": \"or\",\n      \"conditions\": [\n        {\n          \"field\": \"created_at\",\n          \"operator\": \"\u003e\",\n          \"value\": \"2024-01-01\"\n        },\n        {\n          \"group_operator\": \"and\",\n          \"conditions\": [\n             {\"field\": \"status\", \"operator\": \"=\", \"value\": \"pending\"},\n             {\"field\": \"urgent\", \"operator\": \"=\", \"value\": true}\n          ]\n        }\n      ]\n    }\n  ],\n  \"limit\": 10,\n  \"offset\": 0,\n  \"order_by\": [\"-created_at\"]\n}\n```\n\n## Supported Operators\n\n| Type | Operators |\n|------|-----------|\n| **Comparison** | `=`, `!=`, `\u003e`, `\u003c`, `\u003e=`, `\u003c=` |\n| **Set** | `in`, `not_in`, `all`, `between`, `not_between` |\n| **String** | `like`, `not_like`, `ilike`, `contains`, `icontains`, `startswith`, `istartswith`, `endswith`, `iendswith`, `regex`, `iregex` |\n| **Null/Empty** | `is_null`, `is_not_null`, `is_empty`, `is_not_empty` |\n| **JSONB** | `jsonb_contains`, `jsonb_contained_by`, `jsonb_has_key`, `jsonb_has_any_keys`, `jsonb_has_all_keys`, `jsonb_path_exists` |\n| **Geometry** | `intersects`, `within`, `contains_geom`, `touches`, `crosses`, `overlaps`, `disjoint`, `geom_equals`, `distance_lt`, `dwithin`, `bbox_intersects` |\n| **Full-Text Search** | `fts`, `fts_phrase` |\n\n## Performance Tips\n\n### SQLAlchemy Backend\n\n1. **Use Spatial Indexes**: For geometry queries, ensure your geometry columns have spatial indexes:\n   ```sql\n   CREATE INDEX idx_location ON places USING GIST(location);\n   ```\n\n2. **Bounding Box First**: Use `bbox_intersects` before expensive operations like `within`:\n   ```python\n   # Fast spatial index query\n   {\"field\": \"location\", \"operator\": \"bbox_intersects\", \"value\": [minX, minY, maxX, maxY]}\n   ```\n\n3. **FTS Indexes**: For full-text search, create tsvector columns with indexes:\n   ```sql\n   ALTER TABLE documents ADD COLUMN search_vector tsvector;\n   CREATE INDEX idx_search ON documents USING GIN(search_vector);\n   ```\n\n4. **Limit Early**: Apply `limit` and `offset` to reduce result set size.\n\n5. **Index Foreign Keys**: Ensure relationship fields are indexed for efficient JOINs.\n\n### Memory Backend\n\n1. **Pre-filter**: Reduce dataset size before passing to `search()`.\n\n2. **Simple Operators**: Use simpler operators (`=`, `in`) instead of complex ones (`regex`, `fts`) when possible.\n\n3. **Avoid Deep Nesting**: Minimize nested groups for better performance.\n\n## Integration\n\n### FastAPI\n\nSimplify endpoint integration with the provided helper:\n\n```python\nfrom fastapi import Body\nfrom search_query_dsl.contrib.fastapi import SearchQuerySchema\nfrom search_query_dsl import search, SearchQuery\n\n@app.post(\"/search\")\nasync def search_items(query: SearchQuerySchema = Body(...)):\n    # Convert Pydantic model to SearchQuery\n    search_query = SearchQuery.from_dict(query.model_dump())\n    return await search(search_query, session, model=Item)\n```\n\n#### Streaming with FastAPI\n\nUse `StreamingResponse` for memory-efficient large result sets:\n\n```python\nfrom fastapi import Body\nfrom fastapi.responses import StreamingResponse\nfrom search_query_dsl.contrib.fastapi import SearchQuerySchema\nfrom search_query_dsl import search_stream, SearchQuery\nimport json\n\n@app.post(\"/search/stream\")\nasync def stream_search(query: SearchQuerySchema = Body(...)):\n    search_query = SearchQuery.from_dict(query.model_dump())\n    \n    async def generate():\n        async with async_session() as session:\n            async for item in search_stream(search_query, session, model=Item):\n                yield json.dumps(item.to_dict()) + \"\\n\"\n    \n    return StreamingResponse(generate(), media_type=\"application/x-ndjson\")\n```\n\n### Django\n\nUse the DRF integration for automatic serialization and validation:\n\n```python\nfrom rest_framework import viewsets\nfrom search_query_dsl.contrib.django import SearchQueryMixin, SearchQuerySerializer\nfrom search_query_dsl import search, SearchQuery\n\nclass ItemViewSet(SearchQueryMixin, viewsets.ModelViewSet):\n    search_model = Item  # Your SQLAlchemy model\n    \n    async def list(self, request):\n        # Automatically parses and validates from request.data\n        query = self.get_search_query(request)\n        \n        # Execute search\n        async with async_session() as session:\n            results = await self.execute_search(query, session=session)\n            return Response({\"results\": results})\n\n# Or use the serializer directly\nclass ManualSearchView(APIView):\n    async def post(self, request):\n        serializer = SearchQuerySerializer(data=request.data)\n        serializer.is_valid(raise_exception=True)\n        \n        query = SearchQuery.from_dict(serializer.validated_data)\n        async with async_session() as session:\n            results = await search(query, session, model=Item)\n            return Response({\"results\": results})\n```\n\n#### Streaming with Django\n\nUse `StreamingHttpResponse` for large result sets:\n\n```python\nfrom django.http import StreamingHttpResponse\nfrom search_query_dsl import search_stream, SearchQuery\nimport json\n\nclass StreamSearchView(APIView):\n    async def post(self, request):\n        serializer = SearchQuerySerializer(data=request.data)\n        serializer.is_valid(raise_exception=True)\n        \n        query = SearchQuery.from_dict(serializer.validated_data)\n        \n        async def generate():\n            async with async_session() as session:\n                async for item in search_stream(query, session, model=Item):\n                    yield json.dumps(item.to_dict()) + \"\\n\"\n        \n        return StreamingHttpResponse(\n            generate(),\n            content_type=\"application/x-ndjson\"\n        )\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgourlis%2Fsearch_query_dsl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmgourlis%2Fsearch_query_dsl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgourlis%2Fsearch_query_dsl/lists"}