{"id":48620847,"url":"https://github.com/catenarytransit/cypress","last_synced_at":"2026-04-09T03:35:50.957Z","repository":{"id":329509577,"uuid":"1118698738","full_name":"catenarytransit/cypress","owner":"catenarytransit","description":"Geocoder, Autocomplete, and Search for OSM data in ElasticSearch","archived":false,"fork":false,"pushed_at":"2026-03-09T04:56:22.000Z","size":424,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-09T09:37:59.393Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/catenarytransit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"open_collective":"catenarymaps"}},"created_at":"2025-12-18T06:34:48.000Z","updated_at":"2026-03-09T04:56:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/catenarytransit/cypress","commit_stats":null,"previous_names":["catenarytransit/cypress"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/catenarytransit/cypress","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catenarytransit%2Fcypress","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catenarytransit%2Fcypress/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catenarytransit%2Fcypress/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catenarytransit%2Fcypress/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/catenarytransit","download_url":"https://codeload.github.com/catenarytransit/cypress/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catenarytransit%2Fcypress/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31584808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"online","status_checked_at":"2026-04-09T02:00:06.848Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-09T03:35:50.561Z","updated_at":"2026-04-09T03:35:50.949Z","avatar_url":"https://github.com/catenarytransit.png","language":"Rust","funding_links":["https://opencollective.com/catenarymaps"],"categories":[],"sub_categories":[],"readme":"# Cypress\n\nA Rust-based geocoding system with Elasticsearch, inspired by [Pelias](https://pelias.io/) and [Nominatim](https://nominatim.org/).\n\n![1200x680](https://github.com/user-attachments/assets/496f0dba-7e6d-4b50-90cc-744f21909ece)\n\n\n## Features\n\n- **OSM PBF Ingestion** - Parses OpenStreetMap data with multilingual name support\n- **Road Way Merging** - Automatically merges adjacent road segments with the same name to reduce disk space usage\n- **Point-in-Polygon Admin Lookup** - Assigns administrative hierarchy to each place using R-tree spatial indexing\n- **Elasticsearch Backend** - Full-text search with edge n-gram autocomplete\n- **Wikidata Integration** - Enriches place names with multilingual labels from Wikidata\n- **Location \u0026 Bounding Box Bias** - Boost results near user's location or viewport\n- **Data Refresh** - Re-import files with automatic stale document cleanup\n\n## Requirements\n\n- Rust 1.70+\n- Elasticsearch 8.x\n- ScyllaDB 5.x+\n- 8GB+ RAM for Switzerland import\n\n## Quick Start\n\n### 1. Prerequisites\n\nEnsure you have the following services installed and running:\n\n- **Elasticsearch 8.x** (Default: http://localhost:9200)\n- **ScyllaDB 5.x+** (Default: 127.0.0.1:9042)\n\n### 2. Build\n\n```bash\ncargo build --release\n```\n\n### 3. Import Data\n\n```bash\n# Configure regions.toml and run:\ncargo run --release --bin ingest -- batch --config regions.toml\n\n# Or run directly (Scylla defaults to 127.0.0.1):\ncargo run --release --bin ingest -- single \\\n  --file switzerland-latest.osm.pbf \\\n  --create-index \\\n  --refresh \\\n  --wikidata \\\n  --scylla-url 127.0.0.1\n\n# With custom Scylla and Elasticsearch URLs:\ncargo run --release --bin ingest -- single \\\n  --file switzerland-latest.osm.pbf \\\n  --es-url http://elasticsearch:9200 \\\n  --scylla-url 10.0.0.5 \\\n  --create-index \\\n  --refresh\n```\n\n### 4. Start Query Server\n\n```bash\ncargo run --release --bin query -- --listen 0.0.0.0:3000\n```\n\n## Data Management\n\n### Road Way Merging\n\nBy default, Cypress automatically merges adjacent road segments (ways) that share the same name and highway type. This provides significant benefits:\n\n**Benefits:**\n- **Reduced disk space**: Fewer documents stored in Elasticsearch\n- **Faster search**: Less data to scan during queries\n- **Cleaner results**: One result per street instead of dozens of segments\n- **Lower costs**: Reduced storage and compute requirements\n\n**How it works:**\n1. During ingestion, all road ways with names are collected\n2. Ways are grouped by name and highway type (e.g., \"Main Street|residential\")\n3. Adjacent ways (sharing endpoint nodes) are merged into single road segments\n4. The merged road is indexed with its full geometry and bounding box\n5. A category tag indicates how many ways were merged (e.g., `merged_ways:5`)\n\n**Which roads are merged:**\n- Residential streets, primary/secondary/tertiary roads\n- Service roads, living streets, pedestrian ways\n- Tracks, footways, cycleways, and paths\n- **NOT merged**: Motorways, motorway links, and other link roads\n\nYou can disable this feature with `--merge-roads false`, but this is not recommended for production use.\n\n### Wiping a Region\n\nIf you need to remove data for a specific region (e.g., to re-import it or free up space), you can use the `wipe_region.sh` script:\n\n```bash\n# Wipe data for Albania\n./scripts/wipe_region.sh Albania\n\n# Wipe data using a custom Elasticsearch URL\n./scripts/wipe_region.sh Germany --url http://10.0.0.5:9200\n```\n\nThe script identifies the correct records using the `source_file` field based on the regions defined in `scripts/import_global.sh`.\n\n### index management\n\nDeleting places index\n```bash\ncurl -X DELETE \"http://localhost:9200/places\"\n```\n\nDeleting versions index\n```bash\ncurl -X DELETE \"http://localhost:9200/cypress_versions\"\n```\n\nor use the wipe versions script:\n```bash\ncargo run --bin ingest -- reset-versions\n```\n\n## API Endpoints\n\n### Forward Geocoding\n\n```bash\n# Basic search\ncurl \"http://localhost:3000/v1/search?text=Zurich\"\n\n# With language preference\ncurl \"http://localhost:3000/v1/search?text=Genève\u0026lang=fr\"\n\n# With bounding box filter\ncurl \"http://localhost:3000/v1/search?text=bahnhof\u0026bbox=8.5,47.3,8.6,47.4\"\n\n# With location bias\ncurl \"http://localhost:3000/v1/search?text=restaurant\u0026focus.point.lat=47.37\u0026focus.point.lon=8.54\"\n```\n\n### Reverse Geocoding\n\n```bash\ncurl \"http://localhost:3000/v1/reverse?point.lat=47.37\u0026point.lon=8.54\"\n```\n\n### Autocomplete\n\n```bash\ncurl \"http://localhost:3000/v1/autocomplete?text=zur\"\n```\n\n## Response Format\n\nResults are returned in GeoJSON-like format with all available language variants:\n\n```json\n{\n  \"features\": [\n    {\n      \"type\": \"Feature\",\n      \"geometry\": {\n        \"type\": \"Point\",\n        \"coordinates\": [8.54, 47.37]\n      },\n      \"properties\": {\n        \"id\": \"node/123456\",\n        \"layer\": \"locality\",\n        \"name\": \"Zürich\",\n        \"names\": {\n          \"default\": \"Zürich\",\n          \"de\": \"Zürich\",\n          \"fr\": \"Zurich\",\n          \"it\": \"Zurigo\",\n          \"en\": \"Zurich\"\n        },\n        \"country\": \"Switzerland\",\n        \"region\": \"Zürich\",\n        \"confidence\": 42.5\n      }\n    }\n  ]\n}\n```\n\n## Architecture\n\n```\ncypress/\n├── src/\n│   ├── lib.rs              # Shared library\n│   ├── models/             # Place, AdminHierarchy, etc.\n│   ├── elasticsearch/      # ES client, schema, bulk indexer\n│   ├── pip/                # Point-in-Polygon admin lookup\n│   ├── wikidata/           # SPARQL label fetcher\n│   ├── ingest/             # OSM PBF import binary\n│   └── query/              # HTTP query server\n└── schema/\n    └── places_mapping.json # Elasticsearch index mapping\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatenarytransit%2Fcypress","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcatenarytransit%2Fcypress","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatenarytransit%2Fcypress/lists"}