{"id":22419473,"url":"https://github.com/dunkelstern/osmgeocoder","last_synced_at":"2025-08-01T04:31:35.303Z","repository":{"id":49037748,"uuid":"134432093","full_name":"dunkelstern/osmgeocoder","owner":"dunkelstern","description":"OpenStreetMap / OpenAddresses.io geocoder written in python","archived":false,"fork":false,"pushed_at":"2022-07-15T12:02:30.000Z","size":158,"stargazers_count":16,"open_issues_count":3,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-08T18:49:40.665Z","etag":null,"topics":["geocoder","imposm3","libpostal","openaddresses","openstreetmap","python","trigram-search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dunkelstern.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-05-22T15:00:05.000Z","updated_at":"2024-07-12T22:29:01.000Z","dependencies_parsed_at":"2022-09-08T15:11:23.713Z","dependency_job_id":null,"html_url":"https://github.com/dunkelstern/osmgeocoder","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/dunkelstern/osmgeocoder","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dunkelstern%2Fosmgeocoder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dunkelstern%2Fosmgeocoder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dunkelstern%2Fosmgeocoder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dunkelstern%2Fosmgeocoder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dunkelstern","download_url":"https://codeload.github.com/dunkelstern/osmgeocoder/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dunkelstern%2Fosmgeocoder/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268099752,"owners_count":24196100,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-31T02:00:08.723Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["geocoder","imposm3","libpostal","openaddresses","openstreetmap","python","trigram-search"],"created_at":"2024-12-05T16:15:51.968Z","updated_at":"2025-08-01T04:31:35.016Z","avatar_url":"https://github.com/dunkelstern.png","language":"Python","readme":"# OSMGeocoder\n\nPython implementation for a OSM / Openaddresses.io Geocoder.\n\nThis geocoder is implemented in PostgreSQL DB functions as much as possible, there is a simple API and an example flask app included.\n\nYou will need PostgreSQL 9.5+ (or 11.0+ for OpenAddresses.io support) with PostGIS installed as well as some disk space and data-files from OpenStreetMap and (optionally) OpenAddresses.io.\n\nData import will be done via [Omniscale's imposm3](https://github.com/omniscale/imposm3) and a supplied python script to import the openaddresses.io data.\n\nOptionally you can use the [libpostal machine learning address classifier](https://github.com/openvenues/libpostal) to parse addresses supplied as input to the forward geocoder.\n\nFor formatting the addresses from the reverse geocoder the `worldwide.yml` from [OpenCageData address-formatting repository](https://github.com/OpenCageData/address-formatting) is used to format the address according to customs in the country that is been encoded.\n\nSee `README.md` in the [repository](https://github.com/dunkelstern/osmgeocoder) for more information.\n\n## Changelog\n\n### v1.0\n\n- Initial release, reverse geocoding works, forward geocoding is slow\n\n### v2.0\n\n**Warning:** DB Format changed, you'll have to re-import data\n\n- Fixed forward geocoding speed\n- Fixed import scripts to be more resilient\n- Made Openaddresses.io completely optional\n- Restored compatability with older 3.x python versions\n- Restored compatability with older PostgreSQL DB versions (9.5+ if you do no use openaddresses.io)\n- Switched to `pipenv`\n\n### v2.0.1\n\n- Fix missing import for structured forward geocoding\n- Fix Copy and Paste error in forward geocoding SQL\n\nIf you're coming from `2.0.0`, re-run the finalize step to update the SQL functions:\n\n```bash\n$ pipenv run bin/finalize_geocoder.py --db postgresql://geocoder:password@localhost/osmgeocoder\n```\n\n### v2.1.0\n\n- Add type hints to all interfaces\n- Add `_dict` variants for geocoding functions to get _raw_ data instead of formatted strings\n- Bugfix: Reading of custom opencage data file for address formatting was broken\n- Returned addresses now contain county and state if available\n\n## TODO\n\n- Return Attribution in API and in webservices\n\n## \"Quick\" and dirty how-to\n\n**Statistics uutdated, will be updated shortly**\n\nJust for your information, this process takes a lot of time for a big import. Example figures on a machine with a Core i7-7700K on 4.2 GHz with a Samsung (SATA-)SSD and 32GB of RAM (and some tuned buffer sizes for Postgres):\n\n- Import of the Europe-Region of OpenStreetMap:\n    - Import time: 3 hours\n    - OSM Data file: 20 GB\n    - Temporary space needed: 35 GB\n    - Final size in DB: 58.7 GB\n    - Summary of space requirement: 115 GB\n- Import of the two Openaddresses.io files for Europe:\n    - Import time: 1 hour\n    - Data files: 4 GB\n    - Temporary space needed: 2 GB\n    - Final size in DB: 18 GB\n    - Summary of space requirement: 24 GB\n- Conversion of the OpenStreetMap data into geocoding format:\n    - Conversion time: 5 hours\n    - Final size in DB: 10.5GB\n\nSo in summary you'll need 9 hours of time and 150 GB of disk space.\nAfter cleanup you'll need 28.5 GB of disk space for the Europe data set. A compressed DB export of the converted data sums up to 2.8 GB of RAW data and will explode on import to the said 28 GB.\n\n1. Create a PostgreSQL Database (we use the name `osmgeocoder` for the DB name and `geocoder` for the DB user in the example)\n2. Create the PostGIS, trigram and fuzzy string search extension for the DB:\n```sql\nCREATE SCHEMA gis;                              -- isolate postgis into its own schema for easier development\nALTER SCHEMA gis OWNER TO geocoder;\nCREATE EXTENSION postgis WITH SCHEMA gis;       -- put postgis into gis schema\n\nCREATE SCHEMA str;                              -- isolate string functions into its own schema for easier development\nALTER SCHEMA str OWNER TO geocoder;\nCREATE EXTENSION pg_trgm WITH SCHEMA str;       -- trigram search, used for forward geocoding\nCREATE EXTENSION fuzzystrmatch WITH SCHEMA str; -- metaphone search, used for text prediction\n\nCREATE SCHEMA crypto;                           -- isolate crypto functions into its own schema for easier development\nALTER SCHEMA crypto OWNER TO geocoder;\nCREATE EXTENSION pgcrypto WITH SCHEMA crypto;   -- used to generate uuids\n\nALTER DATABASE geocoder SET search_path TO public, gis, str, crypto; -- set search path to include the other schemas\n```\n3. Fetch a copy of [imposm3](https://github.com/omniscale/imposm3)\n4. Get a OpenStreetMap data file (for example from [Geofabrik](http://download.geofabrik.de/), start with a small region!)\n5. Create a virtualenv and install packages:\n```bash\npipenv sync\n```\n6. See below for importing openaddresses.io data if needed (this is completely optional)\n7. Import some OpenStreetMap data into the DB (grab a coffee or two):\n```bash\n$ bin/prepare_osm.py --db postgresql://geocoder:password@localhost/osmgeocoder --import-data osm.pbf --optimize\n```\n8. Modify configuration file to match your setup. The example config is in `osmgeocoder/data/config-example.json`.\n9. Optionally install and start the postal machine learning address categorizer (see below)\n10. Import the geocoding functions into the DB:\n```bash\n$ bin/finalize_geocoder.py --db postgresql://geocoder:password@localhost/osmgeocoder\n```\n11. Geocode:\n```bash\nbin/address2coordinate.py --config config.json --center 48.3849 10.8631 Lauterl\nbin/coordinate2address.py --config config.json 48.3849 10.8631\n```\n\nFor a full example see the ``example_setup.sh`` shell script.\n\n**NOTE:** you can also install this via pip:\n- the scripts from the `bin` directory will be copied to your environment.\n- An example config file will be placed in your virtualenv in `osmgeocoder/data/config-example.json`\n- The PIP installation will not install `flask` and `gunicorn` nor will it try to install `postal`,\n  if you want to use those services you need to install those optional dependencies yourself (read on!)\n\n\n## Optional import of openaddresses.io data\n\nFor some countries there are not enough buildings tagged in the OSM data so we can use the [OpenAddresses.io](http://results.openaddresses.io) data to augment the OSM data.\n\nThe import is relatively slow as the data is contained in a big bunch of zipped CSV files, we try to use more threads to import the data faster but it could take a while...\n\n### Importing openaddresses.io data\n\n```bash\nwget https://s3.amazonaws.com/data.openaddresses.io/openaddr-collected-europe.zip # download openaddress.io data\npipenv run bin/import_openaddress_data.py \\ # run an import\n    --db postgresql://geocoder:password@host/osmgeocoder \\\n    --threads 4 \\\n    --optimize \\\n    openaddr-collected-europe.zip\n```\n\nWhen you have imported the data it will create some tables in your DB, `license` which contains the licenses of the imported data (the API will return the license attribution string with the data), `oa_city` which is a foreign key target from `oa_street` which in turn is a fk target to `oa_house` which contains the imported data.\n\nIf you want to import more than one file, just do so, the tables will not be cleared between import runs, the indices will be dropped and rebuilt after the import though. Skip the `--optimize` flag for the imports and run an optimize only pass last to save some time.\n\nIf you want to save even more time import with `--fast`, but be aware this leaves the DB without any indices or foreign key constraints, an optimize pass is required after importing with this flag!\n\nIf you want to start over run the command with the `--clean-start` flag... Be careful, this destroys all openaddresses.io data in the tables.\n\n\n## Optional support for libpostal\n\n### Installation of libpostal\n\nBe aware that the make process will download some data-files (about 1GB in size). The installation of libpostal\nwill need around 1 GB of disk space and about 2 GB of disk space while compiling.\n\nCurrently there is no Ubuntu package for `libpostal`, so we have to install it by hand:\n\n```bash\ngit clone https://github.com/openvenues/libpostal\ncd libpostal\n./bootstrap.sh\n./configure --prefix=/opt/libpostal --datadir=/opt/libpostal/share\nmake -j4\nsudo make install\necho \"/opt/libpostal/lib\" | sudo tee /etc/ld.so.conf.d/libpostal.conf\nsudo ldconfig\necho 'export PKG_CONFIG_PATH=\"$PKG_CONFIG_PATH:/opt/libpostal/lib/pkgconfig\"' | sudo tee /etc/profile.d/libpostal.sh\n```\n\nNow log out and on again or run a new login shell (e.g. `bash -l`) and install the missing python modules:\n\n```bash\nworkon osmgeocoder\nCFLAGS=\"-L/opt/libpostal/lib -I/opt/libpostal/include\" pip install postal\npip install gunicorn\npip install flask\n```\n\n### Run the classifier service\n\n**Source checkout:**\n\n```bash\npipenv run bin/postal_service.py --config config/config.json\n```\n\n**PIP install:**\n\n```bash\n/path/to/virtualenv/bin/postal_service.py --config config.json\n```\n\nAttention: Depending on the speed of your disk, startup of this service may take some seconds\n(this is why this is implemented as a service) and it will take about 2 GB of RAM, so be warned!\n\n\nIf you want to run it in production mode just run it with `gunicorn` directly.\nSee the [Gunicorn documentation](http://docs.gunicorn.org/en/latest/settings.html) for further information.\nSimple Example is following (one worker, run as daemon, bind to 127.0.0.1:3200):\n\n```bash\npipenv run gunicorn postal_service:app \\\n    --bind 127.0.0.1:3200 \\\n    --workers 1 \\\n    --pid /var/run/postal_service.pid \\\n    --log-file /var/log/postal_service.log \\\n    --daemon\n```\n\n**Attention**: Every worker takes that 2GB RAM toll!\n\n## Running a HTTP geocoding service\n\nThe file `geocoder_service.py` is a simple Flask app to present the geocoder as a HTTP service.\n\n### Installation\n\n```bash\npipenv run pip install gunicorn\npipenv run pip install flask\n```\n\nYou will need a working config file too.\n\n### Run the service\n\nThe service will search for a config file in the following places:\n- `~/.osmgeocoderrc`\n- `~/.config/osmgeocoder.json`\n- `/etc/osmgeocoder.json`\n- `osmgeocoder.json`\n\nYou can override the path by setting the environment variable `GEOCODER_CONFIG`.\n\nGunicorn example:\n\n```bash\npipenv run gunicorn geocoder_service:app \\\n    --env 'GEOCODER_CONFIG=config/config.json'\n    --bind 127.0.0.1:8080 \\\n    --workers 4 \\\n    --pid /var/run/osmgeocoder_service.pid \\\n    --log-file /var/log/osmgeocoder_service.log \\\n    --daemon\n```\n\n### Defined API-Endpoints\n\n#### Forward geocoding\n\nAddress string to coordinate.\n\n- Endpoint `/forward`\n- Method `POST`\n- Content-Type `application/json`\n- Body:\n    - `address`: (required) User input / address to convert to coordinates\n    - `center`: (optional) Array with center coordinate to sort matches\n    - `country`: (optional) ISO Country code, use only if no center coordinate is available as it slows down the geocoder massively.\n- Response: Array of objects\n    - `address`: Fully written address line, formatted by country standards\n    - `lat`: Latitude\n    - `lon`: Longitude\n    - `license`: License attribution string\n\n#### Reverse geocoding\n\nCoordinate to address string.\n\n- Endpoint `/reverse`\n- Method `POST`\n- Content-Type `application/json`\n- Body:\n    - `lat`: Latitude\n    - `lon`: Longitude\n- Response: Object\n    - `address`: Nearest address to the point (building search) or `null`, formatted by country standards\n    - `license`: License attribution string\n\n#### Predictive text\n\nIntelligent text completion while typing.\n\n- Endpoint `/predict`\n- Method `POST`\n- Content-Type `application/json`\n- Body:\n    - `query`: User input\n- Response: Object\n    - `predictions`: Up to 10 text predictions, sorted by equality and most common first\n\n\n## Config file\n\nExample:\n\n```json\n{\n  \"db\": {\n    \"dbname\": \"osm\",\n    \"user\": \"osm\",\n    \"password\": \"password\"\n  },\n  \"opencage_data_file\": \"data/worldwide.yml\",\n  \"postal\": {\n    \"service_url\": \"http://localhost:3200/\",\n    \"port\": 3200\n  }\n}\n```\n\nKeys:\n\n- `db`: Database configuration this will be built into a [Postgres connection string](https://www.postgresql.org/docs/current/static/libpq-connect.html#id-1.7.3.8.3.5)\n- `postal` -\u003e `service_url`: (optional) URL where to find the libpostal service, if not supplied searching is reduced to street names only\n- `postal` -\u003e `port`: (optional) only used when running the libpostal service directly without explicitly using gunicorn\n- `opencage_data_file`: (optional) Data file for the address formatter, defaults to the one included in the package\n\n## API documentation\n\nThe complete project contains actually only two classes:\n\n### `Geocoder`.\n\nPublicly accessible method prototypes are:\n\n```python\ndef __init__(self, db=None, db_handle=None, address_formatter_config=None, postal=None):\n    pass\n\ndef forward(self, address, country=None, center=None):\n    pass\n\ndef forward_dict(self, address, country=None, center=None):\n    pass\n\ndef forward_structured(self, road=None, house_number=None, postcode=None, city=None, country=None, center=None):\n    pass\n\ndef forward_structured_dict(self, road=None, house_number=None, postcode=None, city=None, country=None, center=None):\n    pass\n\ndef reverse(self, lat, lon, radius=100, limit=10):\n    pass\n\ndef reverse_dict(self, lat, lon, radius=100, limit=10):\n    pass\n\ndef reverse_epsg3857(self, x, y, radius=100, limit=10):\n    pass\n\ndef reverse_epsg3857_dict(self, x, y, radius=100, limit=10):\n    pass\n\ndef predict_text(self, input):\n    pass\n```\n\n#### `__init__`\n\nInitialize a geocoder, this will read all files to be used and set up the DB connection.\n- `db`: Dictionary with DB config, when used the geocoder will create a DB-connection on its own\n- `db_handle`: Postgres connection, use this if the connection is handled outside the scope of the geocoder (for example when you want to use the geocoder in Django)\n- `address_formatter_config`: Path to the `worldwide.yaml` (optional)\n- `postal`: Dictionary with postal config (at least `service_url` key)\n\nsee __Config File__ above for more info.\n\n#### `forward` and `forward_dict`\n\nGeocode an address to a lat, lon location.\n- `address`: Address to code\n- `country`: (optional) Country code to restrict search and format address\n- `center`: (optional) Center coordinate to sort results for (will be used to determine country too, so you can skip the `country` flag)\n\nThis function is a generator which `yield`s the obtained results.\n\n#### `forward_structured` and `forward_structured_dict`\n\nGeocode an address to a lat, lon location without using the address classifier, use this if your input is already structured.\n- `road`: (optional) Street/Road name\n- `house_number`: (optional) House number, this is a string because of things like `1a`\n- `postcode`: (optional) Post code, this is a string because not all countries use numbers only and zero prefixes,\n- `city`: (optional) City\n- `country`: (optional) Country code to restrict search and format address\n- `center`: (optional) Center coordinate to sort results for (will be used to determine country too, so you can skip the `country` flag)\n\nBe sure that at least one of `road`, `postcode` or `city` is filled, results are not predictable if none is set.\nThis function is a generator which `yield`s the obtained results.\n\n#### `reverse` and `reverse_dict`\n\nGeocode a lat, lon location into a readable address:\n- `lat`: Latitude to code\n- `lon`: Longitute to code\n- `radius`: Search radius in meters\n- `limit`: (optional) maximum number of results to return\n\nThis function is a generator which `yield`s the obtained results.\n\n#### `reverse_epsg3857` and `reverse_epsg3857_dict`\n\nGeocode a x, y location in EPGS 3857 projection (aka Web Mercator) into a readable address:\n- `x`: X coordinate\n- `y`: Y coordinate\n- `radius`: Search radius in meters\n- `limit`: (optional) maximum number of results to return\n\nUse this function if you're using Web Mercator in your application internally to avoid constant re-projection between lat, lon and x, y.\nThis function is a generator which `yield`s the obtained results.\n\n#### `predict_text`\n\nReturn possible text prediction results for the user input. This could be used while the user is typing their query to reduce the load on the database (by avoiding typos and running fewer requests against the geocoder because the user skips over typing long words one character by each).\n- `input`: User input\n\nThis function is a generator which `yield`s the obtained results.\n\n**ATTENTION**: Do not feed complete \"sentences\" into this function as it will not yield the expected result, tokenize into words on client side and only request predictions for the current word the user is editing.\n\n\n### `AddressFormatter`\n\nPublicly accessible method prototypes are:\n\n```python\ndef __init__(self, config=None):\n    pass\n\ndef format(self, address, country=None):\n    pass\n```\n\n#### `__init__`\n\nInitialize the address formatter\n- `config`: (optional) override default config file to use for the address formatter, defaults to config file included in this package\n\n#### `format`\n\nFormat an address in the default layout used in the specified country. Return value may contain line breaks.\n- `address`: Dictionary that contains the address parts, see below for recognized keys\n- `country`: Country code of the formatting template to use\n\nRecognized keys in `address`:\n- `attention`\n- `house`\n- `road`\n- `house_number`\n- `postcode`\n- `city`\n- `town`\n- `village`\n- `county`\n- `state`\n- `country`\n- `suburb`\n- `city_district`\n- `state_district`\n- `state_code`\n- `neighbourhood`\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdunkelstern%2Fosmgeocoder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdunkelstern%2Fosmgeocoder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdunkelstern%2Fosmgeocoder/lists"}