Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/huels-originals/geobatchpy
Python client for Geoapify API
https://github.com/huels-originals/geobatchpy
address geocoding openstreetmap python
Last synced: about 2 months ago
JSON representation
Python client for Geoapify API
- Host: GitHub
- URL: https://github.com/huels-originals/geobatchpy
- Owner: huels-originals
- License: mit
- Created: 2022-09-10T19:26:57.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-06-03T08:45:19.000Z (over 1 year ago)
- Last Synced: 2024-09-23T19:47:39.126Z (3 months ago)
- Topics: address, geocoding, openstreetmap, python
- Language: Python
- Homepage:
- Size: 186 KB
- Stars: 14
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# A CLI and Python Client for the Geoapify API
We have been using the Geoapify API to **geocode millions of location records** for data validation and analytics. We built
this package to make this process comfortable using Python and the command line.Why Geoapify and this package may also be a good fit for you:
- You need to batch process large numbers of location records (geocode, reverse geocode, places & details).
- The license must support commercial use without restrictions.
- It needs to be cheap (or even for free if you don't need more than 6k addresses per day).Sign up at [geoapify.com](https://geoapify.com/) and start with their free plan of 3k credits per day which translates
to up to 6k address geocodings.## Install our package with `pip`
This package is available on the public PyPI:
```shell
pip install geobatchpy
```## Examples
See our documentation at [geobatchpy.readthedocs.io](https://geobatchpy.readthedocs.io/en/latest/) for a growing number of
comprehensive example use cases. Below we illustrate both, the Python API and the CLI, for a tiny batch geocoding
example.### A simple batch geocoding example using the Python API
Below we geocode multiple addresses in a single batch. There are two ways how we can provide the location data as input.
Either we use a list of strings, one string per address. These are then taken as free text searches. Or we provide
structured input as a list of dictionaries, again one per address. See the
[Geoapify API documentation](https://apidocs.geoapify.com/) for a complete list of address attributes accepted by the
geocoding services. Use the optional `parameters` dictionary if all your addresses have an attribute in common. E.g.,
below we request results in French.```python
from geobatchpy import Clientclient = Client(api_key='')
addresses = ['Hülser Markt 1, 47839 Krefeld',
'DB Schenker, Essen, Germany',
'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen']# see the geoapify.com API docs for more optional parameters
res = client.batch.geocode(locations=addresses, parameters={'lang': 'fr'}, simplify_output=True)
```Alternatively you can provide a list of dictionaries, with every address in a structured form. And if you still need
the free text search for some, you can do this with the `'text'` attribute. Here is the same example, with the first
two address translated to structured form:```python
addresses = [{'city': 'Krefeld', 'street': 'Hülser Markt', 'housenumber': 1, 'postcode': '47839'},
{'name': 'DB Schenker', 'city': 'Essen', 'country': 'Germany'},
{'text': 'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen'}]
``````python
# Showing the first of three result sets: res[0]
{
"query": {
"text": "Hülser Markt 1, 47839 Krefeld",
"parsed": {
"housenumber": "1",
"street": "hülser markt",
"postcode": "47839",
"city": "krefeld",
"expected_type": "building",
},
},
"datasource": {
"sourcename": "openstreetmap",
"attribution": "© OpenStreetMap contributors",
"license": "Open Database License",
"url": "https://www.openstreetmap.org/copyright",
},
"name": "Metzgerei Etteldorf",
"housenumber": "1",
"street": "Hülser Markt",
"suburb": "Hüls",
"city": "Krefeld",
"state": "Rhénanie-du-Nord-Westphalie",
"postcode": "47839",
"country": "Allemagne",
"country_code": "de",
"lon": 6.510696417033254,
"lat": 51.373026800000005,
"formatted": "Metzgerei Etteldorf, Hülser Markt 1, 47839 Krefeld, Allemagne",
"address_line1": "Metzgerei Etteldorf",
"address_line2": "Hülser Markt 1, 47839 Krefeld, Allemagne",
"category": "commercial.food_and_drink.butcher",
"result_type": "amenity",
"rank": {
"importance": 0.31100000000000005,
"popularity": 5.585340759145855,
"confidence": 1,
"confidence_city_level": 1,
"confidence_street_level": 1,
"match_type": "inner_part",
},
"place_id": "516b5e6500f40a1a40590a449957bfaf4940f00102f9010ecff70d00000000c002019203134d65747a676572656920457474656c646f7266",
}
```### The same batch geocoding example using the CLI
We built the `geoapify` command line interface to make batch processing large numbers of records more comfortable.
Steps:
1. Prepare a JSON file as input.
2. Use `geoapify post-batch-jobs` to submit one or more jobs to the Geoapify servers.
3. Use `geoapify monitor-batch-jobs` for monitoring progress and data retrieval.```python
# Step 1 - written in Python:
from geobatchpy.batch import parse_geocoding_inputs
from geobatchpy.utils import write_data_to_json_fileaddresses = ['Hülser Markt 1, 47839 Krefeld',
'DB Schenker, Essen, Germany',
'JCI Beteiligungs GmbH, Am Schimmersfeld 5, Ratingen']data = {
'api': '/v1/geocode/search', # see the Geoapify API docs for other APIs that work with batch processing
'inputs': parse_geocoding_inputs(locations=addresses),
'batch_len': 2, # optional - will put first two addresses in batch 1, last address in batch 2
'id': 'my-batch-geocoding-job' # optional - a reference which will be reused in the output file
}write_data_to_json_file(data=data, file_path='')
```The following command submits one or more jobs and stores job URLs to disk. Those URLs are required to monitor
and retrieve results.```shell
geobatch submit --api-key
```You can omit the `--api-key` option if you set the `GEOAPIFY_KEY` environment variable. Next we start monitoring
progress:```shell
geobatch receive --api-key
```We can abort the monitoring at any time and restart later - provided the jobs still are in the cache of
Geoapify servers (24 hours).## References and further reading
- [geoapify.com API documentation](https://apidocs.geoapify.com/)
- [Towards Data Science - Deduplicate and clean-up millions of location records](https://towardsdatascience.com/deduplicate-and-clean-up-millions-of-location-records-abcffb308ebf)