https://github.com/alexprengere/neobase
Minimalist GeoBases: single file, no dependency, compatible with Python 2.6+, Python 3.x, Pypy
https://github.com/alexprengere/neobase
geography map python web
Last synced: over 1 year ago
JSON representation
Minimalist GeoBases: single file, no dependency, compatible with Python 2.6+, Python 3.x, Pypy
- Host: GitHub
- URL: https://github.com/alexprengere/neobase
- Owner: alexprengere
- License: apache-2.0
- Created: 2015-11-25T15:24:44.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2024-05-28T09:19:48.000Z (about 2 years ago)
- Last Synced: 2024-05-29T00:37:25.123Z (about 2 years ago)
- Topics: geography, map, python, web
- Language: Python
- Homepage:
- Size: 71.5 MB
- Stars: 17
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
NeoBase |actions|_ |cratev|_ |crated|_
======================================
.. _actions : https://github.com/alexprengere/neobase/actions/workflows/python-package.yml
.. |actions| image:: https://github.com/alexprengere/neobase/actions/workflows/python-package.yml/badge.svg
.. _cratev : https://pypi.org/project/NeoBase/
.. |cratev| image:: https://img.shields.io/pypi/v/neobase.svg
.. _crated : https://pypi.org/project/NeoBase/
.. |crated| image:: https://static.pepy.tech/badge/neobase
Minimalist `GeoBases `__
implementation:
- no dependencies
- compatible with Python 3.9+, CPython and PyPy
- one data source:
`opentraveldata `__
- one Python module for easier distribution on clusters (like Hadoop)
- faster load time (5x)
- tested with pytest and tox
.. code:: python
>>> from neobase import NeoBase
>>> b = NeoBase()
>>> b.get('ORY', 'city_code_list')
['PAR']
>>> b.get('ORY', 'city_name_list')
['Paris']
>>> b.get('ORY', 'country_code')
'FR'
>>> b.distance('ORY', 'CDG')
34.87...
>>> b.get_location('ORY')
LatLng(lat=48.72..., lng=2.35...)
Installation
------------
Use the Python package:
.. code:: bash
pip install neobase
Docs
----
Check out `readthedocs `__ for the API.
You can customize the source data when initializing:
.. code:: python
with open("file.csv") as f:
N = NeoBase(f)
Otherwise the loaded file will be the embedded one, unless the ``OPTD_POR_FILE`` environment variable is set. In that case, it will load from the path defined in that variable.
You can manually retrieve the latest data source yourself too, but you expose yourself to some breaking changes if they occur in the data.
.. code:: python
from io import StringIO
from urllib.request import urlopen
from neobase import NeoBase, OPTD_POR_URL
data = urlopen(OPTD_POR_URL).read().decode('utf8')
N = NeoBase(StringIO(data))
N.get("PAR")
The reference date of validity can be changed as well:
.. code:: python
N = NeoBase(date="2000-01-01")
N.get("AIY") # was decommissioned in 2015
By default, the reference date will be set to today, unless the ``OPTD_POR_DATE`` environment variable is set. In that case, it will use that value.
You can customize the behavior regarding duplicates: points sharing the same IATA code, like NCE as airport and NCE as city. By default everything is kept, but you can set it so that only the first point with an IATA code is kept:
.. code:: python
N = NeoBase(duplicates=False)
len(N) # about 10,000 "only"
Note that you can use the ``OPTD_POR_DUPLICATES`` environment variable to control this as well: set it to ``0`` to drop duplicates.
Finally, you can customize fields loaded by subclassing.
.. code:: python
class SubNeoBase(NeoBase):
KEY = 0 # iata_code
# Those loaded fields are the default ones
FIELDS = (
("name", 6, None),
("lat", 8, None),
("lng", 9, None),
("page_rank", 12, lambda s: float(s) if s else None),
("country_code", 16, None),
("country_name", 18, None),
('continent_name', 19, None),
("timezone", 31, None),
("city_code_list", 36, lambda s: s.split(",")),
('city_name_list', 37, lambda s: s.split('=')),
('location_type', 41, None),
("currency", 46, None),
)
N = SubNeoBase()
Command-line interface
----------------------
You can query the data using:
.. code:: bash
python -m neobase PAR NCE
Tests
-----
.. code:: bash
tox
A note about performance
------------------------
The geographical operations like ``N.find_near("ORY", 100)`` or ``N.find_closest_from("ORY")`` perform a full scan of the data, and are not optimized (remember that this library has no dependencies by design).
If you want a more efficient solution, you should use a spatial index like a *BallTree*, for example using `scikit-learn `__:
.. code:: python
import numpy as np
from sklearn.neighbors import BallTree
from neobase import NeoBase
N = NeoBase()
iata_codes = []
coords = []
for key in N:
lat, lon = N.get_location(key)
if lat is not None and lon is not None:
iata_codes.append(N.get(key, "iata_code"))
coords.append([np.radians(lat), np.radians(lon)])
coords = np.array(coords)
tree = BallTree(coords, metric="haversine")
def find_closest_with_balltree(coord):
point = np.radians(coord)
_, idx = tree.query([point], k=1)
iata_code = iata_codes[idx[0][0]]
return iata_code
paris = (48.8566, 2.3522)
print(find_closest_with_balltree(paris)) # <0.1ms
print(list(N.find_closest_from_location(paris))) # ~30ms