{"id":16966364,"url":"https://github.com/mar-muel/local-geocode","last_synced_at":"2025-03-22T14:31:03.787Z","repository":{"id":62576589,"uuid":"228201333","full_name":"mar-muel/local-geocode","owner":"mar-muel","description":"Simple library for efficient geocoding without making API calls","archived":false,"fork":false,"pushed_at":"2024-02-12T16:22:32.000Z","size":89,"stargazers_count":23,"open_issues_count":7,"forks_count":5,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-18T11:52:02.046Z","etag":null,"topics":["countries","geocode","geocoding","geolocation","geonames","geoparser","parser","twitter"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mar-muel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-15T14:54:12.000Z","updated_at":"2025-03-15T15:52:32.000Z","dependencies_parsed_at":"2024-10-28T13:21:54.002Z","dependency_job_id":"c7d64ae1-55b1-4933-ba43-fb650945bbd1","html_url":"https://github.com/mar-muel/local-geocode","commit_stats":{"total_commits":68,"total_committers":3,"mean_commits":"22.666666666666668","dds":0.02941176470588236,"last_synced_commit":"b8eb1a5c616ee186601c2ed0d663a3019c5349d1"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mar-muel%2Flocal-geocode","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mar-muel%2Flocal-geocode/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mar-muel%2Flocal-geocode/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mar-muel%2Flocal-geocode/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mar-muel","download_url":"https://codeload.github.com/mar-muel/local-geocode/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244971804,"owners_count":20540860,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["countries","geocode","geocoding","geolocation","geonames","geoparser","parser","twitter"],"created_at":"2024-10-14T00:05:35.725Z","updated_at":"2025-03-22T14:31:03.484Z","avatar_url":"https://github.com/mar-muel.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Local-geocode :earth_americas:\n\nThis is a very simple geocoding library which runs fully locally (without calling any APIs) and has therefore no limits in terms of processing. It runs very fast due to using an efficient in-memory datastructure called [Flashtext](https://github.com/vi3k6i5/flashtext). It uses data from http://www.geonames.org/.\n\nThis project is mainly used in the context of decoding data from the \"user.location\" field of tweets but it can in principle be used on any address/location raw text field. Note that if you need very precise geographical information it is better to use one of the many available APIs. By default this repo only detects places with more than 30k inhabitants.\n\nI have compared the predictions by local-geocode with geopy for 500 Twitter user locations. Local-geocode performs signficantly better (85% accuracy) than geopy (64% accuracy) for this use case. Read more about the benchmark [here](benchmark/benchmark.md).\n\n# Install\n```\npip install local-geocode\n```\n\n# Example usage\nLocal-geocode is able to parse arbitrary location names in many languages, as well as numerous alternative names of places and returns geographic information.\n\n```python\nfrom geocode.geocode import Geocode\n\ngc = Geocode()\ngc.load()  # load geonames data\n\nmydata = ['Tel Aviv', 'Mangalore 🇮🇳']\n\nfor input_text in mydata:\n    locations = gc.decode(input_text)\n    print(locations)\n\n[\n    {\n        \"name\": \"Tel Aviv\",\n        \"official_name\": \"Tel Aviv\",\n        \"country_code\": \"IL\",\n        \"longitude\": 34.780570000000004,\n        \"latitude\": 32.08088,\n        \"geoname_id\": \"293397\",\n        \"location_type\": \"city\",\n        \"population\": 432892\n    }\n]\n[\n    {\n        \"name\": \"Mangalore\",\n        \"official_name\": \"Mangalore\",\n        \"country_code\": \"IN\",\n        \"longitude\": 74.85603,\n        \"latitude\": 12.91723,\n        \"geoname_id\": \"1263780\",\n        \"location_type\": \"city\",\n        \"population\": 417387\n    },\n    {\n        \"name\": \"\\ud83c\\uddee\\ud83c\\uddf3\",\n        \"official_name\": \"Republic of India\",\n        \"country_code\": \"IN\",\n        \"longitude\": 79.0,\n        \"latitude\": 22.0,\n        \"geoname_id\": \"1269750\",\n        \"location_type\": \"country\",\n        \"population\": 1352617328\n    }\n]\n```\n\n# Usage\nThe easiest way to integrate `local-geocode` to your project is to simply run `pip install local-geocode`. You can also simply clone this repository and copy the folder `geocode` into your project. \n\n## Configuration\nWhen installed with pip, local-geocode comes packaged with 2 pickle files which were generated using the default configuration. You can however change the configuration and then re-compute the pickle files for your needs.\n\nThe `Geocode()` initializer accepts the following arguments:\n* `min_population_cutoff` (default: 30k): Places below this population size are excluded\n* `large_city_population_cutoff` (default: 200k): Cities with a population size larger than this will be prioritized. Example: \"Los Angeles, USA\" will result in \"Los Angeles\" as the first result, and not \"USA\".\n* `location_types`: Provide a list of location types which you would like to filter. By default it uses all location types (i.e. `['city', 'place', 'country', 'admin1', 'admin2', 'admin3', 'admin4', 'admin5', 'admin6', 'admin_other', 'continent', 'region']`).\n\nExample:\n```python\nfrom geocode.geocode import Geocode\n\ngc = Geocode(min_population_cutoff=100000)\ngc.load()  # downloads geonames data (~1.2GB), parses data, generates pickle files in \u003cpackage folder\u003e/geocode/data for new configuration\n```\n(This may take 1-2min to run)\n\n\n## Prioritization\nIf multiple locations are detected in an input string, local-geocode sorts the output by the following prioritization:\n1. Large cities (`population size \u003e large_city_population_cutoff`)\n2. States/provinces (admin level 1)\n3. Countries\n4. Places (`population size \u003c= large_city_population_cutoff`)\n5. Counties (admin levels \u003e 1)\n6. Continents\n7. Regions\n\n## Parallelized\nIf you have a large number of texts to decode, it might make sense to use `decode_parallel` which runs decode in parallel:\n```python\ngc = Geocode()\ngc.load()  # load geonames data\n\n# a large number of items\nmydata = ['Tel Aviv', ..,]\nnum_cpus = None # By default use all CPUs\n\nlocations = gc.decode_parallel(mydata, num_cpus=num_cpus)\nprint(locations)\n```\n\n# Contact\nPlease open an issue, if you run into problems!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmar-muel%2Flocal-geocode","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmar-muel%2Flocal-geocode","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmar-muel%2Flocal-geocode/lists"}