{"id":29702574,"url":"https://github.com/devgateway/geocoder-ie","last_synced_at":"2026-02-18T01:02:13.059Z","repository":{"id":40744159,"uuid":"94159564","full_name":"devgateway/geocoder-ie","owner":"devgateway","description":"automatically  geocode aid projects by applying natural language processing techniques ","archived":false,"fork":false,"pushed_at":"2022-12-08T00:53:46.000Z","size":13784,"stargazers_count":4,"open_issues_count":33,"forks_count":0,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-04-15T15:31:33.646Z","etag":null,"topics":["classifier","classifier-training","geocoder","machine-learning"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devgateway.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-13T02:19:33.000Z","updated_at":"2020-09-14T10:39:40.000Z","dependencies_parsed_at":"2023-01-25T01:45:43.464Z","dependency_job_id":null,"html_url":"https://github.com/devgateway/geocoder-ie","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/devgateway/geocoder-ie","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devgateway%2Fgeocoder-ie","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devgateway%2Fgeocoder-ie/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devgateway%2Fgeocoder-ie/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devgateway%2Fgeocoder-ie/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devgateway","download_url":"https://codeload.github.com/devgateway/geocoder-ie/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devgateway%2Fgeocoder-ie/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266680690,"owners_count":23967795,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classifier","classifier-training","geocoder","machine-learning"],"created_at":"2025-07-23T12:39:41.054Z","updated_at":"2026-02-18T01:02:10.987Z","avatar_url":"https://github.com/devgateway.png","language":"Python","readme":"\nAutoGeocoder\n=====\n AutoGeocoder tool reads through text provided in various document formats (PDF, DOC, TXT) to identify activity locations, and then produces a final list of georeferenced location names. The tool has been fully developed in Python 3, and combines well-known tools and libraries such as NLTK and scikit-learn.\n\n\n## Installation steps\n### Requirements\n- Python3.\n- Anaconda or pip (pyhton-devel is required).\n\n### Installation steps\n1. Install dependencies usin pip or anaconda\n```\n(Linux pip)\npip install -r requirements\n\n(Anaconda)\nconda install --yes --file requirements.txt\n```\n2. Download Stanford NER from https://nlp.stanford.edu/software/CRF-NER.shtml#Download\n3. Start Standford NER Server\n\n```\njava -mx400m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer\n-loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -port 9094\n\n```\n4. Run python setup.py to get NLKT data.\n5. Update Stanford setting in geocoder.ini\n```\n[standford]\nhost = localhost\nport = 9094\n```\n\n## Using the tool\n\n### Command line help\n\n```\ngeocoder.sh --help\n\nsage: main.py [-h] [-c {geocode,download,generate,train}]\n               [-f FILE [FILE ...]] [-p ORGANISATION] [-t COUNTRIES]\n               [-l LIMIT] [-n NAME]\n\nAuto-geocode activity projects\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -c {geocode,download,generate,train}, --command {geocode,download,generate,train}\n                        Use geocode to auto-geocode file. Use download to get\n                        raw data from IATI registry. Use generate to generate\n                        a corpora. Use train to train a new classifier\n  -f FILE [FILE ...], --file FILE [FILE ...]\n                        Use together with -c geocode to pass a file to\n                        process, The file can be a IATI activities file, pdf\n                        document, odt document or txt file\n  -p ORGANISATION, --publisher ORGANISATION\n                        Use together with -c download to download data of\n                        specific IATI Publisher\n  -t COUNTRIES, --countries COUNTRIES\n                        Use together with -c geocode to filter geonames search\n                        Use together with -c download to download data of\n                        specific countries\n  -l LIMIT, --limit LIMIT\n                        Use together with -c download to limit the Number of\n                        activities to download for each publisher/country\n  -n NAME, --name NAME  set the new classifier name\n  -o {xml,tsv,json}, --output {xml,tsv,json}\n                        Set output format, default json\n```\n\n```\ngeocoder.sh -f example.pdf -tGN\nexample.pdf will be geocoded\n(geocoder) C:\\projects\\clean_copy\u003egeocoder.sh  -f example.pdf -tGN -otsv\nexample.pdf will be geocoded\n2017-09-07 10:17:04,637 root         INFO     Detecting document language\n2017-09-07 10:17:05,152 root         INFO     Splitting document in sentences\n\nReading pdf pages  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30\n\n2017-09-07 10:17:08,970 root         INFO     There are 380 sentences to process, split took 4.321118570914758ms\n2017-09-07 10:17:08,985 root         INFO     80 geographical sentences found\n2017-09-07 10:17:10,080 root         INFO     NER extraction took 1.0949942523466651ms\n2017-09-07 10:17:11,106 root         INFO     fouta djallon was geocode as RGN with coordinates 11.5,-12.5\n2017-09-07 10:17:11,666 root         INFO     gaoual was geocode as ADM2 with coordinates 11.75,-13.2\n2017-09-07 10:17:12,241 root         INFO     koundara was geocode as ADM2 with coordinates 12.41667,-13.16667\n2017-09-07 10:17:12,921 root         INFO     middle guinea was geocode as RGN with coordinates 11,-12.5\n2017-09-07 10:17:13,469 root         INFO     Wasn't able to geocode  niger\n2017-09-07 10:17:13,470 root         INFO     Let's try using others parameters\n2017-09-07 10:17:14,029 root         INFO     niger was geocode as PRK with coordinates 10.5,-10.2\n2017-09-07 10:17:14,572 root         INFO     guinea was geocode as PCLI with coordinates 10.83333,-10.66667\n2017-09-07 10:17:15,194 root         INFO     republic of guinea was geocode as PCLI with coordinates 10.83333,-10.66667\n2017-09-07 10:17:15,749 root         INFO     conakry was geocode as ADM1 with coordinates 9.60703,-13.597\n2017-09-07 10:17:16,447 root         INFO     koundara prefectures was geocode as ADM2 with coordinates 12.41667,-13.16667\n2017-09-07 10:17:16,447 root         INFO     ua Too short location name\n2017-09-07 10:17:17,034 root         INFO     Wasn't able to geocode  atlantic ocean\n2017-09-07 10:17:17,034 root         INFO     Let's try using others parameters\n2017-09-07 10:17:17,613 root         INFO     Wasn't able to geocode  atlantic ocean\n2017-09-07 10:17:18,174 root         INFO     upper guinea was geocode as RGN with coordinates 10.5,-9.5\nResults were saved in out.tsv\n```\n\n\nOutput format (-o)\n\nxml: is only supported when processing a iati xml file, and the output is a copy of the original xml file  with new locations embedded.\ntsv and json can be used for any input file.\n\n## Web interface\nAutoGeocoder tool provides a simple user interface to upload, geocode documents, review and see the geocoding results and its related texts.\n\n### Setup\n1. Install PostgresSQL\n2. Create the geocoder database  database\n```\ncreatedb -Upostgres autogeocoder\n```\n3. Run sql script\n```\npsql -Upostgres -dautogeocoder -f sql/geocoder.sql\n\n```\n4. Database configuration (geocoder.ini)\n```\n[postgres]\n user_name=postgres\n password=postgres\n port=5432\n host=localhost\n db_name=geocoder\n```\n\n\n5. Uwsgi configuration \n\n The uWSGI project aims at developing a full stack for building hosting services.\n please look at \n \u003ca href=\"https://uwsgi-docs.readthedocs.io/en/latest/ \" target=\"_blank\"\u003ehttps://uwsgi-docs.readthedocs.io/en/latest/ \u003c/a\u003e\n \n To install uwsgi please run. \n\n```\n    pip install uwsgi\n```\n \nuwsgi.ini\n```\n[uwsgi]\nproject = autogeocoder\nmodule = wsgi\nwsgi-file=/src/wsgi.py\nmaster = true\nprocesses = 5\nsocket = 0.0.0.0:9095\nprotocol = http\ncallable = app\nchdir=/home/sdimunzio/projects/geocoder-ie/src/\nvirtualenv=/home/sdimunzio/envs/geocoder/\ndie-on-term = true\n```\n\n6. Run \n  ```\n  uwsgi uwsgi.ini\n   \n  ```\n\n## Training your own text classifier\nThe text classifier attempts to reduce the number of false positives by eliminating those paragraph that shouldn’t be passed to the  named entity extraction phase, you can train your own classifier and make it learn about your documents.\n\n\n## Classifier Training\nThe default classifier has been trained with a small dataset, \nso it is recommended that users train their own text classifiers to achieve enhanced precision.\n\nBefore following  bellow steps you should ensure AutoGeocoder's database is already configured.\n\n\n1. Download iati data from IATI registry\n```\nAfrican Development Bank publisher code is 46002\ngeocoder.sh –c download --publisher=46002 --countries=ALL\n\n```\n2. Generate corpora table\n```\ngeocoder.sh –c generate\n\n```\n3. Go to web interface and open training data manager link\n4. Look for sentences that contains your geographical information and flag it as Geography\n5. Look for other sentences and flag it as None\n6. Train a new classifier\n```\ngeocoder.sh -c train -n my_classifier\n```\n7. Eedit geocoder.ini and change default classifier name\n```\n[ie]\ndefault_classifier= my_classifier\n```\n8. Geocode your documents\n```\ngeocoder.sh -f mydocument.pdf\n```\n\n\n## AutoGeocoder process workflow\n\n\n![Alt text](workflow-image.png?raw=true \"Workflow\")\n\n## Geocoder Suite Technical Guide\nFor detailed installation instructions please look at the technical guide https://drive.google.com/file/d/1QhGoI_syJq3FqO0lm3Zc7j5ziuWro5TK/view\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevgateway%2Fgeocoder-ie","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevgateway%2Fgeocoder-ie","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevgateway%2Fgeocoder-ie/lists"}