{"id":26228932,"url":"https://github.com/thunderpoot/isogloss","last_synced_at":"2025-09-12T11:43:39.580Z","repository":{"id":220173447,"uuid":"750922855","full_name":"thunderpoot/isogloss","owner":"thunderpoot","description":"ISO 639 and IETF Language Code Lookup Tool","archived":false,"fork":false,"pushed_at":"2024-10-07T18:23:48.000Z","size":1866,"stargazers_count":7,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-19T19:52:36.380Z","etag":null,"topics":["bcp47","command-line","command-line-tool","ietf-language-tag","ietf-language-tags","iso-3166-1","iso639","iso639-1","iso639-2","iso639-3","language-classification","languages","locales","localization","python","shell-script"],"latest_commit_sha":null,"homepage":"https://thunderpoot.github.io/isogloss/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thunderpoot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-01-31T15:35:48.000Z","updated_at":"2024-10-09T14:48:33.000Z","dependencies_parsed_at":"2025-04-25T16:31:24.350Z","dependency_job_id":null,"html_url":"https://github.com/thunderpoot/isogloss","commit_stats":null,"previous_names":["thunderpoot/isogloss"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/thunderpoot/isogloss","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thunderpoot%2Fisogloss","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thunderpoot%2Fisogloss/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thunderpoot%2Fisogloss/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thunderpoot%2Fisogloss/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thunderpoot","download_url":"https://codeload.github.com/thunderpoot/isogloss/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thunderpoot%2Fisogloss/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264384304,"owners_count":23599609,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bcp47","command-line","command-line-tool","ietf-language-tag","ietf-language-tags","iso-3166-1","iso639","iso639-1","iso639-2","iso639-3","language-classification","languages","locales","localization","python","shell-script"],"created_at":"2025-03-12T21:34:33.577Z","updated_at":"2025-07-09T03:04:51.557Z","avatar_url":"https://github.com/thunderpoot.png","language":"Python","funding_links":["https://github.com/sponsors/thunderpoot"],"categories":[],"sub_categories":[],"readme":"![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge\u0026logo=python\u0026logoColor=ffdd54)\n![JavaScript](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge\u0026logo=javascript\u0026logoColor=%23F7DF1E)\n\n# 🌐 iso·gloss\n\n![isogloss](isogloss.jpg)\n\n### ISO 639 and IETF Language Code Lookup Tool\n\n`isogloss` is a Python–based command–line tool designed for looking up language details based on [ISO 639](https://www.iso.org/iso-639-language-code) codes and IETF ([BCP-47](https://www.rfc-editor.org/info/bcp47)) language tags. It provides comprehensive information about languages, including their names, native names, and additional details associated with each code or tag.\n\nThere is also a [web–based version here](https://thunderpoot.github.io/isogloss).  The [BCP47 parser](https://thunderpoot.github.io/isogloss/bcp-index.html) has some known issues, documented below in the \"Errata\" section.\n\nElsewhere, [the word isogloss](https://en.wikipedia.org/wiki/Isogloss) means a boundary line on a map denoting the regional use of a particular linguistic characteristic, but in this case it just seemed to fit.\n\n## Features\n\n- Lookup language details using ISO 639-1, 639-2/B, 639-2/T, or 639-3 codes.\n- Lookup language details by language name.\n- Lookup language details using IETF BCP-47 language tags\n    - Examples: `en-GB`, `en-US`, `sv-SE`, `zh-cmn-Hans-CN-pinyin-ud1-p9t4-x-private1`, and so on.\n\n## Installation\n\nClone the repository to your local machine:\n\n```\ngit clone https://github.com/thunderpoot/isogloss.git\n```\n\nCreate a virtual environment and install requirements\n\n```\npython3.11 -m venv venv\nsource venv/bin/activate\npip install unidecode\n```\n\n## Usage\n\nThe script can be run directly from the command line. Below are some examples of how to use it:\n\nTo look up information by ISO 639 code:\n\n```\n$ isogloss/isogloss.py -c swe\n{\n  \"639-1\": \"sv\",\n  \"Scope\": \"Individual\",\n  \"Type\": \"Living\",\n  \"Native name(s)\": \"svenska\",\n  \"Other name(s)\": \"\",\n  \"639-2/T\": \"swe\",\n  \"639-2/B\": \"\",\n  \"639-3\": \"swe\",\n  \"Name(s)\": \"Swedish\"\n}\n```\n\nTo look up information by language name:\n\n```\n$ isogloss/isogloss.py -n \"egyptian arabic\"\n{\n    \"Egyptian Arabic\": \"arz\"\n}\n```\n\nExample of lookup via native name:\n\n```\n$ isogloss/isogloss.py -n 日本語\n{\n    \"\\u65e5\\u672c\\u8a9e Nihongo\": \"jpn\"\n}\n```\n\nExample of multiple results being found:\n\n```\n$ isogloss/isogloss.py -n norwegian\n{\n    \"Norwegian Nynorsk\": \"nno\",\n    \"Nynorsk, Norwegian\": \"nno\",\n    \"Bokm\\u00e5l, Norwegian\": \"nob\",\n    \"Norwegian Bokm\\u00e5l\": \"nob\",\n    \"Norwegian\": \"nor\",\n    \"Norwegian Sign Language\": \"nsl\",\n    \"Traveller Norwegian\": \"rmg\"\n}\n```\n\nLanguage names are normalised, allowing for case–insensitive and accent–insensitive matching when searching:\n\n```\n$ isogloss/isogloss.py -n espanol\n{\n    \"Judeo-espa\\u00f1ol\": \"lad\",\n    \"espa\\u00f1ol\": \"spa\"\n}\n```\n\nTo look up information by IETF language tag:\n\n```\n$ isogloss/isogloss.py -i fr-FR\n{\n    \"Language\": {\n        \"639-1\": \"fr\",\n        \"Scope\": \"Individual\",\n        \"Type\": \"Living\",\n        \"Native name(s)\": \"fran\\u00e7ais\",\n        \"Other name(s)\": \"\",\n        \"639-2/T\": \"fra\",\n        \"639-2/B\": \"fre\",\n        \"639-3\": \"fra\",\n        \"Name(s)\": \"French\"\n    },\n    \"Region\": \"France\"\n}\n```\n\n```\n$ isogloss/isogloss.py -i zh-cmn-Hans-CN-pinyin-ud1-p9t4-x-private1\n{\n    \"Primary Language\": {\n        \"639-1\": \"zh\",\n        \"639-2/B\": \"chi\",\n        \"639-2/T\": \"zho\",\n        \"639-3\": \"zho\",\n        \"Deprecated\": false,\n        \"Name(s)\": \"Chinese\",\n        \"Native name(s)\": \"\\u4e2d\\u6587 Zh\\u014dngw\\u00e9n; \\u6c49\\u8bed; \\u6f22\\u8a9e H\\u00e0ny\\u01d4\",\n        \"Other name(s)\": \"\",\n        \"Scope\": \"Macrolanguage\",\n        \"Type\": \"Living\"\n    },\n    \"Extended Languages\": [\n        {\n            \"639-1\": \"\",\n            \"639-2/B\": \"\",\n            \"639-2/T\": \"\",\n            \"639-3\": \"cmn\",\n            \"Deprecated\": false,\n            \"Name(s)\": \"Mandarin Chinese\",\n            \"Native name(s)\": \"\",\n            \"Other name(s)\": \"\",\n            \"Scope\": \"Individual\",\n            \"Type\": \"Living\"\n        }\n    ],\n    \"Script\": \"Han (Simplified variant)\",\n    \"Region\": \"China\",\n    \"Variant\": \"pinyin\",\n    \"Extension\": \"ud1-p9t4\",\n    \"Private Use\": \"x-private1\"\n}\n```\n\n```\n$ isogloss/isogloss.py -i ar-ajp-apc-apd-Arab-CV-arevela-g-231243-r-sdarre-x-private-x-private1 | jq\n{\n  \"Primary Language\": {\n    \"639-1\": \"ar\",\n    \"639-2/B\": \"\",\n    \"639-2/T\": \"ara\",\n    \"639-3\": \"ara\",\n    \"Deprecated\": false,\n    \"Name(s)\": \"Arabic\",\n    \"Native name(s)\": \"العربية; al'Arabiyyeẗ\",\n    \"Other name(s)\": \"\",\n    \"Scope\": \"Macrolanguage\",\n    \"Type\": \"Living\"\n  },\n  \"Extended Languages\": [\n    {\n      \"639-1\": \"\",\n      \"639-2/B\": \"\",\n      \"639-2/T\": \"\",\n      \"Deprecated\": true,\n      \"Language Name(s)\": \"South Levantine Arabic\",\n      \"Language Type\": \"Living\",\n      \"Native name(s)\": \"\",\n      \"Other name(s)\": \"\",\n      \"Scope\": \"Individual\"\n    },\n    {\n      \"639-1\": \"\",\n      \"639-2/B\": \"\",\n      \"639-2/T\": \"\",\n      \"639-3\": \"apc\",\n      \"Deprecated\": false,\n      \"Name(s)\": \"Levantine Arabic\",\n      \"Native name(s)\": \"\",\n      \"Other name(s)\": \"\",\n      \"Scope\": \"Individual\",\n      \"Type\": \"Living\"\n    },\n    {\n      \"639-1\": \"\",\n      \"639-2/B\": \"\",\n      \"639-2/T\": \"\",\n      \"639-3\": \"apd\",\n      \"Deprecated\": false,\n      \"Name(s)\": \"Sudanese Arabic\",\n      \"Native name(s)\": \"\",\n      \"Other name(s)\": \"\",\n      \"Scope\": \"Individual\",\n      \"Type\": \"Living\"\n    }\n  ],\n  \"Script\": \"Arabic\",\n  \"Region\": \"Cabo Verde\",\n  \"Variant\": \"arevela\",\n  \"Extension\": \"g-231243-r-sdarre\",\n  \"Private Use\": \"x-private-x-private1\"\n}\n```\n\n## Files\n\n- `data/consolidated_langs.json`: Contains language data in JSON format used for the lookup.\n- `data/region_names.json`: Contains region data in JSON format used for the BCP47 lookup.\n- `data/script_codes.json`: Contains script code data in JSON format used for the BCP47 lookup.\n- `data/deprecated-639-3.csv`: Contains deprecated ISO 639-3 codes in CSV format, for quick reference.\n\n## Errata\n\nThere are known issues with the BCP47 parser in the web interface.  It uses regular expressions to validate input, such that:\n\n### Examples of valid tags:\n\n- `en`\n\n- `fr-CA`\n\n- `i-klingon`\n\n- `az-Arab-IR`\n\n- `sr-Cyrl-RS`\n\n- `zh-cmn-Hans`\n\n- `ja-JP-x-tokyo`\n\n- `uz-Cyrl-UZ-1992`\n\n- `bo-Tibt-x-dialect`\n\n- `zh-cmn-Hans-CN-x-private1`\n\n- `hy-Latn-IT-arevela-x-test`\n\n\n### Examples of invalid tags (malformed):\n\n- `en-GB-oed-x-private`\n\n- `de-CH-1901-co-phonebk-sc-gothic-x-bavaria`\n\n(and more)\n\n### Examples of inputs that reveal parsing bugs:\n\n- `ca-valencia-nedis`\n    (Highlighted input section is missing \"valencia\")\n\n- `en-US-u-islamcal`\n    (Variant \"u\" and Extension \"islamcal\", Extension section says \"u - islamcal\")\n\n- `es-419-fonipa`\n    (Extended languages blank)\n\n- `de-Latf-1901`\n    (Region undefined)\n\n- `sl-rozaj`\n    (rozaj is coloured differently in the result container to how it is in the highlighted input section)\n\n\n## Contributing\n\nContributions, issues, and feature requests are welcome!\n\n## Author\n\nWritten by T E Vaughan\n\n## Sponsorship\n\n[![Github-sponsors](https://img.shields.io/badge/sponsor-30363D?style=for-the-badge\u0026logo=GitHub-Sponsors\u0026logoColor=#EA4AAA)](https://github.com/sponsors/thunderpoot)\n\nIf you find this project useful, please consider sponsoring my work. \u003c3\n\n## Related Standards and RFCs\n\nThe codes used in this program conform to the following ISO standards:\n\n### Standards\n\n- [ISO 639](https://www.iso.org/iso-639-language-code) Language codes\n- [ISO 3166-1 alpha-2](https://www.iso.org/iso-3166-country-codes.html) Country codes\n- [ISO 15924](https://www.unicode.org/iso15924/) Script codes\n\n### RFCs\n\n- [RFC 1766](https://www.ietf.org/rfc/rfc1766.txt) Tags for the Identification of Languages\n- [RFC 4646](https://www.ietf.org/rfc/rfc4646.txt) Tags for Identifying Languages\n- [RFC 4647](https://www.ietf.org/rfc/rfc4647.txt) Matching of Language Tags\n\n## License\n\nThis project is [MIT licensed](https://opensource.org/licenses/MIT).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthunderpoot%2Fisogloss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthunderpoot%2Fisogloss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthunderpoot%2Fisogloss/lists"}