{"id":50432805,"url":"https://github.com/jhnwnstd/suxotin","last_synced_at":"2026-05-31T15:01:42.602Z","repository":{"id":236533861,"uuid":"792796624","full_name":"jhnwnstd/suxotin","owner":"jhnwnstd","description":"Python script that distinguishes vowels from consonants using Suxotin's algorithm.","archived":false,"fork":false,"pushed_at":"2026-03-27T05:47:17.000Z","size":10961,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-27T17:42:05.348Z","etag":null,"topics":["cryptography","decipherment","suxotin"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jhnwnstd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-04-27T15:54:01.000Z","updated_at":"2026-03-27T05:47:21.000Z","dependencies_parsed_at":"2024-07-19T22:59:04.428Z","dependency_job_id":"5dc18228-eaf0-47f0-b643-3049c93d29e4","html_url":"https://github.com/jhnwnstd/suxotin","commit_stats":null,"previous_names":["jhnwnstd/suxotin"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jhnwnstd/suxotin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhnwnstd%2Fsuxotin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhnwnstd%2Fsuxotin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhnwnstd%2Fsuxotin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhnwnstd%2Fsuxotin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jhnwnstd","download_url":"https://codeload.github.com/jhnwnstd/suxotin/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhnwnstd%2Fsuxotin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33735663,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cryptography","decipherment","suxotin"],"created_at":"2026-05-31T15:01:41.662Z","updated_at":"2026-05-31T15:01:42.595Z","avatar_url":"https://github.com/jhnwnstd.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Vowel Identification Algorithms\n\nUnsupervised, language-agnostic classification of letters as vowels or consonants from raw text. No linguistic priors — purely statistical. Tested on 502 languages.\n\n## Algorithms\n\n| Script | Method | F1 |\n|---|---|---|\n| `classify.py` | Ensemble of SVD + Sukhotin + accent propagation | **0.965** |\n| `algorithm1.py` | SVD spectral decomposition ([Thaine \u0026 Penn 2017](https://aclanthology.org/W17-4109/)) | 0.942 |\n| `suxotin.py` | Sukhotin's adjacency algorithm (1962) | 0.919 |\n\n**`classify.py`** is the recommended entry point. It uses SVD as the primary classifier, Sukhotin to detect and correct SVD label flips, and Unicode decomposition to propagate vowel status to accented variants.\n\n## Results\n\nAcross 502 languages, the combined classifier:\n- Correctly identifies all 5 basic vowels in **85%** of languages\n- Corrects **20** SVD label-flip errors using Sukhotin's orientation\n- Adds accent-propagated vowels in **140** languages\n- Filters out Suxotin false positives in **270** languages\n- Achieves perfect F1 on English, German, Spanish, Finnish, Portuguese, Czech, Polish, Indonesian, Latin, Estonian, and 10+ others\n\n## Usage\n\n```bash\npip install -r requirements.txt\n\npython classify.py            # recommended — writes classification_output.txt\npython algorithm1.py          # SVD only — writes algorithm1_output.txt\npython suxotin.py             # Sukhotin only — writes suxotin_algorithm_output.txt\n```\n\nEvaluate against ground truth (32 languages):\n\n```bash\npython -c \"from classify import run; from evaluate import evaluate; evaluate(run)\"\n```\n\n## How It Works\n\n**SVD** builds a binary matrix of letters vs. trigram contexts (p-frames), then uses the second right singular vector to split letters into two clusters. The cluster with higher mean frequency is labeled vowels.\n\n**Sukhotin's** builds a character adjacency matrix from within-word bigrams (excluding whitespace — critical for avoiding false positives). It iteratively selects the highest-sum character as a vowel and subtracts its adjacency contributions until no positive sums remain.\n\n**Ensemble** runs both, uses Sukhotin to correct SVD when the two disagree on cluster orientation, then propagates vowel status to accented variants via Unicode NFKD decomposition.\n\n## Repository Structure\n\n```\nclassify.py          Combined classifier (recommended)\nalgorithm1.py        SVD spectral decomposition\nsuxotin.py           Sukhotin's adjacency algorithm\nevaluate.py          Evaluation harness (32 languages with ground truth)\nutils.py             Shared I/O and preprocessing\nlang_code.csv        Language code → name mapping\nTest/data/           Text corpus per language (502 files)\nvisualizations/      Per-language classification score charts\nLiterature/          Reference papers\n```\n\n## References\n\n- Sukhotin, B.V. (1962). *Eksperimental'noe vydelenie klassov bukv s pomoshch'ju elektronnoj vychislitel'noj mashiny*.\n- Thaine, P. \u0026 Penn, G. (2017). [Vowel and Consonant Classification through Spectral Decomposition](https://aclanthology.org/W17-4109/). Workshop on Subword and Character Level Models in NLP, pp. 82–91.\n\n## License\n\nGPL-3.0 — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhnwnstd%2Fsuxotin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjhnwnstd%2Fsuxotin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhnwnstd%2Fsuxotin/lists"}