{"id":13741337,"url":"https://github.com/ajaech/twitter_langid","last_synced_at":"2025-05-08T21:33:19.084Z","repository":{"id":79209776,"uuid":"66724526","full_name":"ajaech/twitter_langid","owner":"ajaech","description":"A hierarchical character-word neural network for language identification","archived":false,"fork":false,"pushed_at":"2017-01-18T22:18:58.000Z","size":1317,"stargazers_count":15,"open_issues_count":0,"forks_count":4,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-11-15T11:36:24.030Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ajaech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-08-27T16:35:30.000Z","updated_at":"2023-06-12T09:23:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"ff00bce8-fa86-422a-a467-854ba465a7a9","html_url":"https://github.com/ajaech/twitter_langid","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajaech%2Ftwitter_langid","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajaech%2Ftwitter_langid/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajaech%2Ftwitter_langid/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajaech%2Ftwitter_langid/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ajaech","download_url":"https://codeload.github.com/ajaech/twitter_langid/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253153163,"owners_count":21862318,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T04:00:58.010Z","updated_at":"2025-05-08T21:33:18.522Z","avatar_url":"https://github.com/ajaech.png","language":"Python","readme":"# twitter_langid\nFor more information please see our paper \n[Hierarchical Character-Word Models for Language Identification](https://arxiv.org/abs/1608.03030).\n\n\n### Getting Started\n\nTo train a model run the command\n\n`python langid.py path/to/outdir`\n\nFor some simple visualization of a trained model run\n\n`python langid.py --mode=debug path/to/outdir`\n\nTo evaluate a trained model run \n\n`python langid.py --mode=eval path/to/outdir`\n\n### Data\n\nThe data directory holds an example input file created from Wikipedia sentence fragments. The file is saved in tab separated format. We partition the data according to the last digit of the id number in the data file. Separate lines are used for training, validation, and testing.\n","funding_links":[],"categories":["Software"],"sub_categories":["Utilities"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajaech%2Ftwitter_langid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fajaech%2Ftwitter_langid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajaech%2Ftwitter_langid/lists"}