Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ajaech/twitter_langid
A hierarchical character-word neural network for language identification
https://github.com/ajaech/twitter_langid
Last synced: 3 months ago
JSON representation
A hierarchical character-word neural network for language identification
- Host: GitHub
- URL: https://github.com/ajaech/twitter_langid
- Owner: ajaech
- License: unlicense
- Created: 2016-08-27T16:35:30.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-01-18T22:18:58.000Z (almost 8 years ago)
- Last Synced: 2024-04-18T22:36:57.287Z (7 months ago)
- Language: Python
- Homepage:
- Size: 1.26 MB
- Stars: 15
- Watchers: 6
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- low-resource-languages - twitter_langid - A hierarchical character-word neural network for language identification. (Software / Utilities)
README
# twitter_langid
For more information please see our paper
[Hierarchical Character-Word Models for Language Identification](https://arxiv.org/abs/1608.03030).### Getting Started
To train a model run the command
`python langid.py path/to/outdir`
For some simple visualization of a trained model run
`python langid.py --mode=debug path/to/outdir`
To evaluate a trained model run
`python langid.py --mode=eval path/to/outdir`
### Data
The data directory holds an example input file created from Wikipedia sentence fragments. The file is saved in tab separated format. We partition the data according to the last digit of the id number in the data file. Separate lines are used for training, validation, and testing.