Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/madrugado/language-identification
Few models for language identification and code switching tasks
https://github.com/madrugado/language-identification
Last synced: about 2 months ago
JSON representation
Few models for language identification and code switching tasks
- Host: GitHub
- URL: https://github.com/madrugado/language-identification
- Owner: madrugado
- Created: 2019-07-08T13:14:49.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-05-03T19:47:08.000Z (8 months ago)
- Last Synced: 2024-05-03T21:00:48.081Z (8 months ago)
- Language: Python
- Size: 1.84 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This is my tryout for language identification.
### Problems
1. There are three languages: Spanish (ES), Portuguese (PT-PT) and English (EN) which need to be identified given a sentence.
2. There are two language variants: European Portuguese (PT-PT) and Brazilian Portuguese (PT-BR), they should be tell apart given a sentence.
3. There are tweets written English and Spanish. Each token in a tweet should be identified to belong to 'en', 'es' or 'other' class.
There are additional info for problems 1 and 2 in [readme](./langid/README.md) under langid folder, and about the third problem in [readme](./code-switching/README.md) under code-switching folder.