Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wojciechmarek/language-recognition
A project allowing to detect a language of a provided text.
https://github.com/wojciechmarek/language-recognition
ai ann artificial-neural-networks castle-windsor csharp language-detection mvvm nunit relay-command sharp-learning wpf
Last synced: 2 days ago
JSON representation
A project allowing to detect a language of a provided text.
- Host: GitHub
- URL: https://github.com/wojciechmarek/language-recognition
- Owner: wojciechmarek
- Created: 2019-04-08T10:46:12.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-04-27T21:11:57.000Z (almost 2 years ago)
- Last Synced: 2024-11-30T02:12:45.263Z (2 months ago)
- Topics: ai, ann, artificial-neural-networks, castle-windsor, csharp, language-detection, mvvm, nunit, relay-command, sharp-learning, wpf
- Language: C#
- Homepage:
- Size: 78.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LanguageRecognition
This project allows to detection of a language of a provided text.
## Purpose
LanguageRecognition was made while I was in my third year of study to pass an individual project. Later I based on it my Bachelor's thesis.
## How it works
One of the easiest methods of detecting language is recognition by counting the frequency of letters in text. Each language uses letters differently, for example:
- in the Polish language is popular a letter: `w`
- in the English language is popular a letter: `h`Here is a screenshot from an unknown source found somewhere on the Internet (Dear Author! - forgive me 🥹) showing the distribution of a letter in polish, english, and french:
![letters](https://user-images.githubusercontent.com/27026036/206008339-20b6db47-1f1f-4e28-921c-37e43b92774f.png)
Having that knowledge, we can create an Artificial Neural Network. An ANN has to have 26 input neurons in the input layer, some hidden layers, and X amount of neurons in the output layer. The X amount depends on how many languages we want to detect.
Later, after providing a new text, we have to programmatically count the amount of every letter in the text, and divide it by the total amount of letters to obtain a value in the range of 0 to 1. Next, we pass the values to input neurons and in the result, the network should also generate values in the range of 0 to 1 on output neurons. The neuron having the highest value suggests language prediction. That's all 😇.
## Screenshots
- Main window view - shows 3 modules:
![main](https://user-images.githubusercontent.com/27026036/55724267-56dd6200-5a0b-11e9-93d2-4c426a817d8b.PNG)
- Prepare window view - allows preparing examples from different languages. Every CREATE SAMPLE press will generate a new file or will append to existing file percentage values of letters. Texts in the given language should have the same label:
![prepare](https://user-images.githubusercontent.com/27026036/55724264-5644cb80-5a0b-11e9-9d5d-7b180cd27a50.PNG)
- Train window view - allows loading samples of languages and saving the trained model in a specific location:
![train](https://user-images.githubusercontent.com/27026036/55724265-56dd6200-5a0b-11e9-94cd-98a30f2363f4.PNG)
- Recognize window view - allows recognizing input text based on the following model of trained Artificial Neural Network:
![recognize](https://user-images.githubusercontent.com/27026036/55724266-56dd6200-5a0b-11e9-96d4-bdb69ffbdf98.PNG)
## Used libraries
- Material Design Themes
- Visual Studio 2019
- Newtonsoft.Json
- Castle Windsor
- .Net Framework
- SharpLearning
- MVVM Pattern
- RelayCommand
- NUnit## Languages samples
Samples of 5 languages (🇵🇱, 🇺🇸, 🇩🇪, 🇫🇷, 🇮🇹) and trained network based on that file are available in the `/Samples` directory.
## How to run
Take a computer with Windows OS. Install .Net Framework and Visual Studio. Open the project file, and follow the instructions on what else to install, then build and run it. Lastly, load a trained model and try to detect the language of any text.