Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mc-cat-tty/language-classification
Suite of Python modules to recognise the language of a file
https://github.com/mc-cat-tty/language-classification
csv files flask frequency-table itis-fermi-modena language language-analyzer language-classification language-classifier language-recognition python python3 twitter
Last synced: about 2 months ago
JSON representation
Suite of Python modules to recognise the language of a file
- Host: GitHub
- URL: https://github.com/mc-cat-tty/language-classification
- Owner: mc-cat-tty
- License: mit
- Created: 2019-12-25T09:24:53.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-07-27T11:00:18.000Z (over 2 years ago)
- Last Synced: 2024-10-28T00:21:29.624Z (3 months ago)
- Topics: csv, files, flask, frequency-table, itis-fermi-modena, language, language-analyzer, language-classification, language-classifier, language-recognition, python, python3, twitter
- Language: Python
- Size: 12.9 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# File Language Analyzer
> File Language Analyzer is a suite of Python modules, that provides objects, constants and functions, to recognise the language of a file, analyze its informations and process (elaborate and create) .csv letter frequency tables.
Keep in mind that this project is programmed very poorly, however the logic behind the adopted method is interesting.## Table of Contents
* [Project Status](#project-status)
* [Features](#features)
* [Math behind it](#math-behind-it)
* [Technologies](#technologies)
* [Requirements](#requirements)
* [Launch](#launch)
* [Usage](#usage)## Project Status
![License](https://img.shields.io/badge/license-MIT-brightgreen) ![build](https://img.shields.io/badge/build-passed-brightgreen) ![Version](https://img.shields.io/badge/version-1.0.0-blue)
## Features
- Recognise the language of a file
- Convert .csv frequency table to Python dictionary
- Convert Python dictionary to .csv frequency table
- Generate frequency table starting from a set of Twitter messages## Math behind it
By analyzing the frequency of every single letter is possible to detect the language of a given text.
Once the characters' frequencies have been extracted, this information can be used as a representation of the text.
We want to find out which is its language, so we have to determine which is the table's column that has the nearest values.
To accomplish that, it can be used the Pythagorean theorem extended to 26 dimensions, the number of letters in the Latin alphabet.
By computing the distance between the given text and each language inside the table, it's possible to define which is the nearest language.## Technologies
- **_Python_** 3.x
- Python built-in libraries
- Twitter API wrapped by **_tweepy_** library
- **_wikipedia-api_** module
- **_Flask_**## Requirements
Use one of the following commands (according to the configuration of your environment):
```sh
$ pip install -r requirements.txt
```
or```sh
$ py -m pip install -r requirements.txt
```## Launch
If you are in Bash-like environment with Python installed, you can run directly by typing:
```sh
$ ./Main.py
```Otherwise, depending on your Python interpreter installation and your OS:
```sh
$ python Main.py
```
or
```sh
$ py Main.py
```
After that, go to http://127.0.0.1:5000 or http://localhost:5000 and try out the web interface.Default frequency table is `letters_frequency_twitter.csv`
## Usage
If you want to use `tweetrain.py`'s functions, you have to insert your personal Twitter tokens.
Look at the first four uppercase variables and fill in double quotes with the proper value.