https://github.com/4rivappa/machine-translation

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/4rivappa/machine-translation
Owner: 4rivappa
Created: 2022-12-24T14:10:31.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2022-12-24T14:11:26.000Z (over 2 years ago)
Last Synced: 2025-01-16T10:36:41.033Z (5 months ago)
Language: Python
Size: 568 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.txt

Awesome Lists containing this project

README

          __  __            _     _              _____                    _       _   _             

 |  \/  | __ _  ___| |__ (_)_ __   ___  |_   _| __ __ _ _ __  ___| | __ _| |_(_) ___  _ __  

 | |\/| |/ _` |/ __| '_ \| | '_ \ / _ \   | || '__/ _` | '_ \/ __| |/ _` | __| |/ _ \| '_ \ 

 | |  | | (_| | (__| | | | | | | |  __/   | || | | (_| | | | \__ \ | (_| | |_| | (_) | | | |

 |_|  |_|\__,_|\___|_| |_|_|_| |_|\___|   |_||_|  \__,_|_| |_|___/_|\__,_|\__|_|\___/|_| |_|

Methods Implemented:

    Data Collection (Parallel corpus)

    Tokenization

    Creating Inverted Index

    Calculating most probable sentence

    POS Tagging (Parts of Speech)

    Identifying NER (Named Entity Recognition)

    Output Generater Model (uses POS tagging and NER recognition)

Running:

    Command:

        $> python indexer.py "input-telugu-sequence"

    Flow:

        - Creating tokens out of parallel corpus

        - Creating inverted index for tokens and document-ids

        - Calculating most probable hindi sentence

        - Calculating pos tags for input telugu sentence

        - Identifying ner (rule 1, rule 2) for input telugu sentence

        - Generating output using model based on ner and pos tagging

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/4rivappa/machine-translation

Awesome Lists containing this project

README