https://github.com/4rivappa/machine-translation
https://github.com/4rivappa/machine-translation
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/4rivappa/machine-translation
- Owner: 4rivappa
- Created: 2022-12-24T14:10:31.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-12-24T14:11:26.000Z (over 2 years ago)
- Last Synced: 2025-01-16T10:36:41.033Z (5 months ago)
- Language: Python
- Size: 568 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.txt
Awesome Lists containing this project
README
__ __ _ _ _____ _ _ _
| \/ | __ _ ___| |__ (_)_ __ ___ |_ _| __ __ _ _ __ ___| | __ _| |_(_) ___ _ __
| |\/| |/ _` |/ __| '_ \| | '_ \ / _ \ | || '__/ _` | '_ \/ __| |/ _` | __| |/ _ \| '_ \
| | | | (_| | (__| | | | | | | | __/ | || | | (_| | | | \__ \ | (_| | |_| | (_) | | | |
|_| |_|\__,_|\___|_| |_|_|_| |_|\___| |_||_| \__,_|_| |_|___/_|\__,_|\__|_|\___/|_| |_|Methods Implemented:
Data Collection (Parallel corpus)
Tokenization
Creating Inverted Index
Calculating most probable sentence
POS Tagging (Parts of Speech)
Identifying NER (Named Entity Recognition)
Output Generater Model (uses POS tagging and NER recognition)Running:
Command:
$> python indexer.py "input-telugu-sequence"
Flow:
- Creating tokens out of parallel corpus
- Creating inverted index for tokens and document-ids
- Calculating most probable hindi sentence
- Calculating pos tags for input telugu sentence
- Identifying ner (rule 1, rule 2) for input telugu sentence
- Generating output using model based on ner and pos tagging