https://github.com/guyandtheworld/makri
Malayalam Corpus, along with a web based POSTagger. ✒️📃
https://github.com/guyandtheworld/makri
machine-learning malayalam python
Last synced: 6 months ago
JSON representation
Malayalam Corpus, along with a web based POSTagger. ✒️📃
- Host: GitHub
- URL: https://github.com/guyandtheworld/makri
- Owner: guyandtheworld
- Created: 2017-07-19T12:35:23.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-26T17:14:09.000Z (almost 8 years ago)
- Last Synced: 2025-04-30T17:52:08.122Z (9 months ago)
- Topics: machine-learning, malayalam, python
- Language: Python
- Homepage: https://makri.ml/
- Size: 3.23 MB
- Stars: 6
- Watchers: 3
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Makri - Malayalam Knowledge Ripper
Makri is a POSTagger built with [RDRPOSTagger](https://github.com/datquocnguyen/RDRPOSTagger) which was trained with over 80,000 lines of POS Tagged Silver Corpus.
## Details.
The scrapper is build using Scrapy-Python. The `makri-links.py` can be used to collect
links of malayalam articles and `makri-sentences.py` can be used to get malayalam text from
websites and write into files which are divided by category.
A project by Adarsh S and Jithin James under the supervision of ICFOSS under the supervision of Dr. Rajeev RR
## FOSSASIA Talk
[Tamil NLP creator and talk mentor Ashok R](https://github.com/AshokR/)
[Slides](https://docs.google.com/presentation/d/1A1n1HqGkXPgyarPUB91tb208IGt2KuKqj5umgUCw5Uw/edit?usp=sharing)
[BLARK Ideology](http://www.elsnet.org/dox/krauwer-specom2003.pdf)
[Language Resource Classification](http://ixa.si.ehu.es/sites/default/files/dokumentuak/3855/LEIPZIG_2014_Sarasola.pdf)
[Language Statistics Data](https://economictimes.indiatimes.com/tech/internet/how-online-vernacular-market-is-becoming-the-big-battle-ground-for-tech-cos/articleshow/63248994.cms)
## Web App
A Django based web app is built to distribute the service.
The development server can be accessed via
python2 manage.py runserver
The web app has an input text option for live data tagging, an upload file option to tag text files.
The sevice also has a web end-point to use in other applications.
curl -G -v "http://127.0.0.1:8000/" --data-urlencode "q=input" will return the tagged data for the given input.
## Team Members
@isht3, @jjmachan -- creators of the project
@abinmn -- deployable web app creator