https://github.com/mohammadrezaamani/full-nlp-tutorial

Last synced: 10 months ago
JSON representation
Host: GitHub
URL: https://github.com/mohammadrezaamani/full-nlp-tutorial
Owner: MohammadrezaAmani
Created: 2023-07-03T11:27:22.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-07-11T22:28:48.000Z (over 2 years ago)
Last Synced: 2025-01-13T11:12:22.604Z (11 months ago)
Size: 76.2 KB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # FULL NLP TUTORIAL

This git repository contains everything you need to learn and master Natural Language Processing

NUMPY

---------------------------------------

|ID | MOH | MOB | COURSE |

|--|--|--|--|

| 1 | [ ] |[ ] | [Learn to write a NumPy tutorial](https://github.com/numpy/numpy-tutorials/tree/main/content/tutorial-style-guide.md)|

| 2 | [ ] | [ ] | [Tutorial: Linear algebra on n-dimensional arrays](https://github.com/numpy/numpy-tutorials/tree/main/content/tutorial-svd.md)|

| 3 | [ ] | [ ] | [Tutorial: Determining Moore's Law with real data in NumPy](https://github.com/numpy/numpy-tutorials/tree/main/content/mooreslaw-tutorial.md)|

| 4 | [ ] | [ ] | [Tutorial: Saving and sharing your NumPy arrays](https://github.com/numpy/numpy-tutorials/tree/main/content/save-load-arrays.md)|

| 5 | [ ] | [ ] | [Tutorial: NumPy deep learning on MNIST from scratch](https://github.com/numpy/numpy-tutorials/tree/main/content/tutorial-deep-learning-on-mnist.md)|

| 6 | [ ] | [ ] | [Tutorial: X-ray image processing](https://github.com/numpy/numpy-tutorials/tree/main/content/tutorial-x-ray-image-processing.md)|

| 7 | [ ] | [ ] | [Tutorial: NumPy deep reinforcement learning with Pong from pixels](https://github.com/numpy/numpy-tutorials/tree/main/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md)|

| 8 | [ ] | [ ] | [Tutorial: Masked Arrays](https://github.com/numpy/numpy-tutorials/tree/main/content/tutorial-ma.md)|

| 9 | [ ] | [ ] | [Tutorial: Static Equilibrium](https://github.com/numpy/numpy-tutorials/tree/main/content/tutorial-static_equilibrium.md)|

| 10 | [ ] | [ ] | [Tutorial: Plotting Fractals](https://github.com/numpy/numpy-tutorials/tree/main/content/tutorial-plotting-fractals.ipynb)

| 11 | [ ] | [ ] | [Tutorial: NumPy natural language processing from scratch with a focus on ethics](https://github.com/numpy/numpy-tutorials/tree/main/content/tutorial-nlp-from-scratch.md)|

| 12 | [ ] | [ ] | [Tutorial: Analysing the impact of the lockdown on air quality in Delhi, India](https://github.com/numpy/numpy-tutorials/tree/main/content/tutorial-air-quality-analysis.md)|

PANDAS

---

|ID | MOH | MOB | COURSE |

|--|--|--|--|

| 0 | [ ] | [x] | [quick pandas tutorial](https://github.com/chiphuyen/just-pandas-things/blob/master/just-pandas-things.ipynb)|

| 1 | [ ] | [ ] | [pandas examples](https://github.com/codebasics/py/tree/master/pandas) 

| 2 | [ ] | [x] | [pandas workshop](https://github.com/stefmolin/pandas-workshop/) see [slides](https://github.com/stefmolin/pandas-workshop/tree/main/slides) folder

| 3 | [ ] | [x] | [pandas data proccesing](https://github.com/mebauer/data-analysis-using-python/blob/main/2-data-inspection-cleaning-wrangling.ipynb)

PLOTTING

---

|ID | MOH | MOB | COURSE |

|--|--|--|--|

| 0 | [ ] | [ ] | [matplotlib for beginners](https://github.com/rougier/matplotlib-tutorial)

| 1 | [ ] | [ ] | [drawing animation with matplotlib](https://www.geeksforgeeks.org/using-matplotlib-for-animations/)

| 2 | [ ] | [ ] | [plot examples](https://github.com/mebauer/data-analysis-using-python/blob/main/3-plotting-visualizations.ipynb)

| 3 | [ ] | [ ] | [another repo](https://github.com/stefmolin/pandas-workshop/blob/main/slides/3-data_visualization.ipynb)

| 4 | [ ] | [ ] | [seaborn tutorial](https://github.com/clair513/Seaborn-Tutorial) |

| 5 | [ ] | [ ] | [plotly tutorial](https://www.geeksforgeeks.org/python-plotly-tutorial/)

SCIKIT-LEARN

---

|ID | MOH | MOB | COURSE |

|--|--|--|--|

| 0 | [ ] |[ ]|[scikit-learn tutorial](https://github.com/justmarkham/scikit-learn-videos)|

| 1 | [ ] | [ ] | [scikit-learn examples](https://github.com/scikit-learn/scikit-learn/tree/main/examples)

# NLP in Persian

### Libraries

- [Hazm](https://github.com/sobhe/hazm): Python library for digesting Persian text.

- [Parsivar](https://github.com/ICTRC/Parsivar): A Language Processing Toolkit for Persian

- [Perke](https://github.com/AlirezaTheH/perke): Perke is a Python keyphrase extraction package for Persian language. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extended to develop new models.

- [Perstem](https://github.com/jonsafari/perstem): Persian stemmer, morphological analyzer, transliterator, and partial part-of-speech tagger

- [ParsiAnalyzer](https://github.com/NarimanN2/ParsiAnalyzer): Persian Analyzer For Elasticsearch

- [virastar](https://github.com/aziz/virastar): Cleaning up Persian text!

### Datasets

- [Bijankhan Corpus](https://dbrg.ut.ac.ir/بیژن%E2%80%8Cخان/): Bijankhan corpus is a tagged corpus that is suitable for natural language processing research on the Persian (Farsi) language. This collection is gathered form daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural and so on. Totally, there are 4300 different subjects. The Bijankhan collection contains about 2.6 millions manually tagged words with a tag set that contains 40 Persian POS tags.

- [Uppsala Persian Corpus (UPC)](https://sites.google.com/site/mojganserajicom/home/upc): Uppsala Persian Corpus (UPC) is a large, freely available Persian corpus. The corpus is a modified version of the Bijankhan corpus with additional sentence segmentation and consistent tokenization containing 2,704,028 tokens and annotated with 31 part-of-speech tags. The part-of-speech tags are listed with explanations in [this table](https://sites.google.com/site/mojganserajicom/home/upc/Table_tag.pdf).

- [Large-Scale Colloquial Persian](http://hdl.handle.net/11234/1-3195): Large Scale Colloquial Persian Dataset (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. LSCP includes 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, Part-of-speech tags, sentiment polarity and automatic translation of original Persian sentences in English (EN), German (DE), Czech (CS), Italian (IT) and Hindi (HI) spoken languages. Learn more about this project at [LSCP webpage](https://iasbs.ac.ir/~ansari/lscp/).

- [ArmanPersoNERCorpus](https://github.com/HaniehP/PersianNER): The dataset includes 250,015 tokens and 7,682 Persian sentences in total. It is available in 3 folds to be used in turn as training and test sets. Each file contains one token, along with its manually annotated named-entity tag, per line. Each sentence is separated with a newline. The NER tags are in IOB format.

- [FarsiYar PersianNER](https://github.com/Text-Mining/Persian-NER): The dataset includes about 25,000,000 tokens and about 1,000,000 Persian sentences in total based on [Persian Wikipedia Corpus](https://github.com/Text-Mining/Persian-Wikipedia-Corpus). The NER tags are in IOB format. More than 1000 volunteers contributed tag improvements to this dataset via web panel or android app. They release updated tags every two weeks.

- [PERLEX](http://farsbase.net/PERLEX.html): The first Persian dataset for relation extraction, which is an expert translated version of the “Semeval-2010-Task-8” dataset. Link to the relevant publication.

- [Persian Syntactic Dependency Treebank](http://dadegan.ir/catalog/perdt): This treebank is supplied for free noncommercial use. For commercial uses feel free to contact us. The number of annotated sentences is 29,982 sentences including samples from almost all verbs of the Persian valency lexicon.

- [Uppsala Persian Dependency Treebank (UPDT)](http://stp.lingfil.uu.se/~mojgan/UPDT.html): Dependency-based syntactically annotated corpus.

- [Hamshahri](https://dbrg.ut.ac.ir/hamshahri/): Hamshahri collection is a standard reliable Persian text collection that was used at Cross Language Evaluation Forum (CLEF) during years 2008 and 2009 for evaluation of Persian information retrieval systems.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mohammadrezaamani/full-nlp-tutorial

Awesome Lists containing this project

README