https://github.com/linuxscout/asmai-arabic-semantic
Asmai: Al'Asma'i arabic semantic analyzer
https://github.com/linuxscout/asmai-arabic-semantic
Last synced: 3 months ago
JSON representation
Asmai: Al'Asma'i arabic semantic analyzer
- Host: GitHub
- URL: https://github.com/linuxscout/asmai-arabic-semantic
- Owner: linuxscout
- License: gpl-3.0
- Created: 2020-08-06T21:42:12.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-09-02T16:26:47.000Z (8 months ago)
- Last Synced: 2024-09-25T16:58:56.870Z (7 months ago)
- Language: Python
- Size: 9.94 MB
- Stars: 5
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Authors: AUTHORS.md
Awesome Lists containing this project
README
# Asmai: (Al'asma'i) Arabic semantic analysis
# مكتبة الأصمعي الدلاليةAsmai: (Al'asma'i) Arabic semantic analysis library for Python


Developpers: Taha Zerrouki: http://tahadz.com
taha dot zerrouki at gmail dot com
Features | value
---------|---------------------------------------------------------------------------------
Authors | [Authors.md](https://github.com/linuxscout/asmai-arabic-semantic/master/AUTHORS.md)
Release | 0.1
License |[GPL](https://github.com/linuxscout/asmai-arabic-semantic/master/LICENSE)
Tracker |[linuxscout/asmai/Issues](https://github.com/linuxscout/asmai-arabic-semantic/issues)
Source |[Github](http://github.com/linuxscout/asmai-arabic-semantic)
Feedbacks |[Comments](https://github.com/linuxscout/asmai-arabic-semantic/)
Accounts |[@Twitter](https://twitter.com/linuxscout)## Description
Asmai: (Al'asma'i) Arabic semantic analysis library for Python, it provides extracting word pairs that carry meanings of the type: (subject-verb, verb-object, word composition)
### مزايا:
* استخلاص ثنائيات الكلمات التي تحمل دلالات من نوع : (فاعلية، مفعولية، إضافة)-
### Usage
#### import
```python
pip install asmai
```
## Citation
```bibtex
@thesis{zerrouki2020adawat,
author = {Taha Zerrouki},
title = {Towards An Open Platform For Arabic Language Processing},
type = {PhD thesis},
institution = {Ecole Nationale Supérieure d'informatique, Alger, Algérie},
date = {2020},
}
```
#### Test
```python
import asmai.anasem as asm
text = u"يعبد الله منذ أن تطلع الشمس"
result = []
anasem = asm.SemanticAnalyzer()
result = anasem.analyze_text(text)
# the result contains objets
anasem.pprint(result)
```* Extract semantic relation, display only found relations
```python
>>> import pprint
>>> sem_result = anasem.display_sem(result)
>>> pprint.pprint(sem_result)
[[['الشَّمْسُ', 'تَطْلُعَ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلُعُ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلُعْ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعَ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعُ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعْ', 'شَمْسٌ', 'طَلَعَ', 'Subject']]]```
* Extract semantic relation, display all words and tags
```python
>>> sem_result = anasem.display_sem(result, all=True)
>>> pprint.pprint(sem_result)
[('يعبد', 'O', []),
('الله', 'O', []),
('منذ', 'O', []),
('أن', 'O', []),
('تطلع', 'B', []),
('الشمس',
'I',
[['الشَّمْسُ', 'تَطْلُعَ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلُعُ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلُعْ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعَ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعُ', 'شَمْسٌ', 'طَلَعَ', 'Subject'],
['الشَّمْسُ', 'تَطْلَعْ', 'شَمْسٌ', 'طَلَعَ', 'Subject']])]
>>>
```* convert to pandas
```python
>>> import pandas as pd
>>>
>>> # flatten the result
... df = pd.DataFrame(anasem.decode(result))
>>> print(df.head())
action affix affix_key forced_word_case ... unvocalized unvoriginal vocalized word
0 -ي-- -ي--|المضارع المنصوب:هو:y False ... يعبد عبد يُعَبِّدَ يعبد
1 -ي-- -ي--|المضارع المجهول المجزوم:هو:y False ... يعبد عبد يُعَبَّدْ يعبد
2 -ي-- -ي--|المضارع المجهول:هو:y False ... يعبد عبد يُعَبَّدُ يعبد
3 -ي-- -ي--|المضارع المعلوم:هو:y False ... يعبد عبد يُعَبِّدُ يعبد
4 -ي-- -ي--|المضارع المجزوم:هو:y False ... يعبد عبد يُعَبِّدْ يعبد[5 rows x 50 columns]
>>> df.to_csv("output/test.csv", encoding="utf8", sep="\t")
>>>```
#### [requirement]
1- pyarabic
2. sqlite
3. sylajone## Data Structure:
### Semantic database
```sql
CREATE TABLE sqlite_sequence(name,seq);
CREATE TABLE "derivations" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL UNIQUE ,
"verb" varchar NOT NULL ,
"transitive" BOOL NOT NULL DEFAULT 1,
"derived" VARCHAR NOT NULL ,
"type" VARCHAR NOT NULL
);
```CSV Structure:
* Derivattion
1. id : id unique in the database
2. verb : vocalized collocation
3. transtive : if the verb is transitive
4. derived : derived word from verb number
5. type : type##### Semantic relations
```sql
CREATE TABLE "relations" (
"id" INTEGER PRIMARY KEY NOT NULL ,
first" VARCHAR NOT NULL DEFAULT ('') ,
"second" VARCHAR NOT NULL DEFAULT ('') ,
"rule" VARCHAR NOT NULL DEFAULT (0)
);
```
CSV Structure:1. id : id unique in the database
2. first: first word
3. second: second word
4. rule : the extraction rule number
: