https://github.com/saidziani/sumrized
Automatic Text Summarization (English/Arabic).
https://github.com/saidziani/sumrized
arabic-nlp nlp-machine-learning text-summarization
Last synced: over 1 year ago
JSON representation
Automatic Text Summarization (English/Arabic).
- Host: GitHub
- URL: https://github.com/saidziani/sumrized
- Owner: saidziani
- License: gpl-3.0
- Created: 2017-12-05T22:50:35.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-07-01T11:05:46.000Z (about 8 years ago)
- Last Synced: 2024-04-23T01:55:25.539Z (about 2 years ago)
- Topics: arabic-nlp, nlp-machine-learning, text-summarization
- Language: Jupyter Notebook
- Homepage: http://sumrized.com/
- Size: 9.74 MB
- Stars: 37
- Watchers: 4
- Forks: 9
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Arabic Text Summarization
## Starting Project
* Make sure you have installed pip (Python 3)
```text
sudo apt-get install python3-pip
```
* Make sure you have NLTK (Natural Language Tool Kit) installed (Python 3)
```text
sudo pip install -U nltk
```
## Project content
```text
.
├── docs <- All documentation about project
│ ├── reports <- Reports for current project advancement
│ ├── references <- All references papers, links related to this project goes here
│ └── sphinx <- Automaticaly genereated API documentation form stringdocs in code
│
├── lib <- All project's source code goes here
│ ├── data-generation <- Code for data generation if needed
│ └── preprocessing <- Code for data preprocessing
│
├── models <- Contains code to train, test and run models
│ ├── dumps <- trained models file
│ └── scripts <- script to run models
│
├── Readme.md <- Contains current project info
├── requirements.txt <- Packages and modules needed for the current project to run
└── tests <- Unit test for the code in lib/
└── lib
├── analysis
├── data-generation
└── preprocessing
```
## Guide lines
### Data
* Symlink to your Raw data
```text
user@host:/my/awesome/project$ ln -s /path/to/your/raw/data .
```
* Data location must follow this structure:
```
data
├── raw <- Raw data
├── temp <- transformed data stored temporarily if needed
└── preprocessed <- preprocessed data to run in a model
```
#### Data is immutable
* Treat the data/raw (and its format) as immutable. Don't ever edit your raw data, especially not manually, and
especially not in Excel.
* Don't overwrite your raw data. Don't save multiple versions of the raw data.
* The code you write should move the raw data through a pipeline to your final analysis.
* You shouldn't have to run all of the steps every time you want to make a new figure,
but anyone should be able to reproduce the final products with only the code in lib/ and the data in data/raw.