https://github.com/saidziani/sumrized

Automatic Text Summarization (English/Arabic).
https://github.com/saidziani/sumrized

arabic-nlp nlp-machine-learning text-summarization

Last synced: over 1 year ago
JSON representation

Automatic Text Summarization (English/Arabic).

Host: GitHub
URL: https://github.com/saidziani/sumrized
Owner: saidziani
License: gpl-3.0
Created: 2017-12-05T22:50:35.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2018-07-01T11:05:46.000Z (about 8 years ago)
Last Synced: 2024-04-23T01:55:25.539Z (about 2 years ago)
Topics: arabic-nlp, nlp-machine-learning, text-summarization
Language: Jupyter Notebook
Homepage: http://sumrized.com/
Size: 9.74 MB
Stars: 37
Watchers: 4
Forks: 9
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# Arabic Text Summarization

## Starting Project

* Make sure you have installed pip (Python 3)
```text
sudo apt-get install python3-pip
```

* Make sure you have NLTK (Natural Language Tool Kit) installed (Python 3)
```text
sudo pip install -U nltk
```

## Project content

```text
.
├── docs <- All documentation about project
│ ├── reports <- Reports for current project advancement
│   ├── references <- All references papers, links related to this project goes here
│   └── sphinx <- Automaticaly genereated API documentation form stringdocs in code
│
├── lib <- All project's source code goes here
│   ├── data-generation <- Code for data generation if needed
│   └── preprocessing <- Code for data preprocessing
│
├── models <- Contains code to train, test and run models
│   ├── dumps <- trained models file
│   └── scripts <- script to run models
│
├── Readme.md <- Contains current project info
├── requirements.txt <- Packages and modules needed for the current project to run
└── tests <- Unit test for the code in lib/
└── lib
├── analysis
├── data-generation
└── preprocessing

```

## Guide lines

### Data

* Symlink to your Raw data
```text
user@host:/my/awesome/project$ ln -s /path/to/your/raw/data .
```

* Data location must follow this structure:

```
data
├── raw <- Raw data
├── temp <- transformed data stored temporarily if needed
└── preprocessed <- preprocessed data to run in a model
```
#### Data is immutable
* Treat the data/raw (and its format) as immutable. Don't ever edit your raw data, especially not manually, and
especially not in Excel.
* Don't overwrite your raw data. Don't save multiple versions of the raw data.
* The code you write should move the raw data through a pipeline to your final analysis.
* You shouldn't have to run all of the steps every time you want to make a new figure,
but anyone should be able to reproduce the final products with only the code in lib/ and the data in data/raw.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/saidziani/sumrized

Awesome Lists containing this project

README