https://github.com/jplusplus/wikileaks-cg-analyser
Cable processing tools.
https://github.com/jplusplus/wikileaks-cg-analyser
Last synced: 10 months ago
JSON representation
Cable processing tools.
- Host: GitHub
- URL: https://github.com/jplusplus/wikileaks-cg-analyser
- Owner: jplusplus
- License: gpl-3.0
- Created: 2013-11-05T16:12:47.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2013-11-10T00:19:59.000Z (over 12 years ago)
- Last Synced: 2024-04-14T04:55:28.849Z (about 2 years ago)
- Language: Python
- Size: 383 KB
- Stars: 1
- Watchers: 10
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Wikileaks: Cables analyser
This project includes 4 modules to ease the document analyse: an ngrams extractor, a string cleaner, a geo references extractor and a database connector (PostGreSQL).
## Installation
**1. Prerequisite**
```bash
sudo apt-get install build-essential git-core python python-pip python-dev
sudo pip install virtualenv
```
**2. Download the project**
```bash
git clone git@github.com:jplusplus/wikileaks-cg-analyser.git
```
**3. Install**
```bash
make install
```
## Usage
*Tokens* alway represents the string to analyse.
### String cleaner
```bash
$ python ./cleaner.py
usage: cleaner.py [-h] [-l LANGUAGE] [-c CACHE] [--slugify] [--no-slugify] tokens```
```
#### Exemple
Remove all `french` stopwords and slugify the string:
```bash
$ python ./cleaner.py -l french --slugify "Les sanglots longs des violons de l'automne blessent mon coeur d'une langueur monotone."
sanglots longs violons automne blessent coeur langueur monotone
```
### Ngrams extractor
```bash
$ python ./ngrams.py
usage: ngrams.py [-h] [-m N_MIN] [-x N_MAX] tokens
```
### Geo references extractor
```bash
$ python ./geo.py
usage: geo.py [-h] {city,country} tokens
```
## License
This software is under the [GNU GENERAL PUBLIC LICENSE v3](./LICENSE).