https://github.com/fracpete/reuters21578
Python script to convert the Reuters-21578 dataset into an ARFF file for MEKA.
https://github.com/fracpete/reuters21578
meka multi-label-classification python3 text-classification
Last synced: about 1 year ago
JSON representation
Python script to convert the Reuters-21578 dataset into an ARFF file for MEKA.
- Host: GitHub
- URL: https://github.com/fracpete/reuters21578
- Owner: fracpete
- License: mit
- Created: 2022-05-29T04:17:27.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-06-08T02:56:33.000Z (about 4 years ago)
- Last Synced: 2024-10-19T12:15:54.302Z (over 1 year ago)
- Topics: meka, multi-label-classification, python3, text-classification
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# reuters21578
Python script to convert the Reuters-21578 dataset into an ARFF file for MEKA.
## Original data
Download the original data from here:
http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html
And extract it.
# Install
* Clone repository
```bash
git clone https://github.com/fracpete/reuters21578.git
```
* Change into repo
```bash
cd reuters21578
```
* Create virtual environment
```bash
virtualenv -p /usr/bin/python3 venv
./venv/bin/pip install .
```
## Generate ARFF
You can use the `reuters-generate` console script to generate the ARFF file:
```
usage: reuters-generate [-h] -t FILE -d DIR -o FILE
Generates a MEKA ARFF file from the Reuters 21578 SGML files.
optional arguments:
-h, --help show this help message and exit
-t FILE, --topics_file FILE
the file with all the topics (one per line).
-d DIR, --data_dir DIR
the directory with the .sgm files.
-o FILE, --output_file FILE
the ARFF file to generate.
```