https://github.com/hsm207/imdb_data
Script to download IMDb data and convert it to tsv files
https://github.com/hsm207/imdb_data
imdb-dataset
Last synced: 7 months ago
JSON representation
Script to download IMDb data and convert it to tsv files
- Host: GitHub
- URL: https://github.com/hsm207/imdb_data
- Owner: hsm207
- License: gpl-3.0
- Created: 2019-01-17T07:47:40.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-01-17T10:55:35.000Z (over 6 years ago)
- Last Synced: 2025-01-14T11:16:58.650Z (9 months ago)
- Topics: imdb-dataset
- Language: Python
- Homepage:
- Size: 14.6 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Introduction
This repository contains code to create a tsv file of the [IMDb dataset](http://ai.stanford.edu/~amaas/data/sentiment/) using
the [tensor2tensor](https://github.com/tensorflow/tensor2tensor) library.# Usage
1. Create and switch to a new Python 3.6+ environment.
2. Navigate to the project's root directory.
3. Execute:
```bash
pip install -r requirements.txt
```
4. Execute:
```bash
python create_imdb_dataset.py --output_dir OUTPUT_DIR
```
where `OUTPUT_DIR` is the path to where you want to save the training
and test files.