Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dolthub/eli5-dataset
https://github.com/dolthub/eli5-dataset
Last synced: 3 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/dolthub/eli5-dataset
- Owner: dolthub
- Created: 2021-05-21T22:46:04.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-05-21T23:10:00.000Z (over 3 years ago)
- Last Synced: 2024-11-08T08:32:49.529Z (about 2 months ago)
- Language: Shell
- Size: 12.7 KB
- Stars: 1
- Watchers: 5
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ELI5 Dataset
Uploaded to DoltHub: https://www.dolthub.com/repositories/max-hoffman/eli5
Information on dataset: https://facebookresearch.github.io/ELI5/download.html
Used HuggingFace as an initial remote: https://huggingface.co/datasets/eli5
The `./scripts/populate_db.sh` script takes about 5 minutes to run:
```bash
+ dolt init
Successfully initialized dolt data repository.
+ dolt sql -q 'create table eli5_train (q_id text primary key, title text, selftext text, document text, subreddit text, answers json, title_urls json, selftext_urls json, answers_urls json)'
+ dolt table import --pk id --update-table eli5_train /Users/max-hoffman/Documents/sandbox/dolt/eli5-fb-dataset/scripts/../tmp/train.csv
Rows Processed: 502937, Additions: 502937, Modifications: 0, Had No Effect: 0Import completed successfully.
+ dolt sql -q 'create table eli5_test (q_id text primary key, title text, selftext text, document text, subreddit text, answers json, title_urls json, selftext_urls json, answers_urls json)'
+ dolt table import --pk id --update-table eli5_test /Users/max-hoffman/Documents/sandbox/dolt/eli5-fb-dataset/scripts/../tmp/test.csv
Rows Processed: 38738, Additions: 38738, Modifications: 0, Had No Effect: 0Import completed successfully.
+ dolt sql -q 'create table eli5_validation (q_id text primary key, title text, selftext text, document text, subreddit text, answers json, title_urls json, selftext_urls json, answers_urls json)'
+ dolt table import --pk id --update-table eli5_validation /Users/max-hoffman/Documents/sandbox/dolt/eli5-fb-dataset/scripts/../tmp/validation.csv
Rows Processed: 16994, Additions: 16994, Modifications: 0, Had No Effect: 0Import completed successfully.
+ dolt commit -am 'Initialize and import eli5 data'
commit jaaf8f0e6sobhjtdotj73gl3v1ccamjk
Author: Max Hoffman
Date: Fri May 21 15:48:37 -0700 2021Initialize and import eli5 data
```