An open API service indexing awesome lists of open source software.

https://github.com/marcw/dgtools

Tools to work with Discogs data dumps
https://github.com/marcw/dgtools

discogs discogs-dump parquet postgresql

Last synced: 15 days ago
JSON representation

Tools to work with Discogs data dumps

Awesome Lists containing this project

README

          

# dgtools

A command line utility to work with the Discogs data dumps.

It makes it super easy to:

- List data dumps
- Download a specific dumps
- Convert dumps to ndjson or parquet
- Import a dump into a PostgreSQL database

## Usage

```
dgtools [global options] command [command options] [arguments...]
```

### Global Options

- `--discogs-bucket` - The URL of the Discogs data dumps (default: "https://discogs-data-dumps.s3.us-west-2.amazonaws.com")

## Commands

### dump

Work with Discogs data dump files.

#### dump list

List the files in the Discogs data dumps.

```
dgtools dump list [options]
```

**Options:**
- `--year` - Filter by year
- `--month` - Filter by month
- `--type` - Filter by data type
- `--no-table` - Don't print the table (output filenames only)

#### dump structure

Dump the structure of an XML file.

```
dgtools dump structure [options]
```

**Arguments:**
- `file` - The file to dump the structure of

**Options:**
- `--stop-after X` - Stops analysis after X records

#### dump download

Download a Discogs data dump.

```
dgtools dump download [options]
```

**Arguments:**
- `name` - The file to download

**Options:**
- `--out-dir` - The output directory (default: ".")
- `--overwrite` - Force the download even if the file already exists
- `--checksum` - Check the checksum of the file after downloading (default: true)

#### dump convert

Convert a dump to a different format

```
dgtools dump convert --out [options]
```

**Arguments:**
- `name` - The file to convert

**Options:**
- `--out` - The output file
- `--stop-after X` - Stop conversion after X records

### db

Work with a database.

**Options:**
- `--database-url` - The URL of the database to connect to (default: "postgres://$USER@localhost:5432/dgtools", can be set via DATABASE_URL environment variable)

#### db prepare

Prepare the database for import by running migrations.

```
dgtools db prepare
```

#### db import

Import data from a dump file to the database.

```
dgtools db import
```

**Arguments:**
- `file` - The file to import the data from

#### db nuke

Nuke the database by rolling back all migrations.

```
dgtools db nuke
```

## LICENSE

Please see [LICENSE.md](LICENSE.md)