{"id":19877454,"url":"https://github.com/mla/pg_sample","last_synced_at":"2025-05-16T04:03:40.838Z","repository":{"id":958013,"uuid":"743648","full_name":"mla/pg_sample","owner":"mla","description":"PostgreSQL utility for creating a small, sample database from a larger one","archived":false,"fork":false,"pushed_at":"2025-03-30T18:04:40.000Z","size":171,"stargazers_count":324,"open_issues_count":18,"forks_count":49,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-09T06:09:44.276Z","etag":null,"topics":["database","postgresql"],"latest_commit_sha":null,"homepage":"","language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mla.png","metadata":{"files":{"readme":"README.md","changelog":"Changes","contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2010-06-27T19:25:46.000Z","updated_at":"2025-04-03T15:46:03.000Z","dependencies_parsed_at":"2024-03-03T18:33:27.113Z","dependency_job_id":"23c31891-e0b9-4218-b903-fe4b47c3fae1","html_url":"https://github.com/mla/pg_sample","commit_stats":{"total_commits":149,"total_committers":10,"mean_commits":14.9,"dds":"0.24832214765100669","last_synced_commit":"ddede105b82831cbddfe067bd2762a97f4aa6211"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mla%2Fpg_sample","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mla%2Fpg_sample/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mla%2Fpg_sample/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mla%2Fpg_sample/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mla","download_url":"https://codeload.github.com/mla/pg_sample/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254464891,"owners_count":22075570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","postgresql"],"created_at":"2024-11-12T16:37:29.904Z","updated_at":"2025-05-16T04:03:40.817Z","avatar_url":"https://github.com/mla.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NAME\n\npg\\_sample - extract a small, sample dataset from a larger PostgreSQL\ndatabase while maintaining referential integrity.\n\n# SYNOPSIS\n\npg\\_sample \\[ option... \\] \\[ dbname \\]\n\n# DESCRIPTION\n\npg\\_sample is a utility for exporting a small, sample dataset from a\nlarger PostgreSQL database. The output and command-line options closely\nresemble the pg\\_dump backup utility (although only the plain-text format\nis supported).\n\nThe sample database produced includes all tables from the original,\nmaintains referential integrity, and supports circular dependencies.\n\nTo build an actual instance of the sample database, the output of this script\ncan be piped to the psql utility. For example, assuming we have an existing\nPostgreSQL database named \"mydb\", a sample database could be constructed with:\n\n```\n$ createdb sampledb\n$ pg_sample mydb | psql -v ON_ERROR_STOP=1 sampledb\n```\n\nThe \"-v ON_ERROR_STOP=1\" option is not required but is recommended to catch any\nimport errors.\n\n\n## Requirements\n\n- PostgreSQL 8.1 or later\n- pg\\_dump should be in your search path (in order to dump the schema)\n- Perl DBI and DBD::Pg (\u003e= 2.0) modules\n\n## Installation\n\nSee the [Docker section](#using-with-docker) for details on how to\nrun pg_sample with Docker.\n\nTo install locally:\n\n1. Clone the repo. e.g.,\n    ```\n    $ git clone git@github.com:mla/pg_sample.git\n    ```\n2. Install dependencies. For Ubuntu / Mint, try:\n    ```\n    $ sudo apt install perl libdbi-perl libdbd-pg-perl\n    ```\n3. Run it.\n    ```\n    $ cd pg_sample\n    $ ./pg_sample ... # See below for options\n    ```\n\n## Command-line Options\n\n_dbname_\n\n    Specifies the database to sample. If not specified, uses the\n    environment variable PGDATABASE, if defined; otherwise, uses\n    the username of the user executing the script.\n\n__\\-a__  \n__\\--data-only__\n\n    Output only the data, not the schema (data definitions).\n\n__\\--help__\n\n    Output detailed options and exit.\n\n__\\-E__ _encoding_  \n__\\--encoding=__*encoding*\n\n    Use the specified character set encoding. If not specified, uses the\n    environment variable PGCLIENTENCODING, if defined; otherwise, uses\n    the encoding of the database.\n\n__\\-f__ _file_  \n__\\--file=__*file*\n\n    Send output to the specified file. If omitted, standard output is used.\n\n__\\--force__\n\n    Drop the sample schema if it exists.\n\n__\\--keep__\n\n    Don't delete the sample schema when the script finishes.\n\n__\\--limit=__*limit*\n\n    As a numeric value, specifies the default number of rows to copy from\n    each table (defaults to 100). Note that sample tables may end up with\n    significantly more rows in order to satisfy foreign key constraints.\n\n    If the value is a string, it is interpreted as a pattern/rule pair to\n    apply to matching tables. Examples:\n\n         # include all rows from the users table\n         --limit=\"users = *\"\n\n        # include 1,000 rows from users table\n        --limit=\"users = 1000\"\n\n        # include 10% of the total rows from users table\n        --limit=\"users = 10%\"\n\n        # include all users where deactivated column is false\n        --limit=\"users = NOT deactivated\"\n\n        # include all rows from all tables in the forums schema\n        --limit=\"forums.* = *\"\n\n        # include 5% of total rows from each table in log schema\n        # and 50% to the rest of tables\n        --limit=\"log.* = 5%, * = 50%\"\n\n    The limit option may be specified multiple times. Multiple pattern/rule\n    pairs can also be specified as a single comma-separated value. For example:\n\n        # include all rows from the ads table; otherwise default to 300 rows\n        --limit=\"ads=*,*=300\"\n\n    Rules are applied in order with the first match taking precedence.\n\n__\\--ordered__\n\n    Guarantees deterministic row ordering in the generated scripts by ordering\n    by primary key.\n\n    --ordered-desc and --ordered-asc are also available to \n    control whether sort is descending or ascending, respectively.\n    Results are in descending order by default (newest records first) \n\n__\\--random__\n\n    Randomize the rows initially selected from each table. May significantly\n    increase the running time of the script.\n\n__\\--schema=__*name*\n\n    The schema name to export (defaults to all).\n\n__\\--sample-schema=__*name*\n\n    The schema name to use for the sample database (defaults to _pg_sample).\n\n__\\--trace__\n\n    Turn on Perl DBI tracing. See the DBI module documentation for details.\n\n__\\--verbose__\n\n    Output status information to standard error.\n\nThe following options control the database connection parameters.\n\n__\\-h__ _host_  \n__\\--host=__*host*\n\n    The host name to connect to. Defaults to the PGHOST environment\n    variable if not specified.\n\n__\\-p__ _port_  \n__\\--port=__*port*\n\n    The database port to connect to. Defaults to the PGPORT environment\n    variable, if set; otherwise, the default port is used.\n\n__\\-U__ _username_  \n__\\--username=__*username*\n\n    User name to connect as.\n\n__\\-W__ _password_  \n__\\-password=__*password*\n\n    Password to connect with.\n\n## Using with Docker\n\nWe support running `pg_sample` as a `docker` container:\n\n```\nsudo docker run --network=host -v \"$(pwd):/io\" mla12/pg_sample -v [option ...] --file /io/myfile.sql \u003cdbname\u003e\n```\n\n# TROUBLESHOOTING\n\n## Working with JSON Fields\n\nIf you get the following error:\n\n```\ncould not identify an equality operator for type json\n```\n\nYou have one or more tables that have `json` column types. This error exists because `json` column types cannot execute equality comparisons natively. To solve this problem, you can convert these `json` columns into `jsonb` columns. However, if that is not feasible in your situation, an alternate solution is to run the [contrib/add_json_equality_operator.sql](https://github.com/mla/pg_sample/blob/master/contrib/add_json_equality_operator.sql) script against the database you are sampling and it will create helper functions for comparing `json` columns.\n\n# LICENSE\n\nThis code is released under the Artistic License. See [perlartistic](http://search.cpan.org/perldoc?perlartistic).\n\n# SEE ALSO\n\ncreatedb(1), pg\\_dump(1), psql(1)\n\n# AUTHOR\n\nMaurice Aubrey \u003cmaurice.aubrey@gmail.com\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmla%2Fpg_sample","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmla%2Fpg_sample","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmla%2Fpg_sample/lists"}