{"id":16578011,"url":"https://github.com/glynnbird/couchimport","last_synced_at":"2025-04-06T01:10:29.342Z","repository":{"id":18212582,"uuid":"21348369","full_name":"glynnbird/couchimport","owner":"glynnbird","description":"CouchDB import tool to allow data to be bulk inserted","archived":false,"fork":false,"pushed_at":"2024-07-31T11:52:57.000Z","size":1614,"stargazers_count":140,"open_issues_count":0,"forks_count":30,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-28T03:38:02.346Z","etag":null,"topics":["command-line","couchdb","csv","export","import"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/glynnbird.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-06-30T11:08:19.000Z","updated_at":"2024-11-10T07:10:37.000Z","dependencies_parsed_at":"2023-11-15T10:24:20.712Z","dependency_job_id":"f06828d2-8bf1-4034-b036-a68f6c7f6f4d","html_url":"https://github.com/glynnbird/couchimport","commit_stats":{"total_commits":140,"total_committers":15,"mean_commits":9.333333333333334,"dds":"0.33571428571428574","last_synced_commit":"4ee5fe3da4555d38300eb3cb8fbec065d55e1f6f"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glynnbird%2Fcouchimport","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glynnbird%2Fcouchimport/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glynnbird%2Fcouchimport/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glynnbird%2Fcouchimport/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/glynnbird","download_url":"https://codeload.github.com/glynnbird/couchimport/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247419861,"owners_count":20936012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line","couchdb","csv","export","import"],"created_at":"2024-10-11T22:12:52.130Z","updated_at":"2025-04-06T01:10:29.324Z","avatar_url":"https://github.com/glynnbird.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# couchimport\n\n## Introduction\n\nWhen populating CouchDB databases, often the source of the data is initially some JSON documents in a file, or some structured CSV/TSV data from another database's export.\n\n*couchimport* is designed to assist with importing such data into CouchDb efficiently. Simply pipe a file full of JSON documents into *couchimport*, telling the URL and database to send the data to.\n\n\u003e Note: `couchimport` used to handle the CSV to JSON conversion, but this part is now handled by [csvtojsonlines](https://www.npmjs.com/package/csvtojsonlines), keeping this package smaller and easier to maintain. The `couchimport@1.6.5` package is the last version to support CSV/TSV natively - from 2.0 onwards, `couchimport` is only for pouring JSONL files into CouchDB.\n\n\u003e Also note: the companion CSV export utility (couchexport) is now hosted at [couchcsvexport](https://www.npmjs.com/package/couchcsvexport).\n\n## Installation\n\nInstall using npm or another Node.js package manager:\n\n```sh\nnpm install -g couchimport\n```\n\n## Usage\n\n_couchimport_ can either read JSON docs (one per line) from _stdin_ e.g.\n\n```sh\ncat myfile.json | couchimport\n```\n\nor by passing a filename as the last parameter:\n\n```sh\ncouchimport myfile.json\n```\n\n*couchimport*'s configuration parameters can be stored in environment variables or supplied as command line arguments.\n\n## Configuration - environment variables\n\nSimply set the `COUCH_URL` environment variable e.g. for a hosted Cloudant database\n\n```sh\nexport COUCH_URL=\"https://myusername:myPassw0rd@myhost.cloudant.com\"\n```\n\nand define the name of the CouchDB database to write to by setting the `COUCH_DATABASE` environment variable e.g.\n\n```sh\nexport COUCH_DATABASE=\"mydatabase\"\n```\n\nSimply pipe the text data into \"couchimport\":\n\n```sh\ncat mydata.jsonl | couchimport\n```\n\n## Configuring - command-line options\n\nSupply the `--url` and `--database` parameters as command-line parameters instead:\n\n```sh\ncouchimport --url \"http://user:password@localhost:5984\" --database \"mydata\" mydata.jsonl\n```\n\nor by piping data into _stdin_:\n\n```sh\ncat mydata.jsonl | couchimport --url \"http://user:password@localhost:5984\" --database \"mydata\" \n```\n\n## Handling CSV/TSV data\n\nWe can use another package [csvtojsonlines](https://www.npmjs.com/package/csvtojsonlines) to convert CSV/TSV files into a JSONL stream acceptable to `couchimport`:\n\n```sh\n# CSV file ----\u003e JSON lines ---\u003e CouchDB\ncat transactions.csv | csvtojsonlines --delimiter ',' | couchimport --db ledger\n```\n\n## Generating random data\n\n_couchimport_ can be paired with [datamaker](https://www.npmjs.com/package/datamaker) to generate any amount of sample data:\n\n```sh\n# template ---\u003e datamaker ---\u003e 100 JSON docs ---\u003e couchimport ---\u003e CouchDB\necho '{\"_id\":\"{{uuid}}\",\"name\":\"{{name}}\",\"email\":\"{{email true}}\",\"dob\":\"{{date 1950-01-01}}\"}' | datamaker -f json -i 100 | couchimport --db people\nwritten {\"docCount\":100,\"successCount\":1,\"failCount\":0,\"statusCodes\":{\"201\":1}}\nwritten {\"batch\":1,\"batchSize\":100,\"docSuccessCount\":100,\"docFailCount\":0,\"statusCodes\":{\"201\":1},\"errors\":{}}\nImport complete\n```\n\nor with the template as a file:\n\n```sh\ncat template.json | datamaker -f json -i 10000 | couchimport --db people\n```\n\n## Understanding errors\n\nWe know if we get an HTTP 4xx/5xx response, then all of the documents failed to be written to the database. But as _couchimport_ is writing data in bulk, the bulk request may get an HTTP 201 response that doesn't mean that _all_ of the documents were written. Some of the document ids may have been in the database already. So the _couchimport_ output includes counts of the number of documents that were written successfully and the number that failed, and a tally of the HTTP response codes and individual document error messages:\n\ne.g.\n\n```js\nwritten {\"batch\":10,\"batchSize\":1,\"docSuccessCount\":4,\"docFailCount\":6,\"statusCodes\":{\"201\":10},\"errors\":{\"conflict\":6}}\n```\n\nThe above log line shows that after the tenth batch of writes, we have written 4 documents and failed to write 6 others. There were six \"conflict\" errors, meaning that there was a clash of document id or id/rev combination.\n\n## Parallel writes\n\nOlder versions of _couchimport_ supported the ability to have multiple HTTP requests in flight at any one time, but the new simplified _couchimport_ does not. To achieve the same thing, simply split your file of JSON docs into smaller pieces and run multiple _couchimport_ jobs:\n\n```sh\n# split large file into files 1m lines each\n# this will create files xaa, xab, xac etc\nsplit -l 1000000 massive.txt\n# find all files starting with x and using xargs,\n# spawn a max of 2 process at once running couchimport, \n# one for each file\nfind . -name \"x*\" | xargs -t -I % -P 2 couchimport --db test %\n```\n\n## Environment variables reference\n\n* COUCH_URL - the url of the CouchDB instance (required, or to be supplied on the command line)\n* COUCH_DATABASE - the database to deal with (required, or to be supplied on the command line)\n* COUCH_BUFFER_SIZE - the number of records written to CouchDB per bulk write (defaults to 500, not required)\n* IAM_API_KEY - to use IBM IAM to do authentication, set the IAM_API_KEY to your api key and a bearer token will be used in the HTTP requests.\n\n## Command-line parameters reference\n\nYou can also configure `couchimport` using command-line parameters:\n\n* `--help` - show help\n* `--url`/`-u` - the url of the CouchDB instance (required, or to be supplied in the environment)\n* `--database`/`--db`/`-d` - the database to deal with (required, or to be supplied in the environment)\n* `--buffer`/`-b` - the number of records written to CouchDB per bulk write (defaults to 500, not required)\n\n## Using programmatically\n\nIn your project, add `couchimport` into the dependencies of your package.json or run `npm install --save couchimport`. In your code, require the library with\n\n```js\nconst couchimport = require('couchimport')\n```\n\nand your options are set in an object whose keys are the same as the command line paramters:\n\ne.g.\n\n```js\nconst opts = { url: \"http://localhost:5984\", database: \"mydb\", rs: fs.createReadStream('myfile.json') }\nawait couchimport(opts)\n```\n\n\u003e Note: `rs` is the readstream where data will be read (default: `stdin`) and `ws` is the write stream where the output will be written (default: `stdout`)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglynnbird%2Fcouchimport","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fglynnbird%2Fcouchimport","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglynnbird%2Fcouchimport/lists"}