{"id":19651309,"url":"https://github.com/embulk/embulk-input-mongodb","last_synced_at":"2025-04-28T16:31:23.286Z","repository":{"id":38325671,"uuid":"41950168","full_name":"embulk/embulk-input-mongodb","owner":"embulk","description":"MongoDB input plugin for Embulk loads records from MongoDB.","archived":false,"fork":false,"pushed_at":"2023-12-07T14:22:58.000Z","size":323,"stargazers_count":18,"open_issues_count":6,"forks_count":17,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-17T15:13:37.475Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/embulk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-09-05T05:24:15.000Z","updated_at":"2024-11-18T09:00:29.000Z","dependencies_parsed_at":"2024-11-11T15:06:05.276Z","dependency_job_id":"76c896ae-3dd1-4cc9-8a67-8e07dfebb777","html_url":"https://github.com/embulk/embulk-input-mongodb","commit_stats":{"total_commits":111,"total_committers":8,"mean_commits":13.875,"dds":0.5045045045045045,"last_synced_commit":"c54dfd9bedd0c0a22d936ca2fd2ffada807f7ffa"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-mongodb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-mongodb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-mongodb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-mongodb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/embulk","download_url":"https://codeload.github.com/embulk/embulk-input-mongodb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251345917,"owners_count":21574806,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T15:06:00.829Z","updated_at":"2025-04-28T16:31:22.686Z","avatar_url":"https://github.com/embulk.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MongoDB input plugin for Embulk\n\n[![Build Status](https://travis-ci.org/hakobera/embulk-input-mongodb.svg)](https://travis-ci.org/hakobera/embulk-input-mongodb)\n\nMongoDB input plugin for Embulk loads records from MongoDB.\nThis plugin loads documents as single-column records (column name is \"record\"). You can use filter plugins such as [embulk-filter-expand_json](https://github.com/civitaspo/embulk-filter-expand_json) or [embulk-filter-add_time](https://github.com/treasure-data/embulk-filter-add_time) to convert the json column to typed columns. [Rename filter](https://www.embulk.org/docs/built-in.html#rename-filter-plugin) is also useful to rename the typed columns.\n\n## Overview\n\nThis plugin only works with embulk \u003e= 0.8.8.\n\n* **Plugin type**: input\n* **Guess supported**: no\n\n## Configuration\n\n- Connection parameters\n  One of them is required.\n  \n  - use MongoDB connection string URI\n    - **uri**: [MongoDB connection string URI](https://docs.mongodb.org/manual/reference/connection-string/) (e.g. 'mongodb://localhost:27017/mydb') (string, required)\n  - use separated URI parameters\n    - **hosts**: list of hosts. `hosts` are pairs of host(string, required) and port(integer, optional, default: 27017)\n    - **auth_method**: Auth method. One of `scram-sha-1`, `mongodb-cr`, `auto` (string, optional, default: null)\n    - **auth_source**: Auth source. The database name where the user is defined (string, optional, default: null)\n    - **user**: (string, optional)\n    - **password**:  (string, optional)\n    - **database**:  (string, required)\n    - **tls**: `true` to use TLS to connect to the host (boolean, optional, default: `false`)\n    - **tls_insecure**: `true` to disable various certificate validations (boolean, optional, default: `false`)\n      - The option is similar to an option of the official `mongo` command.\n      - See also: https://www.mongodb.com/docs/manual/reference/connection-string/#mongodb-urioption-urioption.tlsInsecure\n- **collection**: source collection name (string, required)\n- **fields**: **(deprecated)** ~~hash records that has the following two fields (array, required)~~\n  ~~- name: Name of the column~~\n  ~~- type: Column types as follows~~\n    ~~- boolean~~\n    ~~- long~~\n    ~~- double~~\n    ~~- string~~\n    ~~- timestamp~~\n- **id_field_name** Name of Object ID field name. Set if you want to change the default name `_id` (string, optional, default: \"_id\")\n- **query**: A JSON document used for [querying](https://docs.mongodb.com/manual/tutorial/query-documents/) on the source collection. Documents are loaded from the colleciton if they match with this condition. (string, optional)\n- **projection**: A JSON document used for [projection](https://docs.mongodb.com/manual/reference/operator/projection/positional/) on query results. Fields in a document are used only if they match with this condition. (string, optional)\n- **sort**: Ordering of results (string, optional)\n- **aggregation**: Aggregation query (string, optional) See [Aggregation query](#aggregation-query) for more detail.\n- **batch_size**: Limits the number of objects returned in one [batch](https://mongodb.github.io/mongo-java-driver/3.8/javadoc/com/mongodb/DBCursor.html#batchSize-int-) (integer, optional, default: 10000)\n- **incremental_field** List of field name (list, optional, can't use with sort option)\n- **last_record** Last loaded record for incremental load (hash, optional)\n- **stop_on_invalid_record** Stop bulk load transaction if a document includes invalid record (such as unsupported object type) (boolean, optional, default: false)\n- **json_column_name**: column name used in outputs (string, optional, default: \"record\")\n\n## Example\n\n### Authentication\n\n#### Use separated URI prameters\n\n```yaml\nin:\n  type: mongodb\n  hosts:\n  - {host: localhost, port: 27017}\n  user:  myuser\n  password: mypassword\n  database: my_database\n  auth_method: scram-sha-1\n  auth_source: auth_db\n  collection: \"my_collection\"\n```\n\nIf you set `auth_method: auto`, The client will negotiate the best mechanism based on the version of the server that the client is authenticating to.\n\nIf the server version is 3.0 or higher, the driver will authenticate using the SCRAM-SHA-1 mechanism.\n\nOtherwise, the driver will authenticate using the MONGODB_CR mechanism. \n\n#### Use URI String\n\n```yaml\nin:\n  type: mongodb\n  uri: mongodb://myuser:mypassword@localhost:27017/my_database?authMechanism=SCRAM-SHA-1\u0026authSource=another_database\n```\n\n### Exporting all objects\n\n#### Specify with MongoDB connection string URI.\n\n```yaml\nin:\n  type: mongodb\n  uri: mongodb://myuser:mypassword@localhost:27017/my_database\n  collection: \"my_collection\"\n```\n\n#### Specify with separated URI parameters.\n\n```yaml\nin:\n  type: mongodb\n  hosts:\n  - {host: localhost, port: 27017}\n  - {host: example.com, port: 27017}\n  user: myuser\n  password: mypassword\n  database: my_database\n  collection: \"my_collection\"\n```\n\n### Filtering documents by query and projection\n\n```yaml\nin:\n  type: mongodb\n  uri: mongodb://myuser:mypassword@localhost:27017/my_database\n  collection: \"my_collection\"\n  query: '{ field1: { $gte: 3 } }'\n  projection: '{ \"_id\": 1, \"field1\": 1, \"field2\": 0 }'\n  sort: '{ \"field1\": 1 }'\n```\n\n### Incremental loading\n\n```yaml\nin:\n  type: mongodb\n  uri: mongodb://myuser:mypassword@localhost:27017/my_database\n  collection: \"my_collection\"\n  query: '{ field1: { $gt: 3 } }'\n  projection: '{ \"_id\": 1, \"field1\": 1, \"field2\": 1 }'\n  incremental_field:\n    - \"field2\"\n  last_record: {\"field2\": 13215}\n```\n\nPlugin will create new query and sort value.\nYou can't use `incremental_field` option with `sort` option at the same time.\n\n```\nquery { field1: { $gt: 3 }, field2: { $gt: 13215}}\nsort {\"field2\", 1} # field2 ascending\n```\n\nYou have to specify last_record with special characters when field type is `ObjectId` or `DateTime`.\n\n```yaml\n# ObjectId field\nin:\n  type: mongodb\n  incremental_field:\n    - \"_id\"\n  last_record: {\"_id\": {\"$oid\": \"5739b2261c21e58edfe39716\"}}\n\n# DateTime field\nin:\n  type: mongodb\n  incremental_field:\n    - \"time_field\"\n  last_record: {\"time_field\": {\"$date\": \"2015-01-25T13:23:15.000Z\"}}\n```\n\n#### Run Incremental load\n\n```\n$ embulk run /path/to/config.yml -c config-diff.yml\n```\n\n### Aggregation query\n\nThis plugin supports aggregation query. You can write complex query like below.\n\n`aggregation` option can't be used with `sort`, `limit`, `skip`, `query` option. Incremental load also doesn't work with aggregation query.\n\n```yaml\nin:\n  type: mongodb\n  aggregation: { $match: {\"int32_field\":{\"$gte\":5 },} }\n```\n\nSee also [Aggregation — MongoDB Manual](https://docs.mongodb.com/manual/aggregation/) and [Aggregation Pipeline Stages — MongoDB Manual](https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/)\n\n### Advanced usage with filter plugins\n\n```yaml\nin:\n  type: mongodb\n  uri: mongodb://myuser:mypassword@localhost:27017/my_database\n  collection: \"my_collection\"\n  query: '{ \"age\": { $gte: 3 } }'\n  projection: '{ \"_id\": 1, \"age\": 1, \"ts\": 1, \"firstName\": 1, \"lastName\": 1 }'\n\nfilters:\n  # convert json column into typed columns\n  - type: expand_json\n    json_column_name: record\n    expanded_columns:\n      - {name: _id, type: long}\n      - {name: ts, type: string}\n      - {name: firstName, type: string}\n      - {name: lastName, type: string}\n\n  # rename column names\n  - type: rename\n    columns:\n      _id: id\n      firstName: first_name\n      lastName: last_name\n\n  # convert string \"ts\" column into timestamp \"time\" column\n  - type: add_time\n    from_column:\n      name: ts\n      timestamp_format: \"%Y-%m-%dT%H:%M:%S.%N%z\"\n    to_column:\n      name: time\n      type: timestamp\n```\n\n## Build\n\n```\n$ ./gradlew gem\n```\n\n## Test\n\nFirstly install Docker and Docker compose then `docker-compose up -d`,\nso that an MongoDB server will be locally launched then you can run tests with `./gradlew test`.\n\n```sh\n$ docker-compose up -d\nCreating embulk-input-mongodb_server ... done\nCreating mongo-express               ... done\nCreating mongoClientTemp             ... done\n\n$ docker-compose ps\n           Name                          Command                 State                            Ports\n------------------------------------------------------------------------------------------------------------------------------\nembulk-input-mongodb_server   docker-entrypoint.sh mongod      Up           0.0.0.0:27017-\u003e27017/tcp, 0.0.0.0:27018-\u003e27018/tcp\nmongo-express                 tini -- /docker-entrypoint ...   Up           0.0.0.0:8081-\u003e8081/tcp\nmongoClientTemp               docker-entrypoint.sh mongo ...   Restarting\n\n$ ./gradlew test  # -t to watch change of files and rebuild continuously\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-input-mongodb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fembulk%2Fembulk-input-mongodb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-input-mongodb/lists"}