{"id":13514796,"url":"https://github.com/embulk/embulk","last_synced_at":"2025-05-13T22:08:43.098Z","repository":{"id":20799395,"uuid":"24084730","full_name":"embulk/embulk","owner":"embulk","description":"Embulk: Pluggable Bulk Data Loader.","archived":false,"fork":false,"pushed_at":"2024-12-03T15:04:05.000Z","size":8064,"stargazers_count":1769,"open_issues_count":159,"forks_count":201,"subscribers_count":104,"default_branch":"master","last_synced_at":"2025-04-29T21:39:27.962Z","etag":null,"topics":["bulk-loader","embulk"],"latest_commit_sha":null,"homepage":"https://www.embulk.org/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/embulk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-09-16T03:58:52.000Z","updated_at":"2025-04-20T13:40:57.000Z","dependencies_parsed_at":"2023-02-16T13:45:56.579Z","dependency_job_id":"ef8f6e7f-d2bf-4540-9a3a-d5ad52dadbc4","html_url":"https://github.com/embulk/embulk","commit_stats":{"total_commits":2514,"total_committers":55,"mean_commits":45.70909090909091,"dds":0.6225139220365951,"last_synced_commit":"4168db146038ff429f7787bca1c748dc12010732"},"previous_names":[],"tags_count":198,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/embulk","download_url":"https://codeload.github.com/embulk/embulk/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254036828,"owners_count":22003654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bulk-loader","embulk"],"created_at":"2024-08-01T05:01:01.899Z","updated_at":"2025-05-13T22:08:38.075Z","avatar_url":"https://github.com/embulk.png","language":"Java","readme":"# What's Embulk?\n\nEmbulk is a parallel bulk data loader that **helps data transfer between various storages, databases, NoSQL and cloud services**.\n\n**Embulk supports plugins** to add functions. You can [share the plugins](https://plugins.embulk.org/) to keep your custom scripts readable, maintainable, and reusable.\n\n[![Embulk](https://gist.githubusercontent.com/frsyuki/f322a77ee2766a508ba9/raw/e8539b6b4fda1b3357e8c79d3966aa8148dbdbd3/embulk-overview.png)](http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed/12)\n[Embulk, an open-source plugin-based parallel bulk data loader](http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed) at Slideshare\n\n# Document\n\nEmbulk documents: https://www.embulk.org/\n\n### Using plugins\n\nYou can use plugins to load data from/to various systems and file formats. Here is the list of publicly released plugins: [list of plugins by category](https://plugins.embulk.org/).\n\nAn example is [embulk-output-command](https://github.com/embulk/embulk-output-command) plugin. It executes an external command to output the records.\n\nTo install plugins, you can use `embulk gem install \u003cname\u003e` command:\n\n```\nembulk gem install embulk-output-command\nembulk gem list\n```\n\nEmbulk bundles some built-in plugins such as `embulk-encoder-gzip` or `embulk-formatter-csv`. You can use those plugins with following configuration file:\n\n```yaml\nin:\n  type: file\n  path_prefix: \"./try1/csv/sample_\"\n  ...\nout:\n  type: command\n  command: \"cat - \u003e task.$INDEX.$SEQID.csv.gz\"\n  encoders:\n    - {type: gzip}\n  formatter:\n    type: csv\n```\n\n### Resuming a failed transaction\n\nEmbulk supports resuming failed transactions.\nTo enable resuming, you need to start transaction with `-r PATH` option:\n\n```\nembulk run config.yml -r resume-state.yml\n```\n\nIf the transaction fails, embulk stores state some states to the yaml file. You can retry the transaction using exactly same command:\n\n```\nembulk run config.yml -r resume-state.yml\n```\n\nIf you give up on resuming the transaction, you can use `embulk cleanup` subcommand to delete intermediate data:\n\n```\nembulk cleanup config.yml -r resume-state.yml\n```\n\n### Using plugin bundle\n\n`embulk mkbundle` subcommand creates a isolated bundle of plugins. You can install plugins (gems) to the bundle directory instead of ~/.embulk directory. This makes it easy to manage versions of plugins.\nTo use the bundle, add `-b \u003cbundle_dir\u003e` option to `guess`, `preview`, or `run` subcommand. `embulk mkbundle` also generates some example plugins to \\\u003cbundle_dir\u003e/embulk/\\*.rb directory.\n\nSee the generated \\\u003cbundle_dir\u003e/Gemfile file how to plugin bundles work.\n\n```\nembulk mkbundle ./embulk_bundle  # please edit ./embulk_bundle/Gemfile to add plugins. Detailed usage is written in the Gemfile\nembulk guess -b ./embulk_bundle ...\nembulk run   -b ./embulk_bundle ...\n```\n\n## Use cases\n\n* [Scheduled bulk data loading to Elasticsearch + Kibana 5 from CSV files](https://www.embulk.org/recipes/scheduled-csv-load-to-elasticsearch-kibana5.html)\n\nFor further details, visit [Embulk documentation](https://www.embulk.org/).\n\n## Upgrading to the latest version\n\nFollowing command updates embulk itself to the specific released version.\n\n```sh\nembulk selfupdate x.y.z\n```\n\n## Embulk Development\n\n### Build\n\n```\n./gradlew cli  # creates pkg/embulk-VERSION.jar\n```\n\nYou can see JaCoCo's test coverage report at `${project}/build/reports/tests/index.html`\nYou can see Findbug's report at `${project}/build/reports/findbug/main.html`  # FIXME coverage information is not included somehow\n\nYou can use `classpath` task to use `bundle exec ./bin/embulk` for development:\n\n```\n./gradlew -t classpath  # -x test: skip test\n./bin/embulk\n```\n\nTo deploy artifacts to your local maven repository at ~/.m2/repository/:\n\n```\n./gradlew install\n```\n\nTo compile the source code of embulk-core project only:\n\n```\n./gradlew :embulk-core:compileJava\n```\n\nTask `dependencies` shows dependency tree of embulk-core project:\n\n```\n./gradlew :embulk-core:dependencies\n```\n\n### Update JRuby\n\nModify `jrubyVersion` in `build.gradle` to update JRuby of Embulk.\n\n### Release\n\n#### Prerequisite: Sonatype OSSRH\n\nYou need an account in [Sonatype OSSRH](https://central.sonatype.org/pages/ossrh-guide.html), and configure it in your `~/.gradle/gradle.properties`.\n\n```\nossrhUsername=(your Sonatype OSSRH username)\nossrhPassword=(your Sonatype OSSRH password)\n```\n\n#### Prerequisite: PGP signatures\n\nYou need your [PGP signatures to release artifacts into Maven Central](https://central.sonatype.org/pages/working-with-pgp-signatures.html), and [configure Gradle to use your key to sign](https://docs.gradle.org/current/userguide/signing_plugin.html).\n\n```\nsigning.keyId=(the last 8 symbols of your keyId)\nsigning.password=(the passphrase used to protect your private key)\nsigning.secretKeyRingFile=(the absolute path to the secret key ring file containing your private key)\n```\n\n#### Release\n\nModify `version` in `build.gradle` at a detached commit to bump Embulk version up.\n\n```\ngit checkout --detach master\n(Remove \"-SNAPSHOT\" in \"version\" in build.gradle.)\ngit add build.gradle\ngit commit -m \"Release vX.Y.Z\"\ngit tag -a vX.Y.Z\n(Write the release note for vX.Y.Z in the tag annotation.)\n./gradlew clean \u0026\u0026 ./gradlew release\ngit push -u origin vX.Y.Z\n```\n","funding_links":[],"categories":["Java","Projects","Data Ingestion","1. language","Uncategorized","II. Databases, search engines, big data and machine learning","大数据","项目","Ruby"],"sub_categories":["Utility","1.1 ruby","Uncategorized","1. Databases and storages","实用程序","Libraries"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fembulk%2Fembulk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk/lists"}